Success Story: Voxo AB SHAPE Access to Improve Swedish Text-to-Speech Algorithms

Organisations & Codes Involved

Voxo AB ( is a Stockholm-based startup that specializes in extracting, analysing, and visualising voice data. Their services are used in multiple industries to provide insights and enable data-driven business development.

Technical/scientific Challenge:

Tools such as Apple’s Siri, Amazon’s Alexa, and Google Home have brought text-to-speech capabilities to the masses. These conversational assistants respond to natural-language requests and reply in kind. They use machine-learning models trained on large amounts of recorded speech samples matched with the corresponding text. When the assistant wants to say something, the model is able to build new utterances that sound natural.

These big tech companies also provide APIs to access such capabilities, and those support many languages. To use them, you have to send the text to their server and receive the generated speech back. That’s often fine, but not when the text pertains to someone’s personal data. In the EU, GDPR requires that such data be handled correctly, and in particular not transmitted outside the EU. Using a third-party API of a trans-national company cannot provide the required transparency.

Business impact:

The speech model will be a key component of a conversational assistant capable of providing information in real time in response to spoken natural-language questions. It will be capable of learning to pronounce jargon relevant to particular domains, such as banking. More industries will be able to use the model to automate processes and improve current solutions.


They are keen to build on their existing voice expertise to enter market sectors that need the capability to synthesize voices speaking Swedish. The generation must not compromise the integrity of the data, which might be personal to a user. Thus, existing programmatic APIs are unsuitable, and Voxo building its own solution using HPC.

As first-time HPC users, Voxo applied to the pan-European program for introductory industrial HPC access, called SHAPE ( They were delighted to receive help from ENCCS to write their proposal to build a Swedish-language voice-to-text capability. Ultimately they were awarded 25,000 core hours on the JUWELS Booster cluster. This cluster is housed at the Jülich Supercomputing Centre ( in Germany and includes over 3700 latest-model NVIDIA A100 GPUS. These will be invaluable for training text-to-speech models for Swedish.


The model will generate audio streams quickly, so that users will be comfortable with natural conversation flow, without pauses for generating long replies. It will be implemented using existing Tacotron and WaveGlow technology, such as described in this blog post from NVIDIA This will make personal banking much more accessible, by removing requirements like visiting a bank branch, or having and being able to use a computer or mobile computing device.


  • Keywords: High-Performance Computing, HPC, Supercomputing, Engineering, Software Optimisation, EuroCC
  • Industry sector: IT/HPC systems, services & software providers, manufacturing & engineering, natural science
  • Technology: HPC, HPDA, AI


This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 951732. The JU receives support from the European Union’s Horizon 2020 research and innovation program and Germany, Bulgaria, Austria, Croatia, Cyprus, the Czech Republic, Denmark, Estonia, Finland, Greece, Hungary, Ireland, Italy, Lithuania, Latvia, Poland, Portugal, Romania, Slovenia, Spain, Sweden, the United Kingdom, France, the Netherlands, Belgium, Luxembourg, Slovakia, Norway, Switzerland, Turkey, Republic of North Macedonia, Iceland, Montenegro