Contributing your voice

You can help us and the rest of the open voice community develop speech-to-text and text-to-speech models for your language.

Speech-to-text

When you speak to a computer, it transcribes the audio from your voice into text. There are many ways to do this, but they all rely on recordings of people speaking.

For speech-to-text, it is important to have:

Many different speakers and accents
A variety of recording devices and quality levels
- Typically 16Khz audio with 16-bit samples
Multiple recording environments, including different rooms and noise levels

We recommend that users contribute to Mozilla's Common Voice project for speech-to-text. This free and open dataset crowd sources spoken sentences from people around the world. Contributors may also help by validating existing recordings.

Text-to-speech

When a computer speaks to you, it synthesizes audio from text. This has different requirements than a speech to text dataset:

A single speaker, or equal amounts of data for all speakers
A high quality recording device
- Typically 48Khz with 32-bit samples
A quiet, controlled recording environment such as a sound-proof booth

We recommend that users contribute to the LibriVox project for text to speech. This not only provides training data for the open voice community, but also free audio books for everyone to enjoy. Importantly, the books that are read must be in the public domain.

Speech-to-text​

Text-to-speech​

Speech-to-text

Text-to-speech