Skip to main content

Speech To Text Entity

A speech to text (STT) entity allows other integrations or applications to stream speech data to the STT API and get text back.

A speech to text entity is derived from the homeassistant.components.stt.SpeechToTextEntity.



Properties should always only return information from memory and not do I/O (like network requests).

supported_languageslist[str]RequiredThe supported languages of the STT service.
supported_formatslist[AudioFormats]RequiredThe supported audio formats of the STT service, wav or ogg.
supported_codecslist[AudioCodecs]RequiredThe supported audio codecs of the STT service, pcm or opus.
supported_bit_rateslist[AudioBitRates]RequiredThe supported audio bit rates of the STT service, 8, 16, 24 or 32.
supported_sample_rateslist[AudioSampleRates]RequiredThe supported audio sample rates of the STT service.
supported_channelslist[AudioChannels]RequiredThe supported audio channels of the STT service, 1 or 2.


Process audio stream

The process audio stream method is used to send audio to an STT service and get text back.

class MySpeechToTextEntity(SpeechToTextEntity):
"""Represent a Speech To Text entity."""

async def async_process_audio_stream(self) -> None:
"""Process an audio stream to STT service.

Only streaming content is allowed!