Skip to content

Text-to-Speech

Let the agent talk back in an actual human-sounding voice — used by the "play this message" speaker button in the web chat.

URL: /settings?tab=tts

Enabled

Master toggle. When off, no speaker icons disappear from the UI.

json
{ "tts": { "enabled": false } }

Provider

Which backend generates the audio. Both options reference an entry you already configured on the Providers page — the API key / base URL is read from there.

ValueNotes
openaiUses an OpenAI API-Key provider from providers.json. Calls its /v1/audio/speech endpoint with one of OpenAI's TTS models.
mistralUses a Mistral API-Key provider from providers.json. Synthesises with Mistral's Voxtral voices.
deepgramHosted Deepgram Aura voices. Standalone — does not use a provider entry. Uses the API key configured directly in the Deepgram card below (stored in tts.deepgramApiKey).

The dropdown lists each matching provider as its own entry, e.g. OpenAI (My OpenAI) or Mistral Voxtral (Mistral Main). If no matching provider is configured, the option appears disabled.

The selected provider's id is stored in tts.providerId; tts.provider stores only the backend type:

json
{ "tts": { "provider": "openai", "providerId": "openai-main" } }

OpenAI model

Shown when provider is openai. The model is sent to the selected provider's /v1/audio/speech endpoint.

ValueNotes
gpt-4o-mini-ttsNewer, supports tone/style instructions. Recommended default.
tts-1Classic, fast, cheap.
tts-1-hdHigher-fidelity version of tts-1.
json
{ "tts": { "openaiModel": "gpt-4o-mini-tts" } }

OpenAI voice

Shown when provider is openai. One of OpenAI's preset voices (alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, …). The list is loaded live from the OpenAI catalog so the options stay in sync if OpenAI ships new voices.

json
{ "tts": { "openaiVoice": "nova" } }

OpenAI instructions

Only shown for gpt-4o-mini-tts. Free-form tone/style guidance for the voice model, e.g.:

text
Speak calmly, with a slight Viennese accent, medium pace.

Leave empty for the neutral default.

json
{ "tts": { "openaiInstructions": "" } }

Mistral voice

Shown when provider is mistral. The UI splits the choice into two side-by-side dropdowns:

  • Speaker — one of the available Voxtral speakers, annotated with language (e.g. Nadia (German), Theo (English)).
  • Mood — emotional color (neutral, happy, serious, …).

The list is fetched live from the Voxtral catalog when you open the panel. The two selections are combined into a single voice id and stored in tts.mistralVoice:

json
{ "tts": { "mistralVoice": "nadia-neutral" } }

Deepgram API key

Shown when provider is deepgram. The API key used to authenticate against Deepgram. Stored encrypted at rest in tts.deepgramApiKey. Get a free key at console.deepgram.com.

The field is rendered as a password input. Once saved, it shows a masked preview (e.g. dg_••••••abcd) — leave the masked value untouched to keep the existing key.

json
{ "tts": { "deepgramApiKey": "dg_..." } }

Deepgram voice

Shown when provider is deepgram. Deepgram bundles voice and language into a single Aura model id (e.g. aura-2-thalia-en for an English voice, aura-2-ophelia-de for German), so picking a voice and picking a language is one decision.

The dropdown is pre-populated with a small list of common Aura voices. Click the refresh icon next to the dropdown to fetch the full, up-to-date voice catalog from your Deepgram account — useful when Deepgram releases new voices. Refreshing requires the API key to be saved first.

Stored in tts.deepgramModel:

json
{ "tts": { "deepgramModel": "aura-2-thalia-en" } }

Voice preview

A text field + play button at the bottom of each provider block. Enter any text, click the speaker icon, hear the current settings applied immediately — no need to save first. Stop playback by clicking the same button again.

Audio format

Output container format used for the synthesized audio in the web chat.

ValueNotes
mp3Universal default. Supports long-text chunking with Deepgram (see below).
wavUncompressed, large. Supports long-text chunking with Deepgram (PCM is concatenated and wrapped in a single WAV header).
opusSmall. Deepgram only — limited to 2000 characters per request (see below).
flacLossless. Deepgram only — limited to 2000 characters per request (see below).
json
{ "tts": { "responseFormat": "mp3" } }

Deepgram long-text behavior

Deepgram's /v1/speak endpoint rejects any single request longer than 2000 characters. To make the “Read message aloud” button work for long assistant replies, Axiom transparently splits the input on sentence boundaries and synthesizes each chunk separately, then concatenates the result.

Which formats this works for depends on whether the audio container can be safely concatenated byte-for-byte:

  • mp3 — frame-aligned; chunked output is concatenated directly.
  • wav — Deepgram returns headerless PCM (linear16); chunks are concatenated as raw samples and wrapped in a single WAV header.
  • opus and flac — use page/frame containers that do not survive naive concatenation. Inputs ≤ 2000 characters still work normally; longer inputs are rejected with an actionable error asking you to switch to mp3 or wav in Settings → Text-to-Speech.

The OpenAI and Mistral providers don't have this 2000-character limit and are unaffected.

Released under the MIT License.