Speech (TTS)

marmot speak. Text in, audio out. TTY-aware so it plays on a terminal and emits bytes when piped.

marmot speak <text> [flags…]

Providers: openai, openrouter, vercel, cloudflare. On first run, marmot detects available API keys in the env and auto-configures a default in this order: openroutervercelcloudflareopenai. Override any time with marmot setup, marmot config set, or --provider.

Output

Default behavior is TTY-aware:

InvocationOutput
marmot speak '...' (terminal)Plays through speakers (writes a temp file, plays in foreground, deletes after).
marmot speak '...' > out.mp3Writes raw audio bytes to stdout (auto-binary).
marmot speak '...' | nextSame — bytes on stdout.
marmot speak '...' -o hi.mp3Writes to hi.mp3, prints the path.
marmot speak '...' --playPlays. When piped, also emits bytes downstream so the pipeline continues.
marmot speak '...' --binaryForces raw bytes regardless.
marmot speak '...' --b64JSON envelope with inline base64.
marmot speak '...' --jsonWrites file, emits full JSON envelope.

Examples

marmot speak 'Hello from marmot'                 # plays on TTY
marmot speak 'Hola mundo' --provider cloudflare --model @cf/myshell-ai/melotts
marmot speak 'Welcome' --voice nova -o ./hello.mp3

# Pipe bytes to a player
marmot speak 'Hello' | mpv -

# Play AND continue piping (e.g. round-trip transcribe)
marmot speak 'Hello from marmot' --play | marmot transcribe

# Steerable voice
marmot speak 'Welcome aboard' --model gpt-4o-mini-tts --voice ash \
  --instructions 'cheerful, slow, slightly British'

Flags

For cross-cutting flags see Common flags. Speak-specific:

FlagDescription
--model <id>Speech model. Defaults to provider's default.
--voice <name>Voice id (provider-specific).
--format <fmt>Audio format: mp3 (default), wav, flac, aac, opus.
--speed <n>Playback speed multiplier (0.25–4.0). OpenAI only.
--instructions <text>Steering text for steerable voices (e.g. gpt-4o-mini-tts).
-o, --output <path>Output audio path.
-p, --prompt-file <path>Read text from a file.
--playPlay through speakers. Default on a TTY. When piped, also emits bytes downstream.
--waitWith --play, block until playback finishes.
--binaryForce raw audio bytes to stdout.
--b64JSON envelope with inline base64.
--jsonJSON envelope on stdout (instead of just the path).

--binary and --b64 are mutually exclusive. --play can combine with binary/pipe — that's the documented "play AND continue piping" mode.