Speech (TTS)

marmot speak. Text in, audio out. TTY-aware so it plays on a terminal and emits bytes when piped.

marmot speak <text> [flags…]

Providers: openai, openrouter, vercel, cloudflare. On first run, marmot detects available API keys in the env and auto-configures a default in this order: openrouter → vercel → cloudflare → openai. Override any time with marmot setup, marmot config set, or --provider.

Output

Default behavior is TTY-aware:

Invocation	Output
`marmot speak '...'` (terminal)	Plays through speakers (writes a temp file, plays in foreground, deletes after).
`marmot speak '...' > out.mp3`	Writes raw audio bytes to stdout (auto-binary).
`marmot speak '...' \| next`	Same — bytes on stdout.
`marmot speak '...' -o hi.mp3`	Writes to `hi.mp3`, prints the path.
`marmot speak '...' --play`	Plays. When piped, also emits bytes downstream so the pipeline continues.
`marmot speak '...' --binary`	Forces raw bytes regardless.
`marmot speak '...' --b64`	JSON envelope with inline base64.
`marmot speak '...' --json`	Writes file, emits full JSON envelope.

Examples

marmot speak 'Hello from marmot'                 # plays on TTY
marmot speak 'Hola mundo' --provider cloudflare --model @cf/myshell-ai/melotts
marmot speak 'Welcome' --voice nova -o ./hello.mp3

# Pipe bytes to a player
marmot speak 'Hello' | mpv -

# Play AND continue piping (e.g. round-trip transcribe)
marmot speak 'Hello from marmot' --play | marmot transcribe

# Steerable voice
marmot speak 'Welcome aboard' --model gpt-4o-mini-tts --voice ash \
  --instructions 'cheerful, slow, slightly British'

Flags

For cross-cutting flags see Common flags. Speak-specific:

Flag	Description
`--model <id>`	Speech model. Defaults to provider's default.
`--voice <name>`	Voice id (provider-specific).
`--format <fmt>`	Audio format: `mp3` (default), `wav`, `flac`, `aac`, `opus`.
`--speed <n>`	Playback speed multiplier (0.25–4.0). OpenAI only.
`--instructions <text>`	Steering text for steerable voices (e.g. gpt-4o-mini-tts).
`-o, --output <path>`	Output audio path.
`-p, --prompt-file <path>`	Read text from a file.
`--play`	Play through speakers. Default on a TTY. When piped, also emits bytes downstream.
`--wait`	With `--play`, block until playback finishes.
`--binary`	Force raw audio bytes to stdout.
`--b64`	JSON envelope with inline base64.
`--json`	JSON envelope on stdout (instead of just the path).

--binary and --b64 are mutually exclusive. --play can combine with binary/pipe — that's the documented "play AND continue piping" mode.

Output

Examples

Flags

On this page