Talk API — Voice Input with AI Cleaning

The Talk API combines real-time speech-to-text with LLM-powered transcript cleaning. Send raw audio via WebSocket, get polished, publication-ready text back. Perfect for push-to-talk interfaces, voice notes, and dictation features.

Python

from sayd_ai import Sayd

client = Sayd(api_key="sk-your-key")

# Create a Talk session with config
session = client.talk.create(
    language="multi",      # Recommended. "auto", "en", "zh", or "multi" (multilingual)
    sample_rate=16000,     # Recommended. 8000 also supported.
    codec="pcm16",         # "pcm16" or "opus"
    cleaning_level="standard",  # "light", "standard", "aggressive"
    output_format="paragraph",  # "paragraph", "bullets", "raw"
)

# Real-time streaming from microphone
for event in session.stream_microphone():
    if event.type == "partial":
        print(f"\r  {event.text}", end="", flush=True)
    elif event.type == "sentence":
        print(f"\n[final] {event.text}")
    elif event.type == "cleaned":
        print(f"\n--- AI Cleaned ---")
        print(event.cleaned_text)

# Also supports streaming from file:
# for event in session.stream_file("recording.wav"):
#     ...

print(f"Duration: {session.duration_minutes:.1f} min")
print(f"Cost: ${session.cost_usd:.4f}")

Parameters

Parameter	Values	Description
language	"auto"	Auto-detect language (single language per session)
	"en"	English (also accepts `en-US`, `en-GB` etc.)
	"zh"	Chinese (also accepts `zh-CN`, `zh-TW` etc.)
	"multi"	Recommended. Multi-language mode — automatically detects and switches between languages within the same session.
sample_rate	8000 / 16000	`8000` or `16000` Hz (16000 recommended for best accuracy)

WebSocket Protocol

After creating a session via POST /api/talk, connect to the returned WebSocket URL to stream audio and receive results. The websocket_url already contains the API key as a query parameter — no additional authentication is needed for the WebSocket connection.

Bash

# WebSocket Protocol — Talk API

## Connect
# POST /api/talk returns a pre-authenticated websocket_url.
# The API key is embedded as a query parameter — just connect directly:
ws = WebSocket(websocket_url)
# Example URL: wss://api2.memorion.me/v1/talk/stream/{session_id}?api_key=...&external_user_id=...
# No additional auth headers needed for WebSocket.

## Server Messages (you receive)
{"type": "ready"}                    # Session ready, start sending audio
{"type": "partial", "text": "..."}   # Interim transcript (may change)
{"type": "sentence", "segments": []} # Final transcript segment
{"type": "cleaned", "text": "..."}   # ✨ AI-cleaned result
{"type": "complete", ...}            # Session complete

## Client Messages (you send)
[binary PCM16 frames]               # Raw audio data
{"type": "end"}                      # Signal end of recording
{"type": "keepalive"}                # Keep connection alive (send every ~15 seconds)

## Keepalive
# Send {"type": "keepalive"} every 15 seconds to prevent the WebSocket
# from timing out. The server does not reply to keepalive messages.

## End Signal & Drain Window
# After you send {"type": "end"}, the server enters a 500ms drain window:
# 1. Any audio data still in transit is accepted and processed (not discarded)
# 2. After 500ms, the server stops accepting new audio
# 3. The STT engine finalizes the remaining transcript
# 4. LLM cleaning runs on the complete transcript
# 5. Server sends {"type": "cleaned", "text": "..."} followed by {"type": "complete"}
# No client-side delay needed — just send "end" and wait for results.

API Endpoints

POST/api/talkCreate a Talk session (returns websocket_url)

WSwebsocket_urlStream audio with AI cleaning (URL from POST response)

GET/api/talkList Talk sessions

GET/api/talk/{id}Get Talk session details & results