Knowledge Base

Ground your agent’s answers in your own documents with retrieval-augmented generation

A knowledge base is a bundle of documents (PDF, plain text, markdown, or HTML) that your voice agent can consult during a call. You upload once; the server extracts, chunks, embeds, and indexes the content. Every agent attached to the knowledge base gets a built-in search_knowledge tool that retrieves the most relevant excerpts in real time.

Why use it

The LLM only knows what’s in its prompt. If you need it to answer from product manuals, policy documents, an FAQ, or internal runbooks, inlining everything into the system prompt is expensive and doesn’t scale past a few pages. A knowledge base gives the agent a cheap, fast way to look up exactly the passage it needs, when it needs it.

Create a knowledge base

1from speechify import Speechify
2
3client = Speechify()
4
5kb = client.tts.knowledge_bases.create(
6 name="Product Handbook",
7 description="Manuals, FAQs, troubleshooting",
8)
9print(kb.id)

Upload a document

Multipart upload. Max 10 MB per file.

1with open("manual.pdf", "rb") as f:
2 doc = client.tts.knowledge_bases.upload_document(
3 id=kb.id,
4 file=f,
5 )
6print(doc.status, doc.chunk_count)

The response includes a status field that transitions from embedding to ready once every chunk is indexed. Upload is synchronous — expect a few seconds per megabyte of input.

StatusMeaning
embeddingChunks are being embedded and inserted.
readyAll chunks indexed; document is searchable.
failedExtraction or embedding failed. See error for details.

Attach to an agent

1client.tts.agents.attach_knowledge_base(id=agent.id, kb_id=kb.id)

On the next conversation for that agent, search_knowledge is auto-registered as a function tool. The LLM decides when to call it based on the caller’s question; you don’t have to modify the agent prompt.

The tool is scoped to exactly the knowledge bases attached to the agent — it cannot query anything else, regardless of what the worker sends.

Search via the API

You can also run semantic search directly, outside a conversation. Useful for UIs that want to show grounded snippets, or for verifying what the agent would retrieve.

1result = client.tts.knowledge_bases.search(
2 kb_ids=[kb.id],
3 query="what is the return policy for refurbished hardware",
4 top_k=5,
5)
6for hit in result.hits:
7 print(hit.filename, hit.score, hit.content[:120])

Each hit includes the source filename, the chunk content, and a cosine-similarity score. Scores are relative — use them for ranking, not as an absolute confidence metric.

How it works

  1. Extract — the server reads the upload and extracts text. PDFs use per-page parsing with graceful skip-on-error; HTML is stripped to plain text; markdown and plain text pass through.
  2. Chunk — text is split into overlapping 1000-character windows with 200 characters of overlap. Chunk boundaries prefer paragraph breaks, then sentence ends, then spaces, so each chunk reads as a coherent passage.
  3. Embed — chunks are embedded in batches with OpenAI text-embedding-3-large (1536 dimensions via Matryoshka truncation).
  4. Index — embeddings land in Postgres pgvector with a cosine-distance IVFFlat index. Search is ANN (approximate nearest-neighbor), sub-50ms on the indexed path.
  5. Query — at call time, the search_knowledge tool sends the user’s question to the server. The server embeds the query, runs the ANN search, and returns the top-k chunks with filenames and scores for the LLM to quote.

Tips

  • One knowledge base per topic. A “Product Manuals” KB and a “Billing Policies” KB will retrieve more relevantly than a single “Everything” KB, because ANN search ranks within the whole pool.
  • Curate your source documents. Out-of-date or contradictory documents will surface; the retriever has no way to know which version is correct.
  • Expect a few seconds of latency on first retrieval. The search_knowledge tool adds one embedding round-trip and one DB query to the turn. In our measurements this is typically 200-500ms — noticeable but not disruptive.
  • Monitor the transcript. Every search_knowledge call is logged as a role=tool message on the conversation, including the query the LLM used and the chunks returned. If the agent is answering incorrectly, that’s the first place to look.