All API calls require an API key in the X-API-Key header, or a client session token in X-Client-Token. Obtain API keys from Dashboard โ Settings โ API Keys.
Note: Client endpoints require X-Client-Token. Admin endpoints require X-Admin-Token.
| Field | Type | Required | Description |
|---|---|---|---|
| string | Required | User email address | |
| password | string | Required | Account password |
{ "client_id": "clt-...", "token": "clt-...", "name": "John Doe", "email": "user@example.com" }
| Field | Type | Required | Description |
|---|---|---|---|
| id_token | string | Required | Google ID token from Google Sign-In flow |
A Memory is an AI knowledge base that stores uploaded documents, websites, and videos. Each chatbot is backed by a distinct memory.
Header: X-Client-Token: {token}
| Field | Type | Required | Description |
|---|---|---|---|
| name | string | Required | Chatbot memory name |
| description | string | Optional | Short description |
| provider | string | Optional | "gemini" (default) | "openai" |
All sources and chat history will be permanently deleted.
Send questions to a memory chatbot. The AI answers using only the knowledge stored in that memory.
| Field | Type | Required | Description |
|---|---|---|---|
| question | string | Required | User's question |
| top_k | integer | Optional | Number of chunks to retrieve (default: 5) |
| history | array | Optional | Chat history e.g. [{"role": "user", "content": "..."}] |
{ "answer": "Based on the knowledge base, ...", "sources": ["document.pdf", "website.com"], "latency_ms": 1240, "history_saved": true }
| Query Param | Type | Default | Description |
|---|---|---|---|
| limit | integer | 10 | Max Q&A pairs to return |
| offset | integer | 0 | Pagination offset |
Add knowledge to a memory by uploading PDFs, pasting web URLs, or providing YouTube links.
Content-Type: multipart/form-data | Form field: file
| Field | Type | Description |
|---|---|---|
| url | string | HTTP/HTTPS URL of the article/website |
| Field | Type | Description |
|---|---|---|
| url | string | YouTube video link |
| Field | Type | Description |
|---|---|---|
| url | string | HTTP/HTTPS URL of the JSON data |
| title | string | Descriptive title for the source |
| Field | Type | Required | Description |
|---|---|---|---|
| connection_string | string | Required | MongoDB URI (mongodb+srv://...) |
| database | string | Required | MongoDB Database Name |
| collection | string | Required | MongoDB Collection Name |
| title | string | Optional | Descriptive title for the source |
Generate short video reels from memory content or a specific answer. Placeholder videos are returned โ integrate D-ID, HeyGen, or Runway ML for production use.
| Field | Type | Required | Description |
|---|---|---|---|
| topic | string | Optional | Topic/title for the reel (default: memory name) |
| style | string | Optional | "cinematic" | "minimal" | "energetic" |
| content_text | string | Optional | Specific answer text to generate from (max 5000 chars) |
{ "success": true, "video_url": "https://...", "topic": "...", "style": "cinematic" }
Voice latency is the delay between a user speaking and the chatbot starting to respond. The total is the sum of several pipeline stages:
| Component | Typical | Range | Rating |
|---|---|---|---|
| Speech-to-Text (STT) Transcribing voice to text (e.g. Whisper, Google STT) |
200โ400ms | 100โ800ms | Fast |
| Vector Search (RAG) Finding relevant chunks in the memory |
50โ150ms | 30โ400ms | Fast |
| LLM Generation โ Gemini Generating the AI answer (Google Gemini) |
800โ1500ms | 500โ3000ms | Moderate |
| LLM Generation โ GPT-4o Generating the AI answer (OpenAI) |
600โ1200ms | 400โ2500ms | Moderate |
| Text-to-Speech (TTS) Converting answer text to audio |
300โ600ms | 200โ1200ms | Moderate |
| Network Round-trip API calls + server overhead |
50โ200ms | 20โ500ms | Fast |
| Total โ Non-streaming | 1.5โ3.0s | 1.0โ5.0s | Acceptable |
| Total โ Streaming (first token) | 300โ700ms | 200msโ1.5s | Excellent |
Tip: Enable streaming ("stream": true in the chat request) to reduce perceived latency. The first token appears in ~300โ700ms, making the response feel instant to users.
| Scenario | Total Latency | Notes |
|---|---|---|
| Simple Q&A (1โ2 sources) | ~800ms | Streaming + small context |
| Complex multi-document query (10+ sources) | ~2.5s | More chunks to process |
| Cold start / first message | ~1.5โ2s | Model warm-up overhead |
| Subsequent messages (warm) | ~600msโ1.2s | Embedded context cache |
| Full voice pipeline (STT + LLM + TTS) | ~2.0โ4.0s | End-to-end voice interaction |
Embed the chatbot on any website using an iframe, or share a QR code that opens the public chat interface.
<!-- Add to any webpage --> <iframe src="https://your-domain.com/memory-chat-public?id={memory_id}" width="400" height="600" style="border:none;border-radius:16px" allow="microphone" ></iframe>
https://your-domain.com/memory-chat-public?id={memory_id}
| Code | Error | Cause | Fix |
|---|---|---|---|
| 400 | Bad Request | Missing or invalid body fields | Check required parameters |
| 401 | Unauthorized | Missing or invalid API key / token | Check X-API-Key or X-Client-Token header |
| 403 | Forbidden | Accessing another user's resource | Use the correct client token |
| 404 | Not Found | Memory or resource doesn't exist | Verify the memory_id or resource path |
| 422 | Validation Error | Field type or constraint mismatch | Check field types (int vs string) |
| 500 | Server Error | Unexpected server-side failure | Check server logs; retry after moment |