How we upgraded Maria: Telegram bot + call recordings in Odoo CRM

In 48 hours, Maria gained ops tools, 2-voice call recordings attached to the lead, and resilience fixes.

Two days ago we shipped Maria, our AI VoIP agent, after a one-day sprint. Since then we have been listening to every call, reading every transcript, and watching where the pipeline wobbles. In 48 hours Maria has gained a Telegram ops bot, call recordings with both voices attached to the lead, and a handful of resilience fixes that turn a demo into something we trust overnight.

This post picks up exactly where the previous one left off. If you have not read it, start here: How we built Maria.

What we learned in 48 hours

Three things became obvious as soon as real traffic hit Maria. First, we needed a back channel to talk to her outside the phone line, for status checks, quick queries, and the occasional restart when a worker misbehaved. Second, a transcript is not enough: when there is a dispute or a nuance, we want to hear both voices. And third, the post-call pipeline was a single point of failure. If Egress hiccuped, the lead never reached the CRM.

The next sections walk through each of those, in the order we tackled them.

Telegram bot @maria_ltc_bot

We built a second Maria that does not pick up the phone. She listens on Telegram. The bot runs as an independent systemd service (maria-telegram.service) with long polling, and only the owner chat id is allowed through. Everyone else gets a polite rejection.

Ten commands cover the daily ops surface:

/backups       Daily backup status (ict_fer + elPanocho + oci-test)
/query <q>     Run a natural-language Odoo query (xmlrpc under the hood)
/oca <q>       Ask the OCA expert skill
/pilot <q>     Ask the Odoo pilot skill (scripted admin)
/sysadmin <q>  Ask the system-admin skill (SSH into pve1/ipve1)
/calls         Recent Maria calls with lead id and summary
/status        Agent health: workers, RAM, last call, queue depth
/restart_maria Graceful restart of the voice agent
/start         Intro + command list
/help          Same as /start, short form

Telegram bot @maria_ltc_bot responding to operational commands

Real screenshot of @maria_ltc_bot answering /backups, /query and /oca — each tool result lands back as a Telegram message.

The interesting part is how the bot actually answers. The tools do not live inside the bot process. The bot process is a thin dispatcher. When a command lands, the bot drops a JSON request into a Unix socket, and a local Claude daemon picks it up and runs the matching Claude Code skill with full tool access.

Why the indirection? Because the Claude CLI refuses to run with --dangerously-skip-permissions as root, and for good reason. So we added a dedicated non-root user, claude-runner, with its own venv, its own SSH keys to pve1 and ipve1, and its own sandbox-disabled Claude config. The daemon runs as that user, exposes a Unix socket, and the bot (running as root, because it needs to read some service logs) talks to it over the socket. One privileged side orchestrates, the other side actually runs the tools. Clean boundary, and we get to reuse every Claude Code skill we already wrote: system-admin, odoo-pilot, oca-expert, the lot.

The Telegram channel became, in two days, the main operator console. Phone for customers, chat for us.

Call recordings with both voices

The first version of Maria captured audio passively with VoIPmonitor, a sidecar that sniffs RTP and writes WAV. It worked, but the path from pcap-to-WAV-to-Odoo-attachment was fragile, and the files were raw packet dumps with no clean separation between caller and agent. We wanted, inside the CRM lead, one audio file with both voices, reviewable in a browser, no extra tooling.

We tried LiveKit Egress in room-composite mode first, which renders the WebRTC room with a headless Chrome and produces a clean mixed output. On our CT, Chrome would launch but the composition pipeline would never emit the start signal, so Egress eventually aborted. Rather than fight Chromium inside an unprivileged LXC, we flipped to a different Egress mode: two track_egress requests in parallel, one per participant, each writing its own OGG/Opus file. When the call ends, ffmpeg takes over and does the job that Chrome refused to do:

ffmpeg -i caller.ogg -i agent.ogg \
       -filter_complex "[0:a][1:a]amix=inputs=2:duration=longest[a]" \
       -map "[a]" -c:a libopus -b:a 32k -ac 1 mix.ogg

The result is a single mono OGG/Opus file at 32 kbps, small enough to live as an ir.attachment, clean enough to be useful. A new method on our custom module, voip.call.attach_recording_to_lead, posts the file to the existing lead's chatter with a short message. The transcript message already arrived seconds earlier from register_call. Two messages, same lead, in order: text first, audio right after.

UX improvements

Three small changes that made a disproportionate difference.

100 Deepgram keyterms. We fed the STT a curated list of 100 terms: OCA, Odoo, ERP, ICT and friends, plus the top Spanish surnames from the INE. Now when a caller named Conesa calls, Maria hears "Conesa" instead of "Conhexa". The lead ends up with the right name, which matters a surprising amount when you have to call the person back.
Odoo is pronounced "Odú". ElevenLabs, left to its own devices, spells the word as "o-do-o" letter by letter. We added one line to the system prompt forcing the pronunciation to "Odú", and Maria now says it like a human.
"When you hang up the opportunity is created automatically." A subtle but annoying failure mode: callers would ask "can you create an opportunity for me?", and Maria, trained to be cautious, would reply "for security reasons I cannot create records". Technically correct, socially wrong, because the opportunity is being created, just at hangup. One line in the prompt rewrote that interaction: Maria now confirms that the lead will be logged when the call ends, and moves on.

Resilience fixes

Real callers find bugs that designers never anticipate. A handful of defensive patches earned their keep this week.

Silence watchdog. If neither side speaks for 120 seconds, Maria hangs up politely. Without this, a call that hit an Anthropic 429 could sit forever with dead air.
Shutdown timeout raised to 60 s. The default livekit-agents shutdown_process_timeout is 10 seconds. That was killing our post-call pipeline mid-attach. Sixty seconds is generous but bounded, and now the recording actually makes it to Odoo.
Pipeline reordered: lead first, recording second. Creating the lead is fast and must succeed. Attaching the recording is slow and best-effort. We split them. Even if Egress fails or ffmpeg stalls, the opportunity is in the CRM with its transcript and summary before we start worrying about audio.
Defensive parking. Right before the XML-RPC call to Odoo, we dump the payload to /var/log/voip/pending_leads/<call_id>.json. If Odoo is down, the payload is not lost: a recovery cron picks it up next run. If Odoo accepts it, the parked file is deleted. Tiny code change, huge peace of mind.

Capacity cap via `load_fnc`

The hardest constraint is not CPU or RAM, it is the Anthropic concurrent-request ceiling on our current tier. Two live conversations plus the Telegram bot can already trigger a 429. So we installed a soft cap on the voice agent: a custom load_fnc that inspects the number of in-flight calls and returns load=1.0 when we are at two concurrent. LiveKit's SFU refuses the third call cleanly, and the caller hears a busy tone instead of a broken session. No silent failures, no half-processed leads.

It is a crude cap, but it turns a soft limit into a hard one, which is exactly what you want when the downstream rate limit is the real bottleneck. When we upgrade the Anthropic tier, we raise the cap by editing one integer.

What is next

Short list, in order of probable pain relief: upgrading the Anthropic tier to lift the concurrent-call cap, widening the RTP port range to absorb more than five parallel calls, and adding a second agent voice for English callers. The bones are stable now. Everything from here is polish and volume.

If you want to hear Maria yourself, call us at +34 868 35 37 57 or drop us a line. She will tell you she is an AI, she will log the call, and both our voices will be waiting in the lead by the time you hang up.

Fer & Claude, 2026-04-22.

Full technical report

The section below is the complete internal report we wrote for ourselves at the end of the sprint. It covers the whole stack, the architecture, every phase, capacity limits, costs and pending work. Nothing has been trimmed except the public IPs and one private phone number.

Download raw Markdown

Informe final — voip-expert / agente María

Fecha: 2026-04-22 Sesiones de trabajo: 2026-04-20 · 2026-04-21 · 2026-04-22 Estado: operativo en producción

1. Resumen ejecutivo

Agente VoIP autónomo que:

Atiende el DID +34 868 35 37 57 (Zadarma) en español con voz natural (ElevenLabs Flash v2.5, voz España uQw4jpKzMLrZuo0RLPS9).
Mantiene conversación con Claude Haiku 4.5, STT con Deepgram Nova-3.
Al colgar genera automáticamente: opportunity en CRM elPanocho + transcripción en chatter + resumen con Claude Sonnet 4.6 + grabación de audio en OGG/Opus mezclado (ambas voces).
Tiene modo debug activado desde XXX XXX XXX (Fernando) con tools internos: check_backups, query_odoo, ask_oca_expert, ask_odoo_pilot, ask_system_admin, send_telegram.
Bot Telegram independiente (@maria_ltc_bot) para consultas textuales con los mismos tools.
Watchdog de silencio que cuelga automáticamente si no hay actividad 120 s.
Self-hosted: todo corre en un CT LXC en pve1 (Proxmox local).
Código versionado en repos privados GitLab, sin dependencias cloud para la lógica.

2. Stack tecnológico

Capa	Tecnología	Versión	Rol
Telefonía PSTN	Zadarma	—	DID + SIP trunk
Router / NAT	Huawei EG8145V5	—	Port-forward UDP 5060 + 10000-10999
SIP gateway	livekit-sip	1.2.0	Terminación SIP, puente a WebRTC
SFU / Realtime	livekit-server	1.11.0	Rooms WebRTC, media routing
Agent framework	livekit-agents (Python)	1.5.4	Ciclo de vida de la llamada
LLM (diálogo)	Claude Haiku 4.5	—	Streaming en tiempo real
LLM (resumen)	Claude Sonnet 4.6	—	Summary post-llamada
STT	Deepgram Nova-3 (multi)	—	Transcripción ES/EN + keyterms
TTS	ElevenLabs Flash v2.5	—	Voz synth, speed 0.85
VAD	Silero	—	Detección de turno
Grabación	LiveKit Egress + ffmpeg	—	2 track_egress + amix
CRM / Backoffice	Odoo 16 (elPanocho DB)	16.0	Opportunity, chatter, attachment
Módulo custom	`custom_fer`	16.0.1.8	`voip.call.register_call` + `attach_recording_to_lead`
Bot Telegram	python-telegram-bot	21.11.1	Canal de consulta por chat
Daemon Claude CLI	Unix socket	—	Subprocess `claude` con perms bypass

3. Arquitectura

  PSTN
   │
   ▼
┌──────────┐   SIP/RTP    ┌──────────────────┐ WebRTC ┌────────────────┐
│ Zadarma  │─────────────▶│ Router Huawei    │───────▶│ livekit-sip    │
│ DID      │  UDP 5060    │ NAT 1:1 + fwd   │        │ (CT-140)       │
└──────────┘  10000-10999 └──────────────────┘        └────────┬───────┘
                                                              │
                                     ┌────────────────────────┼─────────────────┐
                                     ▼                        ▼                 ▼
                            ┌────────────────┐      ┌─────────────────┐   ┌───────────────┐
                            │ livekit-server │◀────▶│ maria-agent     │   │ livekit-egress│
                            │ (CT-140)       │      │ (entrypoint)    │   │ (Docker)      │
                            └────────────────┘      └──┬──────────────┘   └──────┬────────┘
                                                       │                         │ track x2
                       ┌───────────────────────────────┤                         │
                       │               │               │                         │
                       ▼               ▼               ▼                         ▼
               ┌──────────────┐ ┌─────────────┐ ┌──────────────┐       /var/lib/livekit-egress/
               │ Deepgram STT │ │ Anthropic   │ │ ElevenLabs   │            recordings/*.ogg
               │ Nova-3       │ │ Claude API  │ │ Flash v2.5   │                   │
               └──────────────┘ └─────────────┘ └──────────────┘                   ▼
                                                                             ffmpeg amix
                                                                                   │
                       ┌───────────────────────────────────────────────────────────┘
                       ▼
               ┌─────────────────────┐   XML-RPC    ┌─────────────────────────┐
               │ run_post_call_      │─────────────▶│ Odoo 16 (CT-116 ipve1)  │
               │ pipeline            │              │ custom_fer.voip.call    │
               │ (Sonnet + parking)  │              │ register_call +         │
               └─────────────────────┘              │ attach_recording_to_lead│
                                                    └─────────────────────────┘

            ┌───────────────────────────────────────────────────────────────┐
            │                Canales operativos (debug)                     │
            │                                                               │
            │  Telegram @maria_ltc_bot  ◀─▶  maria-telegram.service         │
            │    (long polling)              │                              │
            │                                ▼                              │
            │                         claude-daemon ◀─▶ Unix socket         │
            │                    (claude-runner user, --bypass)             │
            └───────────────────────────────────────────────────────────────┘

4. Infraestructura

CT-140 `ct-voip` en pve1 (Proxmox local)

LXC unprivileged con nesting=1, keyctl=1
Debian 12, Python 3.11, Node 20
IP LAN estática 192.168.1.7/24 (gw 192.168.1.1)
4 cores / 4 GB RAM / 20 GB ZFS (zfs-storage:subvol-140-disk-0)
Paquetes relevantes: bubblewrap, socat, ffmpeg 5.1.8, docker.io

Servicios systemd activos en CT-140

Unit	Descripción	Path script/unit
`livekit-server.service`	SFU WebRTC	`/opt/livekit/...`
`livekit-sip.service`	SIP gateway	build from source v1.2.0
`livekit-egress.service`	Docker `livekit/egress:latest`	config `/etc/livekit/egress.yaml`
`maria-agent.service`	Agent Python (entrypoint)	`/opt/voip-agent/maria_phase2.py`
`maria-telegram.service`	Bot Telegram	`/opt/voip-agent/maria_telegram.py`
`claude-daemon.service`	Unix socket → claude CLI	`/opt/voip-agent/tools/claude_daemon.py`
`redis-server.service`	Backend livekit + egress	default

Usuarios del sistema

root — administra todos los servicios
claude-runner (uid 1001, home /home/claude-runner) — ejecuta claude CLI con --dangerously-skip-permissions (no permitido como root)
venv propio con paramiko, requests, pyyaml, tabulate, lxml, bs4, python-dotenv
SSH keys a pve1 e ipve1 (claves /home/claude-runner/.ssh/id_ed25519*)
Sandbox Claude deshabilitado: ~/.claude/settings.json con {sandbox:{enabled:false}}

Container `livekit-egress` (Docker)

Imagen oficial livekit/egress:latest
--network host, --cap-add=SYS_ADMIN
Bind mount /var/lib/livekit-egress/recordings:/out (chmod 777)
User interno egress (uid 1001)
Chrome 125 incluido (no usado actualmente; usamos track_egress directo)

Infraestructura externa

Router Huawei EG8145V5 (192.168.1.1): admin via Selenium (credenciales documentadas en reference_home_router.md)
WAN IP public.ip.address.number (dinámica; hay que monitorizar)
ipve1 (OVH, public.ip.address.number): CT-116 con Odoo 16 elPanocho — destino CRM
CT-200 (pve1): server de backups, corre check_backups.sh

5. Flujo de llamada entrante

T+0.0s   caller marca +34 868 35 37 57
T+0.5s   Zadarma INVITE → WAN public.ip.address.number:5060
T+0.6s   router NAT → CT-140:5060
T+0.7s   livekit-sip valida trunk (IP 185.45.152.0/22), crea room maria-_<caller>_<random>
T+1.0s   dispatch rule lanza agent worker → entrypoint(JobContext)
T+1.5s   agent joins room, ejecuta greeting GDPR:
         "Hola, le atiende María, asistente virtual de lemontreecloud.
          Esta llamada será grabada para atención al cliente. ¿En qué puedo ayudarle?"
T+3s     saludo TTS reproducido
T+3s     _start_recording: 2 track_egress paralelos
         - caller audio → /out/<room>_<ts>_caller.ogg
         - agent audio  → /out/<room>_<ts>_agent.ogg
T+3s     silence watchdog armed (120 s)

── Ciclo conversacional ──
         STT final usuario → Haiku (streaming)
         Haiku decide: respuesta vocal + quizá tool call (solo debug)
         Si tool: ctx.session.say("Espere un segundo...") luego tool execution
         Respuesta TTS se reproduce
         Cada turno resetea watchdog

── Fin de llamada ──
T+N      usuario dice "adiós" (o equivalente)
T+N+Δ    Haiku invoca end_call
T+N+Δ+1s   TTS drain + grace period → room.disconnect()
T+N+5s   shutdown hook fires (process exiting)
         ┌─ Pipeline post-call (shutdown_process_timeout=60s) ─┐
         │  1. Build transcript snapshot from recorder         │
         │  2. Sonnet 4.6 → {summary, intent, language}        │
         │  3. _park_pending(payload) → /var/log/voip/pending_leads/ (safety)  │
         │  4. XML-RPC voip.call.register_call → lead_id       │
         │  5. Unpark (delete safety file) si éxito            │
         │  6. Stop egress (EGRESS_COMPLETE → ok)              │
         │  7. Poll 2 .ogg files (estables, ≥500 bytes)        │
         │  8. ffmpeg amix → mix.ogg (libopus 32k mono)        │
         │  9. XML-RPC voip.call.attach_recording_to_lead      │
         │ 10. Cleanup files locales                           │
         └──────────────────────────────────────────────────────┘

Resultado en Odoo CRM:
  - Lead (type=opportunity) con:
    · name = primera frase del summary
    · phone = número caller
    · description = summary
    · source_id = VoIP Inbound
    · tag_ids = [voip-inbound, <intent>]
    · user_id = Fernando
    · chatter:
        · mensaje 1: transcript completo renderizado HTML
        · mensaje 2: "Grabación adjuntada" + attachment caller_agent_mix.ogg

6. Integraciones externas

Servicio	Cuenta	Credenciales	Notas
Zadarma	DID +34 868 35 37 57	Panel zadarma.com	"Servidor externo" apunta a WAN + modo SIP IP-based auth
Anthropic API	Fernando (tier actual)	`ANTHROPIC_API_KEY` en `.env`	⚠️ rate-limit concurrent ya bloqueó conversaciones
Deepgram	Fernando	`DEEPGRAM_API_KEY` en `.env`	Nova-3 multi con 100 keyterms (acrónimos + nombres INE)
ElevenLabs	Fernando (paid)	`ELEVENLABS_API_KEY` en `.env`	Voz ES España `uQw4jpKzMLrZuo0RLPS9`, speed 0.85, model Flash v2.5
Telegram Bot	`@maria_ltc_bot`	Token en `.env`	Whitelist chat_id Fernando, rechaza otros
GitLab	`fernandohc`	PAT rotatorio	Repos privados
Odoo elPanocho	`fernando@elpanocho.com`	API key en `.env`	CRM destino de leads

7. Módulo `custom_fer` v16.0.1.8 (Odoo 16 elPanocho)

Path: /opt/odoo/custom/apps/custom_fer/ (CT-116 ipve1). Repo gitlab.com/fernandohc/apps branch antigravity.

Modelo `voip.call`

Métodos públicos XML-RPC:

register_call(payload) — crea opportunity a partir de payload de llamada finalizada.
payload: caller_phone, started_at, ended_at, duration_s, language, transcript, summary, intent, debug_mode, silence_hangup, extra_tags
Retorno: {lead_id, lead_url, attachment_id (False)} (None convertido a False para XML-RPC compat)
Acciones: crm.lead.create(type=opportunity), posts transcripción renderizada HTML, aplica tags y source.
attach_recording_to_lead(payload) — adjunta audio a un lead existente.
payload: lead_id, recording_b64, recording_filename
Mime detect .ogg → audio/ogg, .mp4 → audio/mp4
ir.attachment.create + mail.message con "Grabación adjuntada".
Pipeline de 2 pasos: primero crea lead (rápido, debe tener éxito), luego attach (lento, best-effort).

Flujo email inbound (nuevo 2026-04-22)

fetchmail.server id=3 tira email de fernando@lemontreecloud.com cada 5 min.
mail.alias id=49 con defaults: python { 'type': 'opportunity', 'user_id': 2, # Fernando 'team_id': 1, # Sales 'source_id': 12, # Email Inbound 'tag_ids': [(6, 0, [19])], # email-inbound }
Cada email a fernando@lemontreecloud.com genera un opportunity taggeado.

8. Fases del proyecto (cronológico)

Fase	Fecha	Descripción	Resultado
1 — Hello World	2026-04-20	Llamada hace ring + TTS "hola"	Zadarma + livekit-sip + NAT 1:1 + primer agent Python
2 — Diálogo producción	2026-04-20	María mantiene conversación bilingüe con Haiku + Deepgram + ElevenLabs, GDPR, end_call tool	Primeras llamadas con Raúl
3 — CRM integration	2026-04-20	Hook `on_shutdown` crea opportunity en elPanocho con transcript + resumen Sonnet	`custom_fer.voip.call.register_call` + fix `type=opportunity` para visibilidad
4 — Grabación pasiva (deprecado)	2026-04-20	VoIPmonitor sidecar con MariaDB + GUI	reemplazado en Fase 6
5 — Tools operativos	2026-04-21	6 `function_tool` debug-only en el agent + OCA primer en system prompt	Gateway para `ask_oca_expert`, `ask_odoo_pilot`, `ask_system_admin`, `check_backups`, `query_odoo`, `send_telegram`
5.1 — Bot Telegram	2026-04-21	`@maria_ltc_bot` con 10 comandos, long polling, whitelist Fernando	Servicio independiente `maria-telegram.service`
5.2 — Daemon Claude CLI	2026-04-21	Unix socket daemon corriendo como `claude-runner` no-root, bypass sandbox	Tools `ask_*` ya funcionales; sandbox Claude deshabilitado en settings.json
6 — LiveKit Egress	2026-04-21	Reemplaza VoIPmonitor por grabación activa self-contained	Archivo OGG adjunto al chatter del lead
6.1 — Ambas voces	2026-04-22	Room composite audio_only con Chrome (falló: start signal not received) → migrado a dual `track_egress` + `ffmpeg amix`	Un OGG/Opus con caller + agent mezclados
6.2 — Resiliencia pipeline	2026-04-22	Reordering (lead primero, recording después) + parking defensivo + `attach_recording_to_lead` async	Lead se crea aunque egress falle
6.3 — Fix perms + egress ending	2026-04-22	`chmod 777 /out`, `EGRESS_ENDING`/`COMPLETE` diferenciados de `ABORTED`/`FAILED`	Grabación se adjunta fiable
6.3.1 — keyterms + prompt	2026-04-22	100 keyterms (acrónimos + INE top apellidos) + prompt "al colgar se crea automáticamente" + fix `Odú` vs `o-do-o`	UX conversacional
6.3.3 — Silence watchdog	2026-04-22	Auto-hangup a 120 s de silencio	Resiliencia ante rate limit 429
6.3.4 — Shutdown timeout	2026-04-22	`shutdown_process_timeout=60s` (default 10s mataba el pipeline)	Attach completa antes del SIGKILL
7 — Email inbound CRM	2026-04-22	Alias `fernando@` añade source+tag al opportunity	Emails también entran al CRM taggeados

9. Capacidad y límites

Componente	Límite efectivo	Observaciones
Zadarma trunk	10 canales simultáneos	Capa externa
livekit-sip RTP ports	10 ports (10050-10059) → ~5 llamadas paralelas	Se puede ampliar; router ya permite 10000-10999
livekit-server	cientos de rooms	Sin cuello de botella
maria-agent workers	`num_idle_processes=4` + auto-spawn	RAM/CPU real marcan el techo
CT-140 recursos	4 cores / 4 GB RAM	Cada llamada activa ~300 MB RAM
Anthropic concurrent	tier actual → 429 con 1 llamada + bot activo	cuello de botella principal; upgrade de tier o retry+backoff
Deepgram / ElevenLabs	tier actual suele soportar docenas	No hemos visto bottleneck

Veredicto: soporta razonablemente 2-3 llamadas concurrentes. Para absorber 10 simultáneas de Zadarma hay que (a) ampliar RTP range, (b) subir tier Anthropic, (c) probablemente subir RAM a 8 GB.

10. Repositorios Git

Repo	URL	Contenido
`voip-ai-agent`	https://gitlab.com/fernandohc/voip-ai-agent	Agent Python, bot Telegram, claude-daemon, scripts, deploy/
`apps` (branch antigravity)	https://gitlab.com/fernandohc/apps	Módulo `custom_fer` con `voip.call`, otros módulos Odoo
`claude-memory`	https://gitlab.com/fernandohc/claude-memory	Memoria persistente de Claude Code Fernando (170+ archivos)
`ai-tools/voip-expert`	local en `/opt/odoo/custom/ai-tools/`	Skill Claude Code con SKILL.md + references + templates

Todas privadas. Auto-sync de claude-memory vía hook SessionEnd + PreCompact.

11. Memorias / referencias clave

feedback_livekit_sip_zadarma.md — 3 gotchas NAT (flood/Numbers/nat_1_to_1)
feedback_voip_debug_tools.md — patrón debug-only tools
feedback_livekit_egress_record.md — track_egress + ffmpeg mix
feedback_crm_lead_vs_opportunity.md — crm.lead type=lead es invisible en UI
reference_ct140_voip.md — specs del contenedor
reference_home_router.md — admin router Huawei
reference_elevenlabs_account.md — SSO Google, API key narrow
reference_voipmonitor_ct140.md — deprecado (Fase 6 reemplazó)
feedback_claude_daemon.md — patrón Unix socket para bypass sandbox/root
user_fernando_contact.md — whitelist debug +34XXXXXXXXX

12. Pendientes y roadmap

Pendientes operativos

Rotar 3 credenciales de oci_test_deployment.md (test env de hace 37 días, repo privado pero documentadas)
Borrar /opt/odoo/obsidian-vault/ (ya migrado el contenido útil a claude-memory)
Activar skill voip-expert con /opt/odoo/custom/ai-tools/sync-skills.sh

Roadmap técnico

Subir tier Anthropic o implementar retry+backoff en 429 dentro del plugin anthropic de livekit-agents
Ampliar RTP range (10050-10149) si se necesitan >5 llamadas paralelas
WAN IP dinámica: DDNS o script de reload si cambia (por ahora static)
Sync CT-116 environment: clonar repos y skills en CT-116 (ipve1) para que el CRM tenga herramientas claude locales también
Blog post sobre la experiencia: ya live en https://lemontreecloud.com/blog/ltc-labs-3/como-construimos-maria-nuestra-agente-voip-con-ia-en-un-dia-12

Ideas a evaluar

PersonaPlex 7B v1 (NVIDIA) cuando salga versión ES: full-duplex speech-to-speech con latencia <300 ms, pero requiere GPU 16 GB y v1 es English-only
Voxtral TTS (Mistral): alternativa open-source a ElevenLabs si queremos bajar coste variable
Recordings storage: actualmente OGG en ir.attachment. Si volumen crece, considerar S3/MinIO externo
Multi-lang: María responde en el idioma del caller, pero OCA primer está en español. Ampliar a EN/PT si hay leads internacionales
Voice biometrics: detectar caller recurrente por voiceprint para personalizar saludo

13. Datos sensibles (referencias)

Credenciales operativas viven en /opt/voip-agent/.env de CT-140 (permisos 640 root:claude-runner, no versionado). Contiene:

ANTHROPIC_API_KEY
DEEPGRAM_API_KEY
ELEVENLABS_API_KEY (+ alias ELEVEN_API_KEY)
LIVEKIT_API_KEY + LIVEKIT_API_SECRET + LIVEKIT_URL
ZADARMA_* (si usadas)
TELEGRAM_BOT_TOKEN + TELEGRAM_CHAT_ID
ODOO_URL + ODOO_DB + ODOO_USER + ODOO_API_KEY
SILENCE_TIMEOUT_S=120
PVE1_HOST=root@192.168.1.2

Tokens GitLab son efímeros (PAT rotatorio Fernando, scope api, vida 90 días). Nunca se commitean; se usan vía URL transitoria oauth2:<token>@ y luego se limpia el remote.

14. Coste por llamada (estimación)

Llamada típica de 3 minutos:

Ítem	Coste aprox
Zadarma DID + minutos	~0,03 €
Deepgram Nova-3	~0,12 €
ElevenLabs Flash v2.5 (~600-800 chars agente)	~0,05 €
Claude Haiku 4.5 streaming + Sonnet 4.6 summary	~0,01 €
Total por llamada 3 min	< 0,25 €

Infra self-hosted (CT-140 + livekit-*): coste marginal ya absorbido por el Proxmox de casa.

Generado por Claude Code en colaboración con Fernando Hernández, sesiones 2026-04-20 → 2026-04-22.

How we built Maria: our AI VoIP agent in one day

SIP trunk, LiveKit self-hosted, Deepgram, Claude and ElevenLabs. One day, one phone number, one AI that answers.