How we upgraded Maria: Telegram bot + call recordings in Odoo CRM

In 48 hours, Maria gained ops tools, 2-voice call recordings attached to the lead, and resilience fixes.

Two days ago we shipped Maria, our AI VoIP agent, after a one-day sprint. Since then we have been listening to every call, reading every transcript, and watching where the pipeline wobbles. In 48 hours Maria has gained a Telegram ops bot, call recordings with both voices attached to the lead, and a handful of resilience fixes that turn a demo into something we trust overnight.

This post picks up exactly where the previous one left off. If you have not read it, start here: How we built Maria.

What we learned in 48 hours

Three things became obvious as soon as real traffic hit Maria. First, we needed a back channel to talk to her outside the phone line, for status checks, quick queries, and the occasional restart when a worker misbehaved. Second, a transcript is not enough: when there is a dispute or a nuance, we want to hear both voices. And third, the post-call pipeline was a single point of failure. If Egress hiccuped, the lead never reached the CRM.

The next sections walk through each of those, in the order we tackled them.

Telegram bot @maria_ltc_bot

We built a second Maria that does not pick up the phone. She listens on Telegram. The bot runs as an independent systemd service (maria-telegram.service) with long polling, and only the owner chat id is allowed through. Everyone else gets a polite rejection.

Ten commands cover the daily ops surface:

/backups       Daily backup status (ict_fer + elPanocho + oci-test)
/query <q>     Run a natural-language Odoo query (xmlrpc under the hood)
/oca <q>       Ask the OCA expert skill
/pilot <q>     Ask the Odoo pilot skill (scripted admin)
/sysadmin <q>  Ask the system-admin skill (SSH into pve1/ipve1)
/calls         Recent Maria calls with lead id and summary
/status        Agent health: workers, RAM, last call, queue depth
/restart_maria Graceful restart of the voice agent
/start         Intro + command list
/help          Same as /start, short form

Telegram bot @maria_ltc_bot responding to operational commands

Real screenshot of @maria_ltc_bot answering /backups, /query and /oca — each tool result lands back as a Telegram message.

The interesting part is how the bot actually answers. The tools do not live inside the bot process. The bot process is a thin dispatcher. When a command lands, the bot drops a JSON request into a Unix socket, and a local Claude daemon picks it up and runs the matching Claude Code skill with full tool access.

Why the indirection? Because the Claude CLI refuses to run with --dangerously-skip-permissions as root, and for good reason. So we added a dedicated non-root user, claude-runner, with its own venv, its own SSH keys to pve1 and ipve1, and its own sandbox-disabled Claude config. The daemon runs as that user, exposes a Unix socket, and the bot (running as root, because it needs to read some service logs) talks to it over the socket. One privileged side orchestrates, the other side actually runs the tools. Clean boundary, and we get to reuse every Claude Code skill we already wrote: system-admin, odoo-pilot, oca-expert, the lot.

The Telegram channel became, in two days, the main operator console. Phone for customers, chat for us.

Call recordings with both voices

The first version of Maria captured audio passively with VoIPmonitor, a sidecar that sniffs RTP and writes WAV. It worked, but the path from pcap-to-WAV-to-Odoo-attachment was fragile, and the files were raw packet dumps with no clean separation between caller and agent. We wanted, inside the CRM lead, one audio file with both voices, reviewable in a browser, no extra tooling.

We tried LiveKit Egress in room-composite mode first, which renders the WebRTC room with a headless Chrome and produces a clean mixed output. On our CT, Chrome would launch but the composition pipeline would never emit the start signal, so Egress eventually aborted. Rather than fight Chromium inside an unprivileged LXC, we flipped to a different Egress mode: two track_egress requests in parallel, one per participant, each writing its own OGG/Opus file. When the call ends, ffmpeg takes over and does the job that Chrome refused to do:

ffmpeg -i caller.ogg -i agent.ogg \
       -filter_complex "[0:a][1:a]amix=inputs=2:duration=longest[a]" \
       -map "[a]" -c:a libopus -b:a 32k -ac 1 mix.ogg

The result is a single mono OGG/Opus file at 32 kbps, small enough to live as an ir.attachment, clean enough to be useful. A new method on our custom module, voip.call.attach_recording_to_lead, posts the file to the existing lead's chatter with a short message. The transcript message already arrived seconds earlier from register_call. Two messages, same lead, in order: text first, audio right after.

UX improvements

Three small changes that made a disproportionate difference.

  • 100 Deepgram keyterms. We fed the STT a curated list of 100 terms: OCA, Odoo, ERP, ICT and friends, plus the top Spanish surnames from the INE. Now when a caller named Conesa calls, Maria hears "Conesa" instead of "Conhexa". The lead ends up with the right name, which matters a surprising amount when you have to call the person back.
  • Odoo is pronounced "Odú". ElevenLabs, left to its own devices, spells the word as "o-do-o" letter by letter. We added one line to the system prompt forcing the pronunciation to "Odú", and Maria now says it like a human.
  • "When you hang up the opportunity is created automatically." A subtle but annoying failure mode: callers would ask "can you create an opportunity for me?", and Maria, trained to be cautious, would reply "for security reasons I cannot create records". Technically correct, socially wrong, because the opportunity is being created, just at hangup. One line in the prompt rewrote that interaction: Maria now confirms that the lead will be logged when the call ends, and moves on.

Resilience fixes

Real callers find bugs that designers never anticipate. A handful of defensive patches earned their keep this week.

  • Silence watchdog. If neither side speaks for 120 seconds, Maria hangs up politely. Without this, a call that hit an Anthropic 429 could sit forever with dead air.
  • Shutdown timeout raised to 60 s. The default livekit-agents shutdown_process_timeout is 10 seconds. That was killing our post-call pipeline mid-attach. Sixty seconds is generous but bounded, and now the recording actually makes it to Odoo.
  • Pipeline reordered: lead first, recording second. Creating the lead is fast and must succeed. Attaching the recording is slow and best-effort. We split them. Even if Egress fails or ffmpeg stalls, the opportunity is in the CRM with its transcript and summary before we start worrying about audio.
  • Defensive parking. Right before the XML-RPC call to Odoo, we dump the payload to /var/log/voip/pending_leads/<call_id>.json. If Odoo is down, the payload is not lost: a recovery cron picks it up next run. If Odoo accepts it, the parked file is deleted. Tiny code change, huge peace of mind.

Capacity cap via load_fnc

The hardest constraint is not CPU or RAM, it is the Anthropic concurrent-request ceiling on our current tier. Two live conversations plus the Telegram bot can already trigger a 429. So we installed a soft cap on the voice agent: a custom load_fnc that inspects the number of in-flight calls and returns load=1.0 when we are at two concurrent. LiveKit's SFU refuses the third call cleanly, and the caller hears a busy tone instead of a broken session. No silent failures, no half-processed leads.

It is a crude cap, but it turns a soft limit into a hard one, which is exactly what you want when the downstream rate limit is the real bottleneck. When we upgrade the Anthropic tier, we raise the cap by editing one integer.

What is next

Short list, in order of probable pain relief: upgrading the Anthropic tier to lift the concurrent-call cap, widening the RTP port range to absorb more than five parallel calls, and adding a second agent voice for English callers. The bones are stable now. Everything from here is polish and volume.

If you want to hear Maria yourself, call us at +34 868 35 37 57 or drop us a line. She will tell you she is an AI, she will log the call, and both our voices will be waiting in the lead by the time you hang up.

Fer & Claude, 2026-04-22.

Full technical report

The section below is the complete internal report we wrote for ourselves at the end of the sprint. It covers the whole stack, the architecture, every phase, capacity limits, costs and pending work. Nothing has been trimmed except the public IPs and one private phone number.

Download raw Markdown

Informe final — voip-expert / agente María

Fecha: 2026-04-22 Sesiones de trabajo: 2026-04-20 · 2026-04-21 · 2026-04-22 Estado: operativo en producción


1. Resumen ejecutivo

Agente VoIP autónomo que:

  • Atiende el DID +34 868 35 37 57 (Zadarma) en español con voz natural (ElevenLabs Flash v2.5, voz España uQw4jpKzMLrZuo0RLPS9).
  • Mantiene conversación con Claude Haiku 4.5, STT con Deepgram Nova-3.
  • Al colgar genera automáticamente: opportunity en CRM elPanocho + transcripción en chatter + resumen con Claude Sonnet 4.6 + grabación de audio en OGG/Opus mezclado (ambas voces).
  • Tiene modo debug activado desde XXX XXX XXX (Fernando) con tools internos: check_backups, query_odoo, ask_oca_expert, ask_odoo_pilot, ask_system_admin, send_telegram.
  • Bot Telegram independiente (@maria_ltc_bot) para consultas textuales con los mismos tools.
  • Watchdog de silencio que cuelga automáticamente si no hay actividad 120 s.
  • Self-hosted: todo corre en un CT LXC en pve1 (Proxmox local).
  • Código versionado en repos privados GitLab, sin dependencias cloud para la lógica.

2. Stack tecnológico

Capa Tecnología Versión Rol
Telefonía PSTN Zadarma DID + SIP trunk
Router / NAT Huawei EG8145V5 Port-forward UDP 5060 + 10000-10999
SIP gateway livekit-sip 1.2.0 Terminación SIP, puente a WebRTC
SFU / Realtime livekit-server 1.11.0 Rooms WebRTC, media routing
Agent framework livekit-agents (Python) 1.5.4 Ciclo de vida de la llamada
LLM (diálogo) Claude Haiku 4.5 Streaming en tiempo real
LLM (resumen) Claude Sonnet 4.6 Summary post-llamada
STT Deepgram Nova-3 (multi) Transcripción ES/EN + keyterms
TTS ElevenLabs Flash v2.5 Voz synth, speed 0.85
VAD Silero Detección de turno
Grabación LiveKit Egress + ffmpeg 2 track_egress + amix
CRM / Backoffice Odoo 16 (elPanocho DB) 16.0 Opportunity, chatter, attachment
Módulo custom custom_fer 16.0.1.8 voip.call.register_call + attach_recording_to_lead
Bot Telegram python-telegram-bot 21.11.1 Canal de consulta por chat
Daemon Claude CLI Unix socket Subprocess claude con perms bypass

3. Arquitectura

  PSTN
   │
   ▼
┌──────────┐   SIP/RTP    ┌──────────────────┐ WebRTC ┌────────────────┐
│ Zadarma  │─────────────▶│ Router Huawei    │───────▶│ livekit-sip    │
│ DID      │  UDP 5060    │ NAT 1:1 + fwd   │        │ (CT-140)       │
└──────────┘  10000-10999 └──────────────────┘        └────────┬───────┘
                                                              │
                                     ┌────────────────────────┼─────────────────┐
                                     ▼                        ▼                 ▼
                            ┌────────────────┐      ┌─────────────────┐   ┌───────────────┐
                            │ livekit-server │◀────▶│ maria-agent     │   │ livekit-egress│
                            │ (CT-140)       │      │ (entrypoint)    │   │ (Docker)      │
                            └────────────────┘      └──┬──────────────┘   └──────┬────────┘
                                                       │                         │ track x2
                       ┌───────────────────────────────┤                         │
                       │               │               │                         │
                       ▼               ▼               ▼                         ▼
               ┌──────────────┐ ┌─────────────┐ ┌──────────────┐       /var/lib/livekit-egress/
               │ Deepgram STT │ │ Anthropic   │ │ ElevenLabs   │            recordings/*.ogg
               │ Nova-3       │ │ Claude API  │ │ Flash v2.5   │                   │
               └──────────────┘ └─────────────┘ └──────────────┘                   ▼
                                                                             ffmpeg amix
                                                                                   │
                       ┌───────────────────────────────────────────────────────────┘
                       ▼
               ┌─────────────────────┐   XML-RPC    ┌─────────────────────────┐
               │ run_post_call_      │─────────────▶│ Odoo 16 (CT-116 ipve1)  │
               │ pipeline            │              │ custom_fer.voip.call    │
               │ (Sonnet + parking)  │              │ register_call +         │
               └─────────────────────┘              │ attach_recording_to_lead│
                                                    └─────────────────────────┘

            ┌───────────────────────────────────────────────────────────────┐
            │                Canales operativos (debug)                     │
            │                                                               │
            │  Telegram @maria_ltc_bot  ◀─▶  maria-telegram.service         │
            │    (long polling)              │                              │
            │                                ▼                              │
            │                         claude-daemon ◀─▶ Unix socket         │
            │                    (claude-runner user, --bypass)             │
            └───────────────────────────────────────────────────────────────┘

4. Infraestructura

CT-140 ct-voip en pve1 (Proxmox local)

  • LXC unprivileged con nesting=1, keyctl=1
  • Debian 12, Python 3.11, Node 20
  • IP LAN estática 192.168.1.7/24 (gw 192.168.1.1)
  • 4 cores / 4 GB RAM / 20 GB ZFS (zfs-storage:subvol-140-disk-0)
  • Paquetes relevantes: bubblewrap, socat, ffmpeg 5.1.8, docker.io

Servicios systemd activos en CT-140

Unit Descripción Path script/unit
livekit-server.service SFU WebRTC /opt/livekit/...
livekit-sip.service SIP gateway build from source v1.2.0
livekit-egress.service Docker livekit/egress:latest config /etc/livekit/egress.yaml
maria-agent.service Agent Python (entrypoint) /opt/voip-agent/maria_phase2.py
maria-telegram.service Bot Telegram /opt/voip-agent/maria_telegram.py
claude-daemon.service Unix socket → claude CLI /opt/voip-agent/tools/claude_daemon.py
redis-server.service Backend livekit + egress default

Usuarios del sistema

  • root — administra todos los servicios
  • claude-runner (uid 1001, home /home/claude-runner) — ejecuta claude CLI con --dangerously-skip-permissions (no permitido como root)
  • venv propio con paramiko, requests, pyyaml, tabulate, lxml, bs4, python-dotenv
  • SSH keys a pve1 e ipve1 (claves /home/claude-runner/.ssh/id_ed25519*)
  • Sandbox Claude deshabilitado: ~/.claude/settings.json con {sandbox:{enabled:false}}

Container livekit-egress (Docker)

  • Imagen oficial livekit/egress:latest
  • --network host, --cap-add=SYS_ADMIN
  • Bind mount /var/lib/livekit-egress/recordings:/out (chmod 777)
  • User interno egress (uid 1001)
  • Chrome 125 incluido (no usado actualmente; usamos track_egress directo)

Infraestructura externa

  • Router Huawei EG8145V5 (192.168.1.1): admin via Selenium (credenciales documentadas en reference_home_router.md)
  • WAN IP public.ip.address.number (dinámica; hay que monitorizar)
  • ipve1 (OVH, public.ip.address.number): CT-116 con Odoo 16 elPanocho — destino CRM
  • CT-200 (pve1): server de backups, corre check_backups.sh

5. Flujo de llamada entrante

T+0.0s   caller marca +34 868 35 37 57
T+0.5s   Zadarma INVITE → WAN public.ip.address.number:5060
T+0.6s   router NAT → CT-140:5060
T+0.7s   livekit-sip valida trunk (IP 185.45.152.0/22), crea room maria-_<caller>_<random>
T+1.0s   dispatch rule lanza agent worker → entrypoint(JobContext)
T+1.5s   agent joins room, ejecuta greeting GDPR:
         "Hola, le atiende María, asistente virtual de lemontreecloud.
          Esta llamada será grabada para atención al cliente. ¿En qué puedo ayudarle?"
T+3s     saludo TTS reproducido
T+3s     _start_recording: 2 track_egress paralelos
         - caller audio → /out/<room>_<ts>_caller.ogg
         - agent audio  → /out/<room>_<ts>_agent.ogg
T+3s     silence watchdog armed (120 s)

── Ciclo conversacional ──
         STT final usuario → Haiku (streaming)
         Haiku decide: respuesta vocal + quizá tool call (solo debug)
         Si tool: ctx.session.say("Espere un segundo...") luego tool execution
         Respuesta TTS se reproduce
         Cada turno resetea watchdog

── Fin de llamada ──
T+N      usuario dice "adiós" (o equivalente)
T+N+Δ    Haiku invoca end_call
T+N+Δ+1s   TTS drain + grace period → room.disconnect()
T+N+5s   shutdown hook fires (process exiting)
         ┌─ Pipeline post-call (shutdown_process_timeout=60s) ─┐
         │  1. Build transcript snapshot from recorder         │
         │  2. Sonnet 4.6 → {summary, intent, language}        │
         │  3. _park_pending(payload) → /var/log/voip/pending_leads/ (safety)  │
         │  4. XML-RPC voip.call.register_call → lead_id       │
         │  5. Unpark (delete safety file) si éxito            │
         │  6. Stop egress (EGRESS_COMPLETE → ok)              │
         │  7. Poll 2 .ogg files (estables, ≥500 bytes)        │
         │  8. ffmpeg amix → mix.ogg (libopus 32k mono)        │
         │  9. XML-RPC voip.call.attach_recording_to_lead      │
         │ 10. Cleanup files locales                           │
         └──────────────────────────────────────────────────────┘

Resultado en Odoo CRM:
  - Lead (type=opportunity) con:
    · name = primera frase del summary
    · phone = número caller
    · description = summary
    · source_id = VoIP Inbound
    · tag_ids = [voip-inbound, <intent>]
    · user_id = Fernando
    · chatter:
        · mensaje 1: transcript completo renderizado HTML
        · mensaje 2: "Grabación adjuntada" + attachment caller_agent_mix.ogg

6. Integraciones externas

Servicio Cuenta Credenciales Notas
Zadarma DID +34 868 35 37 57 Panel zadarma.com "Servidor externo" apunta a WAN + modo SIP IP-based auth
Anthropic API Fernando (tier actual) ANTHROPIC_API_KEY en .env ⚠️ rate-limit concurrent ya bloqueó conversaciones
Deepgram Fernando DEEPGRAM_API_KEY en .env Nova-3 multi con 100 keyterms (acrónimos + nombres INE)
ElevenLabs Fernando (paid) ELEVENLABS_API_KEY en .env Voz ES España uQw4jpKzMLrZuo0RLPS9, speed 0.85, model Flash v2.5
Telegram Bot @maria_ltc_bot Token en .env Whitelist chat_id Fernando, rechaza otros
GitLab fernandohc PAT rotatorio Repos privados
Odoo elPanocho fernando@elpanocho.com API key en .env CRM destino de leads

7. Módulo custom_fer v16.0.1.8 (Odoo 16 elPanocho)

Path: /opt/odoo/custom/apps/custom_fer/ (CT-116 ipve1). Repo gitlab.com/fernandohc/apps branch antigravity.

Modelo voip.call

Métodos públicos XML-RPC:

  • register_call(payload) — crea opportunity a partir de payload de llamada finalizada.
  • payload: caller_phone, started_at, ended_at, duration_s, language, transcript, summary, intent, debug_mode, silence_hangup, extra_tags
  • Retorno: {lead_id, lead_url, attachment_id (False)} (None convertido a False para XML-RPC compat)
  • Acciones: crm.lead.create(type=opportunity), posts transcripción renderizada HTML, aplica tags y source.

  • attach_recording_to_lead(payload) — adjunta audio a un lead existente.

  • payload: lead_id, recording_b64, recording_filename
  • Mime detect .oggaudio/ogg, .mp4audio/mp4
  • ir.attachment.create + mail.message con "Grabación adjuntada".
  • Pipeline de 2 pasos: primero crea lead (rápido, debe tener éxito), luego attach (lento, best-effort).

Flujo email inbound (nuevo 2026-04-22)

  • fetchmail.server id=3 tira email de fernando@lemontreecloud.com cada 5 min.
  • mail.alias id=49 con defaults: python { 'type': 'opportunity', 'user_id': 2, # Fernando 'team_id': 1, # Sales 'source_id': 12, # Email Inbound 'tag_ids': [(6, 0, [19])], # email-inbound }
  • Cada email a fernando@lemontreecloud.com genera un opportunity taggeado.

8. Fases del proyecto (cronológico)

Fase Fecha Descripción Resultado
1 — Hello World 2026-04-20 Llamada hace ring + TTS "hola" Zadarma + livekit-sip + NAT 1:1 + primer agent Python
2 — Diálogo producción 2026-04-20 María mantiene conversación bilingüe con Haiku + Deepgram + ElevenLabs, GDPR, end_call tool Primeras llamadas con Raúl
3 — CRM integration 2026-04-20 Hook on_shutdown crea opportunity en elPanocho con transcript + resumen Sonnet custom_fer.voip.call.register_call + fix type=opportunity para visibilidad
4 — Grabación pasiva (deprecado) 2026-04-20 VoIPmonitor sidecar con MariaDB + GUI reemplazado en Fase 6
5 — Tools operativos 2026-04-21 6 function_tool debug-only en el agent + OCA primer en system prompt Gateway para ask_oca_expert, ask_odoo_pilot, ask_system_admin, check_backups, query_odoo, send_telegram
5.1 — Bot Telegram 2026-04-21 @maria_ltc_bot con 10 comandos, long polling, whitelist Fernando Servicio independiente maria-telegram.service
5.2 — Daemon Claude CLI 2026-04-21 Unix socket daemon corriendo como claude-runner no-root, bypass sandbox Tools ask_* ya funcionales; sandbox Claude deshabilitado en settings.json
6 — LiveKit Egress 2026-04-21 Reemplaza VoIPmonitor por grabación activa self-contained Archivo OGG adjunto al chatter del lead
6.1 — Ambas voces 2026-04-22 Room composite audio_only con Chrome (falló: start signal not received) → migrado a dual track_egress + ffmpeg amix Un OGG/Opus con caller + agent mezclados
6.2 — Resiliencia pipeline 2026-04-22 Reordering (lead primero, recording después) + parking defensivo + attach_recording_to_lead async Lead se crea aunque egress falle
6.3 — Fix perms + egress ending 2026-04-22 chmod 777 /out, EGRESS_ENDING/COMPLETE diferenciados de ABORTED/FAILED Grabación se adjunta fiable
6.3.1 — keyterms + prompt 2026-04-22 100 keyterms (acrónimos + INE top apellidos) + prompt "al colgar se crea automáticamente" + fix Odú vs o-do-o UX conversacional
6.3.3 — Silence watchdog 2026-04-22 Auto-hangup a 120 s de silencio Resiliencia ante rate limit 429
6.3.4 — Shutdown timeout 2026-04-22 shutdown_process_timeout=60s (default 10s mataba el pipeline) Attach completa antes del SIGKILL
7 — Email inbound CRM 2026-04-22 Alias fernando@ añade source+tag al opportunity Emails también entran al CRM taggeados

9. Capacidad y límites

Componente Límite efectivo Observaciones
Zadarma trunk 10 canales simultáneos Capa externa
livekit-sip RTP ports 10 ports (10050-10059) → ~5 llamadas paralelas Se puede ampliar; router ya permite 10000-10999
livekit-server cientos de rooms Sin cuello de botella
maria-agent workers num_idle_processes=4 + auto-spawn RAM/CPU real marcan el techo
CT-140 recursos 4 cores / 4 GB RAM Cada llamada activa ~300 MB RAM
Anthropic concurrent tier actual → 429 con 1 llamada + bot activo cuello de botella principal; upgrade de tier o retry+backoff
Deepgram / ElevenLabs tier actual suele soportar docenas No hemos visto bottleneck

Veredicto: soporta razonablemente 2-3 llamadas concurrentes. Para absorber 10 simultáneas de Zadarma hay que (a) ampliar RTP range, (b) subir tier Anthropic, (c) probablemente subir RAM a 8 GB.


10. Repositorios Git

Repo URL Contenido
voip-ai-agent https://gitlab.com/fernandohc/voip-ai-agent Agent Python, bot Telegram, claude-daemon, scripts, deploy/
apps (branch antigravity) https://gitlab.com/fernandohc/apps Módulo custom_fer con voip.call, otros módulos Odoo
claude-memory https://gitlab.com/fernandohc/claude-memory Memoria persistente de Claude Code Fernando (170+ archivos)
ai-tools/voip-expert local en /opt/odoo/custom/ai-tools/ Skill Claude Code con SKILL.md + references + templates

Todas privadas. Auto-sync de claude-memory vía hook SessionEnd + PreCompact.


11. Memorias / referencias clave

  • feedback_livekit_sip_zadarma.md — 3 gotchas NAT (flood/Numbers/nat_1_to_1)
  • feedback_voip_debug_tools.md — patrón debug-only tools
  • feedback_livekit_egress_record.md — track_egress + ffmpeg mix
  • feedback_crm_lead_vs_opportunity.md — crm.lead type=lead es invisible en UI
  • reference_ct140_voip.md — specs del contenedor
  • reference_home_router.md — admin router Huawei
  • reference_elevenlabs_account.md — SSO Google, API key narrow
  • reference_voipmonitor_ct140.md — deprecado (Fase 6 reemplazó)
  • feedback_claude_daemon.md — patrón Unix socket para bypass sandbox/root
  • user_fernando_contact.md — whitelist debug +34XXXXXXXXX

12. Pendientes y roadmap

Pendientes operativos

  • Rotar 3 credenciales de oci_test_deployment.md (test env de hace 37 días, repo privado pero documentadas)
  • Borrar /opt/odoo/obsidian-vault/ (ya migrado el contenido útil a claude-memory)
  • Activar skill voip-expert con /opt/odoo/custom/ai-tools/sync-skills.sh

Roadmap técnico

  • Subir tier Anthropic o implementar retry+backoff en 429 dentro del plugin anthropic de livekit-agents
  • Ampliar RTP range (10050-10149) si se necesitan >5 llamadas paralelas
  • WAN IP dinámica: DDNS o script de reload si cambia (por ahora static)
  • Sync CT-116 environment: clonar repos y skills en CT-116 (ipve1) para que el CRM tenga herramientas claude locales también
  • Blog post sobre la experiencia: ya live en https://lemontreecloud.com/blog/ltc-labs-3/como-construimos-maria-nuestra-agente-voip-con-ia-en-un-dia-12

Ideas a evaluar

  • PersonaPlex 7B v1 (NVIDIA) cuando salga versión ES: full-duplex speech-to-speech con latencia <300 ms, pero requiere GPU 16 GB y v1 es English-only
  • Voxtral TTS (Mistral): alternativa open-source a ElevenLabs si queremos bajar coste variable
  • Recordings storage: actualmente OGG en ir.attachment. Si volumen crece, considerar S3/MinIO externo
  • Multi-lang: María responde en el idioma del caller, pero OCA primer está en español. Ampliar a EN/PT si hay leads internacionales
  • Voice biometrics: detectar caller recurrente por voiceprint para personalizar saludo

13. Datos sensibles (referencias)

Credenciales operativas viven en /opt/voip-agent/.env de CT-140 (permisos 640 root:claude-runner, no versionado). Contiene:

  • ANTHROPIC_API_KEY
  • DEEPGRAM_API_KEY
  • ELEVENLABS_API_KEY (+ alias ELEVEN_API_KEY)
  • LIVEKIT_API_KEY + LIVEKIT_API_SECRET + LIVEKIT_URL
  • ZADARMA_* (si usadas)
  • TELEGRAM_BOT_TOKEN + TELEGRAM_CHAT_ID
  • ODOO_URL + ODOO_DB + ODOO_USER + ODOO_API_KEY
  • SILENCE_TIMEOUT_S=120
  • PVE1_HOST=root@192.168.1.2

Tokens GitLab son efímeros (PAT rotatorio Fernando, scope api, vida 90 días). Nunca se commitean; se usan vía URL transitoria oauth2:<token>@ y luego se limpia el remote.


14. Coste por llamada (estimación)

Llamada típica de 3 minutos:

Ítem Coste aprox
Zadarma DID + minutos ~0,03 €
Deepgram Nova-3 ~0,12 €
ElevenLabs Flash v2.5 (~600-800 chars agente) ~0,05 €
Claude Haiku 4.5 streaming + Sonnet 4.6 summary ~0,01 €
Total por llamada 3 min < 0,25 €

Infra self-hosted (CT-140 + livekit-*): coste marginal ya absorbido por el Proxmox de casa.


Generado por Claude Code en colaboración con Fernando Hernández, sesiones 2026-04-20 → 2026-04-22.

How we built Maria: our AI VoIP agent in one day
SIP trunk, LiveKit self-hosted, Deepgram, Claude and ElevenLabs. One day, one phone number, one AI that answers.