Two days ago we shipped Maria, our AI VoIP agent, after a one-day sprint. Since then we have been listening to every call, reading every transcript, and watching where the pipeline wobbles. In 48 hours Maria has gained a Telegram ops bot, call recordings with both voices attached to the lead, and a handful of resilience fixes that turn a demo into something we trust overnight.
This post picks up exactly where the previous one left off. If you have not read it, start here: How we built Maria.
What we learned in 48 hours
Three things became obvious as soon as real traffic hit Maria. First, we needed a back channel to talk to her outside the phone line, for status checks, quick queries, and the occasional restart when a worker misbehaved. Second, a transcript is not enough: when there is a dispute or a nuance, we want to hear both voices. And third, the post-call pipeline was a single point of failure. If Egress hiccuped, the lead never reached the CRM.
The next sections walk through each of those, in the order we tackled them.
Telegram bot @maria_ltc_bot
We built a second Maria that does not pick up the phone. She listens on Telegram. The bot runs as an independent systemd service (maria-telegram.service) with long polling, and only the owner chat id is allowed through. Everyone else gets a polite rejection.
Ten commands cover the daily ops surface:
/backups Daily backup status (ict_fer + elPanocho + oci-test)
/query <q> Run a natural-language Odoo query (xmlrpc under the hood)
/oca <q> Ask the OCA expert skill
/pilot <q> Ask the Odoo pilot skill (scripted admin)
/sysadmin <q> Ask the system-admin skill (SSH into pve1/ipve1)
/calls Recent Maria calls with lead id and summary
/status Agent health: workers, RAM, last call, queue depth
/restart_maria Graceful restart of the voice agent
/start Intro + command list
/help Same as /start, short form

Real screenshot of @maria_ltc_bot answering /backups, /query and /oca — each tool result lands back as a Telegram message.
The interesting part is how the bot actually answers. The tools do not live inside the bot process. The bot process is a thin dispatcher. When a command lands, the bot drops a JSON request into a Unix socket, and a local Claude daemon picks it up and runs the matching Claude Code skill with full tool access.
Why the indirection? Because the Claude CLI refuses to run with --dangerously-skip-permissions as root, and for good reason. So we added a dedicated non-root user, claude-runner, with its own venv, its own SSH keys to pve1 and ipve1, and its own sandbox-disabled Claude config. The daemon runs as that user, exposes a Unix socket, and the bot (running as root, because it needs to read some service logs) talks to it over the socket. One privileged side orchestrates, the other side actually runs the tools. Clean boundary, and we get to reuse every Claude Code skill we already wrote: system-admin, odoo-pilot, oca-expert, the lot.
The Telegram channel became, in two days, the main operator console. Phone for customers, chat for us.
Call recordings with both voices
The first version of Maria captured audio passively with VoIPmonitor, a sidecar that sniffs RTP and writes WAV. It worked, but the path from pcap-to-WAV-to-Odoo-attachment was fragile, and the files were raw packet dumps with no clean separation between caller and agent. We wanted, inside the CRM lead, one audio file with both voices, reviewable in a browser, no extra tooling.
We tried LiveKit Egress in room-composite mode first, which renders the WebRTC room with a headless Chrome and produces a clean mixed output. On our CT, Chrome would launch but the composition pipeline would never emit the start signal, so Egress eventually aborted. Rather than fight Chromium inside an unprivileged LXC, we flipped to a different Egress mode: two track_egress requests in parallel, one per participant, each writing its own OGG/Opus file. When the call ends, ffmpeg takes over and does the job that Chrome refused to do:
ffmpeg -i caller.ogg -i agent.ogg \
-filter_complex "[0:a][1:a]amix=inputs=2:duration=longest[a]" \
-map "[a]" -c:a libopus -b:a 32k -ac 1 mix.ogg
The result is a single mono OGG/Opus file at 32 kbps, small enough to live as an ir.attachment, clean enough to be useful. A new method on our custom module, voip.call.attach_recording_to_lead, posts the file to the existing lead's chatter with a short message. The transcript message already arrived seconds earlier from register_call. Two messages, same lead, in order: text first, audio right after.
UX improvements
Three small changes that made a disproportionate difference.
- 100 Deepgram keyterms. We fed the STT a curated list of 100 terms: OCA, Odoo, ERP, ICT and friends, plus the top Spanish surnames from the INE. Now when a caller named Conesa calls, Maria hears "Conesa" instead of "Conhexa". The lead ends up with the right name, which matters a surprising amount when you have to call the person back.
- Odoo is pronounced "Odú". ElevenLabs, left to its own devices, spells the word as "o-do-o" letter by letter. We added one line to the system prompt forcing the pronunciation to "Odú", and Maria now says it like a human.
- "When you hang up the opportunity is created automatically." A subtle but annoying failure mode: callers would ask "can you create an opportunity for me?", and Maria, trained to be cautious, would reply "for security reasons I cannot create records". Technically correct, socially wrong, because the opportunity is being created, just at hangup. One line in the prompt rewrote that interaction: Maria now confirms that the lead will be logged when the call ends, and moves on.
Resilience fixes
Real callers find bugs that designers never anticipate. A handful of defensive patches earned their keep this week.
- Silence watchdog. If neither side speaks for 120 seconds, Maria hangs up politely. Without this, a call that hit an Anthropic 429 could sit forever with dead air.
- Shutdown timeout raised to 60 s. The default livekit-agents
shutdown_process_timeout is 10 seconds. That was killing our post-call pipeline mid-attach. Sixty seconds is generous but bounded, and now the recording actually makes it to Odoo.
- Pipeline reordered: lead first, recording second. Creating the lead is fast and must succeed. Attaching the recording is slow and best-effort. We split them. Even if Egress fails or ffmpeg stalls, the opportunity is in the CRM with its transcript and summary before we start worrying about audio.
- Defensive parking. Right before the XML-RPC call to Odoo, we dump the payload to
/var/log/voip/pending_leads/<call_id>.json. If Odoo is down, the payload is not lost: a recovery cron picks it up next run. If Odoo accepts it, the parked file is deleted. Tiny code change, huge peace of mind.
Capacity cap via load_fnc
The hardest constraint is not CPU or RAM, it is the Anthropic concurrent-request ceiling on our current tier. Two live conversations plus the Telegram bot can already trigger a 429. So we installed a soft cap on the voice agent: a custom load_fnc that inspects the number of in-flight calls and returns load=1.0 when we are at two concurrent. LiveKit's SFU refuses the third call cleanly, and the caller hears a busy tone instead of a broken session. No silent failures, no half-processed leads.
It is a crude cap, but it turns a soft limit into a hard one, which is exactly what you want when the downstream rate limit is the real bottleneck. When we upgrade the Anthropic tier, we raise the cap by editing one integer.
What is next
Short list, in order of probable pain relief: upgrading the Anthropic tier to lift the concurrent-call cap, widening the RTP port range to absorb more than five parallel calls, and adding a second agent voice for English callers. The bones are stable now. Everything from here is polish and volume.
If you want to hear Maria yourself, call us at +34 868 35 37 57 or drop us a line. She will tell you she is an AI, she will log the call, and both our voices will be waiting in the lead by the time you hang up.
Fer & Claude, 2026-04-22.
Full technical report
The section below is the complete internal report we wrote for ourselves at the end of the sprint. It covers the whole stack, the architecture, every phase, capacity limits, costs and pending work. Nothing has been trimmed except the public IPs and one private phone number.
Download raw Markdown
Informe final — voip-expert / agente María
Fecha: 2026-04-22
Sesiones de trabajo: 2026-04-20 · 2026-04-21 · 2026-04-22
Estado: operativo en producción
1. Resumen ejecutivo
Agente VoIP autónomo que:
- Atiende el DID +34 868 35 37 57 (Zadarma) en español con voz natural (ElevenLabs Flash v2.5, voz España
uQw4jpKzMLrZuo0RLPS9).
- Mantiene conversación con Claude Haiku 4.5, STT con Deepgram Nova-3.
- Al colgar genera automáticamente: opportunity en CRM elPanocho + transcripción en chatter + resumen con Claude Sonnet 4.6 + grabación de audio en OGG/Opus mezclado (ambas voces).
- Tiene modo debug activado desde
XXX XXX XXX (Fernando) con tools internos: check_backups, query_odoo, ask_oca_expert, ask_odoo_pilot, ask_system_admin, send_telegram.
- Bot Telegram independiente (
@maria_ltc_bot) para consultas textuales con los mismos tools.
- Watchdog de silencio que cuelga automáticamente si no hay actividad 120 s.
- Self-hosted: todo corre en un CT LXC en pve1 (Proxmox local).
- Código versionado en repos privados GitLab, sin dependencias cloud para la lógica.
2. Stack tecnológico
| Capa |
Tecnología |
Versión |
Rol |
| Telefonía PSTN |
Zadarma |
— |
DID + SIP trunk |
| Router / NAT |
Huawei EG8145V5 |
— |
Port-forward UDP 5060 + 10000-10999 |
| SIP gateway |
livekit-sip |
1.2.0 |
Terminación SIP, puente a WebRTC |
| SFU / Realtime |
livekit-server |
1.11.0 |
Rooms WebRTC, media routing |
| Agent framework |
livekit-agents (Python) |
1.5.4 |
Ciclo de vida de la llamada |
| LLM (diálogo) |
Claude Haiku 4.5 |
— |
Streaming en tiempo real |
| LLM (resumen) |
Claude Sonnet 4.6 |
— |
Summary post-llamada |
| STT |
Deepgram Nova-3 (multi) |
— |
Transcripción ES/EN + keyterms |
| TTS |
ElevenLabs Flash v2.5 |
— |
Voz synth, speed 0.85 |
| VAD |
Silero |
— |
Detección de turno |
| Grabación |
LiveKit Egress + ffmpeg |
— |
2 track_egress + amix |
| CRM / Backoffice |
Odoo 16 (elPanocho DB) |
16.0 |
Opportunity, chatter, attachment |
| Módulo custom |
custom_fer |
16.0.1.8 |
voip.call.register_call + attach_recording_to_lead |
| Bot Telegram |
python-telegram-bot |
21.11.1 |
Canal de consulta por chat |
| Daemon Claude CLI |
Unix socket |
— |
Subprocess claude con perms bypass |
3. Arquitectura
PSTN
│
▼
┌──────────┐ SIP/RTP ┌──────────────────┐ WebRTC ┌────────────────┐
│ Zadarma │─────────────▶│ Router Huawei │───────▶│ livekit-sip │
│ DID │ UDP 5060 │ NAT 1:1 + fwd │ │ (CT-140) │
└──────────┘ 10000-10999 └──────────────────┘ └────────┬───────┘
│
┌────────────────────────┼─────────────────┐
▼ ▼ ▼
┌────────────────┐ ┌─────────────────┐ ┌───────────────┐
│ livekit-server │◀────▶│ maria-agent │ │ livekit-egress│
│ (CT-140) │ │ (entrypoint) │ │ (Docker) │
└────────────────┘ └──┬──────────────┘ └──────┬────────┘
│ │ track x2
┌───────────────────────────────┤ │
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────┐ ┌─────────────┐ ┌──────────────┐ /var/lib/livekit-egress/
│ Deepgram STT │ │ Anthropic │ │ ElevenLabs │ recordings/*.ogg
│ Nova-3 │ │ Claude API │ │ Flash v2.5 │ │
└──────────────┘ └─────────────┘ └──────────────┘ ▼
ffmpeg amix
│
┌───────────────────────────────────────────────────────────┘
▼
┌─────────────────────┐ XML-RPC ┌─────────────────────────┐
│ run_post_call_ │─────────────▶│ Odoo 16 (CT-116 ipve1) │
│ pipeline │ │ custom_fer.voip.call │
│ (Sonnet + parking) │ │ register_call + │
└─────────────────────┘ │ attach_recording_to_lead│
└─────────────────────────┘
┌───────────────────────────────────────────────────────────────┐
│ Canales operativos (debug) │
│ │
│ Telegram @maria_ltc_bot ◀─▶ maria-telegram.service │
│ (long polling) │ │
│ ▼ │
│ claude-daemon ◀─▶ Unix socket │
│ (claude-runner user, --bypass) │
└───────────────────────────────────────────────────────────────┘
4. Infraestructura
CT-140 ct-voip en pve1 (Proxmox local)
- LXC unprivileged con
nesting=1, keyctl=1
- Debian 12, Python 3.11, Node 20
- IP LAN estática 192.168.1.7/24 (gw 192.168.1.1)
- 4 cores / 4 GB RAM / 20 GB ZFS (
zfs-storage:subvol-140-disk-0)
- Paquetes relevantes:
bubblewrap, socat, ffmpeg 5.1.8, docker.io
Servicios systemd activos en CT-140
| Unit |
Descripción |
Path script/unit |
livekit-server.service |
SFU WebRTC |
/opt/livekit/... |
livekit-sip.service |
SIP gateway |
build from source v1.2.0 |
livekit-egress.service |
Docker livekit/egress:latest |
config /etc/livekit/egress.yaml |
maria-agent.service |
Agent Python (entrypoint) |
/opt/voip-agent/maria_phase2.py |
maria-telegram.service |
Bot Telegram |
/opt/voip-agent/maria_telegram.py |
claude-daemon.service |
Unix socket → claude CLI |
/opt/voip-agent/tools/claude_daemon.py |
redis-server.service |
Backend livekit + egress |
default |
Usuarios del sistema
root — administra todos los servicios
claude-runner (uid 1001, home /home/claude-runner) — ejecuta claude CLI con --dangerously-skip-permissions (no permitido como root)
- venv propio con
paramiko, requests, pyyaml, tabulate, lxml, bs4, python-dotenv
- SSH keys a
pve1 e ipve1 (claves /home/claude-runner/.ssh/id_ed25519*)
- Sandbox Claude deshabilitado:
~/.claude/settings.json con {sandbox:{enabled:false}}
Container livekit-egress (Docker)
- Imagen oficial
livekit/egress:latest
--network host, --cap-add=SYS_ADMIN
- Bind mount
/var/lib/livekit-egress/recordings:/out (chmod 777)
- User interno
egress (uid 1001)
- Chrome 125 incluido (no usado actualmente; usamos
track_egress directo)
Infraestructura externa
- Router Huawei EG8145V5 (192.168.1.1): admin via Selenium (credenciales documentadas en
reference_home_router.md)
- WAN IP
public.ip.address.number (dinámica; hay que monitorizar)
- ipve1 (OVH, public.ip.address.number): CT-116 con Odoo 16 elPanocho — destino CRM
- CT-200 (pve1): server de backups, corre
check_backups.sh
5. Flujo de llamada entrante
T+0.0s caller marca +34 868 35 37 57
T+0.5s Zadarma INVITE → WAN public.ip.address.number:5060
T+0.6s router NAT → CT-140:5060
T+0.7s livekit-sip valida trunk (IP 185.45.152.0/22), crea room maria-_<caller>_<random>
T+1.0s dispatch rule lanza agent worker → entrypoint(JobContext)
T+1.5s agent joins room, ejecuta greeting GDPR:
"Hola, le atiende María, asistente virtual de lemontreecloud.
Esta llamada será grabada para atención al cliente. ¿En qué puedo ayudarle?"
T+3s saludo TTS reproducido
T+3s _start_recording: 2 track_egress paralelos
- caller audio → /out/<room>_<ts>_caller.ogg
- agent audio → /out/<room>_<ts>_agent.ogg
T+3s silence watchdog armed (120 s)
── Ciclo conversacional ──
STT final usuario → Haiku (streaming)
Haiku decide: respuesta vocal + quizá tool call (solo debug)
Si tool: ctx.session.say("Espere un segundo...") luego tool execution
Respuesta TTS se reproduce
Cada turno resetea watchdog
── Fin de llamada ──
T+N usuario dice "adiós" (o equivalente)
T+N+Δ Haiku invoca end_call
T+N+Δ+1s TTS drain + grace period → room.disconnect()
T+N+5s shutdown hook fires (process exiting)
┌─ Pipeline post-call (shutdown_process_timeout=60s) ─┐
│ 1. Build transcript snapshot from recorder │
│ 2. Sonnet 4.6 → {summary, intent, language} │
│ 3. _park_pending(payload) → /var/log/voip/pending_leads/ (safety) │
│ 4. XML-RPC voip.call.register_call → lead_id │
│ 5. Unpark (delete safety file) si éxito │
│ 6. Stop egress (EGRESS_COMPLETE → ok) │
│ 7. Poll 2 .ogg files (estables, ≥500 bytes) │
│ 8. ffmpeg amix → mix.ogg (libopus 32k mono) │
│ 9. XML-RPC voip.call.attach_recording_to_lead │
│ 10. Cleanup files locales │
└──────────────────────────────────────────────────────┘
Resultado en Odoo CRM:
- Lead (type=opportunity) con:
· name = primera frase del summary
· phone = número caller
· description = summary
· source_id = VoIP Inbound
· tag_ids = [voip-inbound, <intent>]
· user_id = Fernando
· chatter:
· mensaje 1: transcript completo renderizado HTML
· mensaje 2: "Grabación adjuntada" + attachment caller_agent_mix.ogg
6. Integraciones externas
| Servicio |
Cuenta |
Credenciales |
Notas |
| Zadarma |
DID +34 868 35 37 57 |
Panel zadarma.com |
"Servidor externo" apunta a WAN + modo SIP IP-based auth |
| Anthropic API |
Fernando (tier actual) |
ANTHROPIC_API_KEY en .env |
⚠️ rate-limit concurrent ya bloqueó conversaciones |
| Deepgram |
Fernando |
DEEPGRAM_API_KEY en .env |
Nova-3 multi con 100 keyterms (acrónimos + nombres INE) |
| ElevenLabs |
Fernando (paid) |
ELEVENLABS_API_KEY en .env |
Voz ES España uQw4jpKzMLrZuo0RLPS9, speed 0.85, model Flash v2.5 |
| Telegram Bot |
@maria_ltc_bot |
Token en .env |
Whitelist chat_id Fernando, rechaza otros |
| GitLab |
fernandohc |
PAT rotatorio |
Repos privados |
| Odoo elPanocho |
fernando@elpanocho.com |
API key en .env |
CRM destino de leads |
7. Módulo custom_fer v16.0.1.8 (Odoo 16 elPanocho)
Path: /opt/odoo/custom/apps/custom_fer/ (CT-116 ipve1). Repo gitlab.com/fernandohc/apps branch antigravity.
Modelo voip.call
Métodos públicos XML-RPC:
register_call(payload) — crea opportunity a partir de payload de llamada finalizada.
- payload:
caller_phone, started_at, ended_at, duration_s, language, transcript, summary, intent, debug_mode, silence_hangup, extra_tags
- Retorno:
{lead_id, lead_url, attachment_id (False)} (None convertido a False para XML-RPC compat)
-
Acciones: crm.lead.create(type=opportunity), posts transcripción renderizada HTML, aplica tags y source.
-
attach_recording_to_lead(payload) — adjunta audio a un lead existente.
- payload:
lead_id, recording_b64, recording_filename
- Mime detect
.ogg → audio/ogg, .mp4 → audio/mp4
ir.attachment.create + mail.message con "Grabación adjuntada".
- Pipeline de 2 pasos: primero crea lead (rápido, debe tener éxito), luego attach (lento, best-effort).
Flujo email inbound (nuevo 2026-04-22)
fetchmail.server id=3 tira email de fernando@lemontreecloud.com cada 5 min.
mail.alias id=49 con defaults:
python
{
'type': 'opportunity',
'user_id': 2, # Fernando
'team_id': 1, # Sales
'source_id': 12, # Email Inbound
'tag_ids': [(6, 0, [19])], # email-inbound
}
- Cada email a
fernando@lemontreecloud.com genera un opportunity taggeado.
8. Fases del proyecto (cronológico)
| Fase |
Fecha |
Descripción |
Resultado |
| 1 — Hello World |
2026-04-20 |
Llamada hace ring + TTS "hola" |
Zadarma + livekit-sip + NAT 1:1 + primer agent Python |
| 2 — Diálogo producción |
2026-04-20 |
María mantiene conversación bilingüe con Haiku + Deepgram + ElevenLabs, GDPR, end_call tool |
Primeras llamadas con Raúl |
| 3 — CRM integration |
2026-04-20 |
Hook on_shutdown crea opportunity en elPanocho con transcript + resumen Sonnet |
custom_fer.voip.call.register_call + fix type=opportunity para visibilidad |
| 4 — Grabación pasiva (deprecado) |
2026-04-20 |
VoIPmonitor sidecar con MariaDB + GUI |
reemplazado en Fase 6 |
| 5 — Tools operativos |
2026-04-21 |
6 function_tool debug-only en el agent + OCA primer en system prompt |
Gateway para ask_oca_expert, ask_odoo_pilot, ask_system_admin, check_backups, query_odoo, send_telegram |
| 5.1 — Bot Telegram |
2026-04-21 |
@maria_ltc_bot con 10 comandos, long polling, whitelist Fernando |
Servicio independiente maria-telegram.service |
| 5.2 — Daemon Claude CLI |
2026-04-21 |
Unix socket daemon corriendo como claude-runner no-root, bypass sandbox |
Tools ask_* ya funcionales; sandbox Claude deshabilitado en settings.json |
| 6 — LiveKit Egress |
2026-04-21 |
Reemplaza VoIPmonitor por grabación activa self-contained |
Archivo OGG adjunto al chatter del lead |
| 6.1 — Ambas voces |
2026-04-22 |
Room composite audio_only con Chrome (falló: start signal not received) → migrado a dual track_egress + ffmpeg amix |
Un OGG/Opus con caller + agent mezclados |
| 6.2 — Resiliencia pipeline |
2026-04-22 |
Reordering (lead primero, recording después) + parking defensivo + attach_recording_to_lead async |
Lead se crea aunque egress falle |
| 6.3 — Fix perms + egress ending |
2026-04-22 |
chmod 777 /out, EGRESS_ENDING/COMPLETE diferenciados de ABORTED/FAILED |
Grabación se adjunta fiable |
| 6.3.1 — keyterms + prompt |
2026-04-22 |
100 keyterms (acrónimos + INE top apellidos) + prompt "al colgar se crea automáticamente" + fix Odú vs o-do-o |
UX conversacional |
| 6.3.3 — Silence watchdog |
2026-04-22 |
Auto-hangup a 120 s de silencio |
Resiliencia ante rate limit 429 |
| 6.3.4 — Shutdown timeout |
2026-04-22 |
shutdown_process_timeout=60s (default 10s mataba el pipeline) |
Attach completa antes del SIGKILL |
| 7 — Email inbound CRM |
2026-04-22 |
Alias fernando@ añade source+tag al opportunity |
Emails también entran al CRM taggeados |
9. Capacidad y límites
| Componente |
Límite efectivo |
Observaciones |
| Zadarma trunk |
10 canales simultáneos |
Capa externa |
| livekit-sip RTP ports |
10 ports (10050-10059) → ~5 llamadas paralelas |
Se puede ampliar; router ya permite 10000-10999 |
| livekit-server |
cientos de rooms |
Sin cuello de botella |
| maria-agent workers |
num_idle_processes=4 + auto-spawn |
RAM/CPU real marcan el techo |
| CT-140 recursos |
4 cores / 4 GB RAM |
Cada llamada activa ~300 MB RAM |
| Anthropic concurrent |
tier actual → 429 con 1 llamada + bot activo |
cuello de botella principal; upgrade de tier o retry+backoff |
| Deepgram / ElevenLabs |
tier actual suele soportar docenas |
No hemos visto bottleneck |
Veredicto: soporta razonablemente 2-3 llamadas concurrentes. Para absorber 10 simultáneas de Zadarma hay que (a) ampliar RTP range, (b) subir tier Anthropic, (c) probablemente subir RAM a 8 GB.
10. Repositorios Git
| Repo |
URL |
Contenido |
voip-ai-agent |
https://gitlab.com/fernandohc/voip-ai-agent |
Agent Python, bot Telegram, claude-daemon, scripts, deploy/ |
apps (branch antigravity) |
https://gitlab.com/fernandohc/apps |
Módulo custom_fer con voip.call, otros módulos Odoo |
claude-memory |
https://gitlab.com/fernandohc/claude-memory |
Memoria persistente de Claude Code Fernando (170+ archivos) |
ai-tools/voip-expert |
local en /opt/odoo/custom/ai-tools/ |
Skill Claude Code con SKILL.md + references + templates |
Todas privadas. Auto-sync de claude-memory vía hook SessionEnd + PreCompact.
11. Memorias / referencias clave
feedback_livekit_sip_zadarma.md — 3 gotchas NAT (flood/Numbers/nat_1_to_1)
feedback_voip_debug_tools.md — patrón debug-only tools
feedback_livekit_egress_record.md — track_egress + ffmpeg mix
feedback_crm_lead_vs_opportunity.md — crm.lead type=lead es invisible en UI
reference_ct140_voip.md — specs del contenedor
reference_home_router.md — admin router Huawei
reference_elevenlabs_account.md — SSO Google, API key narrow
reference_voipmonitor_ct140.md — deprecado (Fase 6 reemplazó)
feedback_claude_daemon.md — patrón Unix socket para bypass sandbox/root
user_fernando_contact.md — whitelist debug +34XXXXXXXXX
12. Pendientes y roadmap
Pendientes operativos
- Rotar 3 credenciales de
oci_test_deployment.md (test env de hace 37 días, repo privado pero documentadas)
- Borrar
/opt/odoo/obsidian-vault/ (ya migrado el contenido útil a claude-memory)
- Activar skill
voip-expert con /opt/odoo/custom/ai-tools/sync-skills.sh
Roadmap técnico
- Subir tier Anthropic o implementar retry+backoff en 429 dentro del plugin anthropic de livekit-agents
- Ampliar RTP range (10050-10149) si se necesitan >5 llamadas paralelas
- WAN IP dinámica: DDNS o script de reload si cambia (por ahora static)
- Sync CT-116 environment: clonar repos y skills en CT-116 (ipve1) para que el CRM tenga herramientas claude locales también
- Blog post sobre la experiencia: ya live en
https://lemontreecloud.com/blog/ltc-labs-3/como-construimos-maria-nuestra-agente-voip-con-ia-en-un-dia-12
Ideas a evaluar
- PersonaPlex 7B v1 (NVIDIA) cuando salga versión ES: full-duplex speech-to-speech con latencia <300 ms, pero requiere GPU 16 GB y v1 es English-only
- Voxtral TTS (Mistral): alternativa open-source a ElevenLabs si queremos bajar coste variable
- Recordings storage: actualmente OGG en
ir.attachment. Si volumen crece, considerar S3/MinIO externo
- Multi-lang: María responde en el idioma del caller, pero OCA primer está en español. Ampliar a EN/PT si hay leads internacionales
- Voice biometrics: detectar caller recurrente por voiceprint para personalizar saludo
13. Datos sensibles (referencias)
Credenciales operativas viven en /opt/voip-agent/.env de CT-140 (permisos 640 root:claude-runner, no versionado). Contiene:
ANTHROPIC_API_KEY
DEEPGRAM_API_KEY
ELEVENLABS_API_KEY (+ alias ELEVEN_API_KEY)
LIVEKIT_API_KEY + LIVEKIT_API_SECRET + LIVEKIT_URL
ZADARMA_* (si usadas)
TELEGRAM_BOT_TOKEN + TELEGRAM_CHAT_ID
ODOO_URL + ODOO_DB + ODOO_USER + ODOO_API_KEY
SILENCE_TIMEOUT_S=120
PVE1_HOST=root@192.168.1.2
Tokens GitLab son efímeros (PAT rotatorio Fernando, scope api, vida 90 días). Nunca se commitean; se usan vía URL transitoria oauth2:<token>@ y luego se limpia el remote.
14. Coste por llamada (estimación)
Llamada típica de 3 minutos:
| Ítem |
Coste aprox |
| Zadarma DID + minutos |
~0,03 € |
| Deepgram Nova-3 |
~0,12 € |
| ElevenLabs Flash v2.5 (~600-800 chars agente) |
~0,05 € |
| Claude Haiku 4.5 streaming + Sonnet 4.6 summary |
~0,01 € |
| Total por llamada 3 min |
< 0,25 € |
Infra self-hosted (CT-140 + livekit-*): coste marginal ya absorbido por el Proxmox de casa.
Generado por Claude Code en colaboración con Fernando Hernández, sesiones 2026-04-20 → 2026-04-22.