Diego Sanz · Multi-Channel AI Customer Service Platform

Portfolio · Case Study

Diego S.

AI Engineering Agentic Systems Voice AI Multi-Tenant SaaS

Education

Florida International University

BS · Computer Science

Florida International University

MBA · Marketing & E-Commerce

Pennsylvania State University

Master of Applied Statistics

Available now

Live · app.contactos.ai

Expert-Vetted

Top Rated

100%Job Success

Case Study

Multi-Channel AI
Customer Service Platform

SaaS Product

AI Customer Service

4

Channels in production

< 1s

End-to-end voice response

2

AI processing runtimes

01 · 13

Project overview

Key project facts

Platform

Multi-Tenant AI Platform (SaaS)

Industry

Customer Service · Operations

Engagement type

Product Build

Status

In production · app.contactos.ai

Scope

Voice · WhatsApp · Web Chat · Campaigns

Technical stack

Python FastAPI LangGraph Pipecat Claude Haiku 4.5 Anthropic Twilio Deepgram ElevenLabs MongoDB PostgreSQL Redis n8n Stripe React TypeScript Docker

Engagement summary

A production multi-tenant AI customer service platform supporting four inbound and outbound channels: voice (Twilio + Deepgram + ElevenLabs), WhatsApp, web chat, and outbound campaigns. Conversations are handled by a LangGraph ReAct agent with real-time function calling against live client APIs. Voice calls are processed through a dedicated Pipecat streaming pipeline — Silero VAD, Deepgram Nova-2 STT, Claude Haiku 4.5, and ElevenLabs TTS — delivering sub-second responses end-to-end. Multi-tenancy is achieved through a TenantRuntime Registry: each tenant is an isolated directory with its own config, tools, and formatters. Onboarding a new client is a configuration operation — no changes to shared application code. An embedded n8n control plane lets each tenant provision and trigger automation workflows from the platform — templates are instantiated programmatically through the n8n REST API and fired by app and call events.

< 1s

Voice Response Latency

End-to-end streaming: VAD + STT + LLM + TTS

4

Channels in Production

Voice · WhatsApp · Web Chat · Outbound Campaigns

2

AI Processing Runtimes

LangGraph ReAct (text channels) · Pipecat (real-time voice)

02 · 13

Technical Design · AI Customer Service Platform

System Architecture

Multi-Tenant AI Platform

Channels

Meta Cloud API · Webhook

Web Chat

JS Embed · WebSocket

Twilio Voice

WS · 8kHz mu-law

Outbound Campaigns

Celery Dispatcher · n8n

Gateway

FastAPI · uvicorn

Production · HTTPS

Traefik · TLS

Let's Encrypt

JWT Auth

passlib · python-jose

Channel Router

core/channels/router.py

Webhook Handlers

Meta · Twilio · Internal

LangGraph ReAct Agent

chatbot_node

LLM + tool binding

→

tool_node

10 function tools

→

handoff_node

human escalation

MongoDB Checkpointer TenantRuntime Registry Session Manager Anthropic Claude LangSmith tracing

Pipecat Voice Pipeline

Silero VAD

voice activity detection

↓

Deepgram STT

Nova-2 · Spanish (es)

↓

Claude Haiku 4.5

function calling

↓

ElevenLabs TTS

flash_v2_5 · streaming

External APIs

Tenant API

per-tenant endpoints

n8n Control Plane

workflow provisioning · REST

Google Maps

travel time

Pinecone

address normalization

LangSmith

trace + eval

Sentry

error tracking

Data

MongoDB

conversations · messages · sessions

PostgreSQL

users · tenants

Redis

cache · Celery broker

React Dashboard

WebSocket · Analytics · KPIs · Broadcasts

Technical Approach

Inbound messages from all four channels are normalized through a single Channel Router, then dispatched to a shared LangGraph ReAct agent. Voice calls bypass text routing entirely, entering a dedicated real-time Pipecat pipeline for sub-second STT → LLM → TTS latency.

Key Architectural Decisions

LangGraph ReAct — agent loops until tool results satisfy the request or escalation is triggered; MongoDB checkpointer persists state across turns.

TenantRuntime Registry — each tenant is a directory with its own config, tools, and formatters. Adding a client is a new directory, not a code change.

Pipecat for voice — streaming pipeline keeps audio processing out of the main FastAPI event loop, enabling concurrent voice + messaging without blocking.

n8n control plane — per-tenant workflow templates are provisioned via the n8n REST API and dispatched through a universal webhook; call events can trigger workflows.

03 · 13

Technical Design · AI Customer Service Platform

Voice Pipeline — Real-Time Call Flow

Pipecat · FastAPI WebSocket

1

Inbound Channel

Twilio WebSocket

TwiML webhook → WebSocket upgrade → TwilioFrameSerializer (mu-law ↔ PCM)

8kHz · mu-law

~30ms

2

Voice Activity Detection

Silero VAD

Filters silence frames before STT — prevents spurious transcriptions and reduces API cost

ONNX model

~15ms

3

Speech-to-Text

Deepgram Nova-2

Streaming transcription · language: es (Spanish) · interim results discarded, final passed to aggregator

Nova-2 · streaming

~180ms

4

LLM — Function Calling

Claude Haiku 4.5

Lowest-latency Claude model · tools: validate_document, confirm_services, end_call · streaming

claude-haiku-4-5

~350ms

Tool calls (if triggered)

validate_document → Tenant API confirm_services → Tenant API end_call → EndFrame

5

Text-to-Speech

ElevenLabs

Model: eleven_flash_v2_5 · stability: 0.7 · similarity_boost: 0.75 · speed: 0.85 · streaming audio chunks

flash_v2_5

~200ms

6

Outbound Audio

Twilio Transport Output

PCM → mu-law re-encode → WebSocket → Twilio → caller's phone

8kHz · mu-law

~800ms total

Why This Stack

Each component was chosen for streaming latency, not batch throughput. Deepgram Nova-2 returns partials in real time. Claude Haiku 4.5 is Anthropic's fastest inference model. ElevenLabs flash_v2_5 streams audio chunks before synthesis is complete — the caller hears the first word before the full response is generated.

Key Decisions

Silero VAD first — eliminates silence and background noise before STT, cutting Deepgram cost and preventing empty LLM calls.

Claude Haiku over GPT-4o-mini — lower TTFT in real-world voice loops; consistent with the Anthropic-native backend for shared observability in LangSmith.

Function calling in voice — validate_document and confirm_services call the tenant's live API mid-conversation without breaking the audio stream.

End-to-End Latency

VAD + STT + LLM + TTS: ~800ms typical · <1.2s with tool call

04 · 13

01 / 09 Multi-Channel AI Customer Service · Deliverable

Agent Workspace
& Escalation

The agent's command center for live conversations — every channel in one place, with one-click human take-over from the AI.

What this screen does

Active conversation list

All in-flight conversations filtered by Active / Inactive and by ownership (All, Mine, Unassigned, AI).

Live thread with channel context

Customer messages, AI responses, and tool outputs interleaved; the channel (Web, WhatsApp, voice) shown right in the header.

Take over from AI

One-click "Return to AI," Reassign, or End conversation actions on the contact panel.

Identity + state at a glance

Channel, state (Transferred / Active), assigned agent, and start time pinned to the right.

React · TypeScript · FastAPI · LangGraph · Anthropic Claude · Pipecat Anonymized representative UI

05 · 13

02 / 09 Multi-Channel AI Customer Service · Deliverable

Outbound Campaign
Broadcasts

Mass outbound WhatsApp campaigns via Meta-approved templates, with per-send delivery tracking and audit trail.

What this screen does

Template-driven sends

Every campaign uses a Meta-approved WhatsApp template (reminder, appointment, follow-up, location, etc.).

Per-broadcast delivery stats

Recipients, sent, failed, and mode (Test or Real) tracked per send.

One-click new broadcast

Compose, target a recipient list, preview, and send from a single action.

React · TypeScript · FastAPI · LangGraph · Anthropic Claude · Pipecat Anonymized representative UI

06 · 13

03 / 09 Multi-Channel AI Customer Service · Deliverable

Conversation
Analytics

Operational health of the AI customer service platform — resolution, intent mix, agent load, and sentiment in one view.

What this screen does

Headline KPIs

AI resolution rate, total conversations, pending/closed counts, active sessions, and CSAT.

Resolution + handoff reasons

Donut of AI resolution, with the ranked reasons a conversation handed off to a human.

Top intents

The five most-asked topics across the period.

Human agent leaderboard

Per-agent close counts and online status alongside an AI-vs-human sentiment split.

React · TypeScript · FastAPI · LangGraph · Anthropic Claude · Pipecat Anonymized representative UI

07 · 13

04 / 09 Multi-Channel AI Customer Service · Deliverable

AI Agent
Configuration

Per-tenant configuration of the AI agent — model, behavior, tools, greeting, and system prompt — without touching code.

What this screen does

Model + sampling config

Pick the model (e.g. claude-haiku-4-5), temperature, and session expiry per tenant.

Tool registry

Bind the function-calling tools the agent can use (validate_document, confirm_operator, confirm_services, cancel_services).

Greeting & system prompt

Editable first message and full system prompt, applied per tenant.

React · TypeScript · FastAPI · LangGraph · Anthropic Claude · Pipecat Anonymized representative UI

08 · 13

05 / 09Multi-Channel AI Customer Service · Deliverable

Voice Call Logs
& Recordings

Every inbound and outbound voice call in one searchable log — with one-click playback of the recording and the full conversation transcript.

What this screen does

Filterable call log

Inbound & outbound calls by date, agent, campaign, direction, and outcome.

Recording playback

Waveform audio player to replay any call inline.

Full transcript + tool-calls

Turn-by-turn AI/caller transcript with the live-API tool-calls the agent made mid-call.

Voice pipeline at a glance

Per-call view of the Twilio → Deepgram → Claude Haiku → ElevenLabs stack.

Pipecat · Twilio · Deepgram · Claude Haiku · ElevenLabs · ReactAnonymized representative UI

09 · 13

06 / 09Multi-Channel AI Customer Service · Deliverable

Voice Agent
Configuration

Configure a voice AI agent end to end — telephony, voice, language, greeting, prompt, and the AI flow it runs — without touching code.

What this screen does

Telephony + voice stack

Twilio telephony, Deepgram Nova-2 STT, ElevenLabs TTS wired per agent.

Voice & model selection

Pick TTS provider/model, Voice ID, and language.

Greeting & prompt

Initial message plus a system prompt or reusable prompt template.

Routing & availability

Assign a phone number, link an AI Flow, set inbound/outbound and availability.

FastAPI · Twilio · Deepgram · ElevenLabs · ReactAnonymized representative UI

10 · 13

07 / 09Multi-Channel AI Customer Service · Deliverable

Phone Numbers
Buy & Manage

Provision telephony numbers in-app and assign them to agents — the layer that connects the voice agents to the phone network.

What this screen does

Search & buy

Find numbers by country and area code and provision instantly.

Purchased numbers

Manage every owned number in one list.

Assign to agents

Route each number to the right inbound/outbound agent.

Twilio-backed

Numbers provisioned and billed through Twilio.

Twilio · FastAPI · ReactAnonymized representative UI

11 · 13

08 / 09Multi-Channel AI Customer Service · Deliverable

Workflows
n8n Control Plane

Automation around the conversation — built on an embedded n8n control plane that connects the agents to hundreds of downstream apps.

What this screen does

n8n templates

Start from prebuilt templates — post-call summary → CRM, lead handoff → Slack, reminders.

500+ integrations

n8n connects to hundreds of apps incl. CRMs, calendars, and ticketing.

My Workflows

Per-tenant workflows with triggers, execution counts, and on/off control.

Universal webhook dispatch

A single entry point fired by app + call events to run any workflow.

n8n · FastAPI · Webhooks · ReactAnonymized representative UI

12 · 13

09 / 09Multi-Channel AI Customer Service · Deliverable

Billing
& Plans

Subscription and usage management — current plan, tiers, and Stripe-backed payment, so each tenant's voice and messaging usage is metered and billed.

What this screen does

Plans & tiers

Current plan and upgrade tiers per tenant.

Usage metering

Voice/messaging usage tracked against the plan.

Call-minute top-ups

Add capacity at $0.10/min via Stripe Checkout.

Stripe-provisioned

Subscriptions provisioned via webhook on checkout.session.completed.

Stripe · FastAPI · ReactAnonymized representative UI

13 · 13