IndustryRetail Banking
DisciplinesVoice AI · NLP · Cloud
Duration7 months · 4 phases
RegionIndia · 3 GCC subsidiaries

The problem

The bank handled 4.2 million inbound calls a month across 1,200 seats. 71% were repetitive — balance, statement, card status, fixed-deposit info, branch hours. Average wait time during peak hit 4 minutes 40 seconds, and 38% of customers abandoned the IVR before reaching an agent. The existing IVR was a 14-year-old DTMF tree that everyone hated.

The solution

A streaming voice-AI agent built on a fine-tuned ASR (Whisper-large-v3 with Indic adaptation), an intent + slot LLM router, and a TTS layer with neural voices in 9 languages. The agent runs entirely on the bank’s VPC, talks to core banking via the existing ESB, and knows when to fall back to a human — with the entire context handed over so the customer never repeats themselves.

What we built

Streaming ASR with Indic accents

Whisper-large-v3 fine-tuned on 6,000 hours of Indian-English, Hindi, Tamil, Telugu, Kannada, Marathi, Gujarati, Bengali, and Punjabi call-centre audio. Word error rate 4.7% on noisy mobile calls.

Intent + slot routing in one pass

A single LLM call resolves intent, extracts slots (account number, card last 4, amount), and decides the next action. Average latency 320ms.

Live human handover with full context

When confidence drops or the customer says “agent please”, the call transfers to a human with a full transcript + extracted entities + suggested next action prefilled in the agent screen.

PCI-DSS compliant logging

Card numbers, CVVs, OTPs, and PINs are masked at the audio layer before storage. Full conversation transcripts retained for 90 days, audio for 7. SOC 2 Type II audited.

9 languages with seamless code-mix

Customers routinely switch between English and a regional language mid-sentence. The model handles Hinglish, Tanglish, and Manglish without breaking flow.

Real-time supervisor analytics

Live dashboard of intent distribution, deflection rate, sentiment, and abandonment by language and queue. Drives weekly tuning of fallback thresholds.

How it’s built

Voice ASRWhisper-large-v3 (fine-tuned) · Deepgram Nova-2 fallback
LLM RouterLlama-3.1-70B fine-tuned · vLLM serving
TTSElevenLabs Multilingual v2 · Coqui XTTS for self-hosted
TelephonyTwilio · Asterisk · Avaya integration via SIP
BackendPython (FastAPI) · gRPC streaming · Redis for session state
HostingBank-VPC AWS · GPU pool on g5.2xlarge · multi-AZ failover
CompliancePCI-DSS · RBI guidelines · SOC 2 Type II · DPDP Act
ObservabilityOpenTelemetry · Grafana · per-call audit trail

The numbers

62%
Routine call deflection in 6 months
-38%
Average handle time across all calls
9
Languages live, including Hinglish code-mix
4.6/5
Customer satisfaction (post-call IVR)
-44%
Reduction in IVR abandonment
320ms
P95 ASR + LLM round-trip latency

“We expected to deflect maybe 30% in year one. Hitting 62% by month six let us redeploy 380 seats from routine queues to outbound retention — that’s the real ROI.”

— Head of Customer Operations, Top-5 Indian Bank

Have a project that looks like this?

If your engagement combines 3 or more disciplines, we’d like to hear about it. Tell us the constraint, the deadline, and the outcome that matters — we’ll come back with a scoped proposal.