AI voice agents for a 1,200-seat retail bank call-centre
A top-5 private-sector bank in India was drowning in routine inbound calls — balance, mini-statement, card block, EMI status. We deployed a multilingual voice-AI agent that handled the first 90 seconds of every call, deflected 62% of routine queries, and handed off to a human seamlessly when the conversation went off-script.
The problem
The bank handled 4.2 million inbound calls a month across 1,200 seats. 71% were repetitive — balance, statement, card status, fixed-deposit info, branch hours. Average wait time during peak hit 4 minutes 40 seconds, and 38% of customers abandoned the IVR before reaching an agent. The existing IVR was a 14-year-old DTMF tree that everyone hated.
The solution
A streaming voice-AI agent built on a fine-tuned ASR (Whisper-large-v3 with Indic adaptation), an intent + slot LLM router, and a TTS layer with neural voices in 9 languages. The agent runs entirely on the bank’s VPC, talks to core banking via the existing ESB, and knows when to fall back to a human — with the entire context handed over so the customer never repeats themselves.
What we built
Streaming ASR with Indic accents
Whisper-large-v3 fine-tuned on 6,000 hours of Indian-English, Hindi, Tamil, Telugu, Kannada, Marathi, Gujarati, Bengali, and Punjabi call-centre audio. Word error rate 4.7% on noisy mobile calls.
Intent + slot routing in one pass
A single LLM call resolves intent, extracts slots (account number, card last 4, amount), and decides the next action. Average latency 320ms.
Live human handover with full context
When confidence drops or the customer says “agent please”, the call transfers to a human with a full transcript + extracted entities + suggested next action prefilled in the agent screen.
PCI-DSS compliant logging
Card numbers, CVVs, OTPs, and PINs are masked at the audio layer before storage. Full conversation transcripts retained for 90 days, audio for 7. SOC 2 Type II audited.
9 languages with seamless code-mix
Customers routinely switch between English and a regional language mid-sentence. The model handles Hinglish, Tanglish, and Manglish without breaking flow.
Real-time supervisor analytics
Live dashboard of intent distribution, deflection rate, sentiment, and abandonment by language and queue. Drives weekly tuning of fallback thresholds.
How it’s built
The numbers
“We expected to deflect maybe 30% in year one. Hitting 62% by month six let us redeploy 380 seats from routine queues to outbound retention — that’s the real ROI.”
— Head of Customer Operations, Top-5 Indian Bank
Have a project that looks like this?
If your engagement combines 3 or more disciplines, we’d like to hear about it. Tell us the constraint, the deadline, and the outcome that matters — we’ll come back with a scoped proposal.