NPS scoring on AI-handled calls: what 1,000 surveys taught us

Your client rings in. An AI agent picks up, books the appointment, confirms details, ends the call. What happens next determines whether that client tells five mates or ghosts you forever. Yet most agencies running AI voice for clients have no systematic way to measure whether people actually liked talking to the bot.

We sent 1,000 post-call NPS surveys across VoxReach deployments between October and January. The data surprised us. Not in the "AI is perfect" direction, but in what separates a promoter from a detractor when a human never touched the phone.

How we scored AI calls without annoying people

Standard NPS asks one question: "How likely are you to recommend us to a friend, 0-10?" Promoters score 9-10, passives 7-8, detractors 0-6. Your NPS is promoters minus detractors as a percentage.

We triggered the survey 90 seconds after the AI agent hung up, via SMS to the number that just called. The message came from the same business name that appeared on caller ID. One question, one link, thirty seconds to complete. No follow-up nag if they ignored it.

Response rate averaged 18%. That sounds low until you remember most post-purchase email NPS surveys pull 5-12%. People answer texts. They especially answer when the interaction just happened and took under two minutes.

Industries split predictably. Medical clinics and vet practices hit 22-24% response. Trade contractors sat around 14%. Mortgage brokers and conveyancers came in at 16%. The call we reviewed last Tuesday from a plumber in Geelong: booked, confirmed, hung up, survey answered six minutes later with a 9 and the comment "easiest booking ever".

What scores actually look like by vertical

Aggregate NPS across all 1,000 surveys: +41. For context, Aussie telcos average -15 to +5, banks sit around +20 to +30, and top SaaS companies hit +50 to +70.

Breakdown by industry:

Medical and allied health: +52
Veterinary: +48
Trade services (plumbing, electrical, HVAC): +38
Professional services (legal, accounting, mortgage): +35
Beauty and wellness: +44
Fitness and personal training: +29

The pattern: industries where the caller has a clear, simple need and the AI can resolve it in one step scored higher. "Book my dog's vaccination" or "I need a sparkie Thursday arvo" both have obvious happy paths. "I want to refinance but I'm not sure if I should fix the rate" introduces ambiguity the AI has to navigate or escalate, and escalation always dents the score.

Detractors clustered around three failure modes. One: the AI misheard a critical detail (street name, date, surname). Two: the caller wanted to discuss something nuanced and the bot kept trying to book instead of transferring. Three: accent or background noise created a loop where the AI asked the same question twice. The last one accounted for 62% of detractor comments.

Response rates predict retention better than scores

Here's what we didn't expect. Clients with NPS above +40 but survey response below 12% churned at the same rate as clients with NPS around +25 and response above 20%.

Low response means low engagement. If your callers can't be bothered to tap a number thirty seconds after a successful booking, they're transactional. They'll switch to whoever answers faster next time. High response, even with middling scores, signals people care enough to give feedback. Those clients stick.

We tracked this across 40 VoxReach deployments with at least three months of history. The cohort with response rates above 18% renewed at 89%. The cohort below 12% renewed at 71%, regardless of NPS. Retention correlates more with engagement than satisfaction when the service is a commodity.

What to do with the number once you have it

First, set the baseline. Run NPS for the first 100 calls after go-live. That's your starting point. Anything above +30 means the AI isn't actively harming the brand. Above +45 and you're in "this is better than our old receptionist" territory for most SMBs.

Second, read every detractor comment. Most AI platforms log transcripts. Match the low score to the call, listen to the recording, fix the prompt or escalation rule that caused the failure. We've seen agents lift from +32 to +51 in four weeks just by tuning three misfire patterns.

Third, use the score in client reporting. Agencies running AI voice as a service need a retention metric clients understand. NPS is that metric. Stick it in the monthly dashboard next to call volume and conversion rate. When the score climbs, you have proof the AI is learning. When it drops, you catch problems before the client does.

Fourth, segment by call type. Inbound appointment booking will always outscore outbound follow-up or payment reminders. Don't average them. Track separately, optimise separately, report separately.

The real benchmark isn't other bots

Compare AI NPS to what the client's human team was pulling before the switch. Most SMBs never measured it, so you're starting from zero data. But anecdotally, a frazzled receptionist juggling three lines and a walk-in rarely delivers consistent delight. The AI won't either. It will deliver consistent adequacy, and for 80% of callers, that's enough to score a 9.

The other 20% will always prefer a human. Build the transfer path, set the timeout short, and accept that a +40 NPS with an AI agent is a commercial win even if it's not a UX triumph.

Sign up free at app.voxreach.com.au/signup to test post-call NPS on your own deployments. Thirty minutes of calls included, no card required.

How we scored AI calls without annoying people

What scores actually look like by vertical

Response rates predict retention better than scores

What to do with the number once you have it

The real benchmark isn't other bots

Try VoxReach free