Voice AI · Field Notes

Voice AI Fails at the Handoff, Not the Hello

The hello is the one line you can safely script. Everything after it is a handoff — from your screenplay to an actual human. Most builders never make it, because they were still writing the screenplay.

Every voice agent I've ever demoed nailed the hello. That part is easy. The model picks up on the first ring, sounds calm and human, gets the caller's name right, and the room nods. Somebody says it sounds better than half the people they've hired. The demo earns the standing ovation.

Then it goes live, and the complaints start coming in. And here is the thing I want operators to internalize before they sign anything: the complaints are almost never about the hello.

They're about the moment after — when the call leaves the script. The agent doesn't crash. It does something quieter and worse: it says the same lines to a frightened daughter that it says to a furious regular. It can't meet anyone where they are, because a screenplay has the same lines no matter who walks on stage. Most voice agents aren't deployed to think at all — they're deployed to recite.

The tool did not fail. We caged it.

Every quoted line you write is another bar. The more dialogue you script, the smaller the cage.

The Screenplay Mistake

Here is the pattern I see again and again. A team sets out to build a voice agent, and they treat the prompt like a script for a movie. They write the agent's exact lines. They try to anticipate every customer utterance and pre-write a response to each one. They build a decision tree of quoted dialogue, and they keep adding branches, convinced that if they just cover enough cases, they'll cover them all.

They never will. No one can. A real call is not a screenplay with a known cast — it's an improv scene with a stranger who didn't read the script. Every caller arrives at a different pace, in a different mood, carrying a different need, phrasing it a way you didn't predict. The more lines you write to contain that, the smaller the cage you build around the agent.

I explain it to clients with a picture. A scripted agent is a train. It runs beautifully — as long as every caller stays on the rails you laid down in advance. The moment someone needs the dirt road, the train can't follow; it can only keep running the track. An agent built the right way is a car. It stays on the road by default, but when a caller needs it to go somewhere the map didn't anticipate, it can — within the guardrails of its role. Same destination. It just isn't bolted to one set of tracks.

That's the whole difference, and it's the one thing that made this technology worth deploying. A modern model can reason. Prompted the right way, it makes every call original to the person on the line — matches their tempo, hears what they're actually asking, answers on-brand in words no one wrote in advance. Scripting it line by line smothers exactly that. You take a system that can think and you forbid it from thinking. I call the alternative Open Prompting.

A real call isn't a screenplay with a known cast. It's an improv scene with a stranger who didn't read the script.
— The failure point after the opening line

Demo · The birdcage

Same caller. Two prompts. Only one can hear them.

The caller

Wait — is this a robot? Am I actually talking to a real person right now?

The prompt you wrote

14 quoted lines

On greeting → “Thank you for calling Cedar Home Health — how can I help you today?”

On booking → “I'd be happy to book that for you. What day works best?”

On a question → “Let me pull up the account. Can I have the patient's date of birth?”

Fallback → “I'm sorry, I didn't quite catch that. Could you repeat that for me?”

What this caller hears

■ Caged

“I'm sorry, I didn't quite catch that. Could you repeat that for me?”

Misses: A fair, simple question — dodged. The caller now distrusts everything that follows.

Same line, every caller. The script can't bend to a frightened daughter, a furious patient, or a fair question — so it fires the nearest quote and misses.

Pick a caller who didn't read the script. Then flip between a scripted screenplay — a fixed tree of quoted lines, the same on every call — and an Open Prompting build that directs the agent instead of scripting it. Watch the prompt panel: the cage is built out of quotation marks. The key has none.

What You Just Saw

The screenplay never changed. It can't — it's a fixed set of lines. So when a frightened daughter, a furious patient, and someone simply asking "is this a robot?" all hit the same tree, it fires the nearest quote and misses every time. Not because the model is weak. Because we forbade it from doing anything but reciting.

The Open Prompting panel had no lines in it at all. It told the agent who it is, what it cares about, where its lane ends — and then trusted it to think. The result was a different, fitting, on-brand response for every caller, in words I never wrote. Same prompt, a different call every time.

Look at the two prompts side by side and the whole lesson is visible at a glance: the cage is made of quotation marks. Every quoted line is a bar. The prompt that set the agent free didn't contain a single one.

The art is learning to prompt agents for engagement without using quotation marks.
— The whole craft, in one line

Directed, Not Scripted

So if you're not writing lines, what are you writing? You're writing the agent's character and its judgment — the same things you'd give a sharp new hire on their first day, except you'd never hand that person a script and tell them to read it at every caller. You'd tell them what the job is, what good looks like, and where to go when they hit the edge of what they can do.

In my system — Aethergrid — this is the law every prompt answers to. Almost every block is a directed block: it tells the agent what to accomplish and how to carry itself, never what to say. The agent writes its own lines, every call. The rare exception is a scripted block, reserved for the few places exact wording actually matters — spelling a phone number back digit by digit, a legal disclosure. Everything else is directed. When in doubt, it's directed.

There's a test I run before any prompt ships. I call it the Fluidity Test: if the agent handles the same kind of call twice, do the two conversations sound different while reaching the same outcome? If they come out identical, the prompt is still a cage — so I go find the scripted blocks and free them.

The moment	Scripted — the rails	Open Prompting — the road
Greeting	Say: “Thank you for calling, how may I help?”	Open warmly. Find out why they actually called.
Empathy	If upset, say: “I understand your frustration.”	Register what they're feeling, and respond to it like you mean it.
A question	If asked X, respond with line Y.	Answer the real question, in your own words.
The unexpected	Fallback: “I'm sorry, I didn't catch that.”	Reason from what you know. Ask only when you truly must.
The boundary	Never say anything not written above.	Stay in your lane — but speak freely inside it.

The left column is brittle precisely because it's specific. The right column is robust precisely because it isn't — it gives the model room to do the thing it's good at. Notice which column has the quotation marks, and which one doesn't.

From the field

This isn't theory. Samantha — the host I built for Castell Terrace, a Manhattan rooftop room — takes a lost-and-found call by shifting into grounded, serious clarity, then turns to a party of twelve a minute later with warm, playful charm. Same prompt, different caller, different call.

Sophie, the concierge for the Salt and Sky resort, mirrors a calm guest with calm and a stressed one with reassurance — never from a line I wrote, always from a disposition I gave her. Neither agent has a script for those moments. They have a character, a lane, and the freedom to drive.

A Car Still Needs Guardrails

None of this is turning the model loose and hoping. A car without guardrails isn't freedom — it's off-roading into a ditch. So in Aethergrid the guardrails are explicit: four decisions that draw the lane the agent is free to roam. Get them right and you can stop writing lines entirely, because the agent can improvise safely without ever leaving the property.

Role clarity

The first thing you write is not what the agent says. It's what the agent is for, and where its job ends. Give it a sharp boundary — handle these intents, hand off those, never guess at the line — and you've replaced a thousand scripted branches with one well-drawn lane. Inside that lane, let it think.

An agent that knows the exact edge of its job can be trusted to drive freely right up to that edge — and to stop cold at it.

Escalation path

When the agent reaches the edge of its role, the call has to go somewhere — to a named person, with the context already attached, through a transfer that actually connects. A thinking agent is good at recognizing that moment in real time. Don't waste that judgment by leaving the other end of the handoff undefined.

An open-prompted agent's best move is often knowing it's reached its limit. That move only works if there's a real human waiting.

Knowledge ownership

The freedom to reason makes current information non-negotiable. A scripted bot reads yesterday's line; a thinking agent argues yesterday's case persuasively. Someone has to own the facts behind it and keep them true — kept out of the prompt and in the knowledge base — or the agent's fluency becomes a liability instead of an asset.

An open-prompted agent speaks fluently from whatever you give it — which means stale knowledge gets delivered with total confidence.

First-workflow fit

Don't aim a newly freed agent at every call that rings. Choose one workflow that's high-volume, forgiving, and well within its competence, with a clean handoff for everything else. A narrow lane is what lets you trust direction over scripting in the first place — and what lets you prove it before you scale.

Pick the call the agent can already handle better than a hold queue. Prove that, then widen the lane.

The Handoff Is the Whole Game

Come back to the title. The hello is the last line you can safely script, because the greeting is the one moment every caller shares. The instant the caller responds, you've handed the conversation off — from your screenplay to a real person with a real situation. That handoff happens on every call, and it's where scripted agents stall and open-prompted agents come alive.

The human handoff — the transfer to a person — is just the most visible version of the same seam. And it should be directed too, not scripted: the agent should recognize the edge before the caller feels it, carry the context across so nobody starts over, and land them with someone who already knows why they called.

The test I use

Listen to a handoff from the caller's side. If they have to repeat a single thing they already told the agent, the handoff is broken — no matter how good the greeting was. A clean handoff feels like being expected.

Demo · The guardrails

An open-prompted agent still needs its guardrails.

Handoff reliability · 100 simulated calls

Contained

agent handled it

Clean escalation

handed off warm

Dropped

fell into the gap

Hello, answeredunchanged in every configuration above

100%

Callers who reach the agent's limit get walked to the right person — who already knows why they're calling. Most never register that a handoff happened.

These four decisions are the guardrails the car roams within — role clarity, a real escalation path, an owner for the knowledge, the right first workflow. Flip from an afterthought build to an operator build and watch the same agent go from off-the-rails to dependable. The greeting never changed; the guardrails did. Freedom without them is just a car with no road.

Prove It Before You Scale

The failure mode I see most often isn't a bad agent. It's a decent agent scaled on a feeling. The hello sounded great in the demo, so the team turns up the volume and adds workflows before they've measured whether the agent actually holds the road. Reliability is the gate, and it has to be instrumented, not sensed.

Step 01

Instrument the handoff

Per workflow, track the rates that matter: handled cleanly, escalated cleanly, and missed. The missed bucket is the one teams never look at — and the only one that tells the truth about whether the agent met the caller.

Step 02

Set a gate, not a vibe

Decide the reliability threshold a workflow must clear before it earns more volume. Write the number down. “It sounds good” is not a number, and a fluent agent sounds good even when it's wrong.

Step 03

Widen the proven lane

When a workflow clears the gate, add the next one — not more load onto an unproven one. Growth is widening the set of calls you've proven the agent can drive, one lane at a time.

Draw the lane, own the knowledge, design the handoff — and the bars come down. What's left is an agent that can actually hear the person on the line.

The tool did not fail. We caged it.

Across healthcare, home services, and hospitality, the voice agents that work share nothing in the model and everything in the build: a clear lane to roam within, a real escalation path, an owner for the knowledge — and a prompt built on Open Prompting, directed and never scripted. The whole craft is learning to prompt for engagement without using quotation marks.

I build these systems at

Workforce Wave

. If you're past the hello and stuck at the handoff, that's the conversation worth having.