Stop Making AI Sound Too Human. Please.

The uncanny valley of voice AI, and why almost-perfect might be worse than obviously fake

November 28, 20257 min read
Stop Making AI Sound Too Human. Please.

The chatbot therapist listens without judgment. The voice assistant is endlessly patient. We've spent a decade teaching AI to be more empathetic, more emotionally intelligent, more us.

And it's working. Perhaps too well.

Technically, it was brilliant. Psychologically, it had crossed into uncomfortable territory.

Welcome to the uncanny valley of voice AI. And it's happening faster than anyone expected.

The Valley We Can't See But Definitely Feel

In the 1970s, Japanese roboticist Masahiro Mori identified something peculiar: as robots become more human-like, our affinity for them increases. But only up to a point.

When they become almost but not quite human, something flips. We don't feel more comfortable. We feel deeply uncomfortable.

That dip in emotional response, that sense of eeriness when something is nearly human but detectably artificial, is the uncanny valley.

For decades, it applied mainly to visual representations: humanoid robots, CGI characters, realistic dolls.

Now it's moved into our ears.

Recent research from UC Berkeley found that people can only distinguish real voices from AI-generated ones with 65% accuracy, barely better than chance. That's remarkable progress for AI.

Check out this X post by @edler_plinius and listen to how human the AI he's talking to is!

Sesame AI's test and responses on X

But here's what's more interesting: when people encountered voices that were close to human but not quite there, their discomfort spiked.

People response on AI sounding too human.webp

The technology has advanced so rapidly that AI-generated voices are passing through or have already passed through the uncanny valley.

We're at the point where it's not obviously fake, but it's not convincingly real either. And that in-between space is psychologically unsettling.

Why Almost-Human Feels Worse Than Obviously Fake

India’s first AI regional news anchor. Source- Government Economic Times .webp
_India’s first AI regional news anchor. Source- Government Economic Times _ Studies on AI news anchors revealed something fascinating: viewers reacted more negatively to highly realistic AI presenters than to obviously artificial ones.

The AI anchors displayed rigid postures, misaligned lip movements, and flat speech modulation. They were technically impressive, but they triggered what researchers called "psychological rejection and emotional detachment."

Here's the uncomfortable truth:

Your brain is incredibly good at detecting tiny imperfections in human behaviour.

A slight delay between facial movement and speech. Inconsistent emotional tone. Rhythm that's just slightly off. When something looks or sounds 95% human, your brain focuses on the 5% that's wrong.

It's not conscious. You can't always articulate what's off. But you feel it. And that feeling translates to unease, distrust, and rejection.

Research on voice AI specifically found that subtle imperfections in tone, cadence, or emotional expression create a sense of unease amongst users.

The problem isn't that the AI sounds robotic. It's that it sounds almost human, which is somehow worse.

The Growing Resemblance Nobody Asked For

Photo by Katja Anokhina on Unsplash

AI voice technology is advancing at an extraordinary pace. With as little as 30 seconds of voice recording, systems can now clone voices with startling accuracy.

The snippets people heard in studies were only 3-10 seconds long and didn't feature yelling, laughing, or strong emotions. Yet they were already indistinguishable from real voices 35% of the time.

But here's what concerns me: the industry seems to be racing toward perfect human replication without asking whether that's actually what users want.

When I work with clients testing our translation platform, I've noticed a pattern. The ones who stick with it long-term aren't the ones demanding the most human-sounding AI.

They're the ones who want consistency, clarity, and transparency about what they're interacting with.

The Risk of Going Too Far

A Humanoid Robot. Generated by Canva AI.webp
A Humanoid Robot. Generated by Canva AI

The research on this is accumulating quickly, and it's pointing in a concerning direction. Studies show that when AI voices become too realistic but retain subtle imperfections, they trigger several psychological responses:

Decreased trust. Users who detect an almost-human voice that feels "off" become frustrated and less trusting of the entire system.

Emotional withdrawal. When the uncanny valley triggers, people emotionally disengage from the interaction. They complete the transaction but don't form positive associations with the experience.

Increased scrutiny. Once something feels slightly wrong, users start looking for other problems. They become hyperaware of every small glitch or inconsistency.

Brand damage. These negative experiences don't stay isolated. People remember feeling creeped out more vividly than they remember neutral or positive interactions.

What I Think the Ideal AI Sounds Like

Photo by BENCE BOROS on Unsplash .webp

After hundreds of hours watching people interact with voice AI across different contexts, here's what I've concluded: the ideal AI voice lands just short of the uncanny valley.

It's clearly synthetic but pleasant. You know immediately you're talking to an AI, but you don't mind. It has natural rhythm and appropriate emotional range, but it doesn't try to fake genuine emotion. It's consistent in a way humans aren't, and that consistency is reassuring rather than unsettling.

Most importantly, it's contextually appropriate. A sleep assistance app should sound different from a business email reader. A translation tool should sound different from a customer service bot. The voice should match the task, not try to be a generic "perfect human."

This isn't about technical limitations. Modern AI can sound remarkably human. It's about design choices. About recognising that crossing the uncanny valley entirely might not be desirable, even if it becomes possible.

How Smart Do You Actually Want Your AI to Be?

Photo by Amos K on Unsplash .webp

This brings us to the deeper question that the uncanny valley forces us to confront: how human-like do we actually want our AI to be?

The technology is advancing fast enough that we'll soon have voice AI indistinguishable from humans. Research suggests we're already there for short interactions. As AI continues along its current trajectory, distinguishing real from fake voices will become very difficult, if not impossible.

But should we?

I'm building AI translation tools that help people communicate across languages. The goal isn't to replace human interpreters. It's to make communication possible when human interpreters aren't available or affordable. The AI doesn't need to sound exactly like a person. It needs to be reliable, clear, and trustworthy.

For most use cases, AI that's 80% human-like but 100% consistent beats AI that's 95% human-like but occasionally slips into the uncanny valley.

The question isn't "can we make AI sound perfectly human?" It's "should we, and for what purposes?"

The Reflection I'm Asking You to Have

If you’re building with voice AI, designing products that use it, or making decisions about implementing it, here’s what I’d encourage you to think about:

What does your use case actually require? Perfect human replication, or reliable, pleasant interaction?

How will users react when they realise they’re interacting with AI?
Is surprise going to enhance or damage the experience?
Are you designing for the uncanny valley or designing to avoid it?

There’s a difference between “make it sound as human as possible” and “make it sound appropriately natural for this context.”

Where We Go From Here

Photo by Pawel Czerwinski on Unsplash.webp

The uncanny valley of voice isn't going away soon. As AI gets better, we'll keep bumping up against that uncomfortable space where things are almost but not quite right.

The industry will eventually solve the technical challenges. We'll get voices that are genuinely indistinguishable from humans in all contexts, emotional ranges, and speaking styles.

But I'm not convinced that's the future we should be racing toward.

Maybe the goal isn't to cross the uncanny valley. Maybe it's to design AI voices that live comfortably on the near side of it, where they're helpful, reliable, and transparent about what they are.

So, how human-like do you want your AI to sound?

The technology can give you almost any answer. But the right answer might not be "as human as possible." It might be "human enough to be pleasant, synthetic enough to be honest."

That's where I'm placing my bets. Not because the technology can't go further. Because psychology suggests it shouldn't.\

Tweet - AI sound
An X post from @forgebitz that inspires me to write this.

Also read: If the AI Bubble Bursts, What Happens to Us?