ElevenLabs Voice Changer: How to Convert Voices in 2026

Hands-on guide to the ElevenLabs Voice Changer (sts-eleven-labs) on Segmind. Code, audio samples, and pricing for 2026.

ElevenLabs Voice Changer — Segmind API featured illustration

I have been spending a lot of time inside the ElevenLabs ecosystem this past quarter, and the one model I keep getting asked about is the voice changer. The pitch is simple: you record any audio, pick a target voice, and the model regenerates the same in that new voice. The timing, the emotion, and the pauses are preserved. What changes is who is speaking.

Search interest for "voice changer" and "AI voice clone for video" has been climbing steadily in 2026. Marketing teams want it for ad localization. Studios want it for ADR and language dubs. MCNs want one creator's narration repackaged across three channels. The use cases are real, and the ElevenLabs Voice Changer (speech-to-speech) is one of the strongest models available today.

I ran the ElevenLabs Voice Changer on Segmind across real test cases to write this post. In this guide, I will walk through what the model actually does, three production use cases with side-by-side audio you can listen to, how it performs, what it costs, and the one undocumented gotcha that broke my first test.

Ready to transform your voiceovers with AI? Explore the ElevenLabs on the Segmind model page and start creating multiple production-ready voice variants today! 

TL;DR

  • Performance Preserved: ElevenLabs Voice Changer maintains timing, emotion, and pauses from the original recording while swapping the voice.
  • Versatile Applications: Ideal for marketing ads, film ADR/dubbing, and multi-channel content production. One recording can serve multiple variants.
  • Cost-Efficient: Duration-based pricing keeps production affordable; multiple voice variants cost only cents per clip.
  • Multilingual Support: Two models are available; English-only for sharper output and multilingual for cross-language workflows.
  • Workflow Optimization: Reduces repetitive recording, streamlines multi-voice campaigns, and preserves editorial intent across channels.

What Is the ElevenLabs Voice Changer? 

The ElevenLabs Voice Changer is a speech-to-speech model. You give it an MP3 of someone speaking, tell it which target voice you want, and it produces a new MP3 with the same words and delivery, but the voice is the one you picked. 

On Segmind, it exposes the full ElevenLabs voice library: Rachel, Lily, Brian, Charlotte, Aria, Liam, and thirteen others, plus voice IDs.

Two model variants are available. 

  • eleven_english_sts_v2 is optimized for English only. 
  • eleven_multilingual_sts_v2 supports multiple languages. 

The cost is the same either way: $0.00375 per second of output duration, which works out to a little over for a typical thirty-second voiceover.

What the ElevenLabs Voice Changer Can Do 

What makes the voice changer interesting is what it preserves. 

  • It preserves the performance: the rhythm, the emphasis, the breaths, the laugh halfway through a sentence. 
  • It replaces the timbre: the speaker's age, gender, accent, and vocal quality. 
  • This is fundamentally different from text-to-speech, where you provide only a script and the model generates both the words and the delivery. With the voice changer, the performance is already in your recording—you control timing, emphasis, and emotion, then the model swaps the voice without altering that performance. 
  • The model also accepts a remove_background_noise flag, which applies ElevenLabs noise isolation to the input before conversion. I tested it on a clip with a slight room hiss, and the cleanup was good enough that I did not need to pre-process the audio separately. 
  • There is also a seed parameter for reproducible dialogue generation, and a voice_settings JSON for overriding stored settings for the given voice. 

Here is the test I keep coming back to: a male source voice converted to Lily, a brand-safe American English female.

Source script (spoken in input audio) Tired of slow campaign approvals? AdLoop ships ten ad variants in under a minute, all on brand, all ready to ship.

Parameters voice: Lily  |  voice_id: pFZP5JQG7iQjIQuC4Bku  |  model_id: eleven_english_sts_v2

Source (Bill, male)

Converted (Lily, female)

Same words, same cadence, same emphasis on "under a minute" and "ready to ship". Different speaker.

Listen to the pacing in both clips. The pause before "all ready to ship" is preserved exactly. That is the part that matters for ads: you keep the performance the director already approved.

Use Case 1: AI Voice Changer for Marketing Agencies 

The pattern I see across marketing agencies is the production of variants. A creative director records one approved voiceover for an ad. They want 15 versions for 15 markets, 3 brand voices for A/B testing in the same market, or a male-and-female pair for paired social ads. Re-recording is expensive and slow. Hiring fifteen different voice actors is impractical. So the work tends to get cut.

The voice changer fixes this. You record the approved take once with any internal voice. You then run it through the voice changer with whichever target voices the campaign needs. The performance stays consistent because it is the same recording underneath. The brand voice varies by market or by variant.

For agency workflows, the eleven_english_sts_v2 model is the one I default to for English campaigns. It is sharper than the multilingual variant on US English, and the bus-to-bus latency on Segmind is consistently under 7 seconds. The total cost of the test above was lower. At that price, you stop worrying about how many variants the campaign team asks for.

Use Case 2: Voice Changer for Movie Making, ADR, and Dubbing 

Film and television studios spend a lot of money on automated dialogue replacement (ADR) and on dubbing. ADR is when an actor re-records a line in the studio because the on-set audio was unusable. Dubbing is when a different actor reads the same lines in a different language. Both are slow because they need a person in a booth.

The voice changer compresses one piece of that workflow. The director or the editor can voice the line themselves, in any voice, with the exact emotion and timing the cut needs. They then run it through the voice changer with the target actor's voice (or a cloned voice on ElevenLabs) and get back a take in the right voice with the right performance. 

It is a temp track or a preview track in most pipelines today. Some indie productions are using it as the final track. The quality is good enough that the conversation has shifted from "can this work?" to "where should the human still be in the loop?"

Source script (spoken in input audio) I told you we'd find it. Now we walk away, and we never speak of this place again.

Parameters voice: Charlotte  |  voice_id: XB0fDUnXU5powFXDhCwa  |  model_id: eleven_multilingual_sts_v2

Source (George, male)

Converted (Charlotte, female)

Same dramatic delivery and pause structure. The voice and apparent gender of the speaker change.

I used eleven_multilingual_sts_v2 for this one. For dialogue-heavy work where you might want to dub the same line into Spanish or French later, the multilingual model is the safer default. The same input audio can be re-run with the language code on the target voice, and you get a coherent set of takes across languages.

Use Case 3: Voice Changer for Production Houses and MCNs 

MCNs and YouTube production houses are the use cases that most surprised me when I started looking. A multi-channel network might run a single creative producer who writes and voices a five-minute script. They then need that script to ship on three different channels with three different "host" personas. Today, that means writing three scripts and recording three times.

With the voice changer, the producer writes one script, records it once with their own voice or a temp voice, and then routes it through three different voice changer calls with three different target voices. 

Each channel ships with its branded host persona. The script and the underlying performance are identical, which is actually a positive: the editorial intent is preserved across all three channels.

Source script (spoken in input audio) If you only do one thing today, write down the three tasks that actually move the needle.

Parameters voice: Brian  |  voice_id: nPczCjzI2devNBz1zQrb  |  model_id: eleven_english_sts_v2

Source (Aria, female creator)

Converted (Brian, male host)

Female creator narration converted to a male MCN host. Same emphasis on "move the needle".

The MCN math is straightforward. Three channels, five videos per week, average voiceover length around four minutes. That is 12 voice-overs per week per channel, 36 in total. 

At the multilingual rate of $0.00375 per second, each four-minute clip costs about $0.9. Doing the same three channels with three voice actors would run several thousand dollars a month.

How Much Does the ElevenLabs Voice Changer Cost?

Pricing on the voice changer is duration-based, not character-based, which is the right model for speech-to-speech work. The rate is $0.0037 per second of output audio. Since speech-to-speech preserves the input length, that means $0.0037 per second of input as well.

Use case 

Typical clip length 

Per-call cost 

Monthly volume 

Monthly spend 

Short social ad VO 

15 seconds 

$0.056 

500 variants 

$28.125 

YouTube intro/tag 

30 seconds 

$0.11 

200 clips 

$22.5 

MCN long-form VO 

4 minutes 

$0.9 

150 clips 

$135 

Podcast segment / ADR 

2 minutes 

$0.45 

100 clips 

$45 

A few cost-optimization tips from my testing. 

  • First, trim silence from your source audio before sending it. The model bills on output duration, and any trailing silence in the input becomes paid silence in the output. 
  • Second, if you are running the same source against multiple target voices for A/B tests, you are paying per call, not per unique audio, so keep the variant count tight in early experiments. 
  • Third, use the seed parameter when you find a take you like. It makes the output reproducible, which matters when a client asks for "the same thing but five seconds longer".

Check out ElevenLabs Voice Changer pricing to create multiple high-quality voiceovers without the hassle of re-recording!

ElevenLabs Voice Changer Review: Strengths and Limitations 

Where the voice changer is strong: 

Timing fidelity, emotional fidelity, and the breadth of the available voice library. The model preserves micro-pauses and inflection so faithfully that you stop noticing it is a swap. The official ElevenLabs voices cover most production needs out of the box, and cloned voices give you a path to your own brand voice.

Where I would push back: 

The model copies the source performance exactly, which is great when the source performance is good, and a problem when it is not. If your input mumbles a word, the output mumbles in the new voice. If the input has a heavy accent, traces of it can leak into the output, especially on the multilingual model. 

Best practice is to make sure the source recording is clean and well-paced before you convert. Treat it like ADR: the take you record going in is the take you ship.

FAQs

What is the ElevenLabs voice changer used for?

The ElevenLabs voice changer converts speech from one voice to a different target voice, preserving the original's timing and emotion. It is used for ad localization, ADR, dubbing in film, multi-channel voiceover for MCNs, and any workflow where the same performance needs to be shipped in multiple voices.

Is the ElevenLabs voice changer free?

The ElevenLabs voice changer is paid. On Segmind, it costs $0.00375 per second of output audio. There is no free tier.

Which ElevenLabs voice changer model should I use for English versus multilingual content?"

The eleven_multilingual_sts_v2 model supports multiple languages. The eleven_english_sts_v2 model is English-only and produces sharper output on English content.

How does the ElevenLabs voice changer compare to TTS?

Text-to-speech starts from text, and the model invents the delivery. The voice changer starts from the speech you already recorded and only swaps the voice, preserving the original timing and emotion. Use TTS when you do not have source audio. Use the voice changer when you want to revoice.

Can the ElevenLabs voice changer clone my own voice?

Yes, indirectly. You clone your voice on ElevenLabs first, using their voice cloning tools, which gives you a custom voice ID. You then pass that voice ID to the Segmind sts-eleven-labs endpoint as the voice_id parameter, and the output will be in your cloned voice.

Conclusion

The ElevenLabs Voice Changer works best when your workflow is built around efficiency and consistency, not manual repetition or multiple takes for different channels. Teams that leverage it effectively spend less time re-recording, managing voiceover variants, or troubleshooting timing inconsistencies and more time focusing on delivering high-quality audio content across ads, videos, or multi-channel productions. 

So, why wait? Explore the Segmind’s ElevenLabs Voice Changer model to experiment with voice conversion!