AI Audio Inpainting

Replace a section of audio with AI-synthesized speech that matches the surrounding voice. Fix a bad take without re-recording the whole thing.

Ma lihin codadka TTS ee afkaaga weli. Na caawi inaad ku darto kuwaaga! Iibso Codkaaga

Upload Audio to Inpaint

500 characters per second of audio replaced

Riix & riix faylka halkan, ama booqo

Supports MP3, WAV, FLAC, OGG, M4A. Max 50MB. Up to 10 minutes.

file.mp3

0 MB

Source audio — scrub to find the bad take

0.00s / 0.00s

Inpaint Settings

0 / 500 xarfo
How long to blend the splice points. 80ms is the default — match-cuts feel natural, no audible double-trigger.
Sign up free to use audio inpainting
Inpainting audio...

Cloning the voice and synthesizing the replacement...

Slicing → cloning surrounding voice → splicing with crossfade
Qaado waqti? Natiijadaada waxaa ka muuqan doona taariikhda soosaarka markay diyaar u yihiin.
Inpainted Audio Ready

Before (Original)

After (Inpainted)

Download Inpainted Audio

How Audio Inpainting Works

Inpainting is the audio equivalent of Photoshop's content-aware fill. We clone the voice from the audio surrounding your selection, synthesize the new line in that voice, and splice it back with a short crossfade.

Best results: leave at least 3 seconds of clean speech immediately before the edit point so the cloner has good reference material.

Tallaabooyinka ugu Fiican ee Natiijooyinka

  • Keep the marked range as tight as possible — only the bad take
  • Replacement text should be roughly the same length as what it replaces
  • Set the language to match the source audio for best voice match
  • 80ms crossfade is usually invisible; bump to 150ms if you hear a click
  • For long edits (>10s), consider re-recording the whole passage instead

How AI Audio Inpainting Works

Surgical edits, voice-matched, with no re-recording session.

Tallaabada 1aad

Upload + Mark Range

Upload your audio and use the scrubber to mark the start/end of the section you want to replace. Type the replacement text.

Tallaabada 2

Voice Clone + Synthesize

We extract up to 12 seconds of clean reference audio surrounding your selection, clone the speaker's voice, and synthesize the new line in that voice.

Tallaabada 3

Crossfade Splice

The synthesized clip is spliced into the original recording with an equal-power crossfade at both edit points. The boundaries are inaudible.

Audio Inpainting Plans

Bilaash u bilow, kor u qaad markaad u baahan tahay in ka badan

Free
  • Up to 10-minute source files
  • 500-character replacement text
  • 4-second inpaint per request
  • 80ms crossfade splice
  • OpenVoice + CosyVoice 2 backends
Ugu caansan
Free Account
  • Up to 10-minute source files
  • 5,000-character replacement text
  • Tunable crossfade (0-250ms)
  • Voice-model override
  • Generation history + re-edit
Ka diiwaangashan Free
Pro
  • Up to 30-minute source files
  • 100,000-character replacement text
  • Priority GPU queue
  • API access (/v1/audio-inpaint/)
  • Batch inpainting (multiple ranges)
Kordhi

Su'aalaha badanaa la waydiiyo

Audio inpainting (also called audio fill or speech overdub) lets you replace a section of an existing audio recording with new AI-synthesized speech that matches the original voice. It is the audio equivalent of Photoshop's content-aware fill — paint over the part you do not want, type what should be there instead, and the AI generates a seamless replacement.

Mark the time range to replace, type the new line of dialogue, and click Inpaint. Our AI clones the voice from the audio surrounding your selection, synthesizes the new line in that voice, and splices it back into your recording with a short crossfade so the edit is inaudible.

Use it when you have a single bad word, mispronunciation, name slip, swear word, or fact error in an otherwise-good take. Re-recording the entire passage often introduces tonal mismatch with the rest of the project — inpainting fixes only what needs fixing while keeping every other syllable intact.

Free users can inpaint files up to 10 minutes long. Subscribers can inpaint files up to 30 minutes. The replacement text itself is capped at 500 characters for free users, 5,000 for free accounts, and 100,000 for paid plans.

Very close. The AI uses up to 12 seconds of audio surrounding the edit as a voice reference, which is enough for any of our cloning-capable models (OpenVoice, CosyVoice 2) to capture the speaker's timbre, pitch, and speaking style. For best results, leave at least 3 seconds of clean speech immediately before the edit point.

We apply an 80ms equal-power crossfade at both splice points (head→replacement and replacement→tail) by default. You can tune this from 0ms (hard cut) up to 250ms via the Crossfade slider. Longer crossfades hide the edit more thoroughly but can audibly blend overlapping words at the boundary.

Audio inpainting follows the same language coverage as voice cloning. We auto-pick OpenVoice for most languages and CosyVoice 2 for Chinese, Japanese, and Korean. You can override the model in advanced settings.

You are charged 500 characters per second of audio replaced. A 4-second fix costs 2,000 characters. The cost is independent of how long the replacement text is, since the underlying clone synthesis is gated by the run time of the new clip, not the text length.

Per our Terms of Service, you may only inpaint audio you own or have explicit permission to edit. Generating fake quotes, deceptive content, or impersonations is prohibited. We watermark generated audio and log all inpainting jobs for abuse review.

Cutting a clip leaves a noticeable gap in pacing and breath; cross-fading two takes leaves a tonal mismatch. Inpainting fills the gap with speech that matches the surrounding voice, so listeners hear continuous, natural-sounding audio.

Yes — POST to /v1/audio-inpaint/ with the audio file, start_sec, end_sec, and replacement_text. The endpoint returns a job UUID; poll /v1/speech/results/?uuid= to retrieve the inpainted audio when ready. See API docs for details.

ElevenLabs Speech-to-Speech regenerates the entire voice line from scratch in a target voice. Our audio inpainting is surgical: it edits only the marked range, keeps every other byte of your original recording untouched, and matches the new clip to the surrounding voice rather than a separate voice library.
5.0/5 (1)

Maxaa aan ku hagaajin karnaa? Jawaabtaada waxay naga caawisaa inaan xallino dhibaatooyinka.

Fix Your Audio in Seconds

Replace any part of any recording with AI-synthesized speech that matches the original voice. Sign up free to start.