AIVoiceSeparator
๐ŸŽ™๏ธ Open the app

How to Make a Karaoke Version of Any Song

You don't need to buy a karaoke disc or wait for someone to upload your favorite track. With AI vocal removal and automatic lyric transcription you can turn almost any song into a polished, lyric-synced karaoke version in about ten minutes โ€” for free.

๐ŸŽค Removes the lead vocal cleanly ๐Ÿ“ Auto-synced LRC lyrics ๐ŸŽš๏ธ Pitch-shift to your range

What "making a karaoke track" actually means

A karaoke version is really two things working together: an instrumental (the song with the lead vocal removed) and a set of time-synced lyrics that scroll in time so you know when to come in. Older karaoke systems shipped these as proprietary MIDI or CDG files. Today you can build the same thing from any recording you have the rights to use, because two problems that used to be hard โ€” separating the voice from the music, and lining lyrics up to the beat โ€” are now solved by AI.

The AI karaoke maker on AIVoiceSeparator handles both in one pass. It runs a three-model ensemble to strip the vocal, and it can run Whisper on the isolated vocal stem to generate a karaoke-ready .lrc file with timestamps. The rest of this guide walks through the full workflow, including how to pitch-shift the result to your own range and which players actually read karaoke lyric files.

Make a karaoke track from any song

๐ŸŽค Open the karaoke maker

Free 1 song/day ยท no signup ยท Patreon Pro = 20 songs/day

Step 1 โ€” Add your song (upload or paste a URL)

Open the AIVoiceSeparator app and choose how to add your track. You can drag in an audio file (MP3, WAV, FLAC, M4A and more, up to 100 MB / 15 minutes), or switch to the URL tab and paste a link. URL support covers YouTube, YouTube Music, SoundCloud and TikTok โ€” handy if the song you want lives on the YouTube vocal remover side of things rather than as a file on your disk. The server downloads the audio with yt-dlp, so you never have to fish an MP3 out of a sketchy converter site first.

For the cleanest karaoke result, start from the highest-quality source you can. A 320 kbps stream or a lossless file gives the separation model more detail to work with than a tinny 96 kbps rip, and that difference is audible in the final instrumental.

Step 2 โ€” Turn on "Generate lyrics" before you process

This is the step most people skip and then wish they hadn't. Before you hit the separate button, enable the Generate lyrics toggle. When it's on, the service runs Whisper โ€” an open speech-recognition model โ€” on the isolated vocal stem, not the full mix. Running transcription on the clean vocal (instead of the original song) is the trick that makes the timing accurate: there's no instrumentation for the model to mishear as words.

You get three files back:

Whisper supports a wide range of languages, so this works for Thai, Japanese, Korean, Chinese, Spanish and dozens more โ€” more on multi-language karaoke below.

Step 3 โ€” Separate the song

Click separate and let the AI work. AIVoiceSeparator's Studio mode runs a weighted ensemble of three models โ€” BS-Roformer, Mel-Band Roformer and MDX23C โ€” rather than a single network. Ensembling smooths out the artifacts any one model would leave behind, and the result is measured at roughly 12.97 dB SDR, a meaningful step above the older Demucs baseline. A typical five-minute song finishes in about six minutes on the GPU. If you're curious how that separation actually works under the hood, see our explainer on how AI vocal separation works.

While the job runs, the service also detects the song's BPM and musical key โ€” both genuinely useful for karaoke, because they tell you the tempo you'll be singing to and whether the key sits comfortably in your range.

Step 4 โ€” Download the instrumental (and the LRC)

When the job completes, preview the stems in the browser, then download what you need:

If your goal is purely the backing track and you don't care about lyrics, the dedicated instrumental extractor does the same separation with a workflow tuned for that one output.

Step 5 โ€” Pitch-shift the instrumental to your range (optional)

The original key is whatever the artist recorded in, and that's not always where your voice lives. If the chorus is screaming out of your reach, transpose the whole instrumental down a few semitones; if it sits too low and sounds muddy, nudge it up. Two or three semitones in either direction is usually enough to move a song into a comfortable range without sounding obviously processed.

Most modern karaoke players (KaraFun, Walaoke) have a built-in key control, so you can change pitch on the fly without re-rendering anything. If you'd rather bake the new key into the file, any audio editor โ€” Audacity is free โ€” can pitch-shift while preserving tempo. Because you downloaded a lossless WAV in the previous step, you can pitch-shift without stacking the lossy artifacts you'd get from re-encoding an MP3 over and over.

Step 6 โ€” Play it in a karaoke app

An LRC file only does its job if your player knows how to read it. The convention almost every player follows is simple: name the lyric file the same as the audio file and keep them in the same folder โ€” song.mp3 next to song.lrc. Here are the common options:

๐ŸŽฌ VLC

The free, cross-platform standby. With same-name LRC files (and a lyrics extension enabled) VLC scrolls synced lyrics over your instrumental. Works on Windows, macOS, Linux, Android and iOS.

๐ŸŽน KaraFun

A purpose-built karaoke player with on-the-fly key and tempo controls and a big highlight-style lyric display. Great for living-room karaoke nights.

๐ŸŽค Walaoke

A lightweight Windows karaoke player popular for home setups; loads your instrumental plus the matching LRC and shows scrolling, color-highlighted lyrics.

๐ŸŽต MiniLyrics

A lyrics plug-in that hooks into players like foobar2000 and reads LRC timing, displaying synced lyrics as the track plays.

If you just want lyrics burned onto a video for a karaoke screen, use the SRT file instead and add it as a subtitle track in any video player or editor.

Tips for clean karaoke results

Multi-language karaoke (Thai, Japanese, Korean and more)

One of the biggest advantages of building karaoke from AI rather than relying on a karaoke catalog is language coverage. Commercial karaoke libraries are deep for English and a handful of major markets and thin everywhere else. Because the lyrics here come from Whisper, the workflow handles Thai, Japanese, Korean, Mandarin, Cantonese, Spanish, Indonesian, Vietnamese and dozens of other languages โ€” including songs that no karaoke service has ever produced.

The vocal-removal step is language-agnostic: the separation model doesn't care what's being sung, only that it's a human voice in the mix. So a Thai luk thung ballad or a J-pop single separates exactly as well as an English chart hit. For non-Latin scripts, double-check the transcription, since rare words and stylized spellings are where automatic transcription is most likely to slip.

A quick legal note

Making a karaoke version for your own practice or a private get-together is generally treated as personal use. Selling karaoke tracks you built from someone else's recording, uploading them publicly, or performing them commercially involves the rights holders' permission and is a different matter. You are responsible for having the rights to whatever you process. See our terms of use for the full picture. On the privacy side: every job โ€” your upload and the stems we create โ€” is deleted automatically after 24 hours, and your audio is never used to train AI models.

Frequently asked questions

Is making a karaoke track really free?

Yes. Anonymous users get 1 song per day at full Studio quality, including the lyric generation. Patreon Pro raises the limit to 20 songs per day with priority queueing.

What exactly is an LRC file?

It's a plain-text lyrics file where each line is prefixed with a timestamp like [01:14.30]. Karaoke players read those timestamps to scroll and highlight lyrics in sync with the music.

Can I change the key to fit my voice?

Yes. Download the lossless instrumental and either use your karaoke player's built-in key control or pitch-shift it in a free editor like Audacity. Two to three semitones is usually plenty.

Will the instrumental have any leftover vocals?

The three-model ensemble removes the lead vocal cleanly on most studio tracks. Dense backing-vocal stacks and live recordings can leave faint traces; results vary by song.

Does this work for non-English songs?

Yes. Vocal removal is language-agnostic, and Whisper transcribes Thai, Japanese, Korean, Chinese, Spanish and many more languages for the lyric file.

How long do you keep my files?

Every job is deleted after 24 hours. We never use your audio for AI training.

Related tools and reading

Ready to sing? Build your karaoke track now

๐ŸŽค Open the karaoke maker

Free, no signup, no watermark โ€” 1 song every 24 hours