How to Make a Karaoke Version of Any Song

You don't need to buy a karaoke disc or wait for someone to upload your favorite track. With AI vocal removal and automatic lyric transcription you can turn almost any song into a polished, lyric-synced karaoke version in about ten minutes — for free.

🎤 Removes the lead vocal cleanly 📝 Auto-synced LRC lyrics 🎚️ Pitch-shift to your range

What "making a karaoke track" actually means

A karaoke version is really two things working together: an instrumental (the song with the lead vocal removed) and a set of time-synced lyrics that scroll in time so you know when to come in. Older karaoke systems shipped these as proprietary MIDI or CDG files. Today you can build the same thing from any recording you have the rights to use, because two problems that used to be hard — separating the voice from the music, and lining lyrics up to the beat — are now solved by AI.

The AI karaoke maker on AIVoiceSeparator handles both in one pass. It runs a three-model ensemble to strip the vocal, and it can run Whisper on the isolated vocal stem to generate a karaoke-ready .lrc file with timestamps. The rest of this guide walks through the full workflow, including how to pitch-shift the result to your own range and which players actually read karaoke lyric files.

Make a karaoke track from any song

🎤 Open the karaoke maker

Free 3 songs/month · no signup · Patreon Pro = 2 songs/day

Step 1 — Add your song (upload or paste a URL)

Open the AIVoiceSeparator app and choose how to add your track. You can drag in an audio file (MP3, WAV, FLAC, M4A and more, up to 100 MB / 15 minutes), or switch to the URL tab and paste a link. URL support covers YouTube, YouTube Music, SoundCloud and TikTok — handy if the song you want lives on the YouTube vocal remover side of things rather than as a file on your disk. The server downloads the audio with yt-dlp, so you never have to fish an MP3 out of a sketchy converter site first.

For the cleanest karaoke result, start from the highest-quality source you can. A 320 kbps stream or a lossless file gives the separation model more detail to work with than a tinny 96 kbps rip, and that difference is audible in the final instrumental.

Step 2 — Turn on "Generate lyrics" before you process

This is the step most people skip and then wish they hadn't. Before you hit the separate button, enable the Generate lyrics toggle. When it's on, the service runs Whisper — an open speech-recognition model — on the isolated vocal stem, not the full mix. Running transcription on the clean vocal (instead of the original song) is the trick that makes the timing accurate: there's no instrumentation for the model to mishear as words.

You get three files back:

SRT — standard video-subtitle format, with start/end timestamps. Good for putting lyrics on a video.
LRC — the karaoke format. Each line is tagged with a [mm:ss.xx] timestamp so a player can scroll and highlight lyrics in sync with the music.
TXT — plain text, no timing. Useful for printing a lyric sheet or proofreading.

Whisper supports a wide range of languages, so this works for Thai, Japanese, Korean, Chinese, Spanish and dozens more — more on multi-language karaoke below.

Step 3 — Separate the song

Click separate and let the AI work. AIVoiceSeparator's Studio mode runs a weighted ensemble of three models — BS-Roformer, Mel-Band Roformer and MDX23C — rather than a single network. Ensembling smooths out the artifacts any one model would leave behind, and the result is measured at roughly 12.97 dB SDR, a meaningful step above the older Demucs baseline. A typical five-minute song finishes in about six minutes on the GPU. If you're curious how that separation actually works under the hood, see our explainer on how AI vocal separation works.

While the job runs, the service also detects the song's BPM and musical key — both genuinely useful for karaoke, because they tell you the tempo you'll be singing to and whether the key sits comfortably in your range.

Step 4 — Download the instrumental (and the LRC)

When the job completes, preview the stems in the browser, then download what you need:

The instrumental — this is your karaoke backing track. Grab it as lossless WAV or FLAC if you plan to pitch-shift or remix later, or MP3 320 kbps if you just want to sing along.
The .lrc file — your synced lyrics.
Optionally the vocal stem — useful as a reference guide track while you learn the melody.

If your goal is purely the backing track and you don't care about lyrics, the dedicated instrumental extractor does the same separation with a workflow tuned for that one output.

Step 5 — Pitch-shift the instrumental to your range (optional)

The original key is whatever the artist recorded in, and that's not always where your voice lives. If the chorus is screaming out of your reach, transpose the whole instrumental down a few semitones; if it sits too low and sounds muddy, nudge it up. Two or three semitones in either direction is usually enough to move a song into a comfortable range without sounding obviously processed.

Most modern karaoke players (KaraFun, Walaoke) have a built-in key control, so you can change pitch on the fly without re-rendering anything. If you'd rather bake the new key into the file, any audio editor — Audacity is free — can pitch-shift while preserving tempo. Because you downloaded a lossless WAV in the previous step, you can pitch-shift without stacking the lossy artifacts you'd get from re-encoding an MP3 over and over.

Step 6 — Play it in a karaoke app

An LRC file only does its job if your player knows how to read it. The convention almost every player follows is simple: name the lyric file the same as the audio file and keep them in the same folder — song.mp3 next to song.lrc. Here are the common options:

🎬 VLC

The free, cross-platform standby. With same-name LRC files (and a lyrics extension enabled) VLC scrolls synced lyrics over your instrumental. Works on Windows, macOS, Linux, Android and iOS.

🎹 KaraFun

A purpose-built karaoke player with on-the-fly key and tempo controls and a big highlight-style lyric display. Great for living-room karaoke nights.

🎤 Walaoke

A lightweight Windows karaoke player popular for home setups; loads your instrumental plus the matching LRC and shows scrolling, color-highlighted lyrics.

🎵 MiniLyrics

A lyrics plug-in that hooks into players like foobar2000 and reads LRC timing, displaying synced lyrics as the track plays.

If you just want lyrics burned onto a video for a karaoke screen, use the SRT file instead and add it as a subtitle track in any video player or editor.

Tips for clean karaoke results

Start from a clean studio recording. Live versions, acoustic covers with crowd noise, and heavily mastered loud tracks are harder to separate cleanly. A standard studio mix gives the cleanest instrumental.
Watch for backing vocals. Vocal removal targets all vocals, so dense harmony stacks and gang-vocal choruses sometimes leave faint residue or, conversely, pull out harmonies you wanted to keep. There's no perfect answer — listen and pick the result you prefer.
Proofread the LRC. Whisper is strong but not flawless on slang, proper nouns and fast rap. Open the TXT or LRC in any text editor and fix the handful of words it missed.
Keep lossless until the end. Do your pitch-shifting and editing on the WAV/FLAC, then export to MP3 only as the final step.
Match the lyric file name to the audio. The single most common reason lyrics "don't show up" is a filename mismatch.

Multi-language karaoke (Thai, Japanese, Korean and more)

One of the biggest advantages of building karaoke from AI rather than relying on a karaoke catalog is language coverage. Commercial karaoke libraries are deep for English and a handful of major markets and thin everywhere else. Because the lyrics here come from Whisper, the workflow handles Thai, Japanese, Korean, Mandarin, Cantonese, Spanish, Indonesian, Vietnamese and dozens of other languages — including songs that no karaoke service has ever produced.

The vocal-removal step is language-agnostic: the separation model doesn't care what's being sung, only that it's a human voice in the mix. So a Thai luk thung ballad or a J-pop single separates exactly as well as an English chart hit. For non-Latin scripts, double-check the transcription, since rare words and stylized spellings are where automatic transcription is most likely to slip.

A quick legal note

Making a karaoke version for your own practice or a private get-together is generally treated as personal use. Selling karaoke tracks you built from someone else's recording, uploading them publicly, or performing them commercially involves the rights holders' permission and is a different matter. You are responsible for having the rights to whatever you process. See our terms of use for the full picture. On the privacy side: every job — your upload and the stems we create — is deleted automatically after 24 hours, and your audio is never used to train AI models.

Frequently asked questions

Is making a karaoke track really free?

Yes. Anonymous users get 3 songs per month at full Studio quality, including the lyric generation. Patreon Pro raises the limit to 2 songs per day with priority queueing.

What exactly is an LRC file?

It's a plain-text lyrics file where each line is prefixed with a timestamp like [01:14.30]. Karaoke players read those timestamps to scroll and highlight lyrics in sync with the music.

Can I change the key to fit my voice?

Yes. Download the lossless instrumental and either use your karaoke player's built-in key control or pitch-shift it in a free editor like Audacity. Two to three semitones is usually plenty.

Will the instrumental have any leftover vocals?

The three-model ensemble removes the lead vocal cleanly on most studio tracks. Dense backing-vocal stacks and live recordings can leave faint traces; results vary by song.

Does this work for non-English songs?

Yes. Vocal removal is language-agnostic, and Whisper transcribes Thai, Japanese, Korean, Chinese, Spanish and many more languages for the lyric file.

How long do you keep my files?

Every job is deleted after 24 hours. We never use your audio for AI training.