Extract .srt ”quickly” from large .mkv file

Iluvatar77 · November 23, 2023, 11:37am

Hey everyone,

I’m dealing with a scenario involving 100 .mkv files, each around 50GB, and each of these files contains 3-5 .srt files in different languages. The extraction methods I’ve used so far take about 1 hour to extract all the .srt files for just one .mkv file.

I’m reaching out for your expertise on the best approach to solve this efficiently. If you could provide a step-by-step guide or share any insights and suggestions, I would greatly appreciate it.

Thanks in advance for your help!

mbunkus · November 23, 2023, 11:47am

Welcome!

In a Matroska file all frames of all tracks are spread out over the whole file & interleaved by their timestamps. Furthermore there are only indexes that map frames to their positions in the file for some tracks, usually only for the video track. That means that it’s impossible to determine where all the subtitle frames are without reading the whole file.

In short: you cannot really speed this process up.

Iluvatar77 · November 23, 2023, 12:00pm

Thank you for your quick reply!

Do you have any suggestion on how to solve the dilemma? Is an hour a reasonable time to perform this task for that file size or have I just not found the right method yet?

mbunkus · November 26, 2023, 5:47pm

Extraction speed very much depends on your storage speed (spinning HDD vs SSD vs USB thum drive-style SRAM), the way the storage is attached (SATA/m.2 vs USB vs network attached), what else is demanding I/O speed at the same time, other factors (e.g. storage encryption combined with a weak CPU will yield low speeds, HDDs with perpendicular technology, if your disks are in a RAID and if so at what RAID level etc.).

Granted, mkvextract & the whole of MKVToolNix isn’t known for its speed. However, one hour per 50 GB sounds really long. I’ve just taken a random Blu-ray with a 38 GB playlist item, created a 38 GB Matroska file from it. Extracting subs from them takes 5.5 minutes. This file is located on a locally-attached (SATA) RAID 6 of non-perpendicular, 7200 RPM HDDs; definitely not the fastest type of storage. I’ve also made sure to clean all OS-level caches before extraction.

You said “the extraction methods I’ve used so far” without specifying what it was. One possible issue could be that you’re running mkvextract once for each track to extract, extracting a single track at a time, instead of only running it once & extracting all the tracks within a single run. In effect, don’t do this:

mkvextract source.mkv tracks 2:t02.srt
mkvextract source.mkv tracks 3:t03.srt
mkvextract source.mkv tracks 4:t04.srt

Do this instead:

mkvextract source.mkv tracks 2:t02.srt 3:t03.srt 4:t04.srt

The difference should be obvious, given what I said how this all worked: in the first example the program would have to read 150 GB from your drives vs 50 GB in the second example.

Iluvatar77 · November 26, 2023, 7:53pm

I’ve tried “MKV Extract” and “MKVToolNix Batch Tool”.

I will try and look into “mkvextract” and follow your tips. Will get back when I’ve experimented

mbunkus · November 26, 2023, 8:55pm

Neither of those two are one of my own tools. In fact, the “MKV Extract” you’ve linked to states that it’s using a WASM port of ffmpeg — meaning it doesn’t have anything to do with MKVToolNix at all. I definitely cannot help you with it.

Iluvatar77 · November 26, 2023, 9:53pm

Understood! But I will try your MKV Extract and see how it goes

Iluvatar77 · December 16, 2023, 1:09pm

Is it possible to run mkvextract on MAC? If yes, how do I install it?

If it only works for Windows, is there an installation guide somewhere?

mbunkus · December 16, 2023, 1:23pm

mkvextract is part of MKVToolNix. There are several ways to use it on a Mac, including but not limited to:

Download the disk image I provide & mount it. While mounted, you can run it as /Volumes/MKVToolNix-*/MKVToolNix-*.app/Contents/MacOS/mkvextract <arguments>
You can install MKVToolNix with the Homebrew package manager. Afterwards you should be able to run it just by typing mkvextract <arguments>

fbestenli · March 11, 2025, 8:29pm

How do video players like VLC or PotPlayer play MKV files with subtitles without reading the entire file?
If they can seek to any position and display subtitles on demand, why can’t subtitle extraction tools do the same? Why can’t those tools seek like video players to avoid parsing the whole file?

Is it because of this?:

Subtitles are loaded lazily: the player only needs subtitles near the current playback time. It scans forward/backward locally (not the whole file) to find subtitle entries synchronized with the video.
Since subtitles lack their own indexes, their exact positions are unknown without parsing the entire interleaved data stream.

mbunkus · March 11, 2025, 8:34pm

It’s a combination of two things:

Indexes are completely optional in Matroska. mkvmerge does create them for subtitle tracks, but there’s no guarantee that all subtitle frames are actually listed in an index. Therefore only using indexes would risk missing entries during extraction.
Matroska groups a lot of content in what’s called a “cluster” (in order to save space in frame header metadata, basically). Reading can only ever start from the start of such a cluster. Therefore the indexes refer to the clusters, and that’s where players start playback from when they seek (they often discard content from the cluster that occurs before the desired seek position). The effect is, though, that for subtitles that aren’t particularly sparse all or most clusters have to be read anyway. Meaning there wouldn’t be much to gain.

Those two are technical reasons. There’s a third one: I’m completely uninterested in spending time on such functionality due to the technical reasons listed above.