I’m dealing with a scenario involving 100 .mkv files, each around 50GB, and each of these files contains 3-5 .srt files in different languages. The extraction methods I’ve used so far take about 1 hour to extract all the .srt files for just one .mkv file.
I’m reaching out for your expertise on the best approach to solve this efficiently. If you could provide a step-by-step guide or share any insights and suggestions, I would greatly appreciate it.
In a Matroska file all frames of all tracks are spread out over the whole file & interleaved by their timestamps. Furthermore there are only indexes that map frames to their positions in the file for some tracks, usually only for the video track. That means that it’s impossible to determine where all the subtitle frames are without reading the whole file.
In short: you cannot really speed this process up.
Do you have any suggestion on how to solve the dilemma? Is an hour a reasonable time to perform this task for that file size or have I just not found the right method yet?
Extraction speed very much depends on your storage speed (spinning HDD vs SSD vs USB thum drive-style SRAM), the way the storage is attached (SATA/m.2 vs USB vs network attached), what else is demanding I/O speed at the same time, other factors (e.g. storage encryption combined with a weak CPU will yield low speeds, HDDs with perpendicular technology, if your disks are in a RAID and if so at what RAID level etc.).
Granted, mkvextract & the whole of MKVToolNix isn’t known for its speed. However, one hour per 50 GB sounds really long. I’ve just taken a random Blu-ray with a 38 GB playlist item, created a 38 GB Matroska file from it. Extracting subs from them takes 5.5 minutes. This file is located on a locally-attached (SATA) RAID 6 of non-perpendicular, 7200 RPM HDDs; definitely not the fastest type of storage. I’ve also made sure to clean all OS-level caches before extraction.
You said “the extraction methods I’ve used so far” without specifying what it was. One possible issue could be that you’re running mkvextract once for each track to extract, extracting a single track at a time, instead of only running it once & extracting all the tracks within a single run. In effect, don’t do this:
The difference should be obvious, given what I said how this all worked: in the first example the program would have to read 150 GB from your drives vs 50 GB in the second example.
Neither of those two are one of my own tools. In fact, the “MKV Extract” you’ve linked to states that it’s using a WASM port of ffmpeg — meaning it doesn’t have anything to do with MKVToolNix at all. I definitely cannot help you with it.
mkvextract is part of MKVToolNix. There are several ways to use it on a Mac, including but not limited to:
Download the disk image I provide & mount it. While mounted, you can run it as /Volumes/MKVToolNix-*/MKVToolNix-*.app/Contents/MacOS/mkvextract <arguments>
You can install MKVToolNix with the Homebrew package manager. Afterwards you should be able to run it just by typing mkvextract <arguments>
How do video players like VLC or PotPlayer play MKV files with subtitles without reading the entire file?
If they can seek to any position and display subtitles on demand, why can’t subtitle extraction tools do the same? Why can’t those tools seek like video players to avoid parsing the whole file?
Is it because of this?:
Subtitles are loaded lazily: the player only needs subtitles near the current playback time. It scans forward/backward locally (not the whole file) to find subtitle entries synchronized with the video.
Since subtitles lack their own indexes, their exact positions are unknown without parsing the entire interleaved data stream.
Indexes are completely optional in Matroska. mkvmerge does create them for subtitle tracks, but there’s no guarantee that all subtitle frames are actually listed in an index. Therefore only using indexes would risk missing entries during extraction.
Matroska groups a lot of content in what’s called a “cluster” (in order to save space in frame header metadata, basically). Reading can only ever start from the start of such a cluster. Therefore the indexes refer to the clusters, and that’s where players start playback from when they seek (they often discard content from the cluster that occurs before the desired seek position). The effect is, though, that for subtitles that aren’t particularly sparse all or most clusters have to be read anyway. Meaning there wouldn’t be much to gain.
Those two are technical reasons. There’s a third one: I’m completely uninterested in spending time on such functionality due to the technical reasons listed above.