Extract .srt ”quickly” from large .mkv file

Hey everyone,

I’m dealing with a scenario involving 100 .mkv files, each around 50GB, and each of these files contains 3-5 .srt files in different languages. The extraction methods I’ve used so far take about 1 hour to extract all the .srt files for just one .mkv file.

I’m reaching out for your expertise on the best approach to solve this efficiently. If you could provide a step-by-step guide or share any insights and suggestions, I would greatly appreciate it.

Thanks in advance for your help!

Welcome!

In a Matroska file all frames of all tracks are spread out over the whole file & interleaved by their timestamps. Furthermore there are only indexes that map frames to their positions in the file for some tracks, usually only for the video track. That means that it’s impossible to determine where all the subtitle frames are without reading the whole file.

In short: you cannot really speed this process up.

Thank you for your quick reply!

Do you have any suggestion on how to solve the dilemma? Is an hour a reasonable time to perform this task for that file size or have I just not found the right method yet? :slight_smile:

Extraction speed very much depends on your storage speed (spinning HDD vs SSD vs USB thum drive-style SRAM), the way the storage is attached (SATA/m.2 vs USB vs network attached), what else is demanding I/O speed at the same time, other factors (e.g. storage encryption combined with a weak CPU will yield low speeds, HDDs with perpendicular technology, if your disks are in a RAID and if so at what RAID level etc.).

Granted, mkvextract & the whole of MKVToolNix isn’t known for its speed. However, one hour per 50 GB sounds really long. I’ve just taken a random Blu-ray with a 38 GB playlist item, created a 38 GB Matroska file from it. Extracting subs from them takes 5.5 minutes. This file is located on a locally-attached (SATA) RAID 6 of non-perpendicular, 7200 RPM HDDs; definitely not the fastest type of storage. I’ve also made sure to clean all OS-level caches before extraction.

You said “the extraction methods I’ve used so far” without specifying what it was. One possible issue could be that you’re running mkvextract once for each track to extract, extracting a single track at a time, instead of only running it once & extracting all the tracks within a single run. In effect, don’t do this:

mkvextract source.mkv tracks 2:t02.srt
mkvextract source.mkv tracks 3:t03.srt
mkvextract source.mkv tracks 4:t04.srt

Do this instead:

mkvextract source.mkv tracks 2:t02.srt 3:t03.srt 4:t04.srt

The difference should be obvious, given what I said how this all worked: in the first example the program would have to read 150 GB from your drives vs 50 GB in the second example.

I’ve tried “MKV Extract” and “MKVToolNix Batch Tool”.

I will try and look into “mkvextract” and follow your tips. Will get back when I’ve experimented :slight_smile:

Neither of those two are one of my own tools. In fact, the “MKV Extract” you’ve linked to states that it’s using a WASM port of ffmpeg — meaning it doesn’t have anything to do with MKVToolNix at all. I definitely cannot help you with it.

Understood! But I will try your MKV Extract and see how it goes :slight_smile:

Is it possible to run mkvextract on MAC? If yes, how do I install it?

If it only works for Windows, is there an installation guide somewhere? :slight_smile:

mkvextract is part of MKVToolNix. There are several ways to use it on a Mac, including but not limited to:

  1. Download the disk image I provide & mount it. While mounted, you can run it as /Volumes/MKVToolNix-*/MKVToolNix-*.app/Contents/MacOS/mkvextract <arguments>
  2. You can install MKVToolNix with the Homebrew package manager. Afterwards you should be able to run it just by typing mkvextract <arguments>