How to detect split seek in ebml

I’m in the unenviable position of needing to edit an unknown number of files’ seek elements in the file ebml header. The files disallow ff/rev in media servers like Emby, Jellyfin, and
Plex. I’ve found that running mkvmerge in a script, or running mkclean will make the needed change. Howevr, I have many, many files and while I could write a script to make the changes, I don’t know a speedy way to detect the files that need changing. Currently, I’m looking at github scripts/software to edit ebml headers.
If anyone has any ideas, please let me know. Thank you.

Welcome!

I don’t have a solution, but I do have a couple points for you to consider.

The first point is about you saying that you need to know which files to process. That need doesn’t change with the tools you use. If you don’t like needing to know which file to modify for mkvmerge, then you won’t like having to know that with any other potential piece of software.

The second point is about how to detect such things. In general this isn’t really possible, for a variety of reasons. Primarily one would need to know what exactly those players miss/don’t like about a file. There’s no way to know for sure without having access to the source code (or a qualified comment from one of their developers), and you won’t get neither. On top of that it’s highly likely that different players have different levels of support and will therefore choke for different reasons.

There are several conditions that might break things for a lot of players, including but not limited to:

  • no meta seek elements
  • no cues elements
  • cues elements not referenced from the first meta seek element
  • wrong positions within the meta seek elements

That being said, the well-known Ogg media container (used for things such as Vorbis or Opus) doesn’t even have an index, and players can seek in those files just fine. When designing Matroska we made sure the same type of algorithm can be applied to seeking in Matroska, too — it’d be just a tad less efficient & quick than using the cues for seeking.

My last point is one about time investment. You already know a solution that works well (remuxing with mkvmerge which is used by tons of people for this particular purpose, “normalizing” files). You’d have to spend a bit more time implementing a script to loop over all your files, but after that it’s just a matter of letting it run for a bit.

On the other hand you have searching for, testing different programs, trying to understand what your players don’t like about affected files etc. Even if you might find a program that can actually do that (I don’t know of one, otherwise I would’ve already mentioned it), it might not be automatable easily, or not at all. Even in the bast case you’re likely to invest an order of magnitude here than just biting the bullet & remuxing every file as I said above.

Whatever you decide, good luck.

Currently, my strategy is to run mkvinfo or mkvalidator and catch the output in a python or bash script. The output will give me the info I need to decide whether to process the file.
Once the seek data indicator is receieved, if received, I can ‘killall’(bash) or similar, and add the filename to a list in text file, depending on the data receiived.
My hope was to find software to let me examine the file directly and automate the detection and edits in one script, a more elegant solution. I can do this piecemeal though.
Let me clarify, the issue is in the split seekhead element. One part at the beginning of the file and one part at the end of the file. Programs like mkclean remove the problem of the second piece. My issue arises from the incomplete/incorrect implementation of the Mastroska standard by media server software.
Oh, yes, ‘running it a bit’, try several thousand iterations.

Thank you and
Have a good day.
J.

That should be easy: run mkvinfo -v -v yourfile.mkv | sed -e '/+ Cluster/q' > output.txt& analyze the contents of output.txt. It’ll contain the seek head sub-elements but stop after finding the first cluster (that’s what the seq call does: it exits as soon as it finds that pattern, leading the shell to kill mkvinfo, too, as it cannot write to its stdout anymore).