Question about mkvmerge and the --append-mode option

Dear all,

at first, I’d like to say a big thanks for the MKVToolnix suite! Until now, I have used ffmpeg to convert MPLS playlists or M2TS files to MKV, but I plan to switch to MKVToolnix for several reasons.

As a first step, I have read a great part of the mkvmerge documentation and noticed something that I don’t understand in the description of the --append-mode option. From the documentation:

When mkvmerge appends a track (called track2_1 from now on) from a second file (called file2) to a track (called track1_1) from the first file (called file1) then it has to offset all timestamps for track2_1 by an amount. For file mode this amount is the highest timestamp encountered in file1 even if that timestamp was from a different track than track1_1. In track mode the offset is the highest timestamp of track1_1.

As far as I have understood, a timestamp always relates to the begin of an object, not to its end. If that is the case, shouldn’t the offset be the highest timestamp plus the duration of one frame (instead of the highest timestamp itself)? Otherwise, the last frame from track1_1 would be overwritten by the first frame of track2_1, and hence would be lost.

I would be grateful if somebody could explain whether that’s a minor glitch in the documentation or whether I am wrong.

Thank you very much in advance, and best regards!

Welcome!

Yes, and it is calculated that way. The documentation is just inaccurate/incomplete.

In general you don’t need to set the append mode. For Blu-rays, don’t set it, use a playlist as the source. Don’t use the M2TS files directly as source files.

No, that’s wrong. Content will not be overwritten/removed. The only thing that happens is that later content will be delayed too much, causing a short gap which might be noticeable during playback.

Wow! Thank you very much for the Welcome and for the fast answer.

You are right. “Overwritten” is the total wrong wording; that was my fault. Of course, assigning a timestamp that is too early to a frame wouldn’t remove that frame from the file.

However, I currently don’t understand how a timestamp that is too early could lead to delays or gaps. You have explained that the docs are inaccurate and that it works as expected, but let’s assume for a moment that this is not the case.

Then (for example) the last video frame of the first input file would be at 40.040, and the first video frame of the second input file would also be at 40.040. I would then expect that the former isn’t shown at all, because the latter is already presented as soon as the player reaches 40.040 (that’s what I actually was meaning by “overwritten”).

Is this not the case? Where does a delay occur? Will the player present the first frame at 40.040 with full frame duration no matter what, then the second frame with the same timestamp with full frame duration, thereby shifting the actual presentation time of all following frames by one frame duration behind their timestamp? That would seriously worry me (and would make my second post (feature request) that I have written in the meantime totally useless) :slight_smile:

Thanks to everybody for any insights!

Let’s assume you have two files you want to append, each with an audio & a video track. For the same of simplicity let’s assume the video frame is recorded at 25 FPS (meaning a duration of 40ms per frame), and the audio frame has a frame duration of 33ms.

Let’s further assume that the first audio & video frames of the second file start at 0ms.

Now back to the first file: let’s assume the video frame has 5 frames. This means that the last frame’s timestamp is (5 - 1) * 40ms = 160ms; the video track’s duration is obviously last frame’s timestamp + frame’s duration → 160ms + 40ms = 200ms (or 5 * 40ms).

Similarly for the audio track, let’s assume 6 frames, meaning the last one starts at 5 * 33ms = 165ms & ends at 6 * 33mds = 188ms. That leaves a gap of 12ms between the end of the audio track & the end of the video track.

Now the question is: how many ms does mkvmerge add to the timestamps of the audio & video frames from the second file?

With the default append mode, file, mkvmerge uses the duration of the whole file. This is the maximum of all the end timestamps of all the frames in the first file; in this case, the maximums are 188ms (audio track) & 200ms (video track), ergo the file’s duration is 200ms. This is the value mkvmerge will add 200ms to all timestamps coming from the second file.

For the video track this means that there’s no gap in the playback as the first frame from the second file will start right after the last frame of the first file ends (both at 200ms). This isn’t true for the audio track, though: the last frame of the first file ends at 188ms, but the first frame of the second file will be played at 200ms, leaving a gap of 12ms for which there’ll be no content to play.

Players handle such gaps differently, often by shortening the duration of the video frames displayed as gaps in audio playback are much more noticeable than one sped-up video frame.

With the alternative append mode, track, mkvmerge will not use the first file’s duration for that offset. Instead it’ll ensure that each track has a continuous stream of frames. In other words, while the handling of the video track will stay the same, a delay of 188ms instead of 200ms will be used for the audio track, ensuring there’s no gap.

You might wonder why mkmverge doesn’t do this by default. The answer is that users often append content that is synchronized properly within each source file. This synchronization is lost in track append mode. Put differently: if you watch the second file, the first audio & the first video frames must be started at the same timestamp in order for their content to appear synchronized. If you use track append mode with such a file, they would not be started at the same timestamp but 12ms apart (audio earlier than video). Therefore the content of the second file would appear not to be synchronized when watching the appended file.

Such content is used way more often than content that’s the result of a splitting operation created with mkvmerge, which is what track mode is more suitable for.

Thank you very much for the detailed explanation! Now I have understood what you have meant by gaps in your first post.

What you have written about the player behavior in case of such gaps was particularly interesting for me. Without doubt, it is correct that a sped-up video frame is harmless compared to a gap in (loud) audio. But the problem is more complex than it seems in the first place:

If we have multiple audio tracks with different codecs (e.g., an EAC3 track with 32ms frame duration and a DTSHD-MA track with roughly 10.7ms frame duration), things get complicated. This may be somewhat academic, but theoretically multiple audio tracks could be activated at the same time when playing the file.

The player would then need to speed up the presentation of the last video frame of the first input file so much that it ends with the earliest end of all active audio tracks, which means that not only the respective video frame is sped up, but also that there is loss of audio information.

And the player even might have to handle situations where one audio is shorter and the other audio track is longer than the video track in the first input file. I can’t imagine how a player could reasonably handle this.

The primary reason why I have asked was that I wanted to know whether the documentation was correct. The motivation to research is the following:

We are currently searching for a reliable method to concatenate M2TS files and archive them in MKV format. The number of concatenated files may be high. I have investigated several dozens of such M2TS files from various sources and have always found that each audio track was longer as the video track.

In this situation, the additional append mode that I have requested in my other post today might be handy. I haven’t read the replies to it yet, but will do so immediately.

Thank you very much, and best regards!

Oh it definitely is. I’ve only described the most basic case, but you’re 100% correct that syncing multiple audio sources gets quite tricky.

I think the best tool for the initial conversion from M2TS to MKV is actually eac3to at the moment, at least in general over all kinds of Blu-rays. The reason I don’t recommend MKVToolNix for the job is that MKVToolNix cannot handle seamless playback correctly, and I’m not really motivated to work on that feature either. This means that mkvmerge won’t detect & remove the duplicated sections around the connection points properly, leading to a few duplicate frames. eac3to can handle those cases just fine.

As soon as the content is inside Matroska MKVToolNix is perfectly capable of handling it further.

OK, thank you very for the tip! I’ll try that.