Append seems to insert a 2 second delay that is removed when audio is extracted

I combined two files into one by appending, and I ran into an audio/video synchronization problem with the second file I couldn’t solve.

  1. I ripped a DVD set to two MKV files using Handbrake, and since the audio was LPCM, and Handbrake can’t pass it through, I used FLAC as the output format. Both MKV files play fine.
  2. Mkvtoolnix can’t append FLAC streams, so I used eac3to to extract them both into WAV files, and I muxed them with their corresponding video into two new files.
  3. I combined these files with mkvtoolnix, producing an MKV file with a WAV audio track, which we can call 1.x265.WAV.mkv. It plays perfectly fine.
  4. Now for the problem. I wanted an MKV file with a FLAC audio track, so I used eac3to to extract the WAV track into a FLAC file, and I muxed it with the video into a new file using mkvtoolnix. Let’s call it 1.x265.FLAC.mkv.
  5. This new file plays without issue, except the audio in the second part is a full two seconds behind the video. I was unable to solve this. My workaround was to use Handbrake to transcode 1.x265.WAV.mkv to 2.x265.FLAC.mkv, which plays fine.

I listened to the extracted full-length WAV and FLAC files, and they don’t have the 2 second gap between parts the video file has. I guess that explains why muxing the FLAC file with the video has the sync problem. The problem is not with eac3to, because I used another program to extract and convert the audio as a test, and it did the same thing. It’s like there’s a flag in 1.x265.WAV.mkv inserting a 2 second logical gap between parts that the audio extractors ignore.

Any idea what’s going on here?

Welcome!

In Matroska each & every frame has a timestamp & a duration. That information is lost when extracting tracks from Matroska into other formats. This is especially bad if the tracks are offset to each other, e.g. if audio & video only line up if video starts at 0s & audio content starts at 1s. In such a case extracting, possibly converting & re-combining would yield audio content that also starts at 0s, resulting in audio & video that doesn’t align.

This doesn’t have to be your current problem, though it is something to keep in mind.

When appending mkvmerge needs to shift the timcodes of the appended content upwards by the total duration of the file the content is appended to. If parts of the content in the first file are longer than other parts, gaps such as the one you’re describing might be created. A typical culprit might be a subtitle track whose last entry has a duration that exceeds the last audio & video content by a large amount. In that case you’ll notice a definite gap during playback. Depending on how your player works internally, the gap might only be noticeable in the video, or only in the audio, or even in both.

In general, extracting content from Matroska is risky due the inevitable loss of container-level information.

Appreciate the detailed reply! Going forward, I guess I’ll avoid the situation by converting PCM to high bit rate lossy, pending Handbrake implementing PCM passthrough. I read the FAQ and understand why mkvtoolnix no longer supports FLAC appending. OTOH, what I did can work, as the second file I talked about was actually the result of appending three files using the method I described. I did that while trying to determine if there was a point where the process broke down. If I call the original source files 1, 2, 3, and 4, appending 1 to 2 (or any of the other files) was the problem, while 2-3-4 was fine. I guess that points to 1 as the problem.