Dear all …
According to the
mkvmerge documentation, there are currently two possible append modes:
track. I would like to ask whether it would be possible to create an additional mode that works like described below.
Video production and file formats are not my field, so please don’t hesitate to let me know if my idea is bad. Since this post might become a bit lengthy, I’ll explain my motivation separately if somebody is interested; in every case, I believe that I have a good reason to ask for it.
Before explaining how it actually should work, we need some definitions, similar to the explanation for
--append-mode in the
We have two input files (e.g. M2TS), called
infile2, and we want to append
infile1. Each input file contains the same number and types of tracks. Let’s say that this is one video track whose framerate is 24000/(1001s) and one audio track whose framerate is 31.25/s. The video and audio track in
infile1 are called
in1_a, respectively, and the video and audio track in
file2 are called
The timestamps of the last frames in
in1_a are called
ts_in1_alast, respectively. The timestamps of the first frames in
in2_a are called
The duration of one video frame is called
v_dur, and the duration of one audio frame is called
As everybody knows, the problem with joining multimedia files is that audio and video in the first file almost never end at the same time. For this example, let’s assume that
in1_a ends 20ms later than
in1_v. That is, if we calculate
(ts_in1_vlast + v_dur) and
(ts_in1_alast + a_dur), the latter result is 20ms greater than the former.
The MKV output file is called
We would like to ask for a new, additional append mode, possibly called
videoref, that works like so:
- Completely convert
infile1to MKV, filling up
outfile, copying the timestamps for each frame from
infile1without changing their value; no further specialties here.
(ts_in1_vlast + v_dur).
- Append a new cluster to
outfileand assign it the timestamp that has been computed in step 2.
- Get the first video frame from
in2_v, put it into the new cluster created in step 3, and assign it the timestamp that has been computed in step 2. That video frame and the new cluster where it is in have the same timestamp afterwards.
(ts_in2_afirst - ts_in2_vfirst)(this usually gives 0).
- Get the first audio frame from
in2_a, put it into the new cluster created in step 3, and assign it that cluster’s timestamp plus the offset computed in step 5. In cases where the latter is 0, the new cluster, the first video frame in it and the first audio frame in it have the same timestamp.
- From then on, continue to copy the video and audio frames from
outfile, computing the new timestamps in
a) Compute the time offset of each frame from
b) The timestamp of that frame in
outfilethen is the the timestamp computed in step 2. plus the offset computed in step 7a).
Of course, it works the same when joining further input files or with more audio tracks.
The basic idea behind this is the following:
When we have multiple M2TS files that we want concatenate to a MKV file, we have perfect audio-video-synchronization in each single input file. The only way to keep that perfect synchronization is to re-synchronize the audio tracks with the video track in the way shown above each time a new M2TS input file begins.
With the method shown above, we can join an arbitrary number of input files (hundreds or thousands) without the risk of even the slightest de-synchronization. There is no other approach at the demuxer / muxer level that can also guarantee this; at least, I never have seen one.
The proposed name
videoref for the new append mode stems from the fact that the video framerate is taken as the basis when the muxer timestamps are re-synchronized during the transition from the end of an input file to the begin of the next input file.
I believe that it wouldn’t be too difficult to implement this request. Nearly everything must be there already: To implement the append modes that already exist, there must be code that reads timestamps from frames in the input files and that processes such timestamps. Its just another algorithm to compute the new timestamps that needs to be implemented.
Maybe too naive … what do you think about it?
Thank you very much in advance, and best regards!