I have a file that is 61,014,851 bytes in size (disk size 61,018,112 bytes)
I run mkvinfo on the file and get the report: segment size 61014799
Why doesn’t it match any of the size data that Windows gives us?
I am trying to make a tab of each MKV file and wanted to get the file size with MKVinfo tool
Thanks
Welcome!
The segment is only part of the whole file. Each Matroska file starts with a small structure called “EBML Head”. Its size can vary. Next comes the segment.
Next thing is that in Matroska each element consists of three parts:
- the element’s ID
- the element’s size (length)
- its data
What mkvinfo reports in a line such as + Segment: size 3970342
is only the size of 3., of its data portion. The actual size of the element is therefore larger by the length of the ID (always 4 bytes) part & the length of the “size” part (this one is variable). Sounds confusing? It is.
What you can do is run mkvinfo with additional arguments that’ll output more information: mkvinfo -z -P yourfile.mkv
What that does is change each line to also include the element’s position as well as the element’s total size (meaning the size of its ID + size of its “size” part + the size of its data portion). Here’s an example:
+ Segment: size 3970342 at 40 size 3970354 data size 3970342
If you add the position (Segment: size 3970342 **at 40** size 3970354 data size 3970342
) to the element’s total size (Segment: size 3970342 at 40 **size 3970354** data size 3970342
) you should end up with the file’s size in almost all cases. In my case the file is 3970394 bytes long, which does match what I wrote.
Ok, thank you very much.
I have tried and I match the value.
The only thing that leaves me a little surprised is the “you should end with file size in almost all cases” part.
You’re welcome.
Well, there are three situations that come to mind in which case this information wouldn’t actually match the file’s size:
- Technically there can be multiple segment elements in a single file. In that case the sum of all segment sizes + the size of the EBML head would match the file’s size. That being said, I don’t know of a single piece of software that creates such files or handles them properly. So for all intents and purposes you should never encounter them in the wild.
- If the file is incomplete (e.g. due to an aborted download), the size calculated from the elements will obviously be bigger than the file’s actual size.
- In Matroska (well, in EBML, technically) elements can have an “unknown size”. This means that the elements extents to the end of its parent element (if it has a parent element) or to the end of the file (if it doesn’t have a parent element). The only element that doesn’t have a parent element is the Segment element. Therefore you can encounter files for which the Segment’s size is set to that “unknown” value, which means that it extends to the end of the file — and obviously you cannot calculate the file size from a value that’s calculated from the file size.
And of course there can be bugs in programs.
Of these three possibilities you may encounter 2 & 3, but it’s very, very unlikely to encounter 1.