A site for solving at least some of your technical problems...
A site for solving at least some of your technical problems...
I have not been able to find a reliable source describing the Info/Xing/LAME tag which appears in the very first frame of an MP3 Layer III file.
Here I describe the structure based on the GetVbrTag() function found in LAME.
See: libmp3lame/VbrTag.c (around line 362 in version 3.100).
The tag requires the file format to be of Layer III.
This means the bits 1 and 2 of the second byte of the frame header must be equal to 1.
((frame[1] >> 1) & 3) == 1
Part of the data in the LAME VBR Tag comes from the frame header:
Once this data is read from the file, the code skips the frame header to reach the tag (i.e. the "Xing" or "Info" characters).
The headersize field gets set to:
((h_id + 1) * 72000 * h_bitrate) / samprate
h_bitrate is also extracted from the MPEG header.
The tag must start with these characters exactly. As far as LAME is concerned, either one is viewed as being the same. It does not change the parser one bit. Some players, though, view the tags as CBR or VBR and it is important to use the correct one when adding the tag to the file.
"Info" should be used for CBR (Constant Bit Rate).
"Xing" should be used for VBR/ABR (Variable Bit Rate/Average Bit Rate).
This appears right after the MP3 header. So, in other words, the frame cannot include any audio data at all. Only the Xing/Info & LAME headers.
VERY IMPORTANT: If the protection bit is set in that frame, the CRC16 will NOT be before the Xing/Info tag, instead it appears after the Xing/Info tag.
Then LAME reads 8, 16, and 32 bit integers. The input is always viewed as big endian.
The flags are read as a 32 bit integer in Big Endian value.
Some of the data is read-only if the corresponding flag is set. The fields are defined in the order in which the data gets read from the frame.
Note: The order is important since if a piece of data is not present, the offset of the next fields changes.
If flag 0x0001 is set, then read one 32 bit integer in Big Endian. It represents the total number of frames in the Audio file.
If flag 0x0002 is set, then read one 32 bit integer in Big Endian. It represents the total number of bytes of MPEG Audio in the file. This does not include the ID3 tag, however, it includes this very tag.
The size of the TOC data is 100 bytes (see NUMTOCENTRIES).
This gets copied as is to the toc buffer. Each entry is a byte (uint8_t).
The VBR scale is calculated on creation from the -V and -q parameters. This is a percent number from 0 to 100 (no decimal point).
Although it is a number that would fit in a byte, it gets saved in the tag as a 32 bit Big Endian integer.
With LAME, the quality is calculated as follow:
int quality = 100 - 10 * gfp->VBR_q - gfp->quality
Note: the value saved in the file is quality.
Examples on how lame converts command arguments to the VBR scale:
-V0 and -q0: 100 - 10 * 0 - 0 = 100 // 0x64 -V0 and -q2: 100 - 10 * 0 - 2 = 98 // 0x62 -V2 and -q5: 100 - 10 * 2 - 5 = 75 // 0x4B -V9 and -q9: 100 - 10 * 9 - 9 = 1 // 0x01 ^ ^ | | | +---- -q <arg> -q 0 highest quality; -q 9 poor quality | +--------- -V <arg> -V 0 highest quality; -V 9 poor quality
Notice that the number is limited between 0 and 100 and only the last byte is used. The other three bytes are expected to be 0.
The next 9 bytes represent the LAME tag. The letters "LAME" immediately followed by the version.
LAME<major>.<minor><release>
The <major> part is expected to be one digit (i.e. "3").
The <minor> part is expected to be two digits (i.e. "01", "93").
The <release> part is expected to be a letter ([a]lpha, [b]eta, [r]elease if patch != 0, or a space).
The string is expected to be at most 9 characters. If the <major> and/or <minor> options are longer than expected, then the <release> won't appear in the result. If shorter, then a '\0' may end up in the file.
IMPORTANT: other encoders use a different name (i.e. GOGO) in this field. This allows players to know what the following data represents. Although most are compatible between encoders, there can be some difference which can be known by understanding the "LAME" Version. I still title this field m_lame_version because I only have LAME definitions in this document.
At this time these 4 bits are set to 0.
The VBR type used to encode this file. This parameter uses a translation table because the LAME vbr variable parameter uses a different scheme than the LAME tag.
The Lowpass converted to a one byte value.
nLowPass = cfg->lowpassfreq / 100.0 + .5;
Note that it also gets clamped to a maximum of 255.
So the byte represents 100th of the lowpass parameter rounded up.
Assuming a peak was found in the signal, this uint32_t represents that peak defined as a fixed point between 0 and 1. The fixed point is defined as 9.23 (9 bits to the left of the decimal point and 23 to the right). The value is rounded up.
The gain to replay the signal. Set to 0 if not available.
This value is 4 sub-fields, one of which is just padding at the moment.
The originator is set to 3 meaning «determined automatically».
The gain is clamped between -510 and +510 inclusive.
LAME sets the next 2 bytes to 0.
This field represents the audiophile replay gain, but LAME does not provide that value so it uses 0 instead.
One byte representing five flags.
The following byte represents the average bit rate in kbit/s.
The value gets clamped, so numbers larger than 255 are saved as 255.
All MP3 encoders generate a small delay at the beginning and add some padding at the end.
At the beginning, it is required to properly train the MP3 encoding process. It works best by adding some silence at the start of the file. The duration of that silence is Delay.
At the end, it is required to add more silence to make sure that the last frame of data gets saved. Without the extra silence, the resulting file could be missing one or two frames at the end of the audio stream. This is because some data is going to be stuck (buffered) in the encoder. The Padding value indicates how much data is added at the end to ensure the entire set of input audio samples get saved in the MP3 stream.
These two values are saved as 12 bits each over a total of 3 bytes.
The LAME Noise Shaping configuration is saved as is in the LAME tag. It is 2 bits.
The MP3 format supports 4 stereo mode (it is more of a dual-channel mode, but it is really named Stereo even though the Dual Channel mode is not stereo).
This field uses 3 bits even though only 4 modes are available.
One bit telling us whether the user-selected objects are considered out of range, or as LAME puts it: insane.
Two bits representing the frequency of the source:
One byte is unused. It gets set to 0.
The LAME preset value. The preset enumeration defines many parameters such as the bitrate or whether to use CBR or VBR, etc.
The total length of the source PCM data in a uint32_t value.
A CRC16 of the PCM data.
Another CRC16 representing the CRC of the whole Xing/Info tag up to and including the Music CRC.
Note that LAME always calculates and saves this CRC to the tag. It certainly doesn't hurt to always have it there and ignore it when the protection bit says so.
This is a rather broken2 C representation of the data:
struct __attribute__((__packed__)) XingTag { char m_tag[4]; // "Xing" or "Info" uint32_t m_flags; uint32_t m_frame_size; // if (flags & 0x1) uint32_t m_stream_size; // if (flags & 0x2) char m_num_toc_entries[100]; // if (flags & 0x4) uint32_t m_vbr_scale; // if (flags & 0x8) char m_lame_version[9]; // "LAME<major>.<minor><release> uint8_t m_revision : 4; uint8_t m_vbr_type : 4; uint8_t m_lowpass_frequency; uint32_t m_peak_signal; // 9.23 fixed point uint16_t m_radio_replay_pad : 2; uint16_t m_radio_replay_set_name : 2; uint16_t m_radio_replay_originator_code : 2; uint16_t m_radio_replay_gain : 10; uint16_t m_audiophile_replay_gain; uint8_t m_flag_ath_type : 4; uint8_t m_flag_expn_psy_tune : 1; uint8_t m_flag_safe_joint : 1; uint8_t m_flag_no_gap_more : 1; uint8_t m_flag_no_gap_previous : 1; uint8_t m_average_bit_rate; uint8_t m_delay_padding_delay_high; uint8_t m_delay_padding_delay_low : 4; uint8_t m_delay_padding_padding_high : 4; uint8_t m_delay_padding_padding_low; uint8_t m_noise_shaping : 2; uint8_t m_stereo_mode : 3; uint8_t m_non_optimal : 1; uint8_t m_source_frequency : 2; uint8_t m_unused; // set to 0 uint16_t m_preset; uint32_t m_music_length; uint16_t m_music_crc16; uint16_t m_crc16; // if (protection bit) };
Keep in mind that:
WARNING: the m_crc16 field is always calculated by LAME. It still seems that it should not be included if the protection bit is not set (remember also that the protection bit is inverted in the file: i.e. 0 means that protection is active and thus a CRC16 is present.)