The Linux Page

MP3 Info Tag Specifications Rev0 (LAME 3.100)

Preface

I have not been able to find a reliable source describing the Info/Xing/LAME tag which appears in the very first frame of an MP3 Layer III file.

Here I describe the structure based on the GetVbrTag() function found in LAME.

See: libmp3lame/VbrTag.c (around line 362 in version 3.100).

Layer III

The tag requires the file format to be of Layer III.

This means the bits 1 and 2 of the second byte of the frame header must be equal to 1.

((frame[1] >> 1) & 3) == 1

MPEG Info to VBR Tag

Part of the data in the LAME VBR Tag comes from the frame header:

  • h_id — the MPEG Version (1 or 2)1
  • samprate  — converted from the MPEG header data to an actual sample rate
  • headersize — calculated

Once this data is read from the file, the code skips the frame header to reach the tag (i.e. the "Xing" or "Info" characters).

The headersize field gets set to:

((h_id + 1) * 72000 * h_bitrate) / samprate

h_bitrate is also extracted from the MPEG header.

Xing/Info

The tag must start with these characters exactly. As far as LAME is concerned, either one is viewed as being the same. It does not change the parser one bit. Some players, though, view the tags as CBR or VBR and it is important to use the correct one when adding the tag to the file.

"Info" should be used for CBR (Constant Bit Rate).

"Xing" should be used for VBR/ABR (Variable Bit Rate/Average Bit Rate).

This appears right after the MP3 header. So, in other words, the frame cannot include any audio data at all. Only the Xing/Info & LAME headers.

VERY IMPORTANT: If the protection bit is set in that frame, the CRC16 will NOT be before the Xing/Info tag, instead it appears after the Xing/Info tag.

Integers in Big Endian

Then LAME reads 8, 16, and 32 bit integers. The input is always viewed as big endian.

Flags

The flags are read as a 32 bit integer in Big Endian value.

Some of the data is read-only if the corresponding flag is set. The fields are defined in the order in which the data gets read from the frame.

Note: The order is important since if a piece of data is not present, the offset of the next fields changes.

Number of Frames (0x0001)

If flag 0x0001 is set, then read one 32 bit integer in Big Endian. It represents the total number of frames in the Audio file.

Size in Bytes (0x0002)

If flag 0x0002 is set, then read one 32 bit integer in Big Endian. It represents the total number of bytes of MPEG Audio in the file. This does not include the ID3 tag, however, it includes this very tag.

TOC Data (0x0004)

The size of the TOC data is 100 bytes (see NUMTOCENTRIES).

This gets copied as is to the toc buffer. Each entry is a byte (uint8_t).

VBR Scale (0x0008)

The VBR scale is calculated on creation from the -V and -q parameters. This is a percent number from 0 to 100 (no decimal point).

Although it is a number that would fit in a byte, it gets saved in the tag as a 32 bit Big Endian integer.

Calculation of the VBR Scale Value

With LAME, the quality is calculated as follow:

int quality = 100 - 10 * gfp->VBR_q - gfp->quality

Note: the value saved in the file is quality.

Examples on how lame converts command arguments to the VBR scale:

-V0 and -q0:  100 - 10 * 0 - 0 = 100 // 0x64
-V0 and -q2:  100 - 10 * 0 - 2 = 98  // 0x62
-V2 and -q5:  100 - 10 * 2 - 5 = 75  // 0x4B
-V9 and -q9:  100 - 10 * 9 - 9 = 1   // 0x01
 ^       ^
 |       |
 |       +---- -q <arg>  -q 0 highest quality; -q 9 poor quality
 |
 +--------- -V <arg>  -V 0 highest quality; -V 9 poor quality

Notice that the number is limited between 0 and 100 and only the last byte is used. The other three bytes are expected to be 0.

LAME Version

The next 9 bytes represent the LAME tag. The letters "LAME" immediately followed by the version.

LAME<major>.<minor><release>

The <major> part is expected to be one digit (i.e. "3").

The <minor> part is expected to be two digits (i.e. "01", "93").

The <release> part is expected to be a letter ([a]lpha, [b]eta, [r]elease if patch != 0, or a space).

The string is expected to be at most 9 characters. If the <major> and/or <minor> options are longer than expected, then the <release> won't appear in the result. If shorter, then a '\0' may end up in the file.

IMPORTANT: other encoders use a different name (i.e. GOGO) in this field. This allows players to know what the following data represents. Although most are compatible between encoders, there can be some difference which can be known by understanding the "LAME" Version. I still title this field m_lame_version because I only have LAME definitions in this document.

Revision

At this time these 4 bits are set to 0.

VBR Type

The VBR type used to encode this file. This parameter uses a translation table because the LAME vbr variable parameter uses a different scheme than the LAME tag.

Lowpass Frequency

The Lowpass converted to a one byte value.

nLowPass = cfg->lowpassfreq / 100.0 + .5;

Note that it also gets clamped to a maximum of 255.

So the byte represents 100th of the lowpass parameter rounded up.

Peak Signal Amplitude

Assuming a peak was found in the signal, this uint32_t represents that peak defined as a fixed point between 0 and 1. The fixed point is defined as 9.23 (9 bits to the left of the decimal point and 23 to the right). The value is rounded up.

Radio Replay Gain

The gain to replay the signal. Set to 0 if not available.

This value is 4 sub-fields, one of which is just padding at the moment.

The originator is set to 3 meaning «determined automatically».

The gain is clamped between -510 and +510 inclusive.

Audiophile Replay Gain

LAME sets the next 2 bytes to 0.

This field represents the audiophile replay gain, but LAME does not provide that value so it uses 0 instead.

LAME Tag Flags

One byte representing five flags.

Average Bit Rate

The following byte represents the average bit rate in kbit/s.

The value gets clamped, so numbers larger than 255 are saved as 255.

Encoding Delay & Padding

All MP3 encoders generate a small delay at the beginning and add some padding at the end.

At the beginning, it is required to properly train the MP3 encoding process. It works best by adding some silence at the start of the file. The duration of that silence is Delay.

At the end, it is required to add more silence to make sure that the last frame of data gets saved. Without the extra silence, the resulting file could be missing one or two frames at the end of the audio stream. This is because some data is going to be stuck (buffered) in the encoder. The Padding value indicates how much data is added at the end to ensure the entire set of input audio samples get saved in the MP3 stream.

These two values are saved as 12 bits each over a total of 3 bytes.

Noise Shaping

The LAME Noise Shaping configuration is saved as is in the LAME tag. It is 2 bits.

Stereo Mode

The MP3 format supports 4 stereo mode (it is more of a dual-channel mode, but it is really named Stereo even though the Dual Channel mode is not stereo).

This field uses 3 bits even though only 4 modes are available.

Non-Optimal

One bit telling us whether the user-selected objects are considered out of range, or as LAME puts it: insane.

Source Frequency

Two bits representing the frequency of the source:

  • 00 — 32kHz (or less)
  • 01 — 44.1kHz
  • 10 — 48kHz
  • 11 — over 48kHz

Unused

One byte is unused. It gets set to 0.

Preset

The LAME preset value. The preset enumeration defines many parameters such as the bitrate or whether to use CBR or VBR, etc.

Music Length

The total length of the source PCM data in a uint32_t value.

Music CRC

A CRC16 of the PCM data.

Protection CRC

Another CRC16 representing the CRC of the whole Xing/Info tag up to and including the Music CRC.

Note that LAME always calculates and saves this CRC to the tag. It certainly doesn't hurt to always have it there and ignore it when the protection bit says so.

As a C Structure

This is a rather broken2 C representation of the data:

struct __attribute__((__packed__)) XingTag
{
  char      m_tag[4];                // "Xing" or "Info"
  uint32_t  m_flags;
  uint32_t  m_frame_size;            // if (flags & 0x1)
  uint32_t  m_stream_size;           // if (flags & 0x2)
  char      m_num_toc_entries[100];  // if (flags & 0x4)
  uint32_t  m_vbr_scale;             // if (flags & 0x8)
  char      m_lame_version[9];       // "LAME<major>.<minor><release>
  uint8_t   m_revision : 4;
  uint8_t   m_vbr_type : 4;
  uint8_t   m_lowpass_frequency;
  uint32_t  m_peak_signal;           // 9.23 fixed point
  uint16_t  m_radio_replay_pad : 2;
  uint16_t  m_radio_replay_set_name : 2;
  uint16_t  m_radio_replay_originator_code : 2;
  uint16_t  m_radio_replay_gain : 10;
  uint16_t  m_audiophile_replay_gain;
  uint8_t   m_flag_ath_type : 4;
  uint8_t   m_flag_expn_psy_tune : 1;
  uint8_t   m_flag_safe_joint : 1;
  uint8_t   m_flag_no_gap_more : 1;
  uint8_t   m_flag_no_gap_previous : 1;
  uint8_t   m_average_bit_rate;
  uint8_t   m_delay_padding_delay_high;
  uint8_t   m_delay_padding_delay_low : 4;
  uint8_t   m_delay_padding_padding_high : 4;
  uint8_t   m_delay_padding_padding_low;
  uint8_t   m_noise_shaping : 2;
  uint8_t   m_stereo_mode : 3;
  uint8_t   m_non_optimal : 1;
  uint8_t   m_source_frequency : 2;
  uint8_t   m_unused;                // set to 0
  uint16_t  m_preset;
  uint32_t  m_music_length;
  uint16_t  m_music_crc16;
  uint16_t  m_crc16;                 // if (protection bit)
};

Keep in mind that:

  • There is no padding, so fields may not be aligned as C would otherwise  be expected to align fields in a structure (i.e. the structure is packed)
  • Some fields have a HIGH and a LOW part because the value is saved on an odd number of bytes (i.e. I refrained from using a uint24_t for the padding & delay value, for example)
  • Various fields may or may not be included
  • The structure size varies depending on the m_flags field; it may be as short as 42 bytes and as long as 156 bytes; LAME always includes all the fields so any MP3 file is more likely to use 156 bytes for their Xing/Info tag

WARNING: the m_crc16 field is always calculated by LAME. It still seems that it should not be included if the protection bit is not set (remember also that the protection bit is inverted in the file: i.e. 0 means that protection is active and thus a CRC16 is present.)

  • 1. The VBR tag handling views MPEG2 and MPEG2.5 as the exact same thing.
  • 2. In Ada, you can define records with the "when" keyword to create a form of union based on another field in the structure. It also has advanced ways of defining bit fields. It would be really useful in this case. However, I think most people would have difficulties reading Ada code.