/* MPEG Maaate: An Australian MPEG audio analysis toolkit Copyright (C) 2000 Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ Layer 3 encoding ================ This document describes what data is being stored in a Layer 3 encoded MPEG audio frame. It also describes the layout of such a frame. For more information refer to the literature, e.g.: - ISO standard - Peter Noll "MPEG Digital Audio Coding" IEEE Signal Processing Magazine, Sept. 1997, pp.59-81 - Davis Pan "A Tutorial on MPEG/Audio Compression" IEEE Multimedia Vol. 2, No. 7, 1995, pp. 60-74 - chapter 4 on "Audio" in Haskell/Puri/Netravali "Digital Video: An Introduction to MPEG-2", Chapman & Hall, New York, 1997 1) What data is encoded Each channel is encoded separately (but possibly simultaneously with the other one in order to allow adaptive bit allocation between channels). For each channel, two (MPEG1) / one (MPEG2) granule are stored in one "frame". Per granule and per channel, the following processing is performed for 1152 PCM input samples. (In the graph, processes are put into boxes, data has no boxes.) PCM-Samples (36 x 32 samples) > TIME domain | v ---------------------- |polyphase Filterbank| ---------------------- | v 32 equally spaced subbands > FREQUENCY domain (containing 36 samples per subband) | v ----------------------- |modified Discrete | (either with long windows of 36 samples |Cosine Transform MDCT| or short windows of 12 samples; mixed ----------------------- windows for the different subbands are | possible using long windows on 2 lowest | subbands and short windows on upper 30 | subbands) | v 18 sub-subbands for long windows (of 36 samples, 50% overlap) or 3 x 6 sub-subbands for short windows (of 12 samples, 50% overlap) Arranging them from low frequency to high frequency, putting the three successive short window results in a row, results in 576 frequency values (pleas note: values are integers not reals): 1 576 |----------------------------------------------------------------| | pairs of | quadruples of | pairs of | | bigvalues |value|<8191 | count1 |value|<=1 | rzero value=0 | |----------------------------|-------------------|---------------| | region0 |region1|region2| | | |------------|-------|-------| | v | | | | NOT ENCODED v v v v ------------- ... ... - 1 huffman table (A or B) |noise alloc| - huffman code bits encoding |& scalefacs| (according to quadruples ------------- psychoacoustic - 1 signbit per encoded sample | model) (i.e. 4 signbits per quadrupel) v - 1 huffman table each (choice of 32 tables) - 4 scalefactor selection information for long windows only - 1 scalefactor per scalefactor group (21 for long windows, 12 for short windows) - huffman code bits encoding pairs - 2 signbits per encoded pair Scalefactor groups are built approximating critical-band widths. The grouping depends on the samplingfrequency and is determined via a table lookup. There are 21 scalefactor groups for long windows, 3 x 12 = 36 scalefactor groups for short windows, and 8 + 3 x 9 = 35 scalefactor groups for mixed windows. The scalefactor groups are aligned with the region boundaries for region 0-2. Scalefactor groups are further integrated to scalefactor classes for coding of scalefactor selection information. Scalefactor classess for long windows: class 0: scalefactor goups 0-5 class 1: scalefactor groups 6-10 class 2: scalefactor groups 11-15 class 3: scalefactor groups 16-20 Scalefactor classes for short windows: class 0: scalefactor groups 0-5 class 1: scalefactor groups 6-11 The scalefactor selection information specifies whether the scalefactors of the first granule are being reused in the second granule. Additionally for each granule and channel, a scalefactor compression information is determined. It specifies (via table lookup) the number of bits being used for the scalefactors in the scalefactor groups. The samples in the bigvalues frequency bands are nonuniformly quantized. This is implemented by raising the absolute value of the samples to the power 0.75 before passing to a uniform quantizer. The samples are then Huffman coded with variable-length codewords. The scalefactors are used to colour the quantization noise. For each scalefactor group, one scalefactor is calculated which is a factor by which the samples in the scalefactor goup are scaled before quantization. The Huffman coding process usually generates a variable number of bits per Frame. The MPEG frames however contain a constant number of bits calculated from the desired bitrate for the stream. Variable bitrate coding is realized via a "bit-reservoir". Frames which do not require all of their bits for encoding donate these bits to the reservoir such that later frames may borrow bits from the reservoir. It is however ensured that no bits of a frame are stored in future frames. 2) Layout of frame Each MPEG audio frame contains a header after which the encoded audio data is stored. A layer 3 frame contains another header called sideinformation followed by the main data: ------------------------ | Header | (32 byte) ------------------------ | Sideinformation | (17 byte if mono, 32 byte if 2 channels) ------------------------ | optional CRC | (16 bit) ------------------------ | main data | ------------------------ | ancilliary data | ("padding bits") ------------------------ Sideinformation: - main_data_begin (9 bit): negative offset in byte specifying where the main data for the frame begins. This implements the bit-reservoir by allowing a frame to have its data stored in previous frames. - private bits (5 bit if mono, 3 bit if stereo): for private use ("padding") - scalefactor selection information (1 bit): specifies for the scalefactor classes of each channel if the scalefactors for the first granule are reused in the second granule - part2_3_lenth (12 bit): specifies for each channel and granule the number of bits used for scalefactors and Huffman-encoded samples. - big_values (9 bit): specifies the number of sample pairs in the bigvalues region. The number of quadruples in the count1-region must be deduced from part2_3_length and big_values during decoding. The number of zeros in the rzero-region can then be calculated using big_values and count1 and the number of frequency bands (576). - global_gain (8 bit): specifies for each channel and granule the quantizer step size. - scalefac_compress (4 bit): specifies for each channel and granule the number of bits used for the scalefactors (via table lookup). - window_switching (1 bit): specifies for each channel and granule if pure long windows are used (value: 0) or mixed/short/ short->long/long->short (value: 1). - table_select (5 bit): specifies for each channel and granule and each of the three regions which Huffman table is used for encoding. - region0_count (4 bit): specifies for each channel and granule the number of samples in region 0. This depends on the window_type and the mixed_window flag and the samplingfrequency and is found via table lookup. - region1_count (3 bit): specifies for each channel and granuel the number of samples in region 1.This depends on the window_type and the mixed_window flag and the samplingfrequency and is found via table lookup. The remaining samples from the big_values region belong to region 2. - window_type (2 bit): specifies for each channel and granule the type of window being used (long/short/long->short/short->long). - mixed_window (1 bit): specifies for each channel and granule if mixed windows are used (value: 1), i.e. the lowest two subbands are transformed with long windows, the upper 30 with short windows. - shortwin_gain (3 bit): specifies for each channel and granule and short window a gain offset from the global_gain - preflag (1 bit): specifies for each channel and granule additional high frequency amplification. - scalefac_scale (1 bit): specifies for each channel and granule the quantization of the scalefactors. - count1_table_select (1 bit): specifies for each channel and granule which Huffman table is used for the count1-region. Main Data: - Granule 1: scalefactors + Huffman-encoded sample data - Granule 2: scalefactors + Huffman-encoded sample data where scalefac_l (0-4 bit) contains for each channel and granule and scalefactor band the scalefactors of long windows; and scalefac_s (0-4 bit) contains for each channel and granule and scalefactor band three scalefactors for short windows; also Huffman encoded data contain: 0-19 bit in the bigvalues region + 2 x 1-13 linbits + 2 signbits, 1-6 bit in the count1 region + 4 signbits. (linbits are required if the absolute value of the pair>=15) The framesize depends on the samplingfrequency of the PCM samples and on the desired bitrate for the MPEG audio stream. It may be calculated via the following formula: framesize = 144 * bitrate / samplingfrequency Allowed bitrates and samplingfrequencies differ between MPEG1 and MPEG2.