/* MPEG Maaate: An Australian MPEG audio analysis toolkit Copyright (C) 2000 Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ Layer 1 encoding ================ This document describes what data is being stored in a Layer 1 encoded MPEG audio frame. It also describes the layout of such a frame. For more information refer to the literature, e.g.: - ISO standard - Peter Noll "MPEG Digital Audio Coding" IEEE Signal Processing Magazine, Sept. 1997, pp.59-81 - Davis Pan "A Tutorial on MPEG/Audio Compression" IEEE Multimedia Vol. 2, No. 7, 1995, pp. 60-74 - chapter 4 on "Audio" in Haskell/Puri/Netravali "Digital Video: An Introduction to MPEG-2", Chapman & Hall, New York, 1997 1) What data is encoded Each channel is encoded separately (but possibly simultaneously with the other one in order to allow adaptive bit allocation between channels). For each channel, the following processing is performed for frames of 384 PCM input samples. (In the graph, processes are put into boxes, data has no boxes.) PCM-Samples (12 x 32 samples) > TIME domain | v ---------------------- |polyphase Filterbank| ---------------------- | v 32 equally spaced subbands > FREQUENCY domain (containing 12 samples per subband, which form a block that is encoded together) | v -------------------------- |dynamic bit allocation &| (based on psycho-acoustic model) |scalefactor calculation | -------------------------- | v - 32 scalefactors (1 per subband and block of 12 samples) - 32 bitallocation values (1 per subband and block of 12 samples) - 12 x 32 quantized and normalized samples (12 per subband) The bitallocation tells the decoder the number of bits used to represent each encoded sample. During bit allocation, each subband is treated separately. According to the signal-to-mask-ratio calculated by the psycho-acoustic model, bits are allocated for the sample quantization in an iterative loop starting with 0 bits per block. The quantization is linear. The scalefactor is a multiplier for the samples containing the maximum value of the block of 12 samples, such that the value of the largest sample in the block is unity. The scalefactors therefore basically perform a normalization of the 12 samples in a block. 2) Layout of frame Each MPEG audio frame contains a header after which the encoded audio data is stored. A layer 1 frame contains the following data: ------------------------ | Header | (32 byte) ------------------------ | Bitallocation | (4 bit per subband, giving | information | 128 bit per channel) ------------------------ | optional CRC | (16 bit) ------------------------ | Scalefactors index | (6 bit per subband if used, giving | to scalefactor table | 0-192 bit per channel) ------------------------ | quantized samples | (0/2-12 bit per sample) ------------------------ | ancilliary data | ("padding bits") ------------------------ The framesize depends on the samplingfrequency of the PCM samples and on the desired bitrate for the MPEG audio stream. It may be calculated via the following formula: framesize = 12 * bitrate / samplingfrequency Allowed bitrates and samplingfrequencies differ between MPEG1 and MPEG2. Bits are allocated according to the desired framesize such that the 384 PCM samples may be encoded with varying detail in streams of different bitrate.