/*
    MPEG Maaate: An Australian MPEG audio analysis toolkit
    Copyright (C) 2000 Commonwealth Scientific and Industrial Research Organisation
    (CSIRO), Australia.

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/

Layer 3 encoding
================
This document describes what data is being stored in a Layer 3
encoded MPEG audio frame. It also describes the layout of such
a frame. For more information refer to the literature, e.g.:
- ISO standard
- Peter Noll "MPEG Digital Audio Coding" IEEE Signal Processing
  Magazine, Sept. 1997, pp.59-81
- Davis Pan "A Tutorial on MPEG/Audio Compression" IEEE Multimedia
  Vol. 2, No. 7, 1995, pp. 60-74
- chapter 4 on "Audio" in Haskell/Puri/Netravali "Digital Video: An
  Introduction to MPEG-2", Chapman & Hall, New York, 1997


1) What data is encoded

Each channel is encoded separately (but possibly simultaneously 
with the other one in order to allow adaptive bit allocation 
between channels). For each channel, two (MPEG1) / one (MPEG2) 
granule are stored in one "frame". Per granule and per channel, 
the following processing is performed for 1152 PCM input samples.

(In the graph, processes are put into boxes, data has no boxes.)


PCM-Samples (36 x 32 samples)       > TIME domain
    |
    v
----------------------
|polyphase Filterbank|
----------------------
    |
    v
32 equally spaced subbands           > FREQUENCY domain
(containing 36 samples per 
 subband)
    |
    v
-----------------------
|modified Discrete    | (either with long windows of 36 samples
|Cosine Transform MDCT|  or short windows of 12 samples; mixed
-----------------------  windows for the different subbands are
    |                    possible using long windows on 2 lowest
    |                    subbands and short windows on upper 30 
    |                    subbands)
    |
    v
18 sub-subbands for long windows (of 36 samples, 50% overlap) or
3 x 6 sub-subbands for short windows (of 12 samples, 50% overlap)
Arranging them from low frequency to high frequency, putting the 
three successive short window results in a row, results in 576 
frequency values (pleas note: values are integers not reals):

1                                                              576
|----------------------------------------------------------------|
| pairs of                   | quadruples of     | pairs of      |
| bigvalues |value|<8191     | count1 |value|<=1 | rzero value=0 |
|----------------------------|-------------------|---------------|
| region0    |region1|region2|        |                 |
|------------|-------|-------|        |                 v
      |         |       |             |            NOT ENCODED
      v         v       v             v
-------------  ...     ...        - 1 huffman table (A or B) 
|noise alloc|                     - huffman code bits encoding
|& scalefacs| (according to         quadruples
-------------  psychoacoustic     - 1 signbit per encoded sample
      |        model)               (i.e. 4 signbits per quadrupel)
      v         
- 1 huffman table each (choice of 32 tables)
- 4 scalefactor selection information for long windows only
- 1 scalefactor per scalefactor group (21 for long windows, 
  12 for short windows)
- huffman code bits encoding pairs
- 2 signbits per encoded pair

Scalefactor groups are built approximating critical-band widths.
The grouping depends on the samplingfrequency and is determined
via a table lookup. There are 
 21 scalefactor groups for long windows,
 3 x 12 = 36 scalefactor groups for short windows, and
 8 + 3 x 9 = 35 scalefactor groups for mixed windows.
The scalefactor groups are aligned with the region boundaries for
region 0-2.

Scalefactor groups are further integrated to scalefactor classes
for coding of scalefactor selection information.
Scalefactor classess for long windows:
 class 0: scalefactor goups 0-5
 class 1: scalefactor groups 6-10
 class 2: scalefactor groups 11-15
 class 3: scalefactor groups 16-20
Scalefactor classes for short windows:
 class 0: scalefactor groups 0-5
 class 1: scalefactor groups 6-11

The scalefactor selection information specifies whether the
scalefactors of the first granule are being reused in the
second granule.

Additionally for each granule and channel, a scalefactor 
compression information is determined. It specifies (via table 
lookup) the number of bits being used for the scalefactors in the 
scalefactor groups.

The samples in the bigvalues frequency bands are nonuniformly 
quantized.  This is implemented by raising the absolute value of 
the samples to the power 0.75 before passing to a uniform 
quantizer. The samples are then Huffman coded with variable-length 
codewords.

The scalefactors are used to colour the quantization noise. For 
each scalefactor group, one scalefactor is calculated which is a 
factor by which the samples in the scalefactor goup are scaled 
before quantization.

The Huffman coding process usually generates a variable number of 
bits per Frame. The MPEG frames however contain a constant number 
of bits calculated from the desired bitrate for the stream. 
Variable bitrate coding is realized via a "bit-reservoir". Frames 
which do not require all of their bits for encoding donate these 
bits to the reservoir such that later frames may borrow bits from 
the reservoir. It is however ensured that no bits of a frame are 
stored in future frames.


2) Layout of frame

Each MPEG audio frame contains a header after which the encoded
audio data is stored. A layer 3 frame contains another header
called sideinformation followed by the main data:

------------------------
| Header               |  (32 byte)
------------------------
| Sideinformation      |  (17 byte if mono, 32 byte if 2 channels)
------------------------
| optional CRC         |  (16 bit)
------------------------
| main data            |
------------------------
| ancilliary data      |  ("padding bits")
------------------------

Sideinformation:
- main_data_begin (9 bit): negative offset in byte specifying 
  where the main data for the frame begins. This implements the 
  bit-reservoir by allowing a frame to have its data stored in 
  previous frames.
- private bits (5 bit if mono, 3 bit if stereo): for private use 
  ("padding")
- scalefactor selection information (1 bit): specifies for the 
  scalefactor classes of each channel if the scalefactors for the 
  first granule are reused in the second granule
- part2_3_lenth (12 bit): specifies for each channel and granule 
  the number of bits used for scalefactors and Huffman-encoded 
  samples.
- big_values (9 bit): specifies the number of sample pairs in the 
  bigvalues region. The number of quadruples in the count1-region 
  must be deduced from part2_3_length and big_values during 
  decoding.
  The number of zeros in the rzero-region can then be calculated 
  using big_values and count1 and the number of frequency bands 
  (576).
- global_gain (8 bit): specifies for each channel and granule the 
  quantizer step size.
- scalefac_compress (4 bit): specifies for each channel and 
  granule the number of bits used for the scalefactors (via table 
  lookup).
- window_switching (1 bit): specifies for each channel and granule
  if pure long windows are used (value: 0) or mixed/short/
  short->long/long->short (value: 1).
- table_select (5 bit): specifies for each channel and granule and
  each of the three regions which Huffman table is used for 
  encoding.
- region0_count (4 bit): specifies for each channel and granule the
  number of samples in region 0. This depends on the window_type 
  and the mixed_window flag and the samplingfrequency and is found 
  via table lookup.
- region1_count (3 bit): specifies for each channel and granuel 
  the number of samples in region 1.This depends on the window_type 
  and the mixed_window flag and the samplingfrequency and is found 
  via table lookup. The remaining samples from the big_values 
  region belong to region 2.
- window_type (2 bit): specifies for each channel and granule the 
  type of window being used (long/short/long->short/short->long).
- mixed_window (1 bit): specifies for each channel and granule if
  mixed windows are used (value: 1), i.e. the lowest two subbands
  are transformed with long windows, the upper 30 with short 
  windows.
- shortwin_gain (3 bit): specifies for each channel and granule and
  short window a gain offset from the global_gain
- preflag (1 bit): specifies for each channel and granule 
  additional high frequency amplification.
- scalefac_scale (1 bit): specifies for each channel and granule 
  the quantization of the scalefactors.
- count1_table_select (1 bit): specifies for each channel and 
  granule which Huffman table is used for the count1-region.

Main Data:
- Granule 1: scalefactors + Huffman-encoded sample data
- Granule 2: scalefactors + Huffman-encoded sample data

where scalefac_l (0-4 bit) contains for each channel and granule
      and scalefactor band the scalefactors of long windows; 
and   scalefac_s (0-4 bit) contains for each channel and granule
      and scalefactor band three scalefactors for short windows;

also Huffman encoded data contain:
     0-19 bit in the bigvalues region + 2 x 1-13 linbits + 2 
              signbits,
     1-6  bit in the count1 region + 4 signbits.
     (linbits are required if the absolute value of the pair>=15)

The framesize depends on the samplingfrequency of the PCM samples
and on the desired bitrate for the MPEG audio stream. It may be
calculated via the following formula:
framesize = 144 * bitrate / samplingfrequency
Allowed bitrates and samplingfrequencies differ between MPEG1 and
MPEG2.