 |
MPEG Video Encoding
MPEG is the compression system of choice for
DVD-Video and Video CD. |
MPEG (Moving Picture Experts Group) is the ISO committee
which is responsible for defining the various MPEG video specifications.
- MPEG-1, originally defined in 1992 was aimed at full screen
video stored on a CD-ROM. It has since been incorporated into the Video CD
specification and is used on CD-ROMs.
- MPEG-2 came later and was intended for digital television
applications and is used for DVD-Video.
It supports interlaced video and variable bit rate (VBR) encoding.
- MPEG-3 was intended for HDTV but this was later incorporated
into MPEG-2.
- MPEG-4 is intended for video conferencing,
Internet distribution and similar
applications using low bandwidths.
Video for compression comprises sequences of
still pictures or frames which, without compression, would require a data rate far too
high even for DVD (see CCIR 601).
Various methods are used to compress the video information contained in these frames.
Each
frame
is divided into an array of macroblocks, each 16 x 16 pixels in size and comprising 4
blocks of Y (luminance), 1 block each of U and V (colour) information. The colour
information therefore has half the horizontal and vertical resolution of the luminance
information (see CCIR 601).
The Y, U and V information in each macroblock
is compressed using Discrete Cosine
Transform (DCT) encoding and Motion
Compensation.
Discrete Cosine Transform
Discrete
Cosine Transform is used to reduce the data required to represent a single
frame. Each of the 6 blocks per macroblock is encoded by carrying out a
Fourier transform of the pixels in diagonal lines (see diagram) which maps
the 8 x 8 block to 1 x 64, prior to DCT encoding. Huffman coding is used
to code the DCT data. If bandwidth is limited, the higher frequency
components will be missing resulting in a 'fuzzy' result. MPEG encoding
can be allowed to degrade in this way to allow for lower bandwidths.
Motion Compensation
Motion compensation is used to predict the values of
pixels by relocating a block of pixels from the last picture. This
motion is described by a 2-dimensional vector or movement from its last
position. This is particularly useful for pans and other similar
movement. Most prediction errors will be small since pixel values do not
have large changes within a small area. The error values will therefore
compress better than the values themselves. Quantization of the
prediction errors further reduces the information.
MPEG Frame Types
Sequences of MPEG video comprise GOPs. Each GOP (Group of Pictures)
comprises video frames of three different types. These are:
- I-frames (Intra coded
frames)
use DCT encoding only to
compress a single frame without reference to any other frame in the sequence.
Typically I-frames are encoded with 2 bits per pixel on
average. Since the initial data comprises 4 bytes of Y, 1 byte of U and 1 byte of V (total
6 bytes = 48 bits) per pixel, this gives a compression ratio of 24:1. For random playing of MPEG video, the decoder must start
decoding from an I-frame not a P-frame. I-frames are inserted every 12 to 15 frames and are used to
start a sequence, allowing video to be played from random positions and for fast
forward/reverse. Decoding of video can start only at an I-frame.
- P-frames (Predicted
frames)
are coded as differences from the last I or P frame. The new P-frame is first predicted by
taking the last I or P frame and 'predicting' the values of each new pixel. P-frames use
Motion
Prediction and DCT
encoding. As a result P-frames will give a compression ratio better
than I-frames but depending on the amount of motion present. The differences between the predicted and actual values are
encoded. Most prediction errors will be small since pixel values do not have large changes
within a small area. The error values will therefore compress better than the values
themselves. Quantization of the prediction errors further reduces the information.
- B-frames (Bidirectional
frames) are coded as differences from the last or next I or P frame. B-frames use
prediction as for P-frames but for each block either the previous I or P frame is used or
the next I or P frame. P-frames use Motion
Prediction and DCT
encoding. Because B-frames require both previous and subsequent
frames for correct decoding, the order of MPEG frames as read is not the same as the
displayed order. This gives improved compression compared with P-frames,
because it is possible to choose for every macroblock whether the previous or next frame
is taken for comparison.
These frames are interleaved in a sequence such as
IBBPBBP.. or IBPBPBPBP.
The former is more difficult to encode but provides a higher compression ratio than the
latter.
|