ISO/IEC 13818-2 Part 2: Video - Standard Text -

Available parts of the standard


Internal Information

Main Referee:

Heiner Schomaker

State of Entry:

Incomplete

Last update:

Feb. 25, 1994

Primary Source / Published in:

Document No.:
ISO/IEC JTC 1/SC 29 N 635
Title:
Working Document for ISO/IEC CD 13818-2: Information technology -
Generic coding of moving pictures and associated audio information -
Part 2 : Video
[ISO/IEC JTC 1/SC 29/WG 11 N 635] Date: 1993-11-15

Document Parts

Titlepage:

INTERNATIONAL ORGANISATION FOR STANDARDIZATION

ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC1/SC29

CODING OF MOVING PICTURES AND ASSOCIATED AUDIO


ISO/IEC JTC1/SC29

WG11/602

November 1993, Seoul




INFORMATION TECHNOLOGY -


GENERIC CODING OF MOVING PICTURES AND ASSOCIATED AUDIO

Recommendation H.262

ISO/IEC 13818-2

Committee Draft


Draft of: November 5, 1993, 9:10

Foreword:

The ITU-T (the ITU Telecommunication Standardisation Sector) is a permanent organ of the International Telecommunications Union (ITU). The ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations on them with a view to developing telecommunication standards on a world-wide basis.

The World Telecommunication Standardisation Conference, which meets every four years, establishes the program of work arising from the review of existing questions and new questions among other things. The approval of new or revised Recommendations by members of the ITU-T is covered by the procedure laid down in the ITU-T Resolution No. 1 (Helsinki 1993). The proposal for Recommendation is accepted if 70% or more of the replies from members indicate approval.

ISO (the International Organisation for Standardisation) and IEC (the International Electrotechnical Commission) form the specialised system for world-wide standardisation. National Bodies that are members of ISO and IEC participate in the development of International Standards through technical committees established by the respective organisation to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organisations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.

In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC1. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires approval by at least 75% of the national bodies casting a vote.

This specification is a committee draft that is being submitted for approval to the ITU-T, ISO-IEC/JTC1 SC29. It was prepared jointly by SC29/WG11, also known as MPEG (Moving Pictures Expert Group), and the Experts Group for ATM Video Coding in the ITU-T SG15. MPEG was formed in 1988 to establish standards for coding of moving pictures and associated audio for various applications such as digital storage media, distribution and communication. The Experts Group for ATM Video Coding was formed in 1990 to develop video coding standard(s) appropriate for B-ISDN using ATM transport.

In this specification Annex A, Annex B and Annex C contain normative requirements and are an integral part of this specification. Annex D, Annex E, Annex F and Annex G are informative and contain no normative requirements.

ISO/IEC

This International Standard is published in four Parts.

13818-1 systems
specifies the system coding of the specification. It defines a multiplexed structure for combining audio and video data and means of representing the timing information needed to replay synchronised sequences in real-time.

13818-2 video
specifies the coded representation of video data and the decoding process required to reconstruct pictures.

13818-3 audio
specifies the coded representation of audio data.

13818-4 conformance
specifies the procedures for determining the characteristics of coded bitstreams and for testing compliance with the requirements stated in 13818-1, 13818-2 and 13818-3.


Contents:


Introduction:

I.1 Purpose

This Part of this specification was developed in response to the growing need for a generic coding method of moving pictures and of associated sound for various applications such as digital storage media, television broadcasting and communication. The use of this specification means that motion video can be manipulated as a form of computer data and can be stored on various storage media, transmitted and received over existing and future networks and distributed on existing and future broadcasting channels.

I.2 Application

The applications of this specification cover, but are not limited to, such areas as listed below:

BSS
Broadcasting Satellite Service (to the home)
CATV
Cable TV Distribution on optical networks, copper, etc.
CDAD
Cable Digital Audio Distribution
DAB
Digital Audio Broadcasting (terrestrial and satellite broadcasting)
DTTB
Digital Terrestrial Television Broadcast
EC
Electronic Cinema
ENG
Electronic News Gathering (including SNG, Satellite News Gathering)
FSS
Fixed Satellite Service (e.g. to head ends)
HTT
Home Television Theatre
IPC
Interpersonal Communications (videoconferencing, videophone, etc.)
ISM
Interactive Storage Media (optical disks, etc.)
MMM
Multimedia Mailing
NCA
News and Current Affairs
NDB
Networked Database Services (via ATM, etc.)
RVS
Remote Video Surveillance
SSM
Serial Storage Media (digital VTR, etc.)

I.3 Profiles and levels

This specification is intended to be generic in the sense that it serves a wide range of applications, bit rates, resolutions, qualities and services. Applications should cover, among other things, digital storage media, television broadcasting and communications. In the course of creating this specification, various requirements from typical applications have been considered, necessary algorithmic elements have been developed, and they have been integrated into a single syntax. Hence this specification will facilitate the bitstream interchange among different applications.

Considering the practicality of implementing the full syntax of this specification, however, a limited number of subsets of the syntax are also stipulated by means of "profile" and "level". These and other related terms are formally defined in clause 3 of this specification.

A "profile" is a defined sub-set of the entire bitstream syntax that is defined by this specification. Within the bounds imposed by the syntax of a given profile it is still possible to require a very large variation in the performance of encoders and decoders depending upon the values taken by parameters in the bitstream. For instance it is possible to specify frame sizes as large as (approximately) 2[14] pels wide by 2[14] lines high. It is currently neither practical nor economic to implement a decoder capable of dealing with all possible frame sizes.

In order to deal with this problem "levels" are defined within each profile. A level is a defined set of constraints imposed on parameters in the bitstream. These constraints may be simple limits on numbers. Alternatively they may take the form of constraints on arithmetic combinations of the parameters (e.g. frame width multiplied by frame height multiplied by frame rate).

Bitstreams complying with this specification use a common syntax. In order to achieve a sub-set of the complete syntax flags and parameters are included in the bitstream that signal the presence or otherwise of syntactic elements that occur later in the bitstream. In order to specify constraints on the syntax (and hence define a profile) it is thus only necessary to constrain the values of these flags and parameters that specify the presence of later syntactic elements.

I.4 The scalable and the non-scalable syntax

The full syntax can be divided into two major categories: One is the non-scalable syntax, which is structured as a super set of the syntax defined in ISO/IEC 11172-2. The main feature of the non-scalable syntax is the extra compression tools for interlaced video signals. The second is the scalable syntax, the key property of which is to enable the reconstruction of useful video from pieces of a total bitstream. This is achieved by structuring the total bitstream in two or more layers, starting from a standalone base layer and adding a number of enhancement layers. The base layer can use the non-scalable syntax, or in some situations conform to the ISO/IEC 11172-2 syntax.

I.4.1 Overview of the non-scalable syntax

The coded representation defined in the non-scalable syntax achieves a high compression ratio while preserving good image quality. The algorithm is not lossless as the exact pixel values are not preserved during coding. The choice of the techniques is based on the need to balance a high image quality and compression ratio with the requirement to make random access to the coded bitstream. Obtaining good image quality at the bitrates of interest demands very high compression, which is not achievable with intra picture coding alone. The need for random access, however, is best satisfied with pure intra picture coding. This requires a careful balance between intra- and interframe coding and between recursive and non-recursive temporal redundancy reduction.

A number of techniques are used to achieve high compression. The algorithm first uses block-based motion compensation to reduce the temporal redundancy. Motion compensation is used both for causal prediction of the current picture from a previous picture, and for non-causal, interpolative prediction from past and future pictures. Motion vectors are defined for each 16-pixel by 16-line region of the picture. The difference signal, i.e., the prediction error, is further compressed using the discrete cosine transform (DCT) to remove spatial correlation before it is quantised in an irreversible process that discards the less important information. Finally, the motion vectors are combined with the residual DCT information, and encoded using variable length codes.

I.4.1.1 Temporal processing

Because of the conflicting requirements of random access and highly efficient compression, three main picture types are defined. Intra coded pictures (I-Pictures) are coded without reference to other pictures. They provide access points to the coded sequence where decoding can begin, but are coded with only moderate compression. Predictive coded pictures (P-Pictures) are coded more efficiently using motion compensated prediction from a past intra or predictive coded picture and are generally used as a reference for further prediction. Bidirectionally-predictive coded pictures (B-Pictures) provide the highest degree of compression but require both past and future reference pictures for motion compensation. Bidirectionally-predictive coded pictures are never used as references for prediction. The organisation of the three picture types in a sequence is very flexible. The choice is left to the encoder and will depend on the requirements of the application. Figure 0-1 illustrates the relationship among the three different picture types.

Figure 0-1 Example of temporal picture structure

I.4.1.2 Coding interlaced video

Each frame of interlaced video consists of two fields which are separated by one field-period. The specification allows either the frame to be encoded as picture or the two fields to be encoded as two pictures. Frame encoding or field encoding can be adaptively selected on a frame-by-frame basis. Frame encoding is typically preferred when the video scene contains significant detail with limited motion. Field encoding, in which the second field can be predicted from the first, works better when there is fast movement.

I.4.1.3 Motion representation - macroblocks

As in ISO/IEC 11172-2, the choice of 16 by 16 macroblocks for the motion-compensation unit is a result of the trade-off between the coding gain provided by using motion information and the overhead needed to store it. Each macroblock can be temporally predicted in one of a number of different ways. For example, in frame encoding, the prediction from the previous reference frame can itself be either frame-based or field-based. Depending on the type of the macroblock, motion vector information and other side information is encoded with the compressed prediction error signal in each macroblock. The motion vectors are encoded differentially with respect to the last encoded motion vectors using variable length codes. The maximum length of the vectors that may be represented can be programmed, on a picture-by-picture basis, so that the most demanding applications can be met without compromising the performance of the system in more normal situations.

It is the responsibility of the encoder to calculate appropriate motion vectors. The specification does not specify how this should be done.

I.4.1.4 Spatial redundancy reduction

Both original pictures and prediction error signals have high spatial redundancy. This specification uses a block-based DCT method with visually weighted quantisation and run-length coding. After motion compensated prediction or interpolation, the residual picture is split into 8 by 8 blocks. These are transformed into the DCT domain where they are weighted before being quantised. After quantisation many of the coefficients are zero in value and so two-dimensional run-length and variable length coding is used to encode the remaining coefficients efficiently.

I.4.1.5 Chroma formats

In addition to the 4:2:0 format supported in ISO/IEC 11172-2 this specification supports 4:2:2 and 4:4:4 chroma formats.

I.4.2 Scalable extensions

The scalability tools in this specification are designed to support applications beyond that supported by single layer video. Among the noteworthy applications areas addressed are video telecommunications, video on asynchronous transfer mode networks (ATM), interworking of video standards, video service hierarchies with multiple spatial, temporal and quality resolutions, HDTV with embedded TV, systems allowing migration to higher temporal resolution HDTV etc. Although a simple solution to scalable video is the simulcast technique which is based on transmission/storage of multiple independently coded reproductions of video, a more efficient alternative is scalable video coding, in which the bandwidth allocated to a given reproduction of video can be partially reutilised in coding of the next reproduction of video. In scalable video coding, it is assumed that given an encoded bitstream, decoders of various complexities can decode and display appropriate reproductions of coded video. A scalable video encoder is likely to have increased complexity when compared to a single layer encoder. However, this standard provides several different forms of scalabilities that address nonoverlapping applications with corresponding complexities. The basic scalability tools offered are: data partitioning, SNR scalability, spatial scalability and temporal scalability. Moreover, combinations of these basic scalability tools are also supported and are referred to as hybrid scalability. In the case of basic scalability, two layers of video referred to as the lower layer and the enhancement layer are allowed, whereas in hybrid scalability up to 3 layers are supported. The following tables provide a few example applications of various scalabilities.

Table 0-? Applications of SNR scalability

Lower layer | Enhancement layer | Application ----------------+-----------------------+----------------------- ITU-R-601 | Same resolution and | Two quality service | format as lower layer | for Standard TV High Definition | Same resolution and | Two quality service | format as lower layer | for HDTV 4:2:0 High De- | 4:2: chroma simulcast | Video production / finition | | distribution Table 0-?. Applications of spatial scalability

Base | Enhancement | Application ----------------+---------------+------------------------------- prog (30Hz) | prog (30Hz) | CIF/SCIF compatibility or | | scalability interl (30Hz) | interl(30Hz) | HDTV/SDTV scalability prog (30Hz) | interl(30Hz) | ISO/IEC 11172-2/compatibility | | with this specification interl (30Hz) | prog (60Hz) | Migration to HR prog HDTV Table 0-?. Applications of temporal scalability

Base | Enhancement | Higher | Application ----------------+---------------+---------------+--------------- prog (30Hz) | prog (30Hz) | prog (60Hz) | Migration to | | | HR prog HDTV interl (30Hz) | interl(30Hz) | prog (60Hz) | Migration to | | | HR prog HDTV

I.4.2.1 Spatial scalable extension

Spatial scalability is a tool intended for use in video applications involving telecommunications, interworking of video standards, video database browsing, interworking of HDTV and TV etc., i.e., video systems with the primary common feature that a minimum of two layers of spatial resolution are necessary. Spatial scalability involves generating two spatial resolution video layers from a single video source such that the lower layer is coded by itself to provide the basic spatial resolution and the enhancement layer employs the spatially interpolated lower layer and carries the full spatial resolution of the input video source. The lower and the enhancement layers may either both use the coding tools in this specification, or the ISO/IEC 11172-2 standard for the lower layer and this specification for the enhancement layer. The latter case achieves a further advantage by facilitating interworking between video coding standards. Moreover, spatial scalability offers flexibility in choice of video formats to be employed in each layer. An additional advantage of spatial scalability is its ability to provide resilience to transmission errors as the more important data of the lower layer can be sent over channel with better error performance, while the less critical enhancement layer data can be sent over a channel with poor error performance.

I.4.2.2 SNR scalable extension

SNR scalability is a tool intended for use in video applications involving telecommunications, video services with multiple qualities, standard TV and HDTV, i.e., video systems with the primary common feature that a minimum of two layers of video quality are necessary. SNR scalability involves generating two video layers of same spatial resolution but different video qualities from a single video source such that the lower layer is coded by itself to provide the basic video quality and the enhancement layer is coded to enhance the lower layer. The enhancement layer when added back to the lower layer regenerates a higher quality reproduction of the input video. The lower and the enhancement layers may either use this specification or ISO/IEC 11172-2 standard for the lower layer and this specification for the enhancement layer. An additional advantage of SNR scalability is its ability to provide high degree of resilience to transmission errors as the more important data of the lower layer can be sent over channel with better error performance, while the less critical enhancement layer data can be sent over a channel with poor error performance.

I.4.2.3 Temporal scalable extension

Temporal scalability is a tool intended for use in a range of diverse video applications from telecommunications to HDTV for which migration to higher temporal resolution systems from that of lower temporal resolution systems may be necessary. In many cases, the lower temporal resolution video systems may be either the existing systems or the less expensive early generation systems, with the motivation of introducing more sophisticated systems gradually. Temporal scalability involves partitioning of video frames into layers, whereas the lower layer is coded by itself to provide the basic temporal rate and the enhancement layer is coded with temporal prediction with respect to the lower layer, these layers when decoded and temporal multiplexed to yield full temporal resolution of the video source. The lower temporal resolution systems may only decode the lower layer to provide basic temporal resolution, whereas more sophisticated systems of the future may decode both layers and provide high temporal resolution video while maintaining interworking with earlier generation systems. An additional advantage of temporal scalability is its ability to provide resilience to transmission errors as the more important data of the lower layer can be sent over channel with better error performance, while the less critical enhancement layer can be sent over a channel with poor error performance.

I.4.2.4 Data partitioning extension

Data partitioning is a tool intended for use when two channels are available for transmission and/or storage of a video bitstream, as may be the case in ATM networks, terrestrial broadcast, magnetic media, etc. The bitstream is partitioned between these channels such that more critical parts of the bitstream (such as headers, motion vectors, DC coefficients) are transmitted in the channel with the better error performance, and less critical data (such as higher DCT coefficients) is transmitted in the channel with poor error performance. Thus, degradation to channel errors are minimised since the critical parts of a bitstream are better protected. Data from neither channel may be decoded on a decoder that is not intended for decoding data partitioned bitsreams.


Scope:

This Recommendation | International Standard specifies the coded representation of picture information for digital storage media and digital video communication and specifies the decoding process. The representation supports constant bitrate transmission, variable bitrate transmission, random access, channel hopping, scalable decoding, bitstream editing, as well as special functions such as fast forward playback, slow motion, pause and still pictures. This Recommendation | International Standard is compatible with ISO/IEC 11172-2 and upward or downwardm compatible with EDTV, HDTV, SDTV formats. This Recommendation | International Standard is primarily applicable to digital storage media, video broadcast and communication. The storage media may be directly connected to the decoder, or via communications means such as busses, LANs, or telecommunications links.


Field of Applications:

The applications of this specification cover, but are not limited to, such areas as listed below:

BSS
Broadcasting Satellite Service (to the home)
CATV
Cable TV Distribution on optical networks, copper, etc.
CDAD
Cable Digital Audio Distribution
DAB
Digital Audio Broadcasting (terrestrial and satellite broadcasting)
DTTB
Digital Terrestrial Television Broadcast
EC
Electronic Cinema
ENG
Electronic News Gathering (including SNG, Satellite News Gathering)
FSS
Fixed Satellite Service (e.g. to head ends)
HTT
Home Television Theatre
IPC
Interpersonal Communications (videoconferencing, videophone, etc.)
ISM
Interactive Storage Media (optical disks, etc.)
MMM
Multimedia Mailing
NCA
News and Current Affairs
NDB
Networked Database Services (via ATM, etc.)
RVS
Remote Video Surveillance
SSM
Serial Storage Media (digital VTR, etc.)
NOTE:From "Chapter I.2 Application"

Relationships to other Standards:

2 Normative references

The following ITU-T Recommendations and International Standards contain provisions which through reference in this text, constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent editions of the standards indicated below. Members of IEC and ISO maintain registers of currently valid International Standards. The TSB (Telecommunication Standardisation Bureau) maintains a list of currently valid ITU-T Recommendations.


Definitions:

For the purposes of this Recommendation | International Standard, the following definitions apply.

3.1 AC coefficient:
Any DCT coefficient for which the frequency in one or both dimensions is non-zero.

3.2 backward compatibility:
A new coding standard is backward compatible with an existing coding standard if existing decoders (designed to operate with the existing coding standard) are able to continue to operate by decoding all or part of a bitstream produced according to the new coding standard.

3.3 backward motion vector:
A motion vector that is used for motion compensation from a reference picture at a later time in display order.

3.4 bidirectionally predictive-coded picture; B-picture:
A picture that is coded using motion compensated prediction from past and/or future reference pictures.

3.5 bitrate:
The rate at which the compressed bitstream is delivered from the storage medium to the input of a decoder.

3.6 block:
An 8-row by 8-column matrix of pels, or 64 DCT coefficients (source, quantised or dequantised).

3.7 bottom field:
One of two fields that comprise a frame of interlaced video. Each line of a bottom field is spatially located immediately below the corresponding line of the top field.

3.8 byte aligned:
A bit in a coded bitstream is byte-aligned if its position is a multiple of 8-bits from the first bit in the stream.

3.9 byte:
Sequence of 8-bits.

3.10 channel:
A digital medium that stores or transports a bitstream constructed according to this specification.

3.11 chroma format:
Defines the number of chrominance blocks in a macroblock.

3.12 chroma simulcast:
A type of scalability (which is a subset of SNR scalability) where the enhancement layer (s) contain only coded refinement data for the DC coefficients, and all the data for the AC coefficients, of the chroma components.

3.13 chrominance (component):
A matrix, block or single pel representing one of the two colour difference signals related to the primary colours in the manner defined in the bitstream. The symbols used for the colour difference signals are Cr and Cb.

3.14 coded video bitstream:
A coded representation of a series of one or more pictures as defined in this specification.

3.15 coded order:
The order in which the pictures are stored and decoded. This order is not necessarily the same as the display order.

3.16 coded representation:
A data element as represented in its encoded form.

3.17 coding parameters:
The set of user-definable parameters that characterise a coded video bitstream. Bitstreams are characterised by coding parameters. Decoders are characterised by the bitstreams that they are capable of decoding.

3.18 component:
A matrix, block or single pel from one of the three matrices (luminance and two chrominance) that make up a picture.

3.19 compression:
Reduction in the number of bits used to represent an item of data.

3.20 constant bitrate coded video:
A compressed video bitstream with a constant average bitrate.

3.21 constant bitrate:
Operation where the bitrate is constant from start to finish of the compressed bitstream.

3.22 CRC:
Cyclic redundancy code.

3.23 data element:
An item of data as represented before encoding and after decoding.

3.24 data partitioning:
A method for dividing a bitstream into two separate bitstreams for error resilience purposes. the two bitstreams have to be recombined before decoding.

3.25 DC coefficient:
The DCT coefficient for which the frequency is zero in both dimensions.

3.26 DCT coefficient:
The amplitude of a specific cosine basis function.

3.27 decoder input buffer:
The first-in first-out (FIFO) buffer specified in the video buffering verifier.

3.28 decoder input rate:
The data rate specified in the video buffering verifier and encoded in the coded video bitstream.

3.29 decoder:
An embodiment of a decoding process.

3.30 decoding (process):
The process defined in this specification that reads an input coded bitstream and produces decoded pictures or audio samples.

3.31 dequantisation:
The process of rescaling the quantised DCT coefficients after their representation in the bitstream has been decoded and before they are presented to the inverse DCT.

3.32 digital storage media; DSM:
A digital storage or transmission device or system.

3.33 discrete cosine transform; DCT:
Either the forward discrete cosine transform or the inverse discrete cosine transform. The DCT is an invertible, discrete orthogonal transformation. The inverse DCT is defined in Annex A of this specification.

3.34 display order:
The order in which the decoded pictures are displayed. Normally this is the same order in which they were presented at the input of the encoder.

3.35 editing:
The process by which one or more compressed bitstreams are manipulated to produce a new compressed bitstream. Conforming edited bitstreams must meet the requirements defined in this specification.

3.36 encoder:
An embodiment of an encoding process.

3.37 encoding (process):
A process, not specified in this specification, that reads a stream of input pictures or audio samples and produces a valid coded bitstream as defined in this specification.

3.38 fast forward playback:
The process of displaying a sequence, or parts of a sequence, of pictures in display-order faster than real-time.

3.39 fast reverse playback:
The process of displaying the picture sequence in the reverse of display order faster than real-time..

3.40 field:
For an interlaced video signal, a "field" is the assembly of alternate lines of a frame. Therefore. an interlaced frame is composed of two fields a top field and a bottom field.

3.41 field period: The reciprocal of twice he frame rate.

3.42 flag:
A variable which can take one of only the two values defined in this specification.

3.43 forbidden:
The term "forbidden" when used in the clauses defining the coded bitstream indicates that the value shall never be used. This is usually to avoid emulation of start codes.

3.44 forced updating:
The process by which macroblocks are intra-coded from time-to-time to ensure that mismatch errors between the inverse DCT processes in encoders and decoders cannot build up excessively.

3.45 forward compatibility:
A new coding standard is forward compatible with an existing coding standard if new decoders (designed to operate with the new coding standard) continue to be able to decode bitstreams of the existing coding standard.

3.46 forward motion vector:
A motion vector that is used for motion compensation from a reference picture at an earlier time in display order.

3.47 frame:
A frame contains lines of spatial information of a video signal. For progressive video, these lines contain samples starting from one time instant and continuing through successive lines to the bottom of the frame. For interlaced video a frame consists of two fields, a top field and a bottom field. One of these fields will commence one field period later than the other.

3.48 frame period:
The reciprocal of the frame rate.

3.49 frame rate:
The rate at which frames are be output from the decoding process.

3.50 future reference picture:
A future reference picture is a reference picture that occurs at a later time than the current picture in display order.

3.51 header:
A block of data in the coded bitstream containing the coded representation of a number of data elements pertaining to the coded data that follow the header in the bitstream.

3.52 hybrid scalability:
Hybrid scalability is the combination of two (or more) types of scalability.

3.53 interlace:
The property of conventional television frames where alternating lines of the frame represent different instances in time.

3.54 intra coding:
Coding of a macroblock or picture that uses information only from that macroblock or picture.

3.55 intra-coded picture; I-picture:
A picture coded using information only from itself.

3.56 level :
A defined set of constraints on the values which may be taken by the parameters of this specification within a particular profile. A profile may contain one or more levels.

3.57 luminance (component):
A matrix, block or single pel representing a monochrome representation of the signal and related to the primary colours in the manner defined in the bitstream. The symbol used for luminance is Y.

3.58 macroblock:
The four 8 by 8 blocks of luminance data and the two (for 4:2:0 chroma format), four (for 4:2:2 chroma format) or eight (for 4:4:4 chroma format) corresponding 8 by 8 blocks of chrominance data coming from a 16 by 16 section of the luminance component of the picture. Macroblock is sometimes used to refer to the pel data and sometimes to the coded representation of the pel values and other data elements defined in the macroblock header of the syntax defined in this part of this specification. The usage is clear from the context.

3.59 motion compensation:
The use of motion vectors to improve the efficiency of the prediction of pel values. The prediction uses motion vectors to provide offsets into the past and/or future reference pictures containing previously decoded pel values that are used to form the prediction error signal.

3.60 motion estimation:
The process of estimating motion vectors during the encoding process.

3.61 motion vector:
A two-dimensional vector used for motion compensation that provides an offset from the coordinate position in the current picture to the coordinates in a reference picture.

3.62 non-intra coding:
Coding of a macroblock or picture that uses information both from itself and from macroblocks and pictures occurring at other times.

3.63 parameter:
A variable within the syntax of this specification which may take one of a large range of values. A variable which can take one of only two values is a flag and not a parameter.

3.64 past reference picture:
A past reference picture is a reference picture that occurs at an earlier time than the current picture in display order.

3.65 pel aspect ratio:
The ratio of the nominal vertical height of pel on the display to its nominal horizontal width.

3.66 pel:
Picture element.

3.67 picture:
Source, coded or reconstructed image data. A source or reconstructed picture consists of three rectangular matrices of 8-bit numbers representing the luminance and two chrominance signals. For progressive video, a picture is identical to a frame, while for interlaced video, a picture can refer to a frame, or the top field or the bottom field of the frame depending on the context.

3.68 prediction:
The use of a predictor to provide an estimate of the pel value or data element currently being decoded.

3.69 predictive-coded picture; P-picture:
A picture that is coded using motion compensated prediction from past reference pictures.

3.70 prediction error: The difference between the actual value of a pel or data element and its predictor.

3.71 predictor:
A linear combination of previously decoded pel values or data elements.

3.72 profile:
A defined sub-set of the syntax of this specification.

3.73 Note
In this specification the word "profile" is used as defined above. It should not be confused with other definitions of "profile" and in particular it does not have the meaning that is defined by JTC1/SGFS.

3.74 quantisation matrix:
A set of sixty-four 8-bit values used by the dequantiser.

3.75 quantised DCT coefficients:
DCT coefficients before dequantisation. A variable length coded representation of quantised DCT coefficients is stored as part of the compressed video bitstream.

3.76 quantiser scale:
A scale factor coded in the bitstream and used by the decoding process to scale the dequantisation.

3.77 random access:
The process of beginning to read and decode the coded bitstream at an arbitrary point.

3.78 reference picture:
Reference pictures are the nearest adjacent I- or P-pictures to the current picture in display order.

3.79 reserved:
The term "reserved" when used in the clauses defining the coded bitstream indicates that the value may be used in the future for ISO/IEC defined extensions.

3.80 scalability:
Scalability is the ability of a decoder to decode an ordered set of bitstreams to produce a reconstructed sequence. Moreover, useful video is output when subsets are decoded. The minimum subset that can thus be decoded is the first bitstream in the set which is called the base layer. Each of the other bitstreams in the set is called an enhancement layer. When addressing a specific enhancement layer, "lower layer" refer to the bitstream which precedes the enhancement layer.

3.81 side information:
Information in the bitstream necessary for controlling the decoder.

3.82 skipped macroblock:
A macroblock for which no data is encoded.

3.83 slice:
A series of macroblocks.

3.84 SNR scalability:
A type of scalability where the enhancement layer (s) contain only coded refinement data for the DCT coefficients. of the base layer.

3.85 spatial scalability:
A type of scalability where an enhancement layer also uses predictions from pel data derived from a lower layer without using motion vectors. The layers can have different frame sizes, frame rates or chroma formats

3.86 start codes [system and video]:
32-bit codes embedded in that coded bitstream that are unique. They are used for several purposes including identifying some of the structures in the coding syntax.

3.87 stuffing (bits); stuffing (bytes) :
Code-words that may be inserted into the compressed bitstream that are discarded in the decoding process. Their purpose is to increase the bitrate of the stream.

3.88 temporal scalability:
A type of scalability where an enhancement layer also uses predictions from pel data derived from a lower layer using motion vectors. The layers have identical frame size, and chroma formats, but can have different frame rates.

3.89 top field:
One of two fields that comprise a frame of interlaced video. Each line of a top field is spatially located immediately above the corresponding line of the bottom field.

3.90 variable bitrate:
Operation where the bitrate varies with time during the decoding of a compressed bitstream.

3.91 variable length coding; VLC:
A reversible procedure for coding that assigns shorter code-words to frequent events and longer code-words to less frequent events.

3.92 video buffering verifier; VBV:
A hypothetical decoder that is conceptually connected to the output of the encoder. Its purpose is to provide a constraint on the variability of the data rate that an encoder or editing process may produce.

3.93 video sequence:
A series of one or more pictures.

3.94 zig-zag scanning order:
A specific sequential ordering of the DCT coefficients from (approximately) the lowest spatial frequency to the highest.


Bibliography:

(This annex does not form an integral part of this Recommendation | International Standard)

1 Arun N. Netravali & Barry G. Haskell "Digital Pictures, representation and compression" Plenum Press, 1988

2 Didier Le Gall "MPEG: A Video Compression Standard for Multimedia Applications" Trans. ACM, April 1991

3 C Loeffler, A Ligtenberg, G S Moschytz "Practical fast 1-D DCT algorithms with 11 multiplications" Proceedings IEEE ICASSP-89, Vol. 2, pp 988-991, Feb. 1989

4 See the Normative Reference for ITU-R Rec 601 (formerly CCIR Rec 601)

5 See the Normative Reference for IEC Standard Publication 461

6 See the Normative Reference for ITU-T Rec. H.261

7 See the Normative reference for IEEE Standard Specification P1180-1990

8 ISO/IEC 10918-1 | ITU-T T.81 (JPEG)

9 E Viscito and C Gonzales "A Video Compression Algorithm with Adaptive Bit Allocation and Quantization", Proc SPIE Visual Communications and Image Proc '91 Boston MA November 10-15 Vol. 1605 205, 1991

10 A Puri and R Aravind "Motion Compensated Video Coding with Adaptive Perceptual Quantization", IEEE Trans. on Circuits and Systems for Video Technology, Vol. 1 pp 351 Dec. 1991.

11 C. Gonzales and E. Viscito, "Flexibly scalable digital video coding". Image Communications, Vol. 5, Nos. 1-2, February 1993

12 A.W.Johnson, T.Sikora and T.K. Tan, "Filters for Drift Reduction in Frequency Scalable Video Coding Schemes" <Transmitted for publication to Electronic Letters.>

13 R.Mokry and D.Anastassiou, "Minimal Error Drift in Frequency Scalability for Motion-Compensated DCT Coding". IEEE Transactions on Circuits and Systems for Video Technology, <accepted for publication>

14 K.N. Ngan, J. Arnold, T. Sikora, T.K. Tan and A.W. Johnson. "Frequency Scalability Experiments for MPEG-2 Standard". Asia-Pacific Conference on Communications, Korea, August 1993.

15 T. Sikora, T.K. Tan and K.N. Ngan, "A Performance Comparison of Frequency Domain Pyramid Scalable Coding Schemes Within the MPEG Framework". Proc. PCS, Picture Coding Symposium, Lausanne, pp. 16.1 - 16.2, Switzerland March 1993.

16 Masahiro Iwahashi, "Motion Compensation Technique for 2:1 Scaled-down Moving Pictures". 8-14, Picture Coding Symposium '93.

17 Sikora, T. and Pang, K., "Experiments with Optimal Block-Overlapping Filters for Cell Loss Concealment in Packet Video", Proc. IEEE Visual Signal Processing and Communications Workshop, Melbourne, 21-22 Sept. 1993, pp. 247-250.

18 A. Puri "Video Coding Using the MPEG-2 Compression Standard", <to appear> Proc SPIE Visual Communications and Image Proc '93 Boston MA November,1993.

19 A. Puri and A. Wong "Spatial Domain Resolution Scalable Video Coding", <to appear> Proc SPIE Visual Communications and Image Proc '93 Boston MA November,1993.


Annex:

Annex D: Features Supported by the algorithm

NOTE: This Annex gives an overview on the features supported by the MPEG II video algorithm. To get this part of the standard click here

Annex F: Patent statements

(This annex does not form an integral part of this Recommendation | International Standard)

The following table summarises the formal patent statements received and indicates the parts of the MPEG-2 standard to which the statement applies.

The list includes all the companies that previously submitted the informal statement, but if no "X" is present it means that no formal statement was received from that company.

Company V A S --------------------------------------------+-------+-------+------+ AT&T X X X BBC Research Department Bellcore X Belgian Science Policy Office X X X BOSCH X X X CCETT CSELT X David Sarnoff Research Center X X X Deutsche Thomson-Brandt GmbH X X X France Telecom CNET Fraunhofer Gesellschaft X X GC Technology Corporation X X X General Instruments Goldstar Hitachi, Ltd. International Business Machines Corporation X X X IRT X KDD X Massachusetts Institute of Technology X X X Matsushita Electric Industrial Co., Ltd. X X X Mitsubishi Electric Corporation National Transcommunications Limited NEC Corporation X Nippon Hoso Kyokai X Nippon Telegraph and Telephone X Nokia Research Center X Norwegian Telecom Research X Philips Consumer Electronics X X X OKI Qualcomm Incorporated X Royal PTT Nederland N.V., PTT Research (NL) X X X Samsung Electronics Scientific Atlanta X X X Siemens AG X Sharp Corporation Sony Corporation Texas Instruments Thomson Consumer Electronics Toshiba Corporation X TV/Com X X X Victor Company of Japan Limited