ISO/IEC 13818-2 Part 2: Video - Standard Text -

Available parts of the standard

Titelpage
Foreword
Contents
Introduction
Scope
Field of Applications
Relationships
Definitions
Bibliography
Annex

Internal Information

Main Referee:

Heiner Schomaker

State of Entry:

Incomplete

Last update:

Feb. 25, 1994

Primary Source / Published in:

Document No.:: ISO/IEC JTC 1/SC 29 N 635
Title:: Working Document for ISO/IEC CD 13818-2: Information technology -
Generic coding of moving pictures and associated audio information -
Part 2 : Video
[ISO/IEC JTC 1/SC 29/WG 11 N 635] Date: 1993-11-15

Document Parts

Titlepage:

INTERNATIONAL ORGANISATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29
CODING OF MOVING PICTURES AND ASSOCIATED AUDIO

ISO/IEC JTC1/SC29

WG11/602
November 1993, Seoul

INFORMATION TECHNOLOGY -

GENERIC CODING OF MOVING PICTURES AND ASSOCIATED AUDIO
Recommendation H.262
ISO/IEC 13818-2
Committee Draft

Draft of: November 5, 1993, 9:10

Foreword:

The ITU-T (the ITU Telecommunication Standardisation Sector) is a permanent organ of the International Telecommunications Union (ITU). The ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations on them with a view to developing telecommunication standards on a world-wide basis.

The World Telecommunication Standardisation Conference, which meets every four years, establishes the program of work arising from the review of existing questions and new questions among other things. The approval of new or revised Recommendations by members of the ITU-T is covered by the procedure laid down in the ITU-T Resolution No. 1 (Helsinki 1993). The proposal for Recommendation is accepted if 70% or more of the replies from members indicate approval.

ISO (the International Organisation for Standardisation) and IEC (the International Electrotechnical Commission) form the specialised system for world-wide standardisation. National Bodies that are members of ISO and IEC participate in the development of International Standards through technical committees established by the respective organisation to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organisations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.

In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC1. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires approval by at least 75% of the national bodies casting a vote.

This specification is a committee draft that is being submitted for approval to the ITU-T, ISO-IEC/JTC1 SC29. It was prepared jointly by SC29/WG11, also known as MPEG (Moving Pictures Expert Group), and the Experts Group for ATM Video Coding in the ITU-T SG15. MPEG was formed in 1988 to establish standards for coding of moving pictures and associated audio for various applications such as digital storage media, distribution and communication. The Experts Group for ATM Video Coding was formed in 1990 to develop video coding standard(s) appropriate for B-ISDN using ATM transport.

In this specification Annex A, Annex B and Annex C contain normative requirements and are an integral part of this specification. Annex D, Annex E, Annex F and Annex G are informative and contain no normative requirements.

ISO/IEC

This International Standard is published in four Parts.

13818-1 systems: specifies the system coding of the specification. It defines a multiplexed structure for combining audio and video data and means of representing the timing information needed to replay synchronised sequences in real-time.
13818-2 video: specifies the coded representation of video data and the decoding process required to reconstruct pictures.
13818-3 audio: specifies the coded representation of audio data.
13818-4 conformance: specifies the procedures for determining the characteristics of coded bitstreams and for testing compliance with the requirements stated in 13818-1, 13818-2 and 13818-3.

CONTENTS i
Foreword vii
I Introduction viii
- I.1 Purpose viii
- I.2 Application viii
- I.3 Profiles and levels viii
- I.4 The scalable and the non-scalable syntax ix
  - I.4.1 Overview of the non-scalable syntax ix
    - I.4.1.1 Temporal processing ix
    - I.4.1.2 Coding interlaced video x
    - I.4.1.3 Motion representation - macroblocks x
    - I.4.1.4 Spatial redundancy reduction x
    - I.4.1.5 Chroma formats x
  - I.4.2 Scalable extensions x
    - I.4.2.1 Spatial scalable extension xi
    - I.4.2.2 SNR scalable extension xii
      - I.4.2.3 Temporal scalable extension xii
      - I.4.2.4 Data partitioning extension xii
1 Scope 1
2 Normative references 1
3 Definitions 2
4 Abbreviations and symbols 7
- 4.1 Arithmetic operators 7
- 4.2 Logical operators 7
- 4.3 Relational operators 7
- 4.4 Bitwise operators 8
- 4.5 Assignment 8
- 4.6 Mnemonics 8
- 4.7 Constants 8
5 Conventions 9
- 5.1 Method of describing bitstream syntax 9
- 5.2 Definition of functions 10
  - 5.2.1 Definition of bytealigned() function 10
  - 5.2.2 Definition of nextbits() function 10
  - 5.2.3 Definition of next_start_code() function 10
- 5.3 Reserved, forbidden and marker_bit 10
- 5.4 Arithmetic precision 10
6 Video bitstream syntax and semantics 11
- 6.1 Structure of video data 11
  - 6.1.1 Video sequence 11
    - 6.1.1.1 Frame reordering 11
    - 6.1.1.2 Sequence header 12
    - 6.1.1.3 Group of pictures header 12
  - 6.1.2 Picture 12
    - 6.1.2.1 4:2:0 Format 13
    - 6.1.2.2 4:2:2 Format 14
    - 6.1.2.3 4:4:4 Format 16
    - 6.1.2.4 Picture Types 16
    - 6.1.2.5 Progressive and interlaced sequences 16
      - 6.1.2.5.1 Field pictures 16
      - 6.1.2.5.2 Frame pictures 17
  - 6.1.3 Slice 17
    - 6.1.3.1 The general slice structure 17
    - 6.1.3.2 Restricted slice structure 17
  - 6.1.4 Macroblock 18
  - 6.1.5 Block 19
- 6.2 Video bitstream syntax 20
  - 6.2.1 Start codes 20
  - 6.2.2 Video Sequence 21
    - 6.2.2.1 Sequence header 22
    - 6.2.2.2 Extension and user data 22
      - 6.2.2.2.1 Extension data 23
      - 6.2.2.2.2 User data 23
    - 6.2.2.3 Sequence extension 24
    - 6.2.2.4 Sequence display extension 24
    - 6.2.2.5 Sequence scalable extension 25
    - 6.2.2.6 Group of pictures header 25
  - 6.2.3 Picture header 26
    - 6.2.3.1 Picture coding extension 27
    - 6.2.3.2 Quant matrix extension 28
    - 6.2.3.3 Picture display extension 28
    - 6.2.3.4 Picture temporal scalable extension 28
    - 6.2.3.5 Picture spatial scalable extension 29
    - 6.2.3.6 Picture data 29
  - 6.2.4 Slice 30
  - 6.2.5 Macroblock 31
    - 6.2.5.1 Macroblock modes 31
    - 6.2.5.2 Motion vectors 32
      - 6.2.5.2.1 Motion vector 32
    - 6.2.5.3 Coded block pattern 32
  - 6.2.6 Block layer 33
- 6.3 Video bitstream semantics 34
  - 6.3.1 Semantic rules for higher syntactic structures 34
  - 6.3.2 Video sequence 35
  - 6.3.3 Sequence header 35
  - 6.3.4 Extension and user data 38
  - 6.3.5 Sequence extension 38
  - 6.3.6 Sequence display extension 40
  - 6.3.7 Quant matrix extension 44
  - 6.3.8 Sequence scalable extension 45
  - 6.3.9 Group of pictures header 46
  - 6.3.10 Picture header 47
  - 6.3.11 Picture Coding Extension 48
  - 6.3.12 Picture display extension 50
    - 6.3.12.1 Pan-scan 51
  - 6.3.13 Picture spatial scalable extension 52
  - 6.3.14 Picture temporal scalable extension 52
  - 6.3.15 Slice header 52
  - 6.3.16 Macroblock 53
  - 6.3.17 Block 56
7 The video decoding process 57
- 7.1 Higher syntactic structures 57
- 7.2 Variable length decoding 58
  - 7.2.1 DC intra coefficients 58
  - 7.2.2 Other coefficients 59
    - 7.2.2.1 Table selection 59
    - 7.2.2.2 First coefficient of a non-intra block 60
    - 7.2.2.3 Escape coding 60
    - 7.2.2.4 Summary 60
- 7.3 Inverse scan 60
  - 7.3.1 Inverse scan for matrix download 61
- 7.4 Inverse Quantisation 61
  - 7.4.1 Intra DC coefficient 62
  - 7.4.2 Other coefficients 62
    - 7.4.2.1 Weighting matrices 62
    - 7.4.2.2 Quantiser scale factor 63
    - 7.4.2.3 Reconstruction formulae 64
  - 7.4.3 Saturation 65
  - 7.4.4 Mismatch control 65
  - 7.4.5 Summary 65
- 7.5 Inverse DCT 66
  - 7.5.1 Non-coded blocks and skipped macroblocks 66
- 7.6 Motion compensation 66
  - 7.6.1 Prediction modes 67
  - 7.6.2 Prediction field and frame selection 68
    - 7.6.2.1 Field prediction 68
    - 7.6.2.2 Frame prediction 70
  - 7.6.3 Motion vectors 70
    - 7.6.3.1 Decoding the motion vectors 71
    - 7.6.3.2 Vector restrictions 72
    - 7.6.3.3 Updating motion vector predictors 72
    - 7.6.3.4 Resetting motion vector predictors 74
    - 7.6.3.5 Prediction in P-pictures 74
    - 7.6.3.6 Dual prime additional arithmetic 74
    - 7.6.3.7 Vectors for colour difference components 76
    - 7.6.3.8 Semantic restrictions concerning predictions 76
    - 7.6.3.9 Concealment motion vectors 76
  - 7.6.4 Forming predictions 77
  - 7.6.5 Motion vector selection 78
  - 7.6.6 Skipped Macroblocks 80
    - 7.6.6.1 P field-picture 80
    - 7.6.6.2 P frame-picture 81
    - 7.6.6.3 B field-picture 81
    - 7.6.6.4 B frame-picture 81
  - 7.6.7 Combining predictions 81
    - 7.6.7.1 Simple frame predictions 81
    - 7.6.7.2 Simple field predictions 81
    - 7.6.7.3 16x8 Motion compensation 81
    - 7.6.7.4 Dual prime 81
  - 7.6.8 Adding prediction and coefficient data 82
- 7.7 Spatial Scalability 83
  - 7.7.1 Prediction in scalable layer 83
  - 7.7.2 Formation of 'spatial' prediction 84
    - 7.7.2.1 General 84
    - 7.7.2.2 Deinterlacing 85
    - 7.7.2.3 Vertical resampling 86
    - 7.7.2.4 Horizontal resampling 86
    - 7.7.2.5 Chroma formats 87
    - 7.7.2.6 Generalised slice structure in the lower layer 87
  - 7.7.3 Selection and combination of spatial and temporal predictions 87
  - 7.7.4 Updating motion vector predictors and Motion vector selection 89
    - 7.7.4.1 Resetting motion vector predictors 91
  - 7.7.5 Skipped macroblocks 94
  - 7.7.6 Skipped pictures in the lower layer 94
- 7.8 SNR scalability 95
  - 7.8.1 Higher syntactic structures 96
  - 7.8.2 Macroblock 98
    - 7.8.2.1 dct_type 98
    - 7.8.2.2 Skipped Macroblocks 99
  - 7.8.3 Block 99
    - 7.8.3.1 VLC decoding 99
    - 7.8.3.2 Inverse scan 99
    - 7.8.3.3 Inverse quantisation 99
    - 7.8.3.4 Addition of coefficients from the two layers 99
    - 7.8.3.5 Remaining macroblock decoding steps 100
- 7.9 Temporal scalability 101
- 7.10 Data Partitioning 103
- 7.11 Hybrid scalability 104
8 Profiles and levels 106
- 8.1 Simple profile 107
  - 8.1.1 Simple profile syntax 107
    - 8.1.1.1 Picture coding type 107
    - 8.1.1.2 Chroma sampling structure 108
    - 8.1.1.3 Scalability 108
    - 8.1.1.4 Slice structure 108
  - 8.1.2 Main level 108
    - 8.1.2.1 Frame dimensions 108
    - 8.1.2.2 Coded data rate and VBV buffer size 108
    - 8.1.2.3 Vector range 108
    - 8.1.2.4 intra_dc_precision 108
- 8.2 Main profile 108
  - 8.2.1 Main profile syntax 109
    - 8.2.1.1 Chroma sampling structure 109
    - 8.2.1.2 Scalability 109
    - 8.2.1.3 Slice structure 109
  - 8.2.2 Low level 109
    - 8.2.2.1 Frame dimensions 109
    - 8.2.2.2 Coded data rate and VBV buffer size 109
    - 8.2.2.3 Vector range 109
    - 8.2.2.4 intra_dc_precision 109
  - 8.2.3 Main level 110
    - 8.2.3.1 Frame dimensions 110
    - 8.2.3.2 Coded data rate and VBV buffer size 110
    - 8.2.3.3 Vector range 110
    - 8.2.3.4 intra_dc_precision 110
  - 8.2.4 High-1440 level 110
    - 8.2.4.1 Frame dimensions 110
    - 8.2.4.2 Coded data rate and VBV buffer size 110
    - 8.2.4.3 Vector range 111
  - 8.2.5 High level 111
    - 8.2.5.1 Frame dimensions 111
    - 8.2.5.2 Coded data rate and VBV buffer size 111
    - 8.2.5.3 Vector range 111
- 8.3 SNR Scalable Profile 111
  - 8.3.1 SNR Scalable profile syntax 112
    - 8.3.1.1 Chroma sampling structure 112
    - 8.3.1.2 Slice structure 112
  - 8.3.2 Low level 112
    - 8.3.2.1 Frame dimensions 112
    - 8.3.2.2 Coded data rate and VBV buffer size 112
    - 8.3.2.3 Vector range 112
    - 8.3.2.4 intra_dc_precision 11
  - 8.3.3 Main level 112
    - 8.3.3.1 Frame dimensions 112
    - 8.3.3.2 Coded data rate and VBV buffer size 112
    - 8.3.3.3 Vector range 113
    - 8.3.3.4 intra_dc_precision 113
- 8.4 Spatially Scalable Profile 113
  - 8.4.1 Spatially Scalable profile syntax 113
    - 8.4.1.1 Chroma sampling structure 113
    - 8.4.1.2 Slice structure 113
  - 8.4.2 High-1440 level 113
    - 8.4.2.1 Frame dimensions 113
    - 8.4.2.2 Coded data rate and VBV buffer size 113
    - 8.4.2.3 Vector range 114
- 8.5 High profile 114
  - 8.5.1 High profile syntax 114
    - 8.5.1.1 Chroma sampling structure 114
    - 8.5.1.2 Slice structure 114
    - 8.5.1.3 Scalability 114
  - 8.5.2 Main level 114
    - 8.5.2.1 Frame dimensions 114
    - 8.5.2.2 Coded data rate and VBV buffer size 114
    - 8.5.2.3 Vector range 115
    - 8.5.2.4 intra_dc_precision 115
  - 8.5.3 High-1440 level 115
    - 8.5.3.1 Frame dimensions 115
    - 8.5.3.2 Coded data rate and VBV buffer size 115
    - 8.5.3.3 Vector range 115
  - 8.5.4 High level 115
    - 8.5.4.1 Frame dimensions 115
    - 8.5.4.2 Coded data rate and VBV buffer size 116
    - 8.5.4.3 Vector range 116
Annex A Discrete cosine transform 117
Annex B Variable length code tables 118
- B.1 Macroblock addressing 118
- B.2 Macroblock type 119
- B.3 Macroblock pattern 124
- B.4 Motion vectors 125
- B.5 DCT coefficients 126
Annex C Video buffering verifier 135
- C.1 Video buffering verifier 135
Annex D Features supported by the algorithm 139
- D.1 Overview 139
- D.2 Video Formats 139
  - D.2.1 Sampling Formats and Color 139
  - D.2.2 Movie Timing 139
  - D.2.3 Display Format Control 140
  - D.2.5 Transparent coding of composite video 140
- D.3 Picture Quality 140
- D.4 Data Rate Control 140
- D.5 Low Delay Mode 141
- D.6 Random Access/Channel Hopping 141
- D.7 Scalability 141
  - D.7.1 Use of SNR scalability at a single spatial resolution 141
    - D.7.1.1 Additional features 142
      - D.7.1.1.1 Error resilience 142
      - D.7.1.1.2 Chroma simulcast 142
    - D.7.1.2 SNR scalable encoding process 142
      - D.7.1.2.1 Description 142
      - D.7.1.2.2 A few important remarks 142
  - D.7.2 Multiple resolution scalability bitstreams using SNR scalability 143
    - D.7.2.1 Decoder Implementation 143
    - D.7.2.2 Encoder Implementation 143
  - D.7.3 Bitrate allocation in data partitioning 143
  - D.7.4 Temporal scalability 143
    - D.7.4.1 Progressive:progressive-to-progressive Temporal Scalability 144
    - D.7.4.2 Progressive:interlace-to-interlace temporal scalability 144
    - D.7.4.3 Interlace:interlace-to-interlace Temporal Scalability 146
  - D.7.5 Hybrids of the spatial, the SNR and the temporal ... 146
    - D.7.5.1 Spatial and SNR hybrid scalability applications 146
    - D.7.5.2 Spatial and temporal hybrid scalability applications 146
    - D.7.5.3 Temporal and SNR hybrid scalability applications 147
- D.8 Compatibility 147
  - D.8.1 Compatibility with higher and lower resolution formats 147
  - D.8.2 Compatibility with ISO/IEC IS 11172-2 (and ITU-T Rec. H.261) 147
- D.9 Complexity 147
  - D.9.1 Restrictions to reduce decoder implementation cost 147
- D.10 Editing Encoded Bit Streams 148
- D.11 Trick modes 148
- D.12 Error Resilience 149
  - D.12.1. Concealment possibilities 150
    - D.12.1.1 Temporal predictive concealment 150
      - D.12.1.1.1 Substitution from previous frame 150
      - D.12.1.1.2 Motion compensated concelament 151
      - D.12.1.1.3 Use of Intra MVs 151
    - D.12.1.2 Spatial predictive concelament 152
    - D.12.1.3 Layered coding to facilitate concealment 152
      - D.12.1.3.1 Use of data partitioning 153
      - D.12.1.3.2 Use of SNR scalable coding 153
      - D.12.1.3.3 Use of spatial scalable coding 153
      - D.12.1.3.4 Use of temporal scalable coding 153
  - D.12.2 Spatial localisation 154
    - D.12.2.1 Small slices 154
    - D.12.2.2 Adaptive slice size 155
  - D.12.3 Temporal localisation 155
    - D.12.3.1 Intra pictures 155
    - D.12.3.2 Intra slices 155
  - D.12.4 Summary 155
Annex E Profile and level restrictions 158
Annex F Patent statements 174
Annex G Bibliography 176

Introduction:

I.1 Purpose

This Part of this specification was developed in response to the growing need for a generic coding method of moving pictures and of associated sound for various applications such as digital storage media, television broadcasting and communication. The use of this specification means that motion video can be manipulated as a form of computer data and can be stored on various storage media, transmitted and received over existing and future networks and distributed on existing and future broadcasting channels.

I.2 Application

The applications of this specification cover, but are not limited to, such areas as listed below:

BSS: Broadcasting Satellite Service (to the home)
CATV: Cable TV Distribution on optical networks, copper, etc.
CDAD: Cable Digital Audio Distribution
DAB: Digital Audio Broadcasting (terrestrial and satellite broadcasting)
DTTB: Digital Terrestrial Television Broadcast
EC: Electronic Cinema
ENG: Electronic News Gathering (including SNG, Satellite News Gathering)
FSS: Fixed Satellite Service (e.g. to head ends)
HTT: Home Television Theatre
IPC: Interpersonal Communications (videoconferencing, videophone, etc.)
ISM: Interactive Storage Media (optical disks, etc.)
MMM: Multimedia Mailing
NCA: News and Current Affairs
NDB: Networked Database Services (via ATM, etc.)
RVS: Remote Video Surveillance
SSM: Serial Storage Media (digital VTR, etc.)

I.3 Profiles and levels

This specification is intended to be generic in the sense that it serves a wide range of applications, bit rates, resolutions, qualities and services. Applications should cover, among other things, digital storage media, television broadcasting and communications. In the course of creating this specification, various requirements from typical applications have been considered, necessary algorithmic elements have been developed, and they have been integrated into a single syntax. Hence this specification will facilitate the bitstream interchange among different applications.

Considering the practicality of implementing the full syntax of this specification, however, a limited number of subsets of the syntax are also stipulated by means of "profile" and "level". These and other related terms are formally defined in clause 3 of this specification.

A "profile" is a defined sub-set of the entire bitstream syntax that is defined by this specification. Within the bounds imposed by the syntax of a given profile it is still possible to require a very large variation in the performance of encoders and decoders depending upon the values taken by parameters in the bitstream. For instance it is possible to specify frame sizes as large as (approximately) 2[14] pels wide by 2[14] lines high. It is currently neither practical nor economic to implement a decoder capable of dealing with all possible frame sizes.

In order to deal with this problem "levels" are defined within each profile. A level is a defined set of constraints imposed on parameters in the bitstream. These constraints may be simple limits on numbers. Alternatively they may take the form of constraints on arithmetic combinations of the parameters (e.g. frame width multiplied by frame height multiplied by frame rate).

Bitstreams complying with this specification use a common syntax. In order to achieve a sub-set of the complete syntax flags and parameters are included in the bitstream that signal the presence or otherwise of syntactic elements that occur later in the bitstream. In order to specify constraints on the syntax (and hence define a profile) it is thus only necessary to constrain the values of these flags and parameters that specify the presence of later syntactic elements.

I.4 The scalable and the non-scalable syntax

The full syntax can be divided into two major categories: One is the non-scalable syntax, which is structured as a super set of the syntax defined in ISO/IEC 11172-2. The main feature of the non-scalable syntax is the extra compression tools for interlaced video signals. The second is the scalable syntax, the key property of which is to enable the reconstruction of useful video from pieces of a total bitstream. This is achieved by structuring the total bitstream in two or more layers, starting from a standalone base layer and adding a number of enhancement layers. The base layer can use the non-scalable syntax, or in some situations conform to the ISO/IEC 11172-2 syntax.

I.4.1 Overview of the non-scalable syntax

The coded representation defined in the non-scalable syntax achieves a high compression ratio while preserving good image quality. The algorithm is not lossless as the exact pixel values are not preserved during coding. The choice of the techniques is based on the need to balance a high image quality and compression ratio with the requirement to make random access to the coded bitstream. Obtaining good image quality at the bitrates of interest demands very high compression, which is not achievable with intra picture coding alone. The need for random access, however, is best satisfied with pure intra picture coding. This requires a careful balance between intra- and interframe coding and between recursive and non-recursive temporal redundancy reduction.

A number of techniques are used to achieve high compression. The algorithm first uses block-based motion compensation to reduce the temporal redundancy. Motion compensation is used both for causal prediction of the current picture from a previous picture, and for non-causal, interpolative prediction from past and future pictures. Motion vectors are defined for each 16-pixel by 16-line region of the picture. The difference signal, i.e., the prediction error, is further compressed using the discrete cosine transform (DCT) to remove spatial correlation before it is quantised in an irreversible process that discards the less important information. Finally, the motion vectors are combined with the residual DCT information, and encoded using variable length codes.

I.4.1.1 Temporal processing

Because of the conflicting requirements of random access and highly efficient compression, three main picture types are defined. Intra coded pictures (I-Pictures) are coded without reference to other pictures. They provide access points to the coded sequence where decoding can begin, but are coded with only moderate compression. Predictive coded pictures (P-Pictures) are coded more efficiently using motion compensated prediction from a past intra or predictive coded picture and are generally used as a reference for further prediction. Bidirectionally-predictive coded pictures (B-Pictures) provide the highest degree of compression but require both past and future reference pictures for motion compensation. Bidirectionally-predictive coded pictures are never used as references for prediction. The organisation of the three picture types in a sequence is very flexible. The choice is left to the encoder and will depend on the requirements of the application. Figure 0-1 illustrates the relationship among the three different picture types.

Figure 0-1 Example of temporal picture structure

I.4.1.2 Coding interlaced video

Each frame of interlaced video consists of two fields which are separated by one field-period. The specification allows either the frame to be encoded as picture or the two fields to be encoded as two pictures. Frame encoding or field encoding can be adaptively selected on a frame-by-frame basis. Frame encoding is typically preferred when the video scene contains significant detail with limited motion. Field encoding, in which the second field can be predicted from the first, works better when there is fast movement.

I.4.1.3 Motion representation - macroblocks

As in ISO/IEC 11172-2, the choice of 16 by 16 macroblocks for the motion-compensation unit is a result of the trade-off between the coding gain provided by using motion information and the overhead needed to store it. Each macroblock can be temporally predicted in one of a number of different ways. For example, in frame encoding, the prediction from the previous reference frame can itself be either frame-based or field-based. Depending on the type of the macroblock, motion vector information and other side information is encoded with the compressed prediction error signal in each macroblock. The motion vectors are encoded differentially with respect to the last encoded motion vectors using variable length codes. The maximum length of the vectors that may be represented can be programmed, on a picture-by-picture basis, so that the most demanding applications can be met without compromising the performance of the system in more normal situations.

It is the responsibility of the encoder to calculate appropriate motion vectors. The specification does not specify how this should be done.

I.4.1.4 Spatial redundancy reduction

Both original pictures and prediction error signals have high spatial redundancy. This specification uses a block-based DCT method with visually weighted quantisation and run-length coding. After motion compensated prediction or interpolation, the residual picture is split into 8 by 8 blocks. These are transformed into the DCT domain where they are weighted before being quantised. After quantisation many of the coefficients are zero in value and so two-dimensional run-length and variable length coding is used to encode the remaining coefficients efficiently.

I.4.1.5 Chroma formats

In addition to the 4:2:0 format supported in ISO/IEC 11172-2 this specification supports 4:2:2 and 4:4:4 chroma formats.

I.4.2 Scalable extensions

The scalability tools in this specification are designed to support applications beyond that supported by single layer video. Among the noteworthy applications areas addressed are video telecommunications, video on asynchronous transfer mode networks (ATM), interworking of video standards, video service hierarchies with multiple spatial, temporal and quality resolutions, HDTV with embedded TV, systems allowing migration to higher temporal resolution HDTV etc. Although a simple solution to scalable video is the simulcast technique which is based on transmission/storage of multiple independently coded reproductions of video, a more efficient alternative is scalable video coding, in which the bandwidth allocated to a given reproduction of video can be partially reutilised in coding of the next reproduction of video. In scalable video coding, it is assumed that given an encoded bitstream, decoders of various complexities can decode and display appropriate reproductions of coded video. A scalable video encoder is likely to have increased complexity when compared to a single layer encoder. However, this standard provides several different forms of scalabilities that address nonoverlapping applications with corresponding complexities. The basic scalability tools offered are: data partitioning, SNR scalability, spatial scalability and temporal scalability. Moreover, combinations of these basic scalability tools are also supported and are referred to as hybrid scalability. In the case of basic scalability, two layers of video referred to as the lower layer and the enhancement layer are allowed, whereas in hybrid scalability up to 3 layers are supported. The following tables provide a few example applications of various scalabilities.

Table 0-? Applications of SNR scalability

I.4.2.1 Spatial scalable extension

Spatial scalability is a tool intended for use in video applications involving telecommunications, interworking of video standards, video database browsing, interworking of HDTV and TV etc., i.e., video systems with the primary common feature that a minimum of two layers of spatial resolution are necessary. Spatial scalability involves generating two spatial resolution video layers from a single video source such that the lower layer is coded by itself to provide the basic spatial resolution and the enhancement layer employs the spatially interpolated lower layer and carries the full spatial resolution of the input video source. The lower and the enhancement layers may either both use the coding tools in this specification, or the ISO/IEC 11172-2 standard for the lower layer and this specification for the enhancement layer. The latter case achieves a further advantage by facilitating interworking between video coding standards. Moreover, spatial scalability offers flexibility in choice of video formats to be employed in each layer. An additional advantage of spatial scalability is its ability to provide resilience to transmission errors as the more important data of the lower layer can be sent over channel with better error performance, while the less critical enhancement layer data can be sent over a channel with poor error performance.

I.4.2.2 SNR scalable extension

SNR scalability is a tool intended for use in video applications involving telecommunications, video services with multiple qualities, standard TV and HDTV, i.e., video systems with the primary common feature that a minimum of two layers of video quality are necessary. SNR scalability involves generating two video layers of same spatial resolution but different video qualities from a single video source such that the lower layer is coded by itself to provide the basic video quality and the enhancement layer is coded to enhance the lower layer. The enhancement layer when added back to the lower layer regenerates a higher quality reproduction of the input video. The lower and the enhancement layers may either use this specification or ISO/IEC 11172-2 standard for the lower layer and this specification for the enhancement layer. An additional advantage of SNR scalability is its ability to provide high degree of resilience to transmission errors as the more important data of the lower layer can be sent over channel with better error performance, while the less critical enhancement layer data can be sent over a channel with poor error performance.

I.4.2.3 Temporal scalable extension

Temporal scalability is a tool intended for use in a range of diverse video applications from telecommunications to HDTV for which migration to higher temporal resolution systems from that of lower temporal resolution systems may be necessary. In many cases, the lower temporal resolution video systems may be either the existing systems or the less expensive early generation systems, with the motivation of introducing more sophisticated systems gradually. Temporal scalability involves partitioning of video frames into layers, whereas the lower layer is coded by itself to provide the basic temporal rate and the enhancement layer is coded with temporal prediction with respect to the lower layer, these layers when decoded and temporal multiplexed to yield full temporal resolution of the video source. The lower temporal resolution systems may only decode the lower layer to provide basic temporal resolution, whereas more sophisticated systems of the future may decode both layers and provide high temporal resolution video while maintaining interworking with earlier generation systems. An additional advantage of temporal scalability is its ability to provide resilience to transmission errors as the more important data of the lower layer can be sent over channel with better error performance, while the less critical enhancement layer can be sent over a channel with poor error performance.

I.4.2.4 Data partitioning extension

Data partitioning is a tool intended for use when two channels are available for transmission and/or storage of a video bitstream, as may be the case in ATM networks, terrestrial broadcast, magnetic media, etc. The bitstream is partitioned between these channels such that more critical parts of the bitstream (such as headers, motion vectors, DC coefficients) are transmitted in the channel with the better error performance, and less critical data (such as higher DCT coefficients) is transmitted in the channel with poor error performance. Thus, degradation to channel errors are minimised since the critical parts of a bitstream are better protected. Data from neither channel may be decoded on a decoder that is not intended for decoding data partitioned bitsreams.

Scope:

This Recommendation | International Standard specifies the coded representation of picture information for digital storage media and digital video communication and specifies the decoding process. The representation supports constant bitrate transmission, variable bitrate transmission, random access, channel hopping, scalable decoding, bitstream editing, as well as special functions such as fast forward playback, slow motion, pause and still pictures. This Recommendation | International Standard is compatible with ISO/IEC 11172-2 and upward or downwardm compatible with EDTV, HDTV, SDTV formats. This Recommendation | International Standard is primarily applicable to digital storage media, video broadcast and communication. The storage media may be directly connected to the decoder, or via communications means such as busses, LANs, or telecommunications links.

Field of Applications:

The applications of this specification cover, but are not limited to, such areas as listed below:

BSS: Broadcasting Satellite Service (to the home)
CATV: Cable TV Distribution on optical networks, copper, etc.
CDAD: Cable Digital Audio Distribution
DAB: Digital Audio Broadcasting (terrestrial and satellite broadcasting)
DTTB: Digital Terrestrial Television Broadcast
EC: Electronic Cinema
ENG: Electronic News Gathering (including SNG, Satellite News Gathering)
FSS: Fixed Satellite Service (e.g. to head ends)
HTT: Home Television Theatre
IPC: Interpersonal Communications (videoconferencing, videophone, etc.)
ISM: Interactive Storage Media (optical disks, etc.)
MMM: Multimedia Mailing
NCA: News and Current Affairs
NDB: Networked Database Services (via ATM, etc.)
RVS: Remote Video Surveillance
SSM: Serial Storage Media (digital VTR, etc.)

NOTE:From "Chapter I.2 Application"

Relationships to other Standards:

2 Normative references

The following ITU-T Recommendations and International Standards contain provisions which through reference in this text, constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent editions of the standards indicated below. Members of IEC and ISO maintain registers of currently valid International Standards. The TSB (Telecommunication Standardisation Bureau) maintains a list of currently valid ITU-T Recommendations.

Recommendations and reports of the CCIR, 1990 XVIIth Plenary Assembly, Dusseldorf,1990 Volume XI - Part 1 Broadcasting Service (Television) Rec. 601-2 "Encoding parameters of digital television for studios"
CCIR Volume X and XI Part 3 Recommendation 648: Recording of audio signals.
CCIR Volume X and XI Part 3 Report 955-2: Sound broadcasting by satellite for portable and mobile receivers, including Annex IV Summary description of advanced digital system II.
ISO/IEC 11172 (1993) "Information technology -- Coding of moving picture and associated audio for digital storage media at up to about 1.5 Mbit/s"
IEEE Standard Specifications for the Implementations of 8 by 8 Inverse Discrete Cosine Transform, IEEE Std 1180-1990, December 6, 1990.
IEC Publication 908:198, "CD Digital Audio System"
IEC Standard Publication 461 Second edition, 1986 "Time and control code for video tape recorders"
ITU-T Recommendation H.261 (1990) (Formerly CCITT Recommendation H.261) "Codec for audiovisual services at px64 kbit/s" Geneva, 1990
ISO/IEC 10918-1 | ITU-T Rec. T.81 (JPEG) "Digital compression and coding of continuous-tone still images"

Definitions:

For the purposes of this Recommendation | International Standard, the following definitions apply.

3.1 AC coefficient:

Any DCT coefficient for which the frequency in one or both dimensions is non-zero.

3.2 backward compatibility:

A new coding standard is backward compatible with an existing coding standard if existing decoders (designed to operate with the existing coding standard) are able to continue to operate by decoding all or part of a bitstream produced according to the new coding standard.

3.3 backward motion vector:

A motion vector that is used for motion compensation from a reference picture at a later time in display order.

3.4 bidirectionally predictive-coded picture; B-picture:

A picture that is coded using motion compensated prediction from past and/or future reference pictures.

3.5 bitrate:

The rate at which the compressed bitstream is delivered from the storage medium to the input of a decoder.

3.6 block:

An 8-row by 8-column matrix of pels, or 64 DCT coefficients (source, quantised or dequantised).

3.7 bottom field:

One of two fields that comprise a frame of interlaced video. Each line of a bottom field is spatially located immediately below the corresponding line of the top field.

3.8 byte aligned:

A bit in a coded bitstream is byte-aligned if its position is a multiple of 8-bits from the first bit in the stream.

3.9 byte:

Sequence of 8-bits.

3.10 channel:

A digital medium that stores or transports a bitstream constructed according to this specification.

3.11 chroma format:

Defines the number of chrominance blocks in a macroblock.

3.12 chroma simulcast:

A type of scalability (which is a subset of SNR scalability) where the enhancement layer (s) contain only coded refinement data for the DC coefficients, and all the data for the AC coefficients, of the chroma components.

3.13 chrominance (component):

A matrix, block or single pel representing one of the two colour difference signals related to the primary colours in the manner defined in the bitstream. The symbols used for the colour difference signals are Cr and Cb.

3.14 coded video bitstream:

A coded representation of a series of one or more pictures as defined in this specification.

3.15 coded order:

The order in which the pictures are stored and decoded. This order is not necessarily the same as the display order.

3.16 coded representation:

A data element as represented in its encoded form.

3.17 coding parameters:

The set of user-definable parameters that characterise a coded video bitstream. Bitstreams are characterised by coding parameters. Decoders are characterised by the bitstreams that they are capable of decoding.

3.18 component:

A matrix, block or single pel from one of the three matrices (luminance and two chrominance) that make up a picture.

3.19 compression:

Reduction in the number of bits used to represent an item of data.

3.20 constant bitrate coded video:

A compressed video bitstream with a constant average bitrate.

3.21 constant bitrate:

Operation where the bitrate is constant from start to finish of the compressed bitstream.

3.22 CRC:

Cyclic redundancy code.

3.23 data element:

An item of data as represented before encoding and after decoding.

3.24 data partitioning:

A method for dividing a bitstream into two separate bitstreams for error resilience purposes. the two bitstreams have to be recombined before decoding.

3.25 DC coefficient:

The DCT coefficient for which the frequency is zero in both dimensions.

3.26 DCT coefficient:

The amplitude of a specific cosine basis function.

3.27 decoder input buffer:

The first-in first-out (FIFO) buffer specified in the video buffering verifier.

3.28 decoder input rate:

The data rate specified in the video buffering verifier and encoded in the coded video bitstream.

3.29 decoder:

An embodiment of a decoding process.

3.30 decoding (process):

The process defined in this specification that reads an input coded bitstream and produces decoded pictures or audio samples.

3.31 dequantisation:

The process of rescaling the quantised DCT coefficients after their representation in the bitstream has been decoded and before they are presented to the inverse DCT.

3.32 digital storage media; DSM:

A digital storage or transmission device or system.

3.33 discrete cosine transform; DCT:

Either the forward discrete cosine transform or the inverse discrete cosine transform. The DCT is an invertible, discrete orthogonal transformation. The inverse DCT is defined in Annex A of this specification.

3.34 display order:

The order in which the decoded pictures are displayed. Normally this is the same order in which they were presented at the input of the encoder.

3.35 editing:

The process by which one or more compressed bitstreams are manipulated to produce a new compressed bitstream. Conforming edited bitstreams must meet the requirements defined in this specification.

3.36 encoder:

An embodiment of an encoding process.

3.37 encoding (process):

A process, not specified in this specification, that reads a stream of input pictures or audio samples and produces a valid coded bitstream as defined in this specification.

3.38 fast forward playback:

The process of displaying a sequence, or parts of a sequence, of pictures in display-order faster than real-time.

3.39 fast reverse playback:

The process of displaying the picture sequence in the reverse of display order faster than real-time..

3.40 field:

For an interlaced video signal, a "field" is the assembly of alternate lines of a frame. Therefore. an interlaced frame is composed of two fields a top field and a bottom field.

3.41 field period: The reciprocal of twice he frame rate.

3.42 flag:

A variable which can take one of only the two values defined in this specification.

3.43 forbidden:

The term "forbidden" when used in the clauses defining the coded bitstream indicates that the value shall never be used. This is usually to avoid emulation of start codes.

3.44 forced updating:

The process by which macroblocks are intra-coded from time-to-time to ensure that mismatch errors between the inverse DCT processes in encoders and decoders cannot build up excessively.

3.45 forward compatibility:

A new coding standard is forward compatible with an existing coding standard if new decoders (designed to operate with the new coding standard) continue to be able to decode bitstreams of the existing coding standard.

3.46 forward motion vector:

A motion vector that is used for motion compensation from a reference picture at an earlier time in display order.

3.47 frame:

A frame contains lines of spatial information of a video signal. For progressive video, these lines contain samples starting from one time instant and continuing through successive lines to the bottom of the frame. For interlaced video a frame consists of two fields, a top field and a bottom field. One of these fields will commence one field period later than the other.

3.48 frame period:

The reciprocal of the frame rate.

3.49 frame rate:

The rate at which frames are be output from the decoding process.

3.50 future reference picture:

A future reference picture is a reference picture that occurs at a later time than the current picture in display order.

3.51 header:

A block of data in the coded bitstream containing the coded representation of a number of data elements pertaining to the coded data that follow the header in the bitstream.

3.52 hybrid scalability:

Hybrid scalability is the combination of two (or more) types of scalability.

3.53 interlace:

The property of conventional television frames where alternating lines of the frame represent different instances in time.

3.54 intra coding:

Coding of a macroblock or picture that uses information only from that macroblock or picture.

3.55 intra-coded picture; I-picture:

A picture coded using information only from itself.

3.56 level :

A defined set of constraints on the values which may be taken by the parameters of this specification within a particular profile. A profile may contain one or more levels.

3.57 luminance (component):

A matrix, block or single pel representing a monochrome representation of the signal and related to the primary colours in the manner defined in the bitstream. The symbol used for luminance is Y.

3.58 macroblock:

The four 8 by 8 blocks of luminance data and the two (for 4:2:0 chroma format), four (for 4:2:2 chroma format) or eight (for 4:4:4 chroma format) corresponding 8 by 8 blocks of chrominance data coming from a 16 by 16 section of the luminance component of the picture. Macroblock is sometimes used to refer to the pel data and sometimes to the coded representation of the pel values and other data elements defined in the macroblock header of the syntax defined in this part of this specification. The usage is clear from the context.

3.59 motion compensation:

The use of motion vectors to improve the efficiency of the prediction of pel values. The prediction uses motion vectors to provide offsets into the past and/or future reference pictures containing previously decoded pel values that are used to form the prediction error signal.

3.60 motion estimation:

The process of estimating motion vectors during the encoding process.

3.61 motion vector:

A two-dimensional vector used for motion compensation that provides an offset from the coordinate position in the current picture to the coordinates in a reference picture.

3.62 non-intra coding:

Coding of a macroblock or picture that uses information both from itself and from macroblocks and pictures occurring at other times.

3.63 parameter:

A variable within the syntax of this specification which may take one of a large range of values. A variable which can take one of only two values is a flag and not a parameter.

3.64 past reference picture:

A past reference picture is a reference picture that occurs at an earlier time than the current picture in display order.

3.65 pel aspect ratio:

The ratio of the nominal vertical height of pel on the display to its nominal horizontal width.

3.66 pel:

Picture element.

3.67 picture:

Source, coded or reconstructed image data. A source or reconstructed picture consists of three rectangular matrices of 8-bit numbers representing the luminance and two chrominance signals. For progressive video, a picture is identical to a frame, while for interlaced video, a picture can refer to a frame, or the top field or the bottom field of the frame depending on the context.

3.68 prediction:

The use of a predictor to provide an estimate of the pel value or data element currently being decoded.

3.69 predictive-coded picture; P-picture:

A picture that is coded using motion compensated prediction from past reference pictures.

3.70 prediction error: The difference between the actual value of a pel or data element and its predictor.

3.71 predictor:

A linear combination of previously decoded pel values or data elements.

3.72 profile:

A defined sub-set of the syntax of this specification.

3.73 Note

In this specification the word "profile" is used as defined above. It should not be confused with other definitions of "profile" and in particular it does not have the meaning that is defined by JTC1/SGFS.

3.74 quantisation matrix:

A set of sixty-four 8-bit values used by the dequantiser.

3.75 quantised DCT coefficients:

DCT coefficients before dequantisation. A variable length coded representation of quantised DCT coefficients is stored as part of the compressed video bitstream.

3.76 quantiser scale:

A scale factor coded in the bitstream and used by the decoding process to scale the dequantisation.

3.77 random access:

The process of beginning to read and decode the coded bitstream at an arbitrary point.

3.78 reference picture:

Reference pictures are the nearest adjacent I- or P-pictures to the current picture in display order.

3.79 reserved:

The term "reserved" when used in the clauses defining the coded bitstream indicates that the value may be used in the future for ISO/IEC defined extensions.

3.80 scalability:

Scalability is the ability of a decoder to decode an ordered set of bitstreams to produce a reconstructed sequence. Moreover, useful video is output when subsets are decoded. The minimum subset that can thus be decoded is the first bitstream in the set which is called the base layer. Each of the other bitstreams in the set is called an enhancement layer. When addressing a specific enhancement layer, "lower layer" refer to the bitstream which precedes the enhancement layer.

3.81 side information:

Information in the bitstream necessary for controlling the decoder.

3.82 skipped macroblock:

A macroblock for which no data is encoded.

3.83 slice:

A series of macroblocks.

3.84 SNR scalability:

A type of scalability where the enhancement layer (s) contain only coded refinement data for the DCT coefficients. of the base layer.

3.85 spatial scalability:

A type of scalability where an enhancement layer also uses predictions from pel data derived from a lower layer without using motion vectors. The layers can have different frame sizes, frame rates or chroma formats

3.86 start codes [system and video]:

32-bit codes embedded in that coded bitstream that are unique. They are used for several purposes including identifying some of the structures in the coding syntax.

3.87 stuffing (bits); stuffing (bytes) :

Code-words that may be inserted into the compressed bitstream that are discarded in the decoding process. Their purpose is to increase the bitrate of the stream.

3.88 temporal scalability:

A type of scalability where an enhancement layer also uses predictions from pel data derived from a lower layer using motion vectors. The layers have identical frame size, and chroma formats, but can have different frame rates.

3.89 top field:

One of two fields that comprise a frame of interlaced video. Each line of a top field is spatially located immediately above the corresponding line of the bottom field.

3.90 variable bitrate:

Operation where the bitrate varies with time during the decoding of a compressed bitstream.

3.91 variable length coding; VLC:

A reversible procedure for coding that assigns shorter code-words to frequent events and longer code-words to less frequent events.

3.92 video buffering verifier; VBV:

A hypothetical decoder that is conceptually connected to the output of the encoder. Its purpose is to provide a constraint on the variability of the data rate that an encoder or editing process may produce.

3.93 video sequence:

A series of one or more pictures.

3.94 zig-zag scanning order:

A specific sequential ordering of the DCT coefficients from (approximately) the lowest spatial frequency to the highest.

Bibliography:

(This annex does not form an integral part of this Recommendation | International Standard)

1 Arun N. Netravali & Barry G. Haskell "Digital Pictures, representation and compression" Plenum Press, 1988

2 Didier Le Gall "MPEG: A Video Compression Standard for Multimedia Applications" Trans. ACM, April 1991

3 C Loeffler, A Ligtenberg, G S Moschytz "Practical fast 1-D DCT algorithms with 11 multiplications" Proceedings IEEE ICASSP-89, Vol. 2, pp 988-991, Feb. 1989

4 See the Normative Reference for ITU-R Rec 601 (formerly CCIR Rec 601)

5 See the Normative Reference for IEC Standard Publication 461

6 See the Normative Reference for ITU-T Rec. H.261

7 See the Normative reference for IEEE Standard Specification P1180-1990

8 ISO/IEC 10918-1 | ITU-T T.81 (JPEG)

9 E Viscito and C Gonzales "A Video Compression Algorithm with Adaptive Bit Allocation and Quantization", Proc SPIE Visual Communications and Image Proc '91 Boston MA November 10-15 Vol. 1605 205, 1991

10 A Puri and R Aravind "Motion Compensated Video Coding with Adaptive Perceptual Quantization", IEEE Trans. on Circuits and Systems for Video Technology, Vol. 1 pp 351 Dec. 1991.

11 C. Gonzales and E. Viscito, "Flexibly scalable digital video coding". Image Communications, Vol. 5, Nos. 1-2, February 1993

12 A.W.Johnson, T.Sikora and T.K. Tan, "Filters for Drift Reduction in Frequency Scalable Video Coding Schemes" <Transmitted for publication to Electronic Letters.>

13 R.Mokry and D.Anastassiou, "Minimal Error Drift in Frequency Scalability for Motion-Compensated DCT Coding". IEEE Transactions on Circuits and Systems for Video Technology, <accepted for publication>

14 K.N. Ngan, J. Arnold, T. Sikora, T.K. Tan and A.W. Johnson. "Frequency Scalability Experiments for MPEG-2 Standard". Asia-Pacific Conference on Communications, Korea, August 1993.

15 T. Sikora, T.K. Tan and K.N. Ngan, "A Performance Comparison of Frequency Domain Pyramid Scalable Coding Schemes Within the MPEG Framework". Proc. PCS, Picture Coding Symposium, Lausanne, pp. 16.1 - 16.2, Switzerland March 1993.

16 Masahiro Iwahashi, "Motion Compensation Technique for 2:1 Scaled-down Moving Pictures". 8-14, Picture Coding Symposium '93.

17 Sikora, T. and Pang, K., "Experiments with Optimal Block-Overlapping Filters for Cell Loss Concealment in Packet Video", Proc. IEEE Visual Signal Processing and Communications Workshop, Melbourne, 21-22 Sept. 1993, pp. 247-250.

18 A. Puri "Video Coding Using the MPEG-2 Compression Standard", <to appear> Proc SPIE Visual Communications and Image Proc '93 Boston MA November,1993.

19 A. Puri and A. Wong "Spatial Domain Resolution Scalable Video Coding", <to appear> Proc SPIE Visual Communications and Image Proc '93 Boston MA November,1993.

Annex:

Annex D: Features Supported by the algorithm
Annex F: Patent statements

Annex D: Features Supported by the algorithm

NOTE: This Annex gives an overview on the features supported by the MPEG II video algorithm. To get this part of the standard click here

Annex F: Patent statements

(This annex does not form an integral part of this Recommendation | International Standard)

The following table summarises the formal patent statements received and indicates the parts of the MPEG-2 standard to which the statement applies.

The list includes all the companies that previously submitted the informal statement, but if no "X" is present it means that no formal statement was received from that company.

Company V A S --------------------------------------------+-------+-------+------+ AT&T X X X BBC Research Department Bellcore X Belgian Science Policy Office X X X BOSCH X X X CCETT CSELT X David Sarnoff Research Center X X X Deutsche Thomson-Brandt GmbH X X X France Telecom CNET Fraunhofer Gesellschaft X X GC Technology Corporation X X X General Instruments Goldstar Hitachi, Ltd. International Business Machines Corporation X X X IRT X KDD X Massachusetts Institute of Technology X X X Matsushita Electric Industrial Co., Ltd. X X X Mitsubishi Electric Corporation National Transcommunications Limited NEC Corporation X Nippon Hoso Kyokai X Nippon Telegraph and Telephone X Nokia Research Center X Norwegian Telecom Research X Philips Consumer Electronics X X X OKI Qualcomm Incorporated X Royal PTT Nederland N.V., PTT Research (NL) X X X Samsung Electronics Scientific Atlanta X X X Siemens AG X Sharp Corporation Sony Corporation Texas Instruments Thomson Consumer Electronics Toshiba Corporation X TV/Com X X X Victor Company of Japan Limited

ISO/IEC 13818-2 Part 2: Video - Standard Text -

Available parts of the standard

Internal Information

Main Referee:

State of Entry:

Last update:

Primary Source / Published in:

Document Parts

INTERNATIONAL ORGANISATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO ISO/IEC JTC1/SC29

WG11/602

INFORMATION TECHNOLOGY - GENERIC CODING OF MOVING PICTURES AND ASSOCIATED AUDIO Recommendation H.262 ISO/IEC 13818-2 Committee Draft

I.1 Purpose

I.2 Application

I.3 Profiles and levels

I.4 The scalable and the non-scalable syntax

I.4.1 Overview of the non-scalable syntax

I.4.1.1 Temporal processing

I.4.1.2 Coding interlaced video

I.4.1.3 Motion representation - macroblocks

I.4.1.4 Spatial redundancy reduction

I.4.1.5 Chroma formats

I.4.2 Scalable extensions

I.4.2.1 Spatial scalable extension

I.4.2.2 SNR scalable extension

I.4.2.3 Temporal scalable extension

I.4.2.4 Data partitioning extension

INTERNATIONAL ORGANISATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29
CODING OF MOVING PICTURES AND ASSOCIATED AUDIO

ISO/IEC JTC1/SC29

INFORMATION TECHNOLOGY -

GENERIC CODING OF MOVING PICTURES AND ASSOCIATED AUDIO
Recommendation H.262
ISO/IEC 13818-2
Committee Draft