ISO/IEC 13818-2 Part 2: Video - Standard Text -
Available parts of the standard
Internal Information
Main Referee:
Heiner Schomaker
State of Entry:
Incomplete
Last update:
Feb. 25, 1994
Primary Source / Published in:
- Document No.:
- ISO/IEC JTC 1/SC 29 N 635
- Title:
- Working Document for ISO/IEC CD 13818-2: Information technology -
Generic coding of moving pictures and associated audio information -
Part 2 : Video
[ISO/IEC JTC 1/SC 29/WG 11 N 635] Date: 1993-11-15
Document Parts
INTERNATIONAL ORGANISATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29
CODING OF MOVING PICTURES AND ASSOCIATED AUDIO
- ISO/IEC JTC1/SC29
WG11/602
November 1993, Seoul
INFORMATION TECHNOLOGY -
GENERIC CODING OF MOVING PICTURES AND ASSOCIATED AUDIO
Recommendation H.262
ISO/IEC 13818-2
Committee Draft
Draft of: November 5, 1993, 9:10
The ITU-T (the ITU Telecommunication
Standardisation Sector) is a permanent organ of the
International Telecommunications Union (ITU). The ITU-T is
responsible for studying technical, operating and tariff
questions and issuing Recommendations on them with a view to
developing telecommunication standards on a world-wide
basis.
The World Telecommunication Standardisation
Conference, which meets every four years, establishes the
program of work arising from the review of existing questions
and new questions among other things. The approval of new or
revised Recommendations by members of the ITU-T is covered by
the procedure laid down in the ITU-T Resolution No. 1
(Helsinki 1993). The proposal for Recommendation is accepted
if 70% or more of the replies from members indicate
approval.
ISO (the International Organisation for
Standardisation) and IEC (the International Electrotechnical
Commission) form the specialised system for world-wide
standardisation. National Bodies that are members of ISO and
IEC participate in the development of International Standards
through technical committees established by the respective
organisation to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in
fields of mutual interest. Other international organisations,
governmental and non-governmental, in liaison with ISO and IEC,
also take part in the work.
In the field of information
technology, ISO and IEC have established a joint technical
committee, ISO/IEC JTC1. Draft International Standards adopted
by the joint technical committee are circulated to national
bodies for voting. Publication as an International Standard
requires approval by at least 75% of the national bodies
casting a vote.
This specification is a committee draft that
is being submitted for approval to the ITU-T, ISO-IEC/JTC1
SC29. It was prepared jointly by SC29/WG11, also known as MPEG
(Moving Pictures Expert Group), and the Experts Group for ATM
Video Coding in the ITU-T SG15. MPEG was formed in 1988 to
establish standards for coding of moving pictures and
associated audio for various applications such as digital
storage media, distribution and communication. The Experts
Group for ATM Video Coding was formed in 1990 to develop video
coding standard(s) appropriate for B-ISDN using ATM
transport.
In this specification Annex A, Annex B and Annex
C contain normative requirements and are an integral part of
this specification. Annex D, Annex E, Annex F and Annex G are
informative and contain no normative requirements.
ISO/IEC
This International Standard is published in
four Parts.
- 13818-1 systems
- specifies the system coding
of the specification. It defines a multiplexed structure for
combining audio and video data and means of representing the
timing information needed to replay synchronised sequences in
real-time.
- 13818-2 video
- specifies the coded
representation of video data and the decoding process required
to reconstruct pictures.
- 13818-3 audio
- specifies the
coded representation of audio data.
- 13818-4 conformance
- specifies the procedures for determining the
characteristics of
coded bitstreams and for testing compliance with the
requirements stated in 13818-1, 13818-2 and 13818-3.
- CONTENTS i
- Foreword vii
- I Introduction viii
- I.1 Purpose viii
- I.2 Application viii
- I.3 Profiles and levels viii
- I.4 The scalable and the non-scalable syntax ix
- I.4.1 Overview of the non-scalable syntax ix
- I.4.1.1 Temporal processing ix
- I.4.1.2 Coding interlaced video x
- I.4.1.3 Motion representation - macroblocks x
- I.4.1.4 Spatial redundancy reduction x
- I.4.1.5 Chroma formats x
- I.4.2 Scalable extensions x
- I.4.2.1 Spatial scalable extension xi
- I.4.2.2 SNR scalable extension xii
- I.4.2.3 Temporal scalable extension xii
- I.4.2.4 Data partitioning extension xii
- 1 Scope 1
- 2 Normative references 1
- 3 Definitions 2
- 4 Abbreviations and symbols 7
- 4.1 Arithmetic operators 7
- 4.2 Logical operators 7
- 4.3 Relational operators 7
- 4.4 Bitwise operators 8
- 4.5 Assignment 8
- 4.6 Mnemonics 8
- 4.7 Constants 8
- 5 Conventions 9
- 5.1 Method of describing bitstream syntax 9
- 5.2 Definition of functions 10
- 5.2.1 Definition of bytealigned() function 10
- 5.2.2 Definition of nextbits() function 10
- 5.2.3 Definition of next_start_code() function 10
- 5.3 Reserved, forbidden and marker_bit 10
- 5.4 Arithmetic precision 10
- 6 Video bitstream syntax and semantics 11
- 6.1 Structure of video data 11
- 6.1.1 Video sequence 11
- 6.1.1.1 Frame reordering 11
- 6.1.1.2 Sequence header 12
- 6.1.1.3 Group of pictures header 12
- 6.1.2 Picture 12
- 6.1.2.1 4:2:0 Format 13
- 6.1.2.2 4:2:2 Format 14
- 6.1.2.3 4:4:4 Format 16
- 6.1.2.4 Picture Types 16
- 6.1.2.5 Progressive and interlaced sequences 16
- 6.1.2.5.1 Field pictures 16
- 6.1.2.5.2 Frame pictures 17
- 6.1.3 Slice 17
- 6.1.3.1 The general slice structure 17
- 6.1.3.2 Restricted slice structure 17
- 6.1.4 Macroblock 18
- 6.1.5 Block 19
- 6.2 Video bitstream syntax 20
- 6.2.1 Start codes 20
- 6.2.2 Video Sequence 21
- 6.2.2.1 Sequence header 22
- 6.2.2.2 Extension and user data 22
- 6.2.2.2.1 Extension data 23
- 6.2.2.2.2 User data 23
- 6.2.2.3 Sequence extension 24
- 6.2.2.4 Sequence display extension 24
- 6.2.2.5 Sequence scalable extension 25
- 6.2.2.6 Group of pictures header 25
- 6.2.3 Picture header 26
- 6.2.3.1 Picture coding extension 27
- 6.2.3.2 Quant matrix extension 28
- 6.2.3.3 Picture display extension 28
- 6.2.3.4 Picture temporal scalable extension 28
- 6.2.3.5 Picture spatial scalable extension 29
- 6.2.3.6 Picture data 29
- 6.2.4 Slice 30
- 6.2.5 Macroblock 31
- 6.2.5.1 Macroblock modes 31
- 6.2.5.2 Motion vectors 32
- 6.2.5.2.1 Motion vector 32
- 6.2.5.3 Coded block pattern 32
- 6.2.6 Block layer 33
- 6.3 Video bitstream semantics 34
- 6.3.1 Semantic rules for higher syntactic structures 34
- 6.3.2 Video sequence 35
- 6.3.3 Sequence header 35
- 6.3.4 Extension and user data 38
- 6.3.5 Sequence extension 38
- 6.3.6 Sequence display extension 40
- 6.3.7 Quant matrix extension 44
- 6.3.8 Sequence scalable extension 45
- 6.3.9 Group of pictures header 46
- 6.3.10 Picture header 47
- 6.3.11 Picture Coding Extension 48
- 6.3.12 Picture display extension 50
- 6.3.13 Picture spatial scalable extension 52
- 6.3.14 Picture temporal scalable extension 52
- 6.3.15 Slice header 52
- 6.3.16 Macroblock 53
- 6.3.17 Block 56
- 7 The video decoding process 57
- 7.1 Higher syntactic structures 57
- 7.2 Variable length decoding 58
- 7.2.1 DC intra coefficients 58
- 7.2.2 Other coefficients 59
- 7.2.2.1 Table selection 59
- 7.2.2.2 First coefficient of a non-intra block 60
- 7.2.2.3 Escape coding 60
- 7.2.2.4 Summary 60
- 7.3 Inverse scan 60
- 7.3.1 Inverse scan for matrix download 61
- 7.4 Inverse Quantisation 61
- 7.4.1 Intra DC coefficient 62
- 7.4.2 Other coefficients 62
- 7.4.2.1 Weighting matrices 62
- 7.4.2.2 Quantiser scale factor 63
- 7.4.2.3 Reconstruction formulae 64
- 7.4.3 Saturation 65
- 7.4.4 Mismatch control 65
- 7.4.5 Summary 65
- 7.5 Inverse DCT 66
- 7.5.1 Non-coded blocks and skipped macroblocks 66
- 7.6 Motion compensation 66
- 7.6.1 Prediction modes 67
- 7.6.2 Prediction field and frame selection 68
- 7.6.2.1 Field prediction 68
- 7.6.2.2 Frame prediction 70
- 7.6.3 Motion vectors 70
- 7.6.3.1 Decoding the motion vectors 71
- 7.6.3.2 Vector restrictions 72
- 7.6.3.3 Updating motion vector predictors 72
- 7.6.3.4 Resetting motion vector predictors 74
- 7.6.3.5 Prediction in P-pictures 74
- 7.6.3.6 Dual prime additional arithmetic 74
- 7.6.3.7 Vectors for colour difference components 76
- 7.6.3.8 Semantic restrictions concerning predictions 76
- 7.6.3.9 Concealment motion vectors 76
- 7.6.4 Forming predictions 77
- 7.6.5 Motion vector selection 78
- 7.6.6 Skipped Macroblocks 80
- 7.6.6.1 P field-picture 80
- 7.6.6.2 P frame-picture 81
- 7.6.6.3 B field-picture 81
- 7.6.6.4 B frame-picture 81
- 7.6.7 Combining predictions 81
- 7.6.7.1 Simple frame predictions 81
- 7.6.7.2 Simple field predictions 81
- 7.6.7.3 16x8 Motion compensation 81
- 7.6.7.4 Dual prime 81
- 7.6.8 Adding prediction and coefficient data 82
- 7.7 Spatial Scalability 83
- 7.7.1 Prediction in scalable layer 83
- 7.7.2 Formation of 'spatial' prediction 84
- 7.7.2.1 General 84
- 7.7.2.2 Deinterlacing 85
- 7.7.2.3 Vertical resampling 86
- 7.7.2.4 Horizontal resampling 86
- 7.7.2.5 Chroma formats 87
- 7.7.2.6 Generalised slice structure in the lower layer 87
- 7.7.3 Selection and combination of spatial and temporal predictions 87
- 7.7.4 Updating motion vector predictors and Motion vector selection 89
- 7.7.4.1 Resetting motion vector predictors 91
- 7.7.5 Skipped macroblocks 94
- 7.7.6 Skipped pictures in the lower layer 94
- 7.8 SNR scalability 95
- 7.8.1 Higher syntactic structures 96
- 7.8.2 Macroblock 98
- 7.8.2.1 dct_type 98
- 7.8.2.2 Skipped Macroblocks 99
- 7.8.3 Block 99
- 7.8.3.1 VLC decoding 99
- 7.8.3.2 Inverse scan 99
- 7.8.3.3 Inverse quantisation 99
- 7.8.3.4 Addition of coefficients from the two layers 99
- 7.8.3.5 Remaining macroblock decoding steps 100
- 7.9 Temporal scalability 101
- 7.10 Data Partitioning 103
- 7.11 Hybrid scalability 104
- 8 Profiles and levels 106
- 8.1 Simple profile 107
- 8.1.1 Simple profile syntax 107
- 8.1.1.1 Picture coding type 107
- 8.1.1.2 Chroma sampling structure 108
- 8.1.1.3 Scalability 108
- 8.1.1.4 Slice structure 108
- 8.1.2 Main level 108
- 8.1.2.1 Frame dimensions 108
- 8.1.2.2 Coded data rate and VBV buffer size 108
- 8.1.2.3 Vector range 108
- 8.1.2.4 intra_dc_precision 108
- 8.2 Main profile 108
- 8.2.1 Main profile syntax 109
- 8.2.1.1 Chroma sampling structure 109
- 8.2.1.2 Scalability 109
- 8.2.1.3 Slice structure 109
- 8.2.2 Low level 109
- 8.2.2.1 Frame dimensions 109
- 8.2.2.2 Coded data rate and VBV buffer size 109
- 8.2.2.3 Vector range 109
- 8.2.2.4 intra_dc_precision 109
- 8.2.3 Main level 110
- 8.2.3.1 Frame dimensions 110
- 8.2.3.2 Coded data rate and VBV buffer size 110
- 8.2.3.3 Vector range 110
- 8.2.3.4 intra_dc_precision 110
- 8.2.4 High-1440 level 110
- 8.2.4.1 Frame dimensions 110
- 8.2.4.2 Coded data rate and VBV buffer size 110
- 8.2.4.3 Vector range 111
- 8.2.5 High level 111
- 8.2.5.1 Frame dimensions 111
- 8.2.5.2 Coded data rate and VBV buffer size 111
- 8.2.5.3 Vector range 111
- 8.3 SNR Scalable Profile 111
- 8.3.1 SNR Scalable profile syntax 112
- 8.3.1.1 Chroma sampling structure 112
- 8.3.1.2 Slice structure 112
- 8.3.2 Low level 112
- 8.3.2.1 Frame dimensions 112
- 8.3.2.2 Coded data rate and VBV buffer size 112
- 8.3.2.3 Vector range 112
- 8.3.2.4 intra_dc_precision 11
- 8.3.3 Main level 112
- 8.3.3.1 Frame dimensions 112
- 8.3.3.2 Coded data rate and VBV buffer size 112
- 8.3.3.3 Vector range 113
- 8.3.3.4 intra_dc_precision 113
- 8.4 Spatially Scalable Profile 113
- 8.4.1 Spatially Scalable profile syntax 113
- 8.4.1.1 Chroma sampling structure 113
- 8.4.1.2 Slice structure 113
- 8.4.2 High-1440 level 113
- 8.4.2.1 Frame dimensions 113
- 8.4.2.2 Coded data rate and VBV buffer size 113
- 8.4.2.3 Vector range 114
- 8.5 High profile 114
- 8.5.1 High profile syntax 114
- 8.5.1.1 Chroma sampling structure 114
- 8.5.1.2 Slice structure 114
- 8.5.1.3 Scalability 114
- 8.5.2 Main level 114
- 8.5.2.1 Frame dimensions 114
- 8.5.2.2 Coded data rate and VBV buffer size 114
- 8.5.2.3 Vector range 115
- 8.5.2.4 intra_dc_precision 115
- 8.5.3 High-1440 level 115
- 8.5.3.1 Frame dimensions 115
- 8.5.3.2 Coded data rate and VBV buffer size 115
- 8.5.3.3 Vector range 115
- 8.5.4 High level 115
- 8.5.4.1 Frame dimensions 115
- 8.5.4.2 Coded data rate and VBV buffer size 116
- 8.5.4.3 Vector range 116
- Annex A Discrete cosine transform 117
- Annex B Variable length code tables 118
- B.1 Macroblock addressing 118
- B.2 Macroblock type 119
- B.3 Macroblock pattern 124
- B.4 Motion vectors 125
- B.5 DCT coefficients 126
- Annex C Video buffering verifier 135
- C.1 Video buffering verifier 135
- Annex D Features supported by the algorithm 139
- D.1 Overview 139
- D.2 Video Formats 139
- D.2.1 Sampling Formats and Color 139
- D.2.2 Movie Timing 139
- D.2.3 Display Format Control 140
- D.2.5 Transparent coding of composite video 140
- D.3 Picture Quality 140
- D.4 Data Rate Control 140
- D.5 Low Delay Mode 141
- D.6 Random Access/Channel Hopping 141
- D.7 Scalability 141
- D.7.1 Use of SNR scalability at a single spatial resolution 141
- D.7.1.1 Additional features 142
- D.7.1.1.1 Error resilience 142
- D.7.1.1.2 Chroma simulcast 142
- D.7.1.2 SNR scalable encoding process 142
- D.7.1.2.1 Description 142
- D.7.1.2.2 A few important remarks 142
- D.7.2 Multiple resolution scalability bitstreams using SNR scalability 143
- D.7.2.1 Decoder Implementation 143
- D.7.2.2 Encoder Implementation 143
- D.7.3 Bitrate allocation in data partitioning 143
- D.7.4 Temporal scalability 143
- D.7.4.1 Progressive:progressive-to-progressive Temporal Scalability 144
- D.7.4.2 Progressive:interlace-to-interlace temporal scalability 144
- D.7.4.3 Interlace:interlace-to-interlace Temporal Scalability 146
- D.7.5 Hybrids of the spatial, the SNR and the temporal ... 146
- D.7.5.1 Spatial and SNR hybrid scalability applications 146
- D.7.5.2 Spatial and temporal hybrid scalability applications 146
- D.7.5.3 Temporal and SNR hybrid scalability applications 147
- D.8 Compatibility 147
- D.8.1 Compatibility with higher and lower resolution formats 147
- D.8.2 Compatibility with ISO/IEC IS 11172-2 (and ITU-T Rec. H.261) 147
- D.9 Complexity 147
- D.9.1 Restrictions to reduce decoder implementation cost 147
- D.10 Editing Encoded Bit Streams 148
- D.11 Trick modes 148
- D.12 Error Resilience 149
- D.12.1. Concealment possibilities 150
- D.12.1.1 Temporal predictive concealment 150
- D.12.1.1.1 Substitution from previous frame 150
- D.12.1.1.2 Motion compensated concelament 151
- D.12.1.1.3 Use of Intra MVs 151
- D.12.1.2 Spatial predictive concelament 152
- D.12.1.3 Layered coding to facilitate concealment 152
- D.12.1.3.1 Use of data partitioning 153
- D.12.1.3.2 Use of SNR scalable coding 153
- D.12.1.3.3 Use of spatial scalable coding 153
- D.12.1.3.4 Use of temporal scalable coding 153
- D.12.2 Spatial localisation 154
- D.12.2.1 Small slices 154
- D.12.2.2 Adaptive slice size 155
- D.12.3 Temporal localisation 155
- D.12.3.1 Intra pictures 155
- D.12.3.2 Intra slices 155
- D.12.4 Summary 155
- Annex E Profile and level restrictions 158
- Annex F Patent statements 174
- Annex G Bibliography 176
I.1 Purpose
This Part of this specification was developed in response to the growing need
for a generic coding method of moving pictures and of associated sound for
various applications such as digital storage media, television broadcasting and
communication. The use of this specification means that motion video can be
manipulated as a form of computer data and can be stored on various storage
media, transmitted and received over existing and future networks and
distributed on existing and future broadcasting channels.
I.2 Application
The applications of this specification cover, but are not limited to, such
areas as listed below:
- BSS
- Broadcasting Satellite Service (to the home)
- CATV
- Cable TV Distribution on optical networks, copper, etc.
- CDAD
- Cable Digital Audio Distribution
- DAB
- Digital Audio Broadcasting (terrestrial and satellite broadcasting)
- DTTB
- Digital Terrestrial Television Broadcast
- EC
- Electronic Cinema
- ENG
- Electronic News Gathering (including SNG, Satellite News Gathering)
- FSS
- Fixed Satellite Service (e.g. to head ends)
- HTT
- Home Television Theatre
- IPC
- Interpersonal Communications (videoconferencing, videophone, etc.)
- ISM
- Interactive Storage Media (optical disks, etc.)
- MMM
- Multimedia Mailing
- NCA
- News and Current Affairs
- NDB
- Networked Database Services (via ATM, etc.)
- RVS
- Remote Video Surveillance
- SSM
- Serial Storage Media (digital VTR, etc.)
I.3 Profiles and levels
This specification is intended to be generic in the sense that it serves a wide
range of applications, bit rates, resolutions, qualities and services.
Applications should cover, among other things, digital storage media,
television broadcasting and communications. In the course of creating this
specification, various requirements from typical applications have been
considered, necessary algorithmic elements have been developed, and they have
been integrated into a single syntax. Hence this specification will facilitate
the bitstream interchange among different applications.
Considering the practicality of implementing the full syntax of this
specification, however, a limited number of subsets of the syntax are also
stipulated by means of "profile" and "level". These and other related terms
are formally defined in clause 3 of this specification.
A "profile" is a defined sub-set of the entire bitstream syntax that is defined
by this specification. Within the bounds imposed by the syntax of a given
profile it is still possible to require a very large variation in the
performance of encoders and decoders depending upon the values taken by
parameters in the bitstream. For instance it is possible to specify frame
sizes as large as (approximately) 2[14] pels wide by 2[14] lines high. It is currently neither practical nor
economic to implement a decoder capable of dealing with all possible frame
sizes.
In order to deal with this problem "levels" are defined within each profile. A
level is a defined set of constraints imposed on parameters in the bitstream.
These constraints may be simple limits on numbers. Alternatively they may take
the form of constraints on arithmetic combinations of the parameters (e.g.
frame width multiplied by frame height multiplied by frame rate).
Bitstreams complying with this specification use a common syntax. In order to
achieve a sub-set of the complete syntax flags and parameters are included in
the bitstream that signal the presence or otherwise of syntactic elements that
occur later in the bitstream. In order to specify constraints on the syntax
(and hence define a profile) it is thus only necessary to constrain the values
of these flags and parameters that specify the presence of later syntactic
elements.
I.4 The scalable and the non-scalable syntax
The full syntax can be divided into two major categories: One is the
non-scalable syntax, which is structured as a super set of the syntax defined
in ISO/IEC 11172-2. The main feature of the non-scalable syntax is the extra
compression tools for interlaced video signals. The second is the scalable
syntax, the key property of which is to enable the reconstruction of useful
video from pieces of a total bitstream. This is achieved by structuring the
total bitstream in two or more layers, starting from a standalone base layer
and adding a number of enhancement layers. The base layer can use the
non-scalable syntax, or in some situations conform to the ISO/IEC 11172-2
syntax.
I.4.1 Overview of the non-scalable syntax
The coded representation defined in the non-scalable syntax achieves a high
compression ratio while preserving good image quality. The algorithm is not
lossless as the exact pixel values are not preserved during coding. The choice
of the techniques is based on the need to balance a high image quality and
compression ratio with the requirement to make random access to the coded
bitstream. Obtaining good image quality at the bitrates of interest demands
very high compression, which is not achievable with intra picture coding alone.
The need for random access, however, is best satisfied with pure intra picture
coding. This requires a careful balance between intra- and interframe coding
and between recursive and non-recursive temporal redundancy reduction.
A number of techniques are used to achieve high compression. The algorithm
first uses block-based motion compensation to reduce the temporal redundancy.
Motion compensation is used both for causal prediction of the current picture
from a previous picture, and for non-causal, interpolative prediction from past
and future pictures. Motion vectors are defined for each 16-pixel by 16-line
region of the picture. The difference signal, i.e., the prediction error, is
further compressed using the discrete cosine transform (DCT) to remove spatial
correlation before it is quantised in an irreversible process that discards the
less important information. Finally, the motion vectors are combined with the
residual DCT information, and encoded using variable length codes.
I.4.1.1 Temporal processing
Because of the conflicting requirements of random access and highly efficient
compression, three main picture types are defined. Intra coded pictures
(I-Pictures) are coded without reference to other pictures. They provide access
points to the coded sequence where decoding can begin, but are coded with only
moderate compression. Predictive coded pictures (P-Pictures) are coded more
efficiently using motion compensated prediction from a past intra or predictive
coded picture and are generally used as a reference for further prediction.
Bidirectionally-predictive coded pictures (B-Pictures) provide the highest
degree of compression but require both past and future reference pictures for
motion compensation. Bidirectionally-predictive coded pictures are never used
as references for prediction. The organisation of the three picture types in a
sequence is very flexible. The choice is left to the encoder and will depend on
the requirements of the application. Figure 0-1 illustrates the relationship
among the three different picture types.
Figure 0-1 Example of temporal picture structure
I.4.1.2 Coding interlaced video
Each frame of interlaced video consists of two fields which are separated by
one field-period. The specification allows either the frame to be encoded as
picture or the two fields to be encoded as two pictures. Frame encoding or
field encoding can be adaptively selected on a frame-by-frame basis. Frame
encoding is typically preferred when the video scene contains significant
detail with limited motion. Field encoding, in which the second field can be
predicted from the first, works better when there is fast movement.
I.4.1.3 Motion representation - macroblocks
As in ISO/IEC 11172-2, the choice of 16 by 16 macroblocks for the
motion-compensation unit is a result of the trade-off between the coding gain
provided by using motion information and the overhead needed to store it. Each
macroblock can be temporally predicted in one of a number of different ways.
For example, in frame encoding, the prediction from the previous reference
frame can itself be either frame-based or field-based. Depending on the type of
the macroblock, motion vector information and other side information is encoded
with the compressed prediction error signal in each macroblock. The motion
vectors are encoded differentially with respect to the last encoded motion
vectors using variable length codes. The maximum length of the vectors that may
be represented can be programmed, on a picture-by-picture basis, so that the
most demanding applications can be met without compromising the performance of
the system in more normal situations.
It is the responsibility of the encoder to calculate appropriate motion
vectors. The specification does not specify how this should be done.
I.4.1.4 Spatial redundancy reduction
Both original pictures and prediction error signals have high spatial
redundancy. This specification uses a block-based DCT method with visually
weighted quantisation and run-length coding. After motion compensated
prediction or interpolation, the residual picture is split into 8 by 8 blocks.
These are transformed into the DCT domain where they are weighted before being
quantised. After quantisation many of the coefficients are zero in value and so
two-dimensional run-length and variable length coding is used to encode the
remaining coefficients efficiently.
I.4.1.5 Chroma formats
In addition to the 4:2:0 format supported in ISO/IEC 11172-2 this specification
supports 4:2:2 and 4:4:4 chroma formats.
I.4.2 Scalable extensions
The scalability tools in this specification are designed to support
applications beyond that supported by single layer video. Among the noteworthy
applications areas addressed are video telecommunications, video on
asynchronous transfer mode networks (ATM), interworking of video standards,
video service hierarchies with multiple spatial, temporal and quality
resolutions, HDTV with embedded TV, systems allowing migration to higher
temporal resolution HDTV etc. Although a simple solution to scalable video is
the simulcast technique which is based on transmission/storage of multiple
independently coded reproductions of video, a more efficient alternative is
scalable video coding, in which the bandwidth allocated to a given reproduction
of video can be partially reutilised in coding of the next reproduction of
video. In scalable video coding, it is assumed that given an encoded
bitstream, decoders of various complexities can decode and display appropriate
reproductions of coded video. A scalable video encoder is likely to have
increased complexity when compared to a single layer encoder. However, this
standard provides several different forms of scalabilities that address
nonoverlapping applications with corresponding complexities. The basic
scalability tools offered are: data partitioning, SNR
scalability, spatial scalability and temporal
scalability. Moreover, combinations of these basic scalability tools
are also supported and are referred to as hybrid scalability. In the
case of basic scalability, two layers of video referred to as the lower
layer and the enhancement layer are allowed, whereas in hybrid
scalability up to 3 layers are supported. The following tables provide a
few example applications of various scalabilities.
Table 0-? Applications of SNR scalability
Lower layer | Enhancement layer | Application
----------------+-----------------------+-----------------------
ITU-R-601 | Same resolution and | Two quality service
| format as lower layer | for Standard TV
High Definition | Same resolution and | Two quality service
| format as lower layer | for HDTV
4:2:0 High De- | 4:2: chroma simulcast | Video production /
finition | | distribution
Table 0-?. Applications of spatial scalability
Base | Enhancement | Application
----------------+---------------+-------------------------------
prog (30Hz) | prog (30Hz) | CIF/SCIF compatibility or
| | scalability
interl (30Hz) | interl(30Hz) | HDTV/SDTV scalability
prog (30Hz) | interl(30Hz) | ISO/IEC 11172-2/compatibility
| | with this specification
interl (30Hz) | prog (60Hz) | Migration to HR prog HDTV
Table 0-?. Applications of temporal scalability
Base | Enhancement | Higher | Application
----------------+---------------+---------------+---------------
prog (30Hz) | prog (30Hz) | prog (60Hz) | Migration to
| | | HR prog HDTV
interl (30Hz) | interl(30Hz) | prog (60Hz) | Migration to
| | | HR prog HDTV
I.4.2.1 Spatial scalable extension
Spatial scalability is a tool intended for use in video applications involving
telecommunications, interworking of video standards, video database browsing,
interworking of HDTV and TV etc., i.e., video systems with the primary common
feature that a minimum of two layers of spatial resolution are necessary.
Spatial scalability involves generating two spatial resolution video layers
from a single video source such that the lower layer is coded by itself to
provide the basic spatial resolution and the enhancement layer employs the
spatially interpolated lower layer and carries the full spatial resolution of
the input video source. The lower and the enhancement layers may either both
use the coding tools in this specification, or the ISO/IEC 11172-2 standard for
the lower layer and this specification for the enhancement layer. The latter
case achieves a further advantage by facilitating interworking between video
coding standards. Moreover, spatial scalability offers flexibility in choice of
video formats to be employed in each layer. An additional advantage of spatial
scalability is its ability to provide resilience to transmission errors as the
more important data of the lower layer can be sent over channel with better
error performance, while the less critical enhancement layer data can be sent
over a channel with poor error performance.
I.4.2.2 SNR scalable extension
SNR scalability is a tool intended for use in video applications involving
telecommunications, video services with multiple qualities, standard TV and
HDTV, i.e., video systems with the primary common feature that a minimum of two
layers of video quality are necessary. SNR scalability involves generating two
video layers of same spatial resolution but different video qualities from a
single video source such that the lower layer is coded by itself to provide the
basic video quality and the enhancement layer is coded to enhance the lower
layer. The enhancement layer when added back to the lower layer regenerates a
higher quality reproduction of the input video. The lower and the enhancement
layers may either use this specification or ISO/IEC 11172-2 standard for the
lower layer and this specification for the enhancement layer. An additional
advantage of SNR scalability is its ability to provide high degree of
resilience to transmission errors as the more important data of the lower layer
can be sent over channel with better error performance, while the less critical
enhancement layer data can be sent over a channel with poor error performance.
I.4.2.3 Temporal scalable extension
Temporal scalability is a tool intended for use in a range of diverse video
applications from telecommunications to HDTV for which migration to higher
temporal resolution systems from that of lower temporal resolution systems may
be necessary. In many cases, the lower temporal resolution video systems may be
either the existing systems or the less expensive early generation systems,
with the motivation of introducing more sophisticated systems gradually.
Temporal scalability involves partitioning of video frames into layers, whereas
the lower layer is coded by itself to provide the basic temporal rate and the
enhancement layer is coded with temporal prediction with respect to the lower
layer, these layers when decoded and temporal multiplexed to yield full
temporal resolution of the video source. The lower temporal resolution systems
may only decode the lower layer to provide basic temporal resolution, whereas
more sophisticated systems of the future may decode both layers and provide
high temporal resolution video while maintaining interworking with earlier
generation systems. An additional advantage of temporal scalability is its
ability to provide resilience to transmission errors as the more important
data of the lower layer can be sent over channel with better error performance,
while the less critical enhancement layer can be sent over a channel with poor
error performance.
I.4.2.4 Data partitioning extension
Data partitioning is a tool intended for use when two channels are
available for transmission and/or storage of a video bitstream, as may be the
case in ATM networks, terrestrial broadcast, magnetic media, etc. The
bitstream is partitioned between these channels such that more critical parts
of the bitstream (such as headers, motion vectors, DC coefficients) are
transmitted in the channel with the better error performance, and less critical
data (such as higher DCT coefficients) is transmitted in the channel with poor
error performance. Thus, degradation to channel errors are minimised since the
critical parts of a bitstream are better protected. Data from neither channel
may be decoded on a decoder that is not intended for decoding data partitioned
bitsreams.
This Recommendation | International Standard specifies the coded representation
of picture information for digital storage media and digital video communication and
specifies the decoding process. The representation supports constant bitrate
transmission, variable bitrate transmission, random access, channel hopping, scalable
decoding, bitstream editing, as well as special functions such as fast forward playback,
slow motion, pause and still pictures. This Recommendation | International Standard is
compatible with ISO/IEC 11172-2 and upward or downwardm compatible with EDTV, HDTV, SDTV
formats. This Recommendation | International Standard is primarily applicable to digital
storage media, video broadcast and communication. The storage media may be directly
connected to the decoder, or via communications means such as busses, LANs, or
telecommunications links.
The applications of this specification cover, but are not limited to, such
areas as listed below:
- BSS
- Broadcasting Satellite Service (to the home)
- CATV
- Cable TV Distribution on optical networks, copper, etc.
- CDAD
- Cable Digital Audio Distribution
- DAB
- Digital Audio Broadcasting (terrestrial and satellite broadcasting)
- DTTB
- Digital Terrestrial Television Broadcast
- EC
- Electronic Cinema
- ENG
- Electronic News Gathering (including SNG, Satellite News Gathering)
- FSS
- Fixed Satellite Service (e.g. to head ends)
- HTT
- Home Television Theatre
- IPC
- Interpersonal Communications (videoconferencing, videophone, etc.)
- ISM
- Interactive Storage Media (optical disks, etc.)
- MMM
- Multimedia Mailing
- NCA
- News and Current Affairs
- NDB
- Networked Database Services (via ATM, etc.)
- RVS
- Remote Video Surveillance
- SSM
- Serial Storage Media (digital VTR, etc.)
NOTE:From "Chapter I.2 Application"
2 Normative references
The following ITU-T Recommendations and International Standards contain
provisions which through reference in this text,
constitute provisions of this Recommendation | International Standard. At the
time of
publication, the editions indicated were valid. All Recommendations and
Standards are
subject to revision, and parties to agreements based on this Recommendation |
International Standard are encouraged to investigate the possibility of
applying the most
recent editions of the standards indicated below. Members of IEC and ISO
maintain
registers of currently valid International Standards. The TSB
(Telecommunication
Standardisation Bureau) maintains a list of currently valid ITU-T
Recommendations.
- Recommendations and reports of the CCIR, 1990 XVIIth Plenary Assembly,
Dusseldorf,1990 Volume XI - Part 1 Broadcasting Service (Television)
Rec.
601-2 "Encoding parameters of digital television for studios"
- CCIR Volume X and XI Part 3 Recommendation 648: Recording of audio
signals.
- CCIR Volume X and XI Part 3 Report 955-2: Sound broadcasting by satellite
for portable and mobile receivers, including Annex IV Summary description of
advanced digital system II.
- ISO/IEC 11172 (1993) "Information technology -- Coding of moving picture
and associated audio for digital storage media at up to about 1.5 Mbit/s"
- IEEE Standard Specifications for the Implementations of 8 by 8 Inverse
Discrete Cosine Transform, IEEE Std 1180-1990, December 6, 1990.
- IEC Publication 908:198, "CD Digital Audio System"
- IEC Standard Publication 461 Second edition, 1986 "Time and control code
for video tape recorders"
- ITU-T Recommendation
H.261 (1990) (Formerly CCITT Recommendation H.261) "Codec for audiovisual
services at px64 kbit/s" Geneva, 1990
- ISO/IEC 10918-1 |
ITU-T Rec. T.81 (JPEG) "Digital compression and coding of continuous-tone
still images"
For the purposes of this Recommendation | International Standard, the following
definitions apply.
- 3.1 AC coefficient:
- Any DCT
coefficient for which the frequency in one or both dimensions is
non-zero.
- 3.2 backward compatibility:
- A
new coding standard is backward compatible with an existing coding standard if
existing decoders (designed to operate with the existing coding standard) are
able to continue to operate by decoding all or part of a bitstream produced
according to the new coding standard.
- 3.3 backward motion vector:
- A
motion vector that is used for motion compensation from a reference picture at
a later time in display order.
- 3.4 bidirectionally predictive-coded
picture; B-picture:
- A picture that is coded using motion compensated
prediction from past and/or future reference pictures.
- 3.5 bitrate:
- The rate at which
the compressed bitstream is delivered from the storage medium to the input of a
decoder.
- 3.6 block:
- An 8-row by 8-column
matrix of pels, or 64 DCT coefficients (source, quantised or dequantised).
- 3.7 bottom field:
- One of two
fields that comprise a frame of interlaced video. Each line of a bottom field
is spatially located immediately below the corresponding line of the top
field.
- 3.8 byte aligned:
- A bit in a
coded bitstream is byte-aligned if its position is a multiple of 8-bits from
the first bit in the stream.
- 3.9 byte:
- Sequence of 8-bits.
- 3.10 channel:
- A digital medium
that stores or transports a bitstream constructed according to this
specification.
- 3.11 chroma format:
- Defines
the number of chrominance blocks in a macroblock.
- 3.12 chroma simulcast:
- A type
of scalability (which is a subset of SNR scalability) where the enhancement
layer (s) contain only coded refinement data for the DC coefficients, and
all the data for the AC coefficients, of the chroma components.
- 3.13 chrominance (component):
- A
matrix, block or single pel representing one of the two colour difference
signals related to the primary colours in the manner defined in the bitstream.
The symbols used for the colour difference signals are Cr and Cb.
- 3.14 coded video bitstream:
- A
coded representation of a series of one or more pictures as defined in this
specification.
- 3.15 coded order:
- The order in
which the pictures are stored and decoded. This order is not necessarily the
same as the display order.
- 3.16 coded representation:
- A data element as represented in its encoded form.
- 3.17 coding parameters:
- The set of user-definable parameters that characterise a coded video
bitstream. Bitstreams are characterised by coding parameters. Decoders are
characterised by the bitstreams that they are capable of decoding.
- 3.18 component:
- A matrix, block
or single pel from one of the three matrices (luminance and two chrominance)
that make up a picture.
- 3.19 compression:
- Reduction in
the number of bits used to represent an item of data.
- 3.20 constant bitrate coded video:
- A compressed video bitstream with a constant average bitrate.
- 3.21 constant bitrate:
- Operation where the bitrate is constant from start to finish of the
compressed bitstream.
- 3.22 CRC:
- Cyclic redundancy
code.
- 3.23 data element:
- An
item of data as represented before encoding and after decoding.
- 3.24 data partitioning:
- A
method for dividing a bitstream into two separate bitstreams for error
resilience purposes. the two bitstreams have to be recombined before
decoding.
- 3.25 DC coefficient:
- The DCT
coefficient for which the frequency is zero in both dimensions.
- 3.26 DCT coefficient:
- The amplitude of a specific cosine basis function.
- 3.27 decoder input buffer:
- The
first-in first-out (FIFO) buffer specified in the video buffering
verifier.
- 3.28 decoder input rate:
- The
data rate specified in the video buffering verifier and encoded in the coded
video bitstream.
- 3.29 decoder:
- An embodiment of
a decoding process.
- 3.30 decoding (process):
- The
process defined in this specification that reads an input coded bitstream and
produces decoded pictures or audio samples.
- 3.31 dequantisation:
- The
process of rescaling the quantised DCT coefficients after their representation
in the bitstream has been decoded and before they are presented to the inverse
DCT.
- 3.32 digital storage media; DSM:
- A digital storage or transmission device or system.
- 3.33 discrete cosine transform;
DCT:
- Either the forward discrete cosine transform or the inverse
discrete cosine transform. The DCT is an invertible, discrete orthogonal
transformation. The inverse DCT is defined in Annex A of this
specification.
- 3.34 display order:
- The
order in which the decoded pictures are displayed. Normally this is the same
order in which they were presented at the input of the encoder.
- 3.35 editing:
- The process by
which one or more compressed bitstreams are manipulated to produce a new
compressed bitstream. Conforming edited bitstreams must meet the requirements
defined in this specification.
- 3.36 encoder:
- An embodiment of
an encoding process.
- 3.37 encoding (process):
- A process, not specified in this specification, that reads a stream of
input pictures or audio samples and produces a valid coded bitstream as defined
in this specification.
- 3.38 fast forward playback:
- The
process of displaying a sequence, or parts of a sequence, of pictures in
display-order faster than real-time.
- 3.39 fast reverse playback:
- The process of displaying the picture sequence in the reverse of display
order faster than real-time..
- 3.40 field:
- For an interlaced
video signal, a "field" is the assembly of alternate lines of a frame.
Therefore. an interlaced frame is composed of two fields a top field and a
bottom field.
- 3.41 field period: The reciprocal
of twice he frame rate.
- 3.42 flag:
- A variable which can
take one of only the two values defined in this specification.
- 3.43 forbidden:
- The term
"forbidden" when used in the clauses defining the coded bitstream indicates
that the value shall never be used. This is usually to avoid emulation of
start codes.
- 3.44 forced updating:
- The
process by which macroblocks are intra-coded from time-to-time to ensure that
mismatch errors between the inverse DCT processes in encoders and decoders
cannot build up excessively.
- 3.45 forward compatibility:
- A
new coding standard is forward compatible with an existing coding standard if
new decoders (designed to operate with the new coding standard) continue to be
able to decode bitstreams of the existing coding standard.
- 3.46 forward motion vector:
- A
motion vector that is used for motion compensation from a reference picture at
an earlier time in display order.
- 3.47 frame:
- A frame contains
lines of spatial information of a video signal. For progressive video, these
lines contain samples starting from one time instant and continuing through
successive lines to the bottom of the frame. For interlaced video a frame
consists of two fields, a top field and a bottom field. One of these fields
will commence one field period later than the other.
- 3.48 frame period:
- The
reciprocal of the frame rate.
- 3.49 frame rate:
- The rate at
which frames are be output from the decoding process.
- 3.50 future reference picture:
- A future reference picture is a reference picture that occurs at a later
time than the current picture in display order.
- 3.51 header:
- A block of data in
the coded bitstream containing the coded representation of a number of data
elements pertaining to the coded data that follow the header in the
bitstream.
- 3.52 hybrid scalability:
-
Hybrid scalability is the combination of two (or more) types of
scalability.
- 3.53 interlace:
- The property of
conventional television frames where alternating lines of the frame represent
different instances in time.
- 3.54 intra coding:
- Coding of a
macroblock or picture that uses information only from that macroblock or
picture.
- 3.55 intra-coded picture;
I-picture:
- A picture coded using information only from itself.
- 3.56 level :
- A defined set of
constraints on the values which may be taken by the parameters of this
specification within a particular profile. A profile may contain one or more
levels.
- 3.57 luminance (component):
- A
matrix, block or single pel representing a monochrome representation of the
signal and related to the primary colours in the manner defined in the
bitstream. The symbol used for luminance is Y.
- 3.58 macroblock:
- The four 8 by
8 blocks of luminance data and the two (for 4:2:0 chroma format), four (for
4:2:2 chroma format) or eight (for 4:4:4 chroma format) corresponding 8 by 8
blocks of chrominance data coming from a 16 by 16 section of the luminance
component of the picture. Macroblock is sometimes used to refer to the pel
data and sometimes to the coded representation of the pel values and other data
elements defined in the macroblock header of the syntax defined in this part of
this specification. The usage is clear from the context.
- 3.59 motion compensation:
- The
use of motion vectors to improve the efficiency of the prediction of pel
values. The prediction uses motion vectors to provide offsets into the past
and/or future reference pictures containing previously decoded pel values that
are used to form the prediction error signal.
- 3.60 motion estimation:
- The
process of estimating motion vectors during the encoding process.
- 3.61 motion vector:
- A
two-dimensional vector used for motion compensation that provides an offset
from the coordinate position in the current picture to the coordinates in a
reference picture.
- 3.62 non-intra coding:
- Coding
of a macroblock or picture that uses information both from itself and from
macroblocks and pictures occurring at other times.
- 3.63 parameter:
- A variable
within the syntax of this specification which may take one of a large range of
values. A variable which can take one of only two values is a flag and not a
parameter.
- 3.64 past reference picture:
- A
past reference picture is a reference picture that occurs at an earlier time
than the current picture in display order.
- 3.65 pel aspect ratio:
- The
ratio of the nominal vertical height of pel on the display to its nominal
horizontal width.
- 3.66 pel:
- Picture
element.
- 3.67 picture:
- Source,
coded or reconstructed image data. A source or reconstructed picture consists
of three rectangular matrices of 8-bit numbers representing the luminance and
two chrominance signals. For progressive video, a picture is identical to a
frame, while for interlaced video, a picture can refer to a frame, or the top
field or the bottom field of the frame depending on the context.
- 3.68 prediction:
- The use of a
predictor to provide an estimate of the pel value or data element currently
being decoded.
- 3.69 predictive-coded picture;
P-picture:
- A picture that is coded using motion compensated prediction
from past reference pictures.
- 3.70 prediction error: The
difference between the actual value of a pel or data element and its
predictor.
- 3.71 predictor:
- A linear
combination of previously decoded pel values or data elements.
- 3.72 profile:
- A defined sub-set
of the syntax of this specification.
- 3.73 Note
- In this specification the
word "profile" is used as defined above. It should not be confused with other
definitions of "profile" and in particular it does not have the meaning that is
defined by JTC1/SGFS.
- 3.74 quantisation matrix:
- A set
of sixty-four 8-bit values used by the dequantiser.
- 3.75 quantised DCT coefficients:
- DCT coefficients before dequantisation. A variable length coded
representation of quantised DCT coefficients is stored as part of the
compressed video bitstream.
- 3.76 quantiser scale:
- A scale
factor coded in the bitstream and used by the decoding process to scale the
dequantisation.
- 3.77 random access:
- The process
of beginning to read and decode the coded bitstream at an arbitrary
point.
- 3.78 reference picture:
- Reference pictures are the nearest adjacent I- or P-pictures to the current
picture in display order.
- 3.79 reserved:
- The term
"reserved" when used in the clauses defining the coded bitstream indicates that
the value may be used in the future for ISO/IEC defined extensions.
- 3.80 scalability:
- Scalability
is the ability of a decoder to decode an ordered set of bitstreams to produce
a reconstructed sequence. Moreover, useful video is output when subsets are
decoded. The minimum subset that can thus be decoded is the first bitstream
in the set which is called the base layer. Each of the other bitstreams in the
set is called an enhancement layer. When addressing a specific enhancement
layer, "lower layer" refer to the bitstream which precedes the enhancement
layer.
- 3.81 side information:
- Information in the bitstream necessary for controlling the decoder.
- 3.82 skipped macroblock:
- A
macroblock for which no data is encoded.
- 3.83 slice:
- A series of
macroblocks.
- 3.84 SNR scalability:
- A type
of scalability where the enhancement layer (s) contain only coded refinement
data for the DCT coefficients. of the base layer.
- 3.85 spatial scalability:
- A
type of scalability where an enhancement layer also uses predictions from pel
data derived from a lower layer without using motion vectors. The layers can
have different frame sizes, frame rates or chroma formats
- 3.86 start codes [system and
video]:
- 32-bit codes embedded in that coded bitstream that are unique.
They are used for several purposes including identifying some of the structures
in the coding syntax.
- 3.87 stuffing (bits); stuffing (bytes)
:
- Code-words that may be inserted into the compressed bitstream that
are discarded in the decoding process. Their purpose is to increase the
bitrate of the stream.
- 3.88 temporal scalability:
- A
type of scalability where an enhancement layer also uses predictions from pel
data derived from a lower layer using motion vectors. The layers have
identical frame size, and chroma formats, but can have different frame
rates.
- 3.89 top field:
- One of two
fields that comprise a frame of interlaced video. Each line of a top field is
spatially located immediately above the corresponding line of the bottom
field.
- 3.90 variable bitrate:
- Operation where the bitrate varies with time during the decoding of a
compressed bitstream.
- 3.91 variable length coding; VLC:
- A reversible procedure for coding that assigns shorter code-words to
frequent events and longer code-words to less frequent events.
- 3.92 video buffering verifier; VBV:
- A hypothetical decoder that is conceptually connected to the output of the
encoder. Its purpose is to provide a constraint on the variability of the data
rate that an encoder or editing process may produce.
- 3.93 video sequence:
- A series
of one or more pictures.
- 3.94 zig-zag scanning order:
- A
specific sequential ordering of the DCT coefficients from (approximately) the
lowest spatial frequency to the highest.
(This annex does not form an integral part of this Recommendation |
International Standard)
1 Arun N. Netravali & Barry G. Haskell "Digital Pictures, representation
and compression" Plenum Press, 1988
2 Didier Le Gall "MPEG: A Video Compression Standard for Multimedia
Applications" Trans. ACM, April 1991
3 C Loeffler, A Ligtenberg, G S Moschytz "Practical fast 1-D DCT algorithms
with 11 multiplications" Proceedings IEEE ICASSP-89, Vol. 2, pp 988-991, Feb.
1989
4 See the Normative Reference for ITU-R Rec 601 (formerly CCIR Rec 601)
5 See the Normative Reference for IEC Standard Publication 461
6 See the Normative Reference for ITU-T Rec. H.261
7 See the Normative reference for IEEE Standard Specification P1180-1990
8 ISO/IEC 10918-1 | ITU-T T.81 (JPEG)
9 E Viscito and C Gonzales "A Video Compression Algorithm with Adaptive Bit
Allocation and Quantization", Proc SPIE Visual Communications and Image Proc
'91 Boston MA November 10-15 Vol. 1605 205, 1991
10 A Puri and R Aravind "Motion Compensated Video Coding with Adaptive
Perceptual Quantization", IEEE Trans. on Circuits and Systems for Video
Technology, Vol. 1 pp 351 Dec. 1991.
11 C. Gonzales and E. Viscito, "Flexibly scalable digital video coding". Image
Communications, Vol. 5, Nos. 1-2, February 1993
12 A.W.Johnson, T.Sikora and T.K. Tan, "Filters for Drift Reduction in
Frequency Scalable Video Coding Schemes" <Transmitted for publication to
Electronic Letters.>
13 R.Mokry and D.Anastassiou, "Minimal Error Drift in Frequency Scalability for
Motion-Compensated DCT Coding". IEEE Transactions on Circuits and Systems for
Video Technology, <accepted for publication>
14 K.N. Ngan, J. Arnold, T. Sikora, T.K. Tan and A.W. Johnson. "Frequency
Scalability Experiments for MPEG-2 Standard". Asia-Pacific Conference on
Communications, Korea, August 1993.
15 T. Sikora, T.K. Tan and K.N. Ngan, "A Performance Comparison of Frequency
Domain Pyramid Scalable Coding Schemes Within the MPEG Framework". Proc. PCS,
Picture Coding Symposium, Lausanne, pp. 16.1 - 16.2, Switzerland March 1993.
16 Masahiro Iwahashi, "Motion Compensation Technique for 2:1 Scaled-down Moving
Pictures". 8-14, Picture Coding Symposium '93.
17 Sikora, T. and Pang, K., "Experiments with Optimal Block-Overlapping Filters
for Cell Loss Concealment in Packet Video", Proc. IEEE Visual Signal Processing
and Communications Workshop, Melbourne, 21-22 Sept. 1993, pp. 247-250.
18 A. Puri "Video Coding Using the MPEG-2 Compression Standard", <to
appear> Proc SPIE Visual Communications and Image Proc '93 Boston MA
November,1993.
19 A. Puri and A. Wong "Spatial Domain Resolution Scalable Video Coding",
<to appear> Proc SPIE Visual Communications and Image Proc '93 Boston MA
November,1993.
- Annex D: Features Supported by the algorithm
- Annex F: Patent statements
NOTE: This Annex gives an overview on the features supported
by the MPEG II video algorithm.
To get this part of the standard click here
(This annex does not form an integral part of this Recommendation |
International Standard)
The following table summarises the formal patent statements received and
indicates the parts of the MPEG-2 standard to which the statement applies.
The list includes all the companies that previously submitted the informal
statement, but if no "X" is present it means that no formal statement was
received from that company.
Company V A S
--------------------------------------------+-------+-------+------+
AT&T X X X
BBC Research Department
Bellcore X
Belgian Science Policy Office X X X
BOSCH X X X
CCETT
CSELT X
David Sarnoff Research Center X X X
Deutsche Thomson-Brandt GmbH X X X
France Telecom CNET
Fraunhofer Gesellschaft X X
GC Technology Corporation X X X
General Instruments
Goldstar
Hitachi, Ltd.
International Business Machines Corporation X X X
IRT X
KDD X
Massachusetts Institute of Technology X X X
Matsushita Electric Industrial Co., Ltd. X X X
Mitsubishi Electric Corporation
National Transcommunications Limited
NEC Corporation X
Nippon Hoso Kyokai X
Nippon Telegraph and Telephone X
Nokia Research Center X
Norwegian Telecom Research X
Philips Consumer Electronics X X X
OKI
Qualcomm Incorporated X
Royal PTT Nederland N.V., PTT Research (NL) X X X
Samsung Electronics
Scientific Atlanta X X X
Siemens AG X
Sharp Corporation
Sony Corporation
Texas Instruments
Thomson Consumer Electronics
Toshiba Corporation X
TV/Com X X X
Victor Company of Japan Limited