Date: 2025-08-08
ISO/IEC DIS 21794-2:2025(en)
ISO/IEC JTC1/SC 29/WG 01
Secretariat: JISC
Information technology —Plenoptic image coding system (JPEG Pleno) — Part 2: Light field coding
Technologies de l'information — Système de codage d'images plénoptiques (JPEG Pleno) — Partie 2: Titre manque
© ISO/IEC 2025
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types of documents should be noted. This document was drafted in accordance with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
This 2nd edition of ISO/IEC 21794-2 ("Plenoptic image coding system (JPEG Pleno) Part 2: Light field coding") integrates AMD1 of ISO/IEC 21794-2 (“Profiles and levels for JPEG Pleno Light Field Coding”) and includes the specification of an additional coding mode, entitled Slanted 4D transform mode and its associated profile.
ISO/IEC 21794-2 ("Plenoptic image coding system (JPEG Pleno) Part 2: Light field coding") specifies two coding modes: the 4D transform mode and the 4D prediction mode. The 4D prediction mode is efficient for light fields of all baselines but depends on the availability of accurate depth information. The 4D transform mode, although not relying on any sort of depth information, is only efficient for coding narrow baseline light fields. The Slanted 4D transform mode, based on 4D transformations, is efficient for light fields with both narrow and wide baselines and does not rely on the availability of depth information.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Details of any patent rights identified during the development of the document will be in the Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents) or the IEC list of patent declarations received (see patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions related to conformity assessment, as well as information about ISO's adherence to the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
A list of all parts in the ISO/IEC 21794 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A complete listing of these bodies can be found at www.iso.org/members.html and www.iec.ch/national-committees.
This document is part of a series of standards for a system known as JPEG Pleno. This document defines the JPEG Pleno framework. It facilitates the capture, representation, exchange and visualization of plenoptic imaging modalities. A plenoptic image modality can be a light field, point cloud or hologram, which are sampled representations of the plenoptic function in the form of, respectively, a vector function that represents the radiance of a discretized set of light rays, a collection of points with position and attribute information, or a complex wavefront. The plenoptic function describes the radiance in time and in space obtained by positioning a pinhole camera at every viewpoint in 3D spatial coordinates, every viewing angle and every wavelength, resulting in a 7D function.
JPEG Pleno specifies tools for coding these modalities while providing advanced functionality at system level, such as support for data and metadata manipulation, editing, random access and interaction, protection of privacy and ownership rights.
Information technology —Plenoptic image coding system (JPEG Pleno)— Part 2: Light field coding
1.0 Scope
This document specifies a coded codestream format for storage of light field modalities as well as associated metadata descriptors that are light field modality specific. This document also provides information on the encoding tools.
2.0 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
ITU-T Rec. T.800 | ISO/IEC 15444‑1, Information technology — JPEG 2000 image coding system — Part 1: Core coding system
ITU-T Rec. T.801 | ISO/IEC 15444‑2, Information technology — JPEG 2000 image coding system — Part 2: Extensions
ISO/IEC 21794‑1:2020, Information technology — Plenoptic image coding system (JPEG Pleno) — Part 1: Framework
ISO/IEC 60559, Information technology — Microprocessor Systems — Floating-Point arithmetic
ITU-T T.81 | ISO/IEC 10918-1, Information technology – Digital compression and coding of continuous-tone still images – Requirements and guidelines
ITU-T Rec. T.84 | ISO/IEC 10918-3, Information technology – Digital compression and coding of continuous-tone still images: Extensions
3.0 Terms and definitions
For the purposes of this document the terms and definitions given in ISO/IEC 21794-1 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https://www.iso.org/obp
— IEC Electropedia: available at http://www.electropedia.org/
3.1
arithmetic coder
entropy coder that converts variable length strings to variable length codes (encoding) and vice versa (decoding)
3.2
bit-plane
two-dimensional array of bits
3.3
4D bit-plane
four-dimensional array of bits
3.4
coefficient
numerical value that is the result of a transformation or linear regression
3.5
compression
reduction in the number of bits used to represent source image data
3.6
depth
distance of a point in 3D space to the camera plane
3.7
disparity view
image that for each pixel of the subaperture view contains the apparent pixel shift between two subaperture views along either horizontal or vertical axis
3.8
hexadeca-tree
division of a 4D region into 16 (sixteen) 4D subregions
3.9
pixel
collection of sample values in the spatial image domain having all the same sample coordinates
EXAMPLE A pixel may consist of three samples describing its red, green and blue value.
3.10
plenoptic function
amount of radiance in time and in space by positioning a pinhole camera at every viewpoint in 3D spatial coordinates, every viewing angle and every wavelength, resulting in a 7D representation
3.11
reference view
subaperture view that is used as one of the references to generate the intermediate views
3.12
subaperture view
subaperture image
image taken of the 3D scene by a pinhole camera positioned at a particular viewpoint and viewing angle
3.13
texture
pixel attributes
EXAMPLE Colour information, opacity, etc.
3.14
transform
transformation
mathematical mapping from one signal space to another
4.0 Symbols and abbreviated terms
4.1 Symbols
Codestream_Body() | coded image data in the codestream without Codestream_Header() |
Codestream_Header() | codestream header preceding the image data in the codestream |
decoded normalized disparity value at view | |
normalized disparity value at view | |
pointer to contiguous codestream for normalized disparity view | |
scaling parameter to translate quantized normalized disparity maps to positive range | |
DCODEC | disparity view codec type |
f | focal length |
fixed-weight merging parameter for view | |
view hierarchy value for view | |
horizontal camera centre coordinate for view | |
binary value defining the availability of a normalized disparity view | |
Lagrangian encoding cost | |
Lagrangian encoding cost of spatial partitioning | |
Lagrangian encoding cost of view partitioning | |
sparse filter regressor mask of texture component | |
LightField() | JPEG Pleno light field codestream |
quantized least-squares merging weight of texture component | |
MIDV | absolute value of the minimum value over all quantized normalized disparity views |
view merging mode for intermediate view | |
sparse filter order for view | |
number of least-squares merging coefficients for intermediate view | |
regressor template size parameter for sparse filter for view | |
NC | number of components in an image |
number of intermediate views | |
number of reference normalized disparity views | |
number of normalized disparity reference views for intermediate view | |
number of texture reference views for intermediate view | |
number of reference views | |
number of prediction residual views | |
total available number of regressors for sparse filter | |
Plev | level a particular codestream complies to |
Ppih | profile a particular codestream complies to |
2D image of dimensions | |
Q | normalized disparity quantization parameter |
R | rate or bitrate, expressed in bit per sample |
RCODEC | prediction residual view codec type |
RDATA | array of bytes containing for a single prediction residual view the RCODEC codestream after header information has been stripped |
RENCODING | array of bytes containing for a single prediction residual view the full DCODEC codestream |
RGB | colour data for the red, green and blue colour component of a pixel |
RHEADER | array of bytes containing for a single prediction residual view the header information from the RCODEC codestream |
pointer to contiguous codestream for prediction residual view | |
coordinate of the addressed subaperture image along the s-axis | |
size of the light field image along the s-axis (COLUMNS) | |
subscript of the column index of the reference view, | |
subscript of the column index of the reference normalized disparity view, | |
binary variable, determines if sparse filter is used (true) or not (false) | |
quantized sparse filter coefficients of texture component | |
de-quantized sparse filter coefficients of texture component | |
t | coordinate of the addressed subaperture image along the t-axis |
T | size of the light field image along the t-axis (ROWS) |
subscript of the row index of the reference view, | |
subscript of the row index of the reference normalized disparity view, | |
view coordinate subscripts for normalized disparity view | |
view coordinate subscripts for reference view | |
view coordinate subscripts for intermediate view | |
4D block dimensions at the 4D block partitioning stage | |
4D block dimensions at the bit-plane hexadeca-tree decomposition stage | |
TCODEC | reference view codec type |
TDATA | array of bytes, containing for a single reference view, the TCODEC codestream, after header information has been stripped |
TENCODING | array of bytes, containing for a single reference view the full TCODEC codestream |
THEADER | array of bytes, containing for a single reference view the header information from the TCODEC codestream |
pointer to contiguous codestream for reference view | |
u | sample coordinate along the u-axis within the addressed subaperture image |
U | size of the subaperture image along the u-axis (WIDTH) |
v | sample coordinate along the v-axis within the addressed subaperture image |
V | size of the subaperture image along the v-axis (HEIGHT) |
vertical camera centre coordinate for view | |
view prediction parameters for intermediate view | |
texture value at view | |
decoded texture value at view | |
result of warping the texture view | |
horizontal distance between a pair of camera centres | |
vertical distance between a pair of camera centres | |
YCbCr | colour data for the luminance, the blue chrominance and the red chrominance component of a pixel |
depth value at view | |
distance based merging weight for reference view | |
distance based factor, used for defining the merging weight, at intermediate view | |
binary matrix, defining the locations of the non-zero merging weights in merging weight matrix | |
de-quantized least-squares merging weight of texture component | |
sparse filter coefficients at intermediate view | |
merging weight matrix for intermediate view | |
locations of the non-zero elements of | |
regressor template at pixel location | |
set of reference normalized disparity views for intermediate view | |
set of occluded pixels, which remain to be inpainted, during normalized disparity view synthesis at intermediate view | |
set of occluded pixels, which remain to be inpainted, during texture view synthesis at intermediate view | |
set of reference views for intermediate view |
4.1.1 Abbreviated terms
2D | two dimensional |
3D | three dimensional |
4D | four dimensional |
DCT | discrete cosine transform |
EPI | Epipolar Plane Image |
floating point | floating point notation as specified in ISO/IEC 60559 |
HTTP | hypertext transfer protocol |
IDCT | inverse DCT |
IPR | intellectual property rights |
IV | Intermediate view; subaperture view that is generated from surrounding reference view(s) |
JPEG | Joint Photographic Experts Group |
JPL | JPEG Pleno file format |
LSB | least significant bit |
MSB | most significant bit |
R-D | rate-distortion |
RV | reference view |
URL | uniform resource locator |
XML | eXtensible Markup Language |
5.0 Conventions
5.1 Naming conventions for numerical values
Integer numbers are expressed as bit patterns, hexadecimal values or decimal numbers. Bit patterns and hexadecimal values have both a numerical value and an associated particular length in bits.
Hexadecimal notation, indicated by prefixing the hexadecimal number by "0x", may be used instead of binary notation to denote a bit pattern having a length that is an integer multiple of 4. For example, 0x41 represents an eight-bit pattern having only its second most significant bit and its least significant bit equal to 1. Numerical values that are specified under a "Code" heading in tables that are referred to as "code tables" are bit pattern values (specified as a string of digits equal to 0 or 1 in which the left-most bit is considered the most-significant bit). Other numerical values not prefixed by "0x" are decimal values. When used in expressions, a hexadecimal value is interpreted as having a value equal to the value of the corresponding bit pattern evaluated as a binary representation of an unsigned integer (i.e. as the value of the number formed by prefixing the bit pattern with a sign bit equal to 0 and interpreting the result as a two's complement representation of an integer value). For example, the hexadecimal value 0xF is equivalent to the 4-bit pattern '1111' and is interpreted in expressions as being equal to the decimal number 15.
5.1.1 Operators
NOTE Many of the operators used in document are similar to those used in the C programming language.
5.1.2 Arithmetic operators
+ | addition |
− | subtraction (as a binary operator) or negation (as a unary prefix operator) |
× | multiplication |
/ | division without truncation or rounding |
<< | left shift; x<<s is defined as x×2s |
>> | right shift; x>>s is defined as ⎿x/2s⏌ |
++ | increment with 1 |
-- | decrement with 1 |
umod | x umod a is the unique value y between 0 and a–1 |
& | bitwise AND operator; compares each bit of the first operand to the corresponding bit of the second operand If both bits are 1, the corresponding result bit is set to 1. Otherwise, the corresponding result bit is set to 0. |
^ | bitwise XOR operator; compares each bit of the first operand to the corresponding bit of the second operand If both bits are equal, the corresponding result bit is set to 0. Otherwise, the corresponding result bit is set to 1. |
5.1.3 Logical operators
|| | logical OR |
&& | logical AND |
! | logical NOT |
5.1.4 Relational operators
> | greater than |
>= | greater than or equal to |
< | less than |
<= | less than or equal to |
== | equal to |
!= | not equal to |
5.1.5 Precedence order of operators
Operators are listed in descending order of precedence. If several operators appear in the same line, they have equal precedence. When several operators of equal precedence appear at the same level in an expression, evaluation proceeds according to the associativity of the operator either from right to left or from left to right.
Operators | Type of operation | Associativity |
() | expression | left to right |
[] | indexing of arrays | left to right |
++, -- | increment, decrement | left to right |
!, – | logical not, unary negation |
|
×, / | multiplication, division | left to right |
umod | modulo (remainder) | left to right |
+, − | addition and subtraction | left to right |
& | bitwise AND | left to right |
^ | bitwise XOR | left to right |
&& | logical AND | left to right |
|| | logical OR | left to right |
<<, >> | left shift and right shift | left to right |
< , >, <=, >= | relational | left to right |
5.1.6 Mathematical functions
|x| | absolute value, is –x for x < 0, otherwise x |
sign(x) | sign of x, zero if x is zero, +1 if x is positive, -1 if x is negative |
clamp(x,min,max) | clamps x to the range [min,max]: returns min if x < min, max if x > max or otherwise x |
ceiling of x; returns the smallest integer that is greater than or equal to x | |
floor of x; returns the largest integer that is less than or equal to x | |
rounding of x to the nearest integer, equivalent to |
6.0 General
6.1 Functional overview on the decoding process
This document specifies the JPEG Pleno Light Field superbox and the JPEG Pleno light field decoding algorithm. The generic JPEG Pleno Light Field superbox syntax is specified in Annex A.
The specified light field decoding algorithm distinguishes three coding modes:
— 4D transform mode: this mode is specified in Annex B and is based on a 4D inverse discrete cosine transform (IDCT), 4D block partitioning, and 4D bit-plane hexadeca-tree decoding.
— 4D prediction mode: this mode is based the prediction of intermediate views based on reference views and normalized disparity maps. The signalling syntax and decoding of the reference views is addressed in Annex C, the normalized disparity views in Annex D, and the prediction parameters and residual views in Annex E. The intermediate views are reconstructed in a decoding process that involves view warping, view merging and prediction error correction.
— Slanted 4D transform mode: this mode is specified in Annex F and is based on changing the EPI slants/slopes in the 4D blocks by applying a geometric transformation before the conventional 4D-DCT stage of the 4D Transform mode architecture, notably to make the slopes of the lines composing the EPIs as aligned with one of the separable 4D-DCT dimensions as possible.
The overall architecture of the three coding modes (Figure 1) provides the flexibility to configure the encoding and decoding system depending on the requirements of the addressed use case.
Figure 1 — Generic JPEG Pleno light field decoder architecture.
6.1.1 Encoder requirements
An encoding process converts source light field data to coded light field data.
In order to conform with this document, an encoder shall conform with the codestream format syntax and file format syntax specified in the annexes for the encoding process(es) embodied by the encoder.
6.1.2 Decoder requirements
A decoding process converts coded light field data to reconstructed light field data. Annexes A through F describe and specify the decoding process.
A decoder is an embodiment of the decoding process. In order to conform to this document, a decoder shall convert all, or specific parts of, any coded light field data that conform to the file format syntax and codestream syntax specified in Annex A to F to a reconstructed light field.
7.0 Organization of the document
Annex A specifies the description of the JPEG Pleno Light Field superbox.
This document specifies three approaches to represent a compressed representation of light field data: the 4D Transform mode is specified in Annex B, the 4D Prediction mode is specified in Annex C, Annex D and Annex E. Annex C details the signalling of the reference view data, Annex D the signalling of the normalized disparity views and , Annex E the signalling of the prediction parameters to generate the intermediate views and residual view data to compensate for prediction errors, and Annex F specifies the Slanted 4D Transform mode. Annex G defines the profiles and levels of the three coding modes.
This annex specifies the use of the JPEG Pleno Light Field superbox which is designed to contain compressed light field data and associated metadata. The listed boxes shall comply with their definitions as specified in ISO/IEC 21794-1.
This document may redefine the binary structure of some boxes defined as part of the ISO/IEC 15444-1 or ISO/IEC 15444-2 file formats. For those boxes, the definition found in this document shall be used for all JPL files.
Figure A.1 shows the hierarchical organization of the JPEG Pleno Light Field superbox contained by a JPL file. This illustration does not specify nor imply a specific order to these boxes. In many cases, the file will contain several boxes of a particular box type. The meaning of each of those boxes is dependent on the placement and order of that particular box within the file.
This superbox is composed out of the following core elements:
— a JPEG Pleno Light Field Header box containing parameterization information about the light field such as size and colour parameters;
— a JPEG Pleno Light Field Reference View box containing the compressed reference views of the light field;
— a JPEG Pleno Light Field Disparity View box signalling disparity information for all or a subset of subaperture views;
— a JPEG Pleno Light Field Intermediate View box containing prediction parameters and eventual compressed residual signals for subaperture views not encoded as reference views.
Table A.1 lists all boxes defined as part of this document. Boxes defined as part of the ISO/IEC 15444-1 or ISO/IEC 15444-2 file formats are not listed. A box that is listed in Table A.1 as “Required” shall exist within all conforming JPL files. For the placement of and restrictions on each box, see the relevant section defining that box.
Note that the IPR, XML, and UUID boxes defined in Annex A can be signalled, as well at the level of the JPEG Pleno Light Field box, to carry light field specific metadata.
Figure A.1 — Hierarchical organization of a JPEG Pleno Light Field superbox
The following boxes should be interpreted properly by all conforming readers. Each of these boxes conforms to the standard box structure as defined in ISO/IEC 21794-1:2020, Annex A. The following clauses define the value of the DBox field. It is assumed that the LBox, TBox and XLBox fields exist for each box in the file as defined in ISO/IEC 21794-1:2020, Annex A.
Table A.1 — Defined boxes
Box name | Type | Superbox | Required? | Comments |
|---|---|---|---|---|
JPEG Pleno Light Field box | ‘jplf’ (0x6A70 6C66) | Yes | Yes | This box contains a series of boxes that contain the encoded light field, its parameterization and associated metadata. (Defined in ISO/IEC 21794-1:2020, Annex A) |
JPEG Pleno Profile and Level box | ‘jppl’ (0x6A70 706C) | No | Yes | This box indicates to which profile and associated level the file format and codestream complies. (Defined in Annex A.3.2) |
JPEG Pleno Light Field Header box | 'jplh' (0x6A70 6C68) | Yes | Yes | This box contains generic information about the file, such as the number of components, bits per component and colour space. (Defined in Annex A.3.3) |
Light Field Header box | ‘lhdr’ (0x6C68 6472) | No | Yes | This box contains fixed length generic information about the light field, such as light field dimensions, subaperture image size, number of components, codec and bits per component. (Defined in Annex A.3.3.2) |
Camera Parameter box | ‘lfcp’ (0x6C66 6370) | No | No | This box signals intrinsic and extrinsic camera parameters for calibration of the light field data. (Defined in Annex A.3.3.3) |
Contiguous Codestream box | 'jp2c' (0x6A70 3263) | No | No | This box contains a JPEG Pleno codestream (Defined in Annex A.3.4) |
JPEG Pleno Light Field Reference View superbox | ‘lfrv’ (0x6C66 7276) | Yes | No | This box contains a series of boxes that contain the encoded reference views and their associated parameters. (Defined in Annex C.2) |
JPEG Pleno Light Field Reference View Description box | ‘lfrd’ (0x6C66 7264) | No | No | This box signals which views are encoded as reference views and their encoding configuration. (Defined in Annex C.3.1) |
Common Codestream Elements box | ‘lfcc’ (0x6C66 6363) | No | No | This box contains the redundant part of the signalled codestreams. (Defined in Annex C.3.2) |
JPEG Pleno Light Field Normalized Disparity View superbox | ‘lfdv’ (0x6C66 6476) | Yes | No | This box contains a series of boxes that contain the encoded normalized disparity views and their associated parameters. (Defined in Annex D.2) |
JPEG Pleno Light Field Normalized Disparity View Description box | ‘lfdd’ (0x6C66 6464) | No | No | This box signals for which views normalized disparity information is signalled and their encoding configuration. (Defined in Annex D.3.1) |
JPEG Pleno Light Field Intermediate View superbox | ‘lfiv’ (0x6C66 6976) | Yes | No | This box contains a series of boxes that contain both the prediction parameters for the intermediate views and the encoded residual views. (Defined in Annex E.2) |
JPEG Pleno Light Field Prediction Parameter box | ‘lfpp’ (0x6C66 7070) | No | No | This box signals prediction parameter information for the intermediate views. (Defined in Annex E.3.1) |
JPEG Pleno Light Field Residual View Description box | ‘lfre’ (0x6C66 7265) | No | No | This box signals the encoding configuration for the residual views containing the prediction errors. (Defined in Annex E.3.2) |
Profile and levels are defined in Annex G. The type of the JPEG Pleno Profile and Level box shall be ‘jppl’ (0x6A70686F). The contents of the box shall have the organization as in Figure A.2, and its format shall be as in Table A.2.
Key
Ppih profile of the codestream (as defined in Annex G)
Plev level of the codestream (as defined in Annex G)
Figure A.2 — Organization of the contents of a JPEG Pleno Profile and Level box
Table A.2 — Format of the contents of the JPEG Pleno Profile and Level box
Field name | Size (bits) | Value |
Ppih | 16 | Variable, defined in Annex G |
Plev | 16 | Variable, defined in Annex G |
The JPEG Pleno Header box contains generic information about the file, such as the number of components, bits per component and colour space. This box is a superbox. Within a JPL file, there shall be one and only one JPEG Pleno Header box. The JPEG Pleno Header box shall be located anywhere after the File Type box and before the Contiguous Codestream box. It also shall be at the same level as the JPEG Pleno Signature and File Type boxes. It shall not be inside any other superbox within the file.
The type of the JPEG Pleno Header box shall be 'jplh' (0x6A706C68).
This box contains several boxes. Other boxes may be defined in other documents and may be ignored by conforming readers. The boxes contained within the JPEG Pleno Header box that are defined within this document are shown in Figure A.3:
— The Light Field Header box specifies information about the reference grid geometry, bit depth and the number of components. This box shall be the first box in the JPEG Pleno Header box and is specified in A.3.3.2.
— The Bits Per Component box specifies the bit depth of the components in the file in cases where the bit depth is not constant across all components. Its structure shall be as specified in ISO/IEC 15444-1.
— The Colour Specification boxes specify the colour space of the decompressed image. Their structures shall be as specified in ISO/IEC 15444-2. There shall be at least one Colour Specification box within the JPEG Pleno Header box. The use of multiple Colour Specification boxes provides the ability for a decoder to be given multiple optimization or compatibility options for colour processing. These boxes shall be positioned anywhere in the JPEG Pleno Header box provided that they come after the Light Field Header box. All Colour Specification boxes shall be contiguous within the JPEG Pleno Header box.
— The Channel Definition box defines the channels in the image. Its structure shall be as specified in ISO/IEC 15444-1. This box shall be positioned anywhere in the JPEG Pleno Header box, provided that it comes after the Light Field Header box.
Key
lhdr Light Field Header box
bppc Bits Per Component box
colri Colour Specification boxes
cdef Channel Definition box
Figure A.3 — Organization of the contents of a JPEG Pleno Header box
This box contains fixed length generic information about the light field, such as light field dimensions, subaperture image size, number of components, codec and bits per component. The contents of the JPEG Pleno Header box shall start with a Light Field Header box. Instances of this box in other places in the file shall be ignored. The length of the Light Field Header box shall be 30 bytes, including the box length and type fields. Much of the information within the Light Field Header box is redundant with information stored in the codestream itself.
All references to "the codestream" in the descriptions of fields in this Light Field Header box apply to the codestream found in the first Contiguous Codestream box in the file. Files that contain contradictory information between the Light Field Header box and the first codestream are not conforming files. However, readers may choose to attempt to read these files by using the values found within the codestream.
The type of the Light Field Header box shall be 'lhdr' (0x6C68 6472) and the contents of the box shall have the format as in Figure A.4 and Table A.3:
— ROWS (T): The value of this parameter indicates the number of rows of the subaperture view array. This field is stored as a 4-byte big-endian unsigned integer.
— COLUMNS (S): The value of this parameter indicates the number of columns of the subaperture view array. This field is stored as a 4-byte big-endian unsigned integer.
— HEIGHT (V): The value of this parameter indicates the height of the sample grid. This field is stored as a 4-byte big-endian unsigned integer.
— WIDTH (U): The value of this parameter indicates the width of the sample grid. This field is stored as a 4-byte big-endian unsigned integer.
— NC: This parameter specifies the number of components in the codestream and is stored as a 2-byte big-endian unsigned integer. The value of this field shall be equal to the value of the NC field in the LFC marker in the codestream (as defined in B.3.2.6.3). If no Channel Definition Box is available, the order of the components for colour images is R-G-B-Aux or Y-U-V-Aux.
— BPC: This parameter specifies the bit depth of the components in the codestream, minus 1, and is stored as a 1-byte field (Table A.4).
The low 7-bits of the value indicate the bit depth of the components. The MSB indicates whether the components are signed or unsigned. If the MSB is 1, then the components contain signed values. If the MSB is 0, then the components contain unsigned values. If the components vary in bit depth or sign, or both, then the value of this field shall be 255 and the Light Field Header box shall also contain a Bits Per Component box defining the bit depth of each component (as defined in A.3.3.2.2).
— C: This parameter specifies the compression algorithm used to compress the image data. It is encoded as a 1-byte unsigned integer. If the value is 0, the 4D transform mode coding is activated. If the value is 1, the 4D prediction mode is activated. If the value is 2, the Slanted 4D transform mode is activated. All other values are reserved for ISO/IEC use.
— UnkC: This field specifies if the actual colour space of the image data in the codestream is known. This field is encoded as a 1-byte unsigned integer. Legal values for this field are 0, if the colour space of the image is known and correctly specified in the Colour Space Specification boxes within the file, or 1 if the colour space of the light field is not known. A value of 1 will be used in cases such as the transcoding of legacy images where the actual colour space of the image data is not known. In these cases, while the colour space interpretation methods specified in the file may not accurately reproduce the image with respect to an original, the image should be treated as if the methods do accurately reproduce the image. Values other than 0 and 1 are reserved for ISO/IEC use.
— IPR: This parameter indicates whether this JPL file contains intellectual property rights information. If the value of this field is 0, this file does not contain rights information, and thus the file does not contain an IPR box. If the value is 1, then the file does contain rights information and thus does contain an IPR box as defined in ISO/IEC 15444-1. Other values are reserved for ISO/IEC use.
Key
ROWS (T) | number of rows of the subaperture view array |
COLUMNS (S) | number of columns of the subaperture view array |
HEIGHT (V) | subaperture view height |
WIDTH (U) | subaperture view width |
NC | number of components |
BPC | bits per component |
C | compression type |
UnkC | colour space unknown |
IPR | intellectual property |
Figure A.4 — Organization of the contents of a Light Field Header box
Table A.3 — Format of the contents of the Light Field Header box
Field name | Size (bits) | Value |
ROWS | 32 | 1 to (232– 1) |
COLUMNS | 32 | 1 to (232– 1) |
HEIGHT | 32 | 1 to (232– 1) |
WIDTH | 32 | 1 to (232– 1) |
NC | 16 | 1 to 16 384 |
C | 8 | 0 to 2 |
BPC | 8 | See Table A.4 |
UnkC | 8 | 0 to 1 |
IPR | 8 | 0 to 1 |
Table A.4 — BPC values
Values (bits) | Component sample precision | |
MSB | LSB | |
x000 0000 | Component bit depth = value + 1. From 1 bit deep to 38 bits deep respectively (counting the sign bit, if appropriate). | |
0xxx xxxx | Components are unsigned values. | |
1xxx xxxx | Components are signed values. | |
1111 1111 | Components vary in bit depth. | |
| All other values reserved for ISO/IEC use. | |
The Bits Per Component box specifies the bit depth of each component. If the bit depth of all components in the codestream is the same (in both sign and precision), then this box shall not be present. Otherwise, this box specifies the bit depth of each individual component. The order of bit depth values in this box is the actual order in which those components are enumerated within the codestream. The exact location of this box within the JPEG Pleno Header box may vary provided that it follows the Light Field Header box.
There shall be one and only one Bits Per Component box inside a JPEG Pleno Header box.
The type of the Bits Per Component box shall be 'bpcc' (0x6270 6363). The contents of this box shall be as in Table A.5 and Figure A.5.
Key
BPCi bits per component
Figure A.5 — Organization of the contents of a Bits Per Component box
Table A.5 — Format of the contents of the Bits Per Component box
Field name | Size (bits) | Value |
BPCi | 8 | See Table A.6 |
This parameter specifies the bit depth of component i, minus 1, encoded as a 1‑byte value (Table A.6). The ordering of the components within the Bits Per Component box shall be the same as the ordering of the components within the codestream. The number of BPCi fields shall be the same as the value of the NC field from the Light Field Header box. The value of this field shall be equivalent to the respective Ssizi field in the LFC marker in the codestream. The low 7-bits of the value indicate the bit depth of this component. The MSB indicates whether the component is signed or unsigned. If the MSB is 1, then the component contains signed values. If the MSB is 0, then the component contains unsigned values.
Table A.6 — BPCi values
Values (bits) | Component sample precision | |
MSB | LSB | |
x000 0000 | Component bit depth = value + 1. From 1 bit deep to 38 bits deep respectively (counting the sign bit, if appropriate). | |
0xxx xxxx | Components are unsigned values. | |
1xxx xxxx | Components are signed values. | |
| All other values reserved for ISO/IEC use. | |
The Camera Parameter box provides information on the positioning of the local reference grid in the global reference grid, its size, and calibration information about the light field. This box is optional.
Camera models can be represented by matrices with particular properties that represent the mapping of the 3D world coordinate system to the image coordinate system. This 3D to 2D transform depends on a number of parameters, known as the intrinsic and extrinsic parameters.
As specified in ISO/IEC 21794-1, JPEG Pleno provides a mechanism to co-register plenoptic data contained by the JPL file on a 3D reference grid system. This reference grid system exists out of a global and a local reference grid. The global reference grid allows the positioning of the individual modalities in the represented 3D scene. In addition, each JPEG Pleno Light Field, Point Cloud and Hologram box shall be assigned a local reference grid to address their sampled plenoptic data. This local reference grid is specified by signalling its translation and rotation with respect to the global reference grid. The rotation angles shall be determined utilizing the right-hand rule for curve orientation.
The parameters related to the local reference grid are signalled per plenoptic object in the associated JPEG Pleno Light Field, Point Cloud or Hologram box.
Figure A.6 — The global and local reference grid
In Figure A.6, boundaries and coordinate axes of the global and one local reference grid are shown. In each case, the samples or coefficients coincident with the left and upper boundaries are included in a given bounding box, while samples or coefficients along the right and/or lower boundaries are not included in that bounding box.
The camera modelling and calibration is described based on the local reference grid (XL, YL, ZL) in Figure A.6. Both intrinsic and extrinsic parameters are being signalled to model and calibrate the camera setting and behaviour. The intrinsic parameters are the camera parameters that are internal and fixed to a particular camera/digitization setup, allowing the mapping between camera coordinates and pixel coordinates in the image plane[1]. The extrinsic parameters are the camera parameters that are external to the camera and may change with respect to the 3D local reference grid, defining the location and orientation of the camera with respect to the 3D local reference grid coordinate system.
Considering a pinhole camera model, the centre of projection is the ‘optical centre’ (C in Figure A.7). The camera's ‘principal axis’ (ZCAM in Figure A.7) is the line perpendicular to the image plane that passes through the pinhole. Its intersection with the image plane is known as the ‘principal point’ (p in Figure A.7) and it is the geometrical centre of the image. The parameters u0 and v0 are the principal points offsets, which are the coordinates of the principal point relative to the coordinate axes u and v (Figure A.7).
Figure A.7 — Pinhole camera geometry
The focal lengths and
correspond to the distance between the optical centres of the cameras and their respective image planes. They are represented in terms of pixel dimensions in the u and v directions. For example, if the camera focal length is given in mm one needs to convert it to pixel dimensions using Formulae (A.1) and (A.2). For square pixels the sensor height is equal to the sensor width, and
is equal to
.
(pixel) = (focal length (mm) / sensor width (mm)) × image width (pixel) (A.1)
(pixel) = (focal length (mm) / sensor height (mm)) × image height (pixel) (A.2)
The axis skew parameter sk causes shear distortion in the projected image, and for most of the cameras its value is equal to zero. The parameters ,
,
,
and
completely characterize the mapping of an image point from camera to pixels coordinates. They are known as the intrinsic or internal parameters of a camera system and can be represented by the transformation matrix K in Formula (A.3):
(A.3)
The matrix K is known as the calibration matrix. In general, the mapping from 3D local reference grid to the image is linear. A camera system is said to be calibrated when its intrinsic parameters are known, otherwise it is an uncalibrated camera system.
The parameters that relate the camera orientation and position to a 3D local reference grid coordinate system are called the extrinsic or external parameters. The geometric quantities (rotational and translational components) describing the relative position and orientation of the cameras are called the extrinsic parameters of the camera system. The rotation and translation can be represented in an extrinsic matrix taking the form of a rigid transformation matrix: a 3×3 rotation matrix R, and a 3×1 translation column-vector tM = (XCC, YCC, ZCC)T that can be represented in a 3×4 matrix, as in Formula (A.4).
(A.4)
Matrix P (Formula (A.5)), known as the projection matrix, or camera matrix, represents the pose of the 3D local reference grid coordinates relative to the image coordinates. It contains 6 independent parameters (Degrees of Freedom - DoF): 3 for rotation and 3 for translation. These parameters are expressed in the local reference grid (Figure A.6).
(A.5)
The 3×4 matrix P relates the sensor plane 2D image coordinates u = (u, v) (= (u, v, 1) in homogeneous coordinates) to the 3D local reference grid coordinates X = (XL, YL, ZL) (
= (XL, YL, ZL, 1) in homogeneous coordinates) via Formula (A.5). The mapping between a point in the 3D world into a 2D image is given by
=P
, where
is the image point in homogeneous coordinates, P is the camera matrix and
is the 3D local reference grid point in homogeneous coordinates. The extrinsic camera parameter matrix represents the current status of the camera in the 3D scene.
For example, a rotation in 3D local reference grid space involves an axis around which to rotate, and an angle of rotation, according to the right-handed coordinates.
Formulae (A.6), (A.7) and (A.8) show the rotation matrix values for rotations around these 3 axes:
, rotation around the XL axis, rotates YL, ZL, leaving the XL coordinates fixed (A.6)
, rotation around the YL axis, rotates XL, ZL, leaving the YL coordinates fixed (A.7)
, rotation around the ZL axis, rotates XL, YL, leaving the ZL coordinates fixed. (A.8)
Any rotation can be expressed as a combination of the three rotations about the three axes, as per Formula (A.9):
(A.9)
Formula (A.10) shows the mapping of a 3-D local reference grid point (XL, YL, ZL) to the image coordinate system (u,v) , for a calibrated system:
(A.10)
The type of the Camera Parameter box shall be 'lfcp' (0x6C66 6370) and the contents of the box shall have the format as in Figure A.8.
If the Camera Parameter box is not signalled, all parameters specified in Figure A.8 and Table A.7 shall be initialized to zero. The scaling values SGLX, SGLY and SGLZ will be initialized to 1.
Key
Light field position in global reference grid | |
PP | precision of coordinates (Precision Prec = 16×2PP) |
XLO | position of the origin of the local reference grid in the global reference system along the XG coordinate axis |
YLO | position of the origin of the local reference grid in the global reference system along the YG coordinate axis |
ZLO | position of the origin of the local reference grid in the global reference system along the ZG coordinate axis |
rotation offset around the XG axis (in rad) | |
rotation offset around the YG axis (in rad) | |
rotation offset around the ZG axis (in rad) | |
SGLX | scaling of local reference grid system with respect to global reference grid system for the X-axes before rotation |
SGLY | scaling of local reference grid system with respect to global reference grid system for the Y-axes before rotation |
SGLZ | scaling of local reference grid system with respect to global reference grid system for the Z-axes before rotation |
Extrinsic parameters for pinhole camera corresponding to subaperture view (t, s) | |
ExtInt | signals which extrinsic and intrinsic camera parameters are signalled |
BaselineX | horizontal camera baseline, used when XCC(t,s) = s × BaselineX + XCC(0,0), and hence, XCC(0,1), XCC(0,2), …, XCC(T-1,S-1) do not need to be signalled |
BaselineY | vertical camera baseline, used when YCC(t,s) = t × BaselineY + YCC(0,0), and hence, YCC(0,1), YCC(0,2), …, YCC(T-1,S-1) do not need to be signalled |
XCC (t, s) | camera centre of subaperture view (t, s) in local reference grid along XL coordinate axis |
YCC(t, s) | camera centre of subaperture view (t, s) in local reference grid along YL coordinate axis |
ZCC (t, s) | camera centre of subaperture view (t, s) in local reference grid along ZL coordinate axis |
| camera rotation offset around the XL axis (in rad) |
| camera rotation offset around the YL axis (in rad) |
| camera rotation offset around the ZL axis (in rad) |
Intrinsic parameters for pinhole camera corresponding to subaperture view (t, s) | |
f (t,s) | focal length (in mm) |
sW (t,s) | sensor width (in mm) |
sH (t,s) | sensor height (in mm) |
sk (t,s) | sensor skew |
u0 (t,s) | horizontal principle point offset |
v0 (t,s) | vertical principle point offset |
NOTE 1 PP indicates the floating-point precision issued for the coordinates.
NOTE 2 XLO, YLO, ZLO, ,
,
, SGLX, SGLY, SGLZ,
(t, s) ,
(t, s) and
(t, s) utilize the chosen floating-point precision.
Figure A.8 — Organization of the contents of the Camera Parameter box
The geometrical coordinates of the centre of the camera when acquiring the view (t, s) are denoted as
An example of camera centre coordinates for a planar camera array is given in Figure A.9, where both the horizontal and vertical coordinates are illustrated for five views in the camera array. The camera centres XCC and YCC are used together with the normalized disparity maps to obtain horizontal and vertical disparity maps between a pair of views in the light field. For usage examples see Annex D.4 and Annex E.4.
Figure A.9 — Organization of subaperture views and associated planar
camera calibration information
Table A.7 — Format of the contents of the Camera Parameter box
Field name | Size (bits) | Value |
PP | 8 | 0 to (28-1) |
XLO | variable | big endian floating-point |
YLO | variable | big endian floating-point |
ZLO | variable | big endian floating-point |
variable | big endian floating-point | |
variable | big endian floating-point | |
variable | big endian floating-point | |
SGLX | variable | big endian floating-point |
SGLY | variable | big endian floating-point |
SGLZ | variable | big endian floating-point |
ExtInt | 16 | See Table A.8 |
BaselineX | 32 | single precision, big endian floating-point |
BaselineY | 32 | single precision, big endian floating-point |
XCC(0,0) | 32 | single precision, big endian floating-point |
YCC(0,0) | 32 | single precision, big endian floating-point |
ZCC(0,0) | 32 | single precision, big endian floating-point |
32 | single precision, big endian floating-point | |
32 | single precision, big endian floating-point | |
32 | single precision, big endian floating-point | |
f(0,0) | 32 | single precision, big endian floating-point |
sW(0,0) | 32 | single precision, big endian floating-point |
sH(0,0) | 32 | single precision, big endian floating-point |
sk(0,0) | 32 | single precision, big endian floating-point |
u0(0,0) | 32 | single precision, big endian floating-point |
v0(0,0) | 32 | single precision, big endian floating-point |
XCC(0,1) | 32 | single precision, big endian floating-point |
YCC(0,1) | 32 | single precision, big endian floating-point |
… | … | … |
u0(T-1,S-1) | 32 | single precision, big endian floating-point |
v0(T-1,S-1) | 32 | single precision, big endian floating-point |
Table A.8 — Meaning of ExtInt bits
Bit position | Value | Meaning |
0 (LSB) | 0 | XCC(t,s) = s × BaselineX + XCC(0,0) Signal BaselineX and XCC(0,0), the remaining (T×S)-1 XCC entries in Table A.7 are not signalled |
1 | The T×S XCC(t,s) entries in Table A.7 are signalled | |
1 | 0 | YCC(t,s) = t × BaselineY + YCC(0,0) Signal BaselineY and YCC(0,0), the remaining (T×S)-1 YCC entries in Table A.7 are not signalled |
1 | The T×S YCC(t,s) entries in Table A.7 are signalled | |
2 | 0 | ZCC(t,s) = ZCC(0,0) and the remaining (T×S)-1 ZCC entries in Table A.7 are not signalled |
1 | The T×S ZCC(t,s) entries in Table A.7 are signalled | |
3 | 0 | θXcam(t,s) = θXcam(0,0) and the remaining (T×S)-1 entries in Table A.7 are not signalled |
1 | The T×S θXcam(t,s) entries in Table A.7 are signalled | |
4 | 0 | θYcam(t,s) = èθcam(0,0) and the remaining (T×S)-1 entries in Table A.7 are not signalled |
1 | The T×S θYcam(t,s) entries in Table A.7 are signalled | |
5 | 0 | θZcam(t,s) = θZcam(0,0) and the remaining (T×S)-1 entries in Table A.7 are not signalled |
1 | The T×S θZcam(t,s) entries in Table A.7 are signalled | |
6 | 0 | f(t,s) = f(0,0) and the remaining (T×S)-1 entries in Table A.7 are not signalled |
1 | The T×S f(t,s) entries in Table A.7 are signalled | |
7 | 0 | sW(t,s) = sW(0,0) and the remaining (T×S)-1 entries in Table A.7 are not signalled |
1 | The T×S sW(t,s) entries in Table A.7 are signalled | |
8 | 0 | sH(t,s) = sH(0,0) and the remaining (T×S)-1 entries in Table A.7 are not signalled |
1 | The T×S sH(t,s) entries in Table A.7 are signalled | |
9 | 0 | sk(t,s) = sk(0,0) and the remaining (T×S)-1 entries in Table A.7 are not signalled |
1 | The T×S sk(t,s) entries in Table A.7 are signalled | |
10 | 0 | u0(t,s) = ⎿U/2⏌ and the (T×S) entries in Table A.7 are not signalled |
1 | The T×S u0(t,s) entries in Table A.7 are signalled. | |
11 | 0 | 0(t,s) = ⎿V/2⏌ and the (T×S) entries in Table A.7 are not signalled |
1 | The T×S v0(t,s) entries in Table A.7 are signalled | |
12-15 | 0 | Reserved for future ISO/IEC use |
The Contiguous Codestream box contains a JPEG Pleno codestream.
The type of a Contiguous Codestream box shall be 'jp2c' (0x6A70 3263). The contents of the box shall be as in Figure A.10 and Table A.9:
Figure A.10 — Organization of the contents of the Contiguous Codestream box
Table A.9 — Format of the contents of the Contiguous Codestream box
Field name | Size (bits) | Value |
Code | Variable | Variable |
Code | This field contains valid and complete JPEG Pleno codestream components as specified in Annexes B, C, D, E and F. |
This annex describes an instantiation of 4D transform mode encoder. Next, the codestream syntax is specified and subsequently the light field decoding process is detailed.
In the 4D transform mode,[2] the light field is encoded with a four-step process (Figure B.1). First, the 4D light field data is divided into fixed-sized 4D blocks that are independently encoded according to a predefined and fixed scanning order. However, if any light field dimensions are not multiple of such fixed-sized 4D blocks, the sizes of the 4D blocks at the light field boundaries shall be truncated to fit in the light field dimensions. The initial blocks can be further partitioned into a set of non-overlapping 4D sub-blocks, where the optimal partitioning parameters are derived based on a rate-distortion (R-D) criterion. Each sub-block is independently transformed by a variable block-size 4D DCT. Subsequently, the transformed blocks are quantized and entropy coded using hexadeca-tree bit plane decomposition and adaptive arithmetic encoding, producing a compressed representation of the light field. This coding procedure is applied to each colour component independently.
Figure B.1 — 4D transform mode encoding architecture
A sample (pixel) of the light field is referenced in a 4D coordinate system along the t, s, v and u-axes, where t and s are representing the coordinates of the addressed subaperture view, and v and u the sample coordinates within the subaperture images as illustrated in Figure B.2. The blocks are scanned in the directions t, s, v and u, with direction u corresponding to the inner loop of the scan. In Table B.1 a pseudo-code describes the scan of the 4D blocks. Each 4D block will generate a separate codestream embedded in the codestream, which can be independently decoded in support of random access.
Figure B.2 — 4D structure
The 4D block partitioning, as well as the clustering of the bit-planes of transform coefficient – for efficient encoding – are signalled using tree structures (Figure B.3). The partitioning of the 4D blocks in sub-blocks is signalled with a binary tree using ternary flags. These flags signal whether:
— a block is transformed as is;
— is split into 4 blocks in the s,t (view) dimensions;
— is split into 4 blocks in the u,v (spatial) dimensions.
Key
Sn split node
Ln leaf node
Figure B.3 — Binary tree representing 4D-block partitioning of a tk×sk×vk×uk 4D block
Before subsequently applying the 4D-DCT on the sub-blocks, a level-shift operation is performed to reduce the dynamic range requirements of the DCT (Annex B.2.3.1). The deployed 4D DCT (Annex B.2.3.2) is separable, i.e. with 1D transforms computed separately in each of the 4 directions. An example of the computation flow of the 4D separable transform is depicted in Figure B.4. The order of the 1D transforms is arbitrary.
Figure B.4 — Separable forward 4D-DCT
After the 4D-DCT is performed, the set of transform coefficients is sliced into 4D bit-planes. A transform coefficient is considered non-significant on a 4D bit-plane if its bits belonging to higher 4D bit-planes are all zero. Otherwise, the transform coefficient is considered to be significant. A hexadeca-tree with ternary flags is used to group the non-significant transform coefficients and thus localize the significant transform coefficients (Figure B.5). The ternary flags signal that:
— either a block of transform coefficients containing a significant coefficient at the current bit-plane is split into 16 blocks in the four t, s, v, u dimensions,
— or a block of transform coefficients not containing any significant coefficient at the current bit-plane is not split,
— or a block of transform coefficients containing a significant coefficient at the current bit-plane is discarded.
The 4D bit-planes are scanned from the most significant bit-plane to the least significant one, where the least significant 4D bit-plane being determined by the desired quantization level. Both the hexadeca-tree bits and the bits from the transform coefficients are encoded using an adaptive arithmetic coder.
Key
Sn split node
Ln leaf node
Figure B.5 — Hexadeca-tree representing the clustering of bit-planes of 4D transform coefficients of a tb×sb×vb×ub 4D block
The structure that clusters the non-significant 4D transform coefficients and thus localizes the significant ones is a hexadeca-tree, that is encoded using ternary flags. They signal that a block of transform coefficients containing a significant coefficient at the current bit-plane is split into 16 blocks in the four t, s, v, u dimensions, or that a block of transform coefficients not containing any significant coefficient at the current bit-plane is not split, or that a block of transform coefficients containing a significant coefficient at the current bit-plane is discarded. The 4D bit-planes are scanned from the most significant to the least significant one, where the least significant 4D bit-plane being determined by the desired quantization level. Both the hexadeca-tree bits and the bits from the coefficients are encoded using an adaptive arithmetic coder. The hexadeca-tree structure mentioned above is depicted in Figure B.5.
The following two sections provide more insight on the partitioning strategy and the entropy and quantization steps.
The partitioning optimization for each block in Figure B.6 is obtained as follows: initially each block is transformed by a full-size DCT (B.2.3.2), and the Lagrangian encoding cost, J0 is evaluated (Table B.5). This cost is defined as J0 = D + λ R, where D is the distortion incurred when representing the original block by the quantized version and R is the rate needed to encode it. The procedure of Table B.5 is used to evaluate this cost.
Figure B.6 — 4D Block of a light field
Next, the block can be partitioned in four sub-blocks each one with approximately a quarter of the pixels in the spatial dimensions. For example, let us consider a block B of dimensions tk×sk×vk×uk. This block (pictured in Figure B.7 and Figure B.8) will be subdivided in four sub-blocks of sizes tk×sk×⎿vk/2⏌×⎿uk/2⏌, tk×sk×⎿vk/2⏌×(uk-⎿uk/2⏌), tk×sk×(vk-⎿vk/2⏌)×⎿uk/2⏌ and tk×sk×(vk-⎿vk/2⏌)×(uk-⎿uk/2⏌), respectively.
Figure B.8 shows a 4D block with dimensions tk×sk×vk×uk in the root node. When applying the spatial split (signalled with the spatialSplit flag) to the root node, the tree in Figure B.8 is obtained. Figure B.7 illustrates the four ways that a single view is partitioned using the spatialSplit flag, corresponding to four nodes of Figure B.8. The optimal partition for each sub-block is computed by means of the recursive procedure described in Table B.5 and the Lagrangian costs of the four sub-blocks are added to compute the Lagrangian cost JS (Spatial R-D cost).
Figure B.7 — Spatial partitioning of block with spatial dimensions vk×uk
Key
<graphic>21794-</graphic> node
<graphic>21794-</graphic> spatialSplit flag
Figure B.8 — Hierarchical 4D partitioning of a 4D block with dimensions tk×sk×vk×uk using the spatialSplit flag
The block can be partitioned in four sub-blocks each one with approximately a quarter of the pixels in the view dimensions. For example, let us consider again a block B of dimensions tk×sk×vk×uk. This block (pictured in Figure B.9 and Figure B.10) will be subdivided in four sub-blocks of sizes ⎿tk/2⏌×⎿sk/2⏌×vk×uk ⎿tk/2⏌×(sk-⎿sk/2⏌)×vk×uk, (tk-⎿tk/2⏌)×⎿sk/2⏌×vk×uk, (tk-⎿tk/2⏌)×(sk-⎿sk/2⏌)×vk×uk, respectively Figure B.10). When applying the view split (signalled with the viewSplit flag) to the root node, the tree in Figure B.10 is obtained. Figure B.9 illustrates the four ways that a 4D block is partitioned using the viewlSplit flag, corresponding to four the nodes of Figure B.10. The optimal partition for each sub-block is computed by means of the recursive procedure described in Table B.5 and the Lagrangian costs of the four sub-blocks are added to compute the Lagrangian cost JV (View R-D cost).
Figure B.9 — View partitioning of a 4D block of dimensions tk×sk×vk×uk
Key
<graphic>21794-</graphic> node
<graphic>21794-</graphic> viewSplit flag
Figure B.10 — Hierarchical 4D partitioning of a 4D block with dimensions tk×sk×vk×uk , using the viewSplit flag
Finally, the three Lagrangian costs (J0, JS and JV) are compared (Table B.5) and the one presenting the lowest value is chosen.
The recursive partition procedure (Table B.5) keeps track of this tree (Table B.11) and returns a partitionString (Table B.5) that represents the optimal tree. The string is obtained as follows: once the lowest cost is chosen, the current value of partitionString is augmented by appending to it the flag corresponding to the lowest cost chosen. Then, the string returned by the recursive call that leads to the minimum cost is also appended to the end of the partitionString and the procedure returns both the minimum cost J0 , JS or JV and the updated partitionString.
Key
<graphic>21794-</graphic> node
<graphic>21794-</graphic> spatialSplit flag
<graphic>21794-</graphic> viewSplit flag
<graphic>21794-</graphic> transform flag
NOTE The transform flag signals that the node is a leaf node and will be no further partitioned.
Figure B.11 — Hierarchical 4D partitioning using the viewSplit flag and the spatialSplit flag
After the optimal partition tree is found, the Encode Partition procedure (Table B.6) is called to encode it.
Subsequently, the subblocks are subject to a DCT. However, before processing the forward DCT for a block of source light field samples, if the samples of the component are unsigned, those samples shall be level shifted to a signed representation. if the MSB of Ssizi from the LFC marker segment (see Annex B.3.2.6.3) is zero, all samples x(u,v,s,t) of the ith component are level shifted by subtracting the same quantity from each sample as follows:
First all components have to be converted to the same precision (bit-depth). Each colour component c is by multiplied by before the forward 4D-DCT.
For a given light field x(u,v,s,t) the corresponding 4D-DCT representation X(i,j,p,q) is computed by transforming the block along each dimension as follows.
where and
, where N is the size of the transform. Output coefficients are represented as 32-bit integers. As indicated earlier, the transform order is arbitrary.
The quantization and entropy encoding rely on the R-D optimized hexadeca-tree structure, which is constructed based on the procedure listed in Table B.7 and which is further discussed in this paragraph. This tree is uniquely represented by a series of ternary flags: lowerBitplane, splitBlock and zeroBlock (Table B.44). The hexadeca-tree is built by recursively subdividing a 4D block until all sub-blocks reach a 1×1×1×1 4D block-size. The hexadeca-tree is built by recursively subdividing a 4D block. Starting from a 4D block of size tb×sb×vb×ub, and a bit-plane initially set to maxBitplane (Table B.11), 3 operations are performed:
| i) | Lower the bit-plane: in this case, the descendant of the node is another block with the same dimensions as the original one but represented with precision bitplane-1. This is used to indicate for all pixels of the block that the binary representation of their magnitudes at the current bitplane and above are zero. This situation is encoded by the ternary flag value lowerBitPlane. |
| ii) | Split the block: in this case, the node will have up to 16 children, each one associated to a sub-block with approximately half the length of the original block in all four dimensions. For example, a block B of size tb×sb×vb×ub can be split in the following sub-blocks: |
|
| B0000 of size (⎿tb/2⏌ × ⎿sb/2⏌ × ⎿vb/2⏌ × ⎿ub/2⏌), B0001 of size (⎿tb/2⏌ × ⎿sb/2⏌ × ⎿vb/2⏌ × ub-⎿ub/2⏌), |
|
| B0010 of size (⎿tb/2⏌ × ⎿sb/2⏌ × vb-⎿vb/2⏌ × ⎿ub/2⏌), B0011 of size (⎿tb/2⏌ × ⎿sb/2⏌ × vb-⎿vb/2⏌ × ub-⎿ub/2⏌), |
|
| B0100 of size (⎿tb/2⏌ × sb -⎿sb/2⏌ × ⎿vb/2⏌ × ⎿ub/2⏌), B0101 of size (⎿tb/2⏌ × sb -⎿sb/2⏌ × ⎿vb/2⏌ × ub-⎿ub/2⏌), B0110 of size (⎿tb/2⏌ × sb -⎿sb/2⏌ × vb-⎿vb/2⏌ × ⎿ub/2⏌), |
|
| B0111 of size (⎿tb/2⏌ × sb -⎿sb/2⏌ × vb-⎿vb/2⏌ × ub-⎿ub/2⏌), B1000 of size (tb-⎿tb/2⏌ × ⎿sb/2⏌ × ⎿vb/2⏌ × ⎿ub/2⏌), B1001 of size (tb-⎿tb/2⏌ × ⎿sb/2⏌ × ⎿vb/2⏌ × ub-⎿ub/2⏌), |
|
| B1010 of size (tb-⎿tb/2⏌ × ⎿sb/2⏌ × vb-⎿vb/2⏌ × ⎿ub/2⏌), B1011 of size (tb-⎿tb/2⏌ × ⎿sb/2⏌ × vb-⎿vb/2⏌ × ub-⎿ub/2⏌), B1100 of size (tb-⎿tb/2⏌ × sb -⎿sb/2⏌ × ⎿vb/2⏌ × ⎿ub/2⏌), |
|
| B1101 of size (tb-⎿tb/2⏌ × sb -⎿sb/2⏌ × ⎿vb/2⏌ × ub-⎿ub/2⏌), B1110 of size (tb-⎿tb/2⏌ × sb -⎿sb/2⏌ × vb-⎿vb/2⏌ × ⎿ub/2⏌), B1111 of size (tb-⎿tb/2⏌ × sb -⎿sb/2⏌ × vb-⎿vb/2⏌ × ub-⎿ub/2⏌). |
|
| There are 16 possible sub-blocks, but depending on the size of the parent block, some of these descendant sub-blocks will have one or more of their lengths equal to zero and shall be skipped. All descendants have the same bit-depth as the parent. This situation is indicated by the flag value splitBlock (Table B.44). |
| iii) | Discard the block: the node has no descendants and is represented by an all-zeros block. This situation is indicated by the flag value zeroBlock (Table B.44). |
The procedure described in Table B.7 recursively subdivides the input block, as determined by the ternary flags in the segmentation string until a 1×1×1×1 4D block-size is reached.
Given a particular hexadeca-tree, specified by a unique segmentationString of ternary segmentation flags together with a particular block, the data can be encoded by means of the recursive procedure described in Table B.8. The inputs to this procedure are the transformed block to be encoded and the optimal partition string. It recursively subdivides the input block, as determined by the ternary flags in the segmentation string. Then the magnitude of the single coefficient of this block is encoded one bit at a time, ranging from the current bit plane to the minimumBitPlane (Table B.8), using an arithmetic encoder with a different context information for each bit. If the coefficient is not zero valued, its signal is encoded as well.
The rate and the distortion achieved depend heavily on the choice of the segmentation tree as well as the data itself, and those should be matched. The procedure described in Table B.7, recursively chooses the optimal segmentation tree to use in the encoding of a given block in a rate-distortion sense. The optimization works as follows: it starts with bitplane=maximumBitplane, segmentationString=null, variables J0 and J1 both set to infinity and the full transformed input block. The transformed block is scanned and all its coefficients are compared to a threshold given by 2bitplane. If the magnitudes of all of them are less than the threshold, the optimization procedure is recursively called with the same block as input, but with a bitplane value decreased by one (bitplane-1). The values returned by this recursive call are the new Lagrangian cost J0, and a rate-distortion optimized segmentation string lowerSegmentationString. However, if any coefficient is above the threshold, the transformed block is segmented into up to 16 sub-blocks as previously described. The optimization procedure is called recursively for each sub-block and the returned Lagrangian costs are added to obtain the new Lagrangian cost J1. The segmentation strings returned from these calls are concatenated to form the splitSegmentationString.
Another Lagrangian cost J2 is evaluated considering the resulting cost if the block was replaced by a block entirely composed of zeros. The lowest cost is chosen, and the segmentation string is updated as follows:
— If the minimum cost is J0, the input segmentation string is augmented by appending a flag lowerBitplane followed by the lowerSegmentationString.
— If the minimum cost is J1, the input segmentation string is augmented by appending a flag splitBlock followed by the splitSegmentationString.
— If the minimum cost is J2, the input segmentation string is augmented by appending a flag zeroBlock.
The procedure returns the lowest cost and the resulting associated segmentation string.
The 4D coefficients, flags, and probability context information generated during the encoding process, are input to the arithmetic encoder, that generates the compressed representation of the light field (Table B.4). The adaptive statistical binary arithmetic coding is detailed in Table B.12, Table B.13 and Table B.14. The arithmetic coding requires transmitting only the information needed to allow a decoder to determine the particular fractional interval between 0 and 1 to which the sequence is mapped, adapting to changing statistics in the codestream.
In this subclause, a sample encoding algorithm is provided for informative purposes. The main procedure processing the individual 4D blocks is listed in Table B.1.
Table B.1 — 4D block scan procedure for 4D-Transform mode
LightField() { |
|
SOC_marker() | Write SOC marker |
LFC_marker() | Write LFC marker |
Write SCC_marker() for every colour component not having a global scaling factor equal to 1. | Write SCC marker for colour component of which the global scaling factor is different from 1 |
PNT_marker() | Write PNT marker |
for(t=0; t<T; t+=BLOCK-SIZE_t){// scan order on t | Scan order on t (T defined in Light Field Header box) |
for(s=0; s<S; s+= BLOCK-SIZE_s){ // scan order on s | Scan order on s (S defined in Light Field Header box) |
for(v=0; v<V; v+= BLOCK-SIZE_v){ // scan order on v | Scan order on v (V defined in Light Field Header box) |
for(u=0; u<U; u+= BLOCK-SIZE_u){ // scan order on u | Scan order on u (U defined in Light Field Header box) |
for(c=0; c<NC; c++){ // scan order on colour components | Scan order on c (colour component) |
SOB_marker(); | Write SOB marker |
if ((TRNC) && ((T – BLOCK-SIZE_t) < t < T) ) { tk = T umod BLOCK_SIZE_t;} else tk = BLOCK-SIZE_t; | Block size computation |
if ((TRNC) && ((S – BLOCK-SIZE_s) < s < S) ) { sk = S umod BLOCK_SIZE_s;} else sk = BLOCK-SIZE_s; | Block size computation |
if ((TRNC) && ((V – BLOCK-SIZE_v) < v < V) ) { vk = V umod BLOCK_SIZE_v;} else vk = BLOCK-SIZE_v; | Block size computation |
if ( (TRNC) && ((U – BLOCK-SIZE_u) < u < U) ) { uk = U umod BLOCK_SIZE_u;} else uk = BLOCK-SIZE_u; | Block size computation |
Padding(LF.BlockAtPosition(t, s, v, u)); | Fills the pixels outside the light field if needed. If TRNC ==1, there will be no pixel outside the light field, and Padding (Table B.2) will have no effect. Note that the 4D array LF.BlockAtPosition is a local copy of the currently processed 4D block (for colour component c). |
InitEncoder() | Initializes the arithmetic coder for each coded codestream (Table B.13) |
Encode(LF.BlockAtPosition(t, s, v, u), lambda) |
|
} // end of scan order on colour components loop |
|
} |
|
} |
|
} |
|
} |
|
Table B.2 — 4D block padding procedure
Procedure Padding(block) { | When TRNC == 0 and a block exceeds a light field dimension, the exceeded block pixels are filled with block values at the light field border |
if (t+tk > T) { |
|
for(ti = T; ti<t+tk; ++ti) { |
|
for(si = s; si<s+sk; ++si) { |
|
for(vi = v; vi<v+vk; ++vi) { |
|
for(ui = u; ui<u+uk; ++ui) { |
|
block[ti-t][si-s][vi-v][ui-u] = block[T-t-1][si-s][vi-v][ui-u] |
|
} |
|
} |
|
} |
|
} |
|
} |
|
if (s+sk > S) { |
|
for(ti = t; ti<t+tk; ++ti) { |
|
for(si = S; si< s+sk; ++si) { |
|
for(vi = v; vi<v+vk; ++vi) { |
|
for(ui = u; ui<u+uk; ++ui) { |
|
block[ti-t][si-s][vi-v][ui-u] = block[ti-t][S-s-1][vi-v][ui-u] |
|
} |
|
} |
|
} |
|
} |
|
} |
|
if (v+vk > V) { |
|
for(ti = t; ti<t+tk; ++ti) { |
|
for(si = s; si<s+sk; ++si) { |
|
for(vi = V; vi<v+vk; ++vi) { |
|
for(ui = u; ui<u+uk; ++ui) { |
|
block[ti-t][si-s][vi-v][ui-u] = block[ti-t][si-s][V-v-1][ui-u] |
|
} |
|
} |
|
} |
|
} |
|
} |
|
if (u+uk > U) { |
|
for(ti = t; ti<t+tk; ++ti) { |
|
for(si = s; si< s+sk; ++si) { |
|
for(vi = v; vi< v+vk; ++vi) { |
|
for(ui = U; ui< u+uk; ++ui) { |
|
block[ti-t][si-s][vi-v][ui-u] = block[ti-t][si-s][vi-v][U-u-1] |
|
} |
|
} |
|
} |
|
} |
|
} |
|
} |
|
Table B.3 — 4D-DCT block coefficient component scaling procedure
Procedure ScaleBlock(block){ | Scaling of the 4D-DCT coefficients components by Spscc (see B.3.2.6.3) |
for(ti = 0; ti< tk; ++ti) { |
|
for(si = 0; si< sk; ++si) { |
|
for(vi = 0; vi< vk; ++vi) { |
|
for(ui = 0; ui< uk; ++ui) { |
|
block[ti][si][vi][ui] =⎾Spscc[c] × block[ti][si][vi][ui]⏋; | The array Spscc contains the global scaling factors for each colour component. |
} |
|
} |
|
} |
|
} |
|
Table B.4 — 4D block encoding procedure for 4D-Transform mode
Procedure Encode(block, lambda) { |
|
OptimizePartition(partitionString, block, lambda); | Defined in Table B.5 |
EncodeMinimumBitPlane(); | Defined in Table B.10 |
EncodePartition(partitionString, block); | Defined in Table B.6 |
FlushEncoder(); | Defined in Table B.14 |
} |
|
Table B.5 — 4D block partition optimization procedure
Procedure OptimizePartition(block, lambda) { | Finds the optimal 4D-block partition |
blockDCT = 4DDCT(block); | Transformed block (using Annex C.6.3.2 definitions) |
ScaleBlock(blockDCT); | Scaling of the 4D-DCT coefficients components by Spscc (see Table B.3) |
if( (tb,sb,vb,ub) == (tk,sk,vk,uk) ){ | If the block size is equal to the maximum block size |
EvaluateOptimalbitPlane(MinimumBitPlane, block, lambda); | Defined in Table B.11 |
set MinimumBitPlane to the optimal value found; |
|
set partititionString = ""; |
|
} |
|
segmentationString = ""; |
|
J0 = OptimizeHexadecaTree(block, maximumBitPlane, lambda, | Defined in Table B.7 |
segmentationString); | |
JS = infinity; |
|
if((vb > 1)&&(ub > 1)) { | If spatial dimensions (ub, vb) of the block are greater than the predefined minimum then segments the block into 4 (four) nonoverlapping sub-blocks (spatialSplit flag – Figure B.7, Figure B.8 and Table B.30) |
v'b = floor(vb/2); |
|
u'b = floor(ub/2); |
|
partitionStringS = ""; |
|
JS = OptimizePartition(block.GetSubblock( | Points to the sub-block position; Returns the Spatial Lagrangian R-D cost |
0,0,0,0,tb,sb,v'b,u'b), lambda); | |
JS += | Points to the sub-block position; Returns the Spatial Lagrangian R-D cost |
OptimizePartition(block.GetSubblock( | |
0,0,0,u'b,tb,sb,v'b,ub-u'b), lambda); | |
JS += | Points to the sub-block position; Returns the Spatial Lagrangian R-D cost |
OptimizePartition(block.GetSubblock( | |
0,0,v'b,0,tb,sb,vb-v'b,u'b), lambda); | |
JS += | Points to the sub-block position; Returns the Spatial Lagrangian R-D cost |
OptimizePartition(block.GetSubblock( | |
0,0,v'b,u'b,tb,sb,vb-v'b,ub-u'b),lambda); | |
} |
|
JV = infinity; |
|
if((tb > 1)&&(sb > 1)) { | If view dimensions (tb, sb) of the block are greater than the predefined minimum then segments the block into 4 (four) nonoverlapping sub-blocks (viewSplit flag – Figure B.9, Figure B.10 and Table B.30) |
t'b = floor(tb/2); |
|
s'b = floor(tb/2); |
|
partitionStringV = ""; |
|
JV = | Points to the sub-block position; Returns the View Lagrangian R-D cost |
OptimizePartition(block.GetSubblock( | |
0,0,0,0,t'b,s'b,vb,ub), lambda); | |
JV += | Points to the sub-block position; Returns the View Lagrangian R-D cost |
OptimizePartition(block.GetSubblock( | |
0,s'b,0,0,t'b,sb-s'b,vb,ub), lambda); | |
JV += | Points to the sub-block position; Returns the View Lagrangian R-D cost |
OptimizePartition(block.GetSubblock( | |
t'b,0,0,0,tb-t'b,s'b,vb,ub), lambda); | |
JV += | Points to the sub-block position; Returns the View Lagrangian R-D cost |
OptimizePartition(block.GetSubblock( | |
t'b,s'b,0,0,tb-t'b,sb-s'b,vb,ub), lambda); | |
} |
|
|
|
if((J0 < JS)&&(J0 < JV)) { | Returns: transform flag (Figure B.11, Table B.30) |
partitionString = cat(partitionString, transformFlag); |
|
return partitionString, J0; | Returns the Lagrangian cost of transforming the block and the transform flag (Figure B.11, Table B.30) |
} |
|
if((JS < J0)&&(JS < JV)) { |
|
partitionString = cat(partitionString, spatialSplitFlag); |
|
return partitionString, JS; | Returns the Lagrangian cost of the spatial segmentation and the spatialSplit flag (Figure B.11, Table B.30) |
} |
|
if((JV < JS)&&(JV < J0)) { |
|
partitionString = cat(partitionString, viewSplitFlag); |
|
return partitionString, JV; | Returns the Lagrangian cost of the view segmenation and the viewSplit flag (Figure B.11, Table B.30) |
} |
|
} |
|
Table B.6 — 4D block encode partition procedure
Procedure EncodePartition(block, lambda) { | Encodes a 4D block |
blockDCT = 4DDCT(block); | Defined in Annex B.2.3.2 |
ScaleBlock(blockDCT); | Scaling of the 4D-DCT coefficients components by Spscc (Annex B.3.2.6.3 and Table B.3) |
if( (tb,sb,vb,ub) == (tk,sk,vk,uk)) { | If the block size is equal to the maximum block size |
point to the start of the partitionString; |
|
} |
|
get the partitionFlag at the current position of the |
|
partitionString; | |
advance the pointer to the partitionString by one position; |
|
|
|
if(partitionFlag == 0) EncodeBit(0,0); | Transmits the partitionFlag; |
if(partitionFlag == 1){ |
|
EncodeBit(1,0); |
|
EncodeBit(0,0); |
|
} |
|
if(partitionFlag == 2){ |
|
EncodeBit(1,0); |
|
EncodeBit(1,0); |
|
} |
|
|
|
OptimizeHexadecaTree(block, lambda, maxBitPlane); | Defined in Table B.7 |
set the segmentationStringPointer to the start of the |
|
segmentationString; | |
EncodeHexadecaTree(block, maxBitPlane); | Defined in Table B.8 |
} |
|
|
|
if(partitionFlag == spatialSplitFlag) { |
|
v'b = floor(vb/2); |
|
u'b = floor(ub/2); |
|
EncodePartition(block.GetSubblock( |
|
0,0,0,0,tb,sb,v'b,u'b)); | |
EncodePartition(block.GetSubblock( |
|
0,0,0,u'b,tb,sb,v'b,ub-u'b)); |
|
EncodePartition(block.GetSubblock( |
|
0,0,v'b,0,tb,sb,vb-v'b,u'b)); |
|
EncodePartition(block.GetSubblock( |
|
0,0,v'b,u'b,tb,sb,vb-v'b,ub-u'b)); |
|
} |
|
if(partitionFlag == viewSplitFlag) { |
|
t'b = floor(tb/2); |
|
s'b = floor(tb/2); |
|
EncodePartition(block.GetSubblock( |
|
0,0,0,0,t'b,s'b,vb,ub)); |
|
EncodePartition(block.GetSubblock( |
|
0,s'b,0,0,t'b,sb-s'b,vb,ub)); |
|
EncodePartition(block.GetSubblock( |
|
t'b,0,0,0,tb-t'b,s'b,vb,ub)); |
|
EncodePartition(block.GetSubblock( |
|
t'b,s'b,0,0,tb-t'b,sb-s'b,vb,ub)); |
|
} |
|
return; |
|
} |
|
Table B.7 — 4D block hexadeca-tree optimization procedure
Procedure OptimizeHexadecaTree(block, bitplane, lambda,segmentationString) { | Recursive Hexadeca-tree optimization procedure |
if(bitplane < InferiorBitPlane) { |
|
return the sum of the squared values of the coefficients | Energy of the block |
of the block; |
|
} |
|
|
|
if(the block is of size (tb,sb,vb,ub) == (1,1,1,1)) { | If the block size is 1x1x1x1 then: |
estimate the rate R to encode the remaining bits of the | estimate the rate (R) to encode the coefficient |
coefficient, from bitplane down to MinimumBitPlane; | |
evaluate the distortion D, as the squared error between the | evaluate the distortion D; |
coefficient represented with minimumBitPlane precision | |
and the full precision coefficient; | |
return J = D + lambda × r | return the Rate-Distortion cost |
} |
|
J0 = infinity; |
|
J1 = infinity; |
|
if (the magnitude of any coefficient of the block | If the magnitude of all elements of the block is smaller than bitplane |
is less than 1 << bitplane) { | |
lowerSegmentationString = ""; | The lower segmentation string is null |
J0 = OptimizeHexadecaTree(block, bitplane-1, lambda, |
|
lowerSegmentationString); |
|
} |
|
else { | Subdivides the block in at most 16 non-overlapping sub-blocks t’b, s’b, v’b and u’b (Figure B.11) by splitting in half at each dimension whenever the length at that dimension is greater than one |
t'b = floor(tb/2); |
|
s'b = floor(sb/2); |
|
v'b = floor(vb/2); |
|
u'b = floor(ub/2); |
|
} |
|
nseg_t = nseg_s = nseg_v = nseg_u = 1; |
|
if(t'b > 1) nseg_t++; |
|
if(s'b > 1) nseg_s++; |
|
if(v'b > 1) nseg_v++; |
|
if(u'b > 1) nseg_u++; |
|
splitSegmentationString = “ “; | The split segmentation string is null |
J1 = 0; |
|
for(t = 0; t < nseg_t; t++) { |
|
for(s = 0; s < nseg_s; s++) { |
|
for(v = 0; v < nseg_v; v++) { |
|
for(u = 0; u < nseg_u; u++) { |
|
new_t = t×t'b + (1-t)×(tb-t'b); |
|
new_s = s×s'b + (1-s)×(sb-s'b); |
|
new_v = v×v'b + (1-v)×(vb-v'b); |
|
new_u = u×u'b + (1-u)×(ub-u'b); |
|
get subBlock from block. | The subBlock size is (new_t, new_s, new_v, new_u) the position is (t×t'b,s×s'b,v×v'b,u×u'b) |
J1 += |
|
OptimizeHexadecaTree(subBlock, bitplane, lambda, |
|
splitSegmentationString); |
|
} |
|
} |
|
} |
|
} |
|
J2 = 0; |
|
J2 = sum of the squared values of the coefficients of the block + | Energy of the block plus the Lagrangian multiplier times the rate to encode the zeroBlock flag (Table B.30). |
+ lambda × rate to encode flag zeroBlock flag; | |
J0 += lambda × rate to encode the lowerBitPlane flag | The cost to encode the lowerBitplane Flag (Table B.30). |
J1 += lambda × rate to encode the splitBlock flag | The cost to encode the splitBlock flag (Table B.30). |
if(J0 < J1 && J0 < J2) { |
|
segmentationString = cat(segmentationString, lowerBitplane, |
|
lowerSegmentationString); |
|
return J0, segmentationString | Returns the Lagrangian cost and the optimal segmentation string |
} |
|
if(J1 < J0 && J0 < J2) { |
|
segmentationString = cat(segmentationString, splitBlock, splitSegmentationString); |
|
return J1, segmentationString | Returns the Lagrangian cost and the optimal segmentation string |
} |
|
if(J2 < J0 && J2 < J1) { |
|
segmentationString = cat(segmentationString, zeroFlag); |
|
return J2, segmentationString | Returns the Lagrangian cost and the optimal segmentation string |
} |
|
} |
|
Table B.8 — 4D block hexadeca-tree encoding procedure
Procedure EncodeHexadecaTree(block, bitplane) { | Encodes the resulting blocks from the hexadeca-tree structure and associated flags. |
if(bitplane < InferiorBitPlane) return | Below the lowest level |
if(the block is of size (tb,sb,vb,ub) == (1,1,1,1)) { |
|
EncodeCoefficient(block, bitplane); | Defined in Table B.9 |
} |
|
get the segmentationFlag at the current position of the segmentationString; |
|
advance the pointer to the segmentationString by one position; |
|
|
|
EncodeBit((segmentationFlag>>1) & 01, 33 + 2×bitplane); | Transmits the segmentationFlag (Table B.12) |
UpdateModel((segmentationFlag>>1) & 01, 33 + 2×bitplane); | Defined in Table B.42 |
|
|
if((segmentationFlag>>1) & 01 == 0){ |
|
EncodeBit((segmentationFlag & 01), 34 + 2×bitplane); | Defined in Table B.12 |
UpdateModel(segmentationFlag & 01), 34 + 2×bitplane); | Defined in Table B.42 |
} |
|
if(segmentationFlag == zeroBlock) return; |
|
|
|
if (segmentationFlag == lowerBitPlane) { | Lowers the bit-plane |
EncodeHexadecaTree(block, bitplane-1); |
|
} |
|
if (segmentationFlag == splitBlock) { |
|
t'b = floor(tb/2); |
|
s'b = floor(sb/2); |
|
v'b = floor(vb/2); |
|
u'b = floor(ub/2); |
|
nseg_t = nseg_s = nseg_v = nseg_u = 1; | Number of segments in each 4D dimension |
if(tb > 1) nseg_t++; |
|
if(sb > 1) nseg_s++; |
|
if(vb > 1) nseg_v++; |
|
if(ub > 1) nseg_u++; |
|
|
|
for(t = 0; t < nseg_t; t++) { |
|
for(s = 0; s < nsg_s; s++) { |
|
for(v = 0; v < nseg_v; v++) { |
|
for(u = 0; u < nseg_u; u++) { |
|
new_t = t×t'b + (1-t)×(tb-t'b); |
|
new_s = s×s'b + (1-s)×(sb-s'b); |
|
new_v = v×v'b + (1-v)×(vb-v'b); |
|
new_u = u×u'b + (1-u)×(ub-u'b); |
|
get subBlock from block at position | The subBlock size is (new_t, new_s, new_v, new_u) the position is (t×t'b,s×s'b,v×v'b,u×u'b) |
(t×t'b,s×s'b,v×v'b,u×u'b) | |
EncodeHexadecaTree(subBlock, bitplane); |
|
} |
|
} |
|
} |
|
} |
|
} |
|
} |
|
Table B.9 — 4D block hexadeca-tree encode coefficient procedure
Procedure EncodeCoefficient(coefficient, bitplane){ | Uses the arithmetic encoder to encode the coefficient bit using the value of the bit-plane as context. |
Magnitude = |Coefficient|; |
|
for(bitplane_counter=bitplane ; |
|
bitplane_counter>=MinimumBitPlane; |
|
bitplane_counter--) { |
|
CoefficientBit = (Magnitude >> bitplane_counter-MinimumBitPlane) & 01H; |
|
EncodeBit(CoefficientBit,bitplane_counter); | Transmits CoefficientBit (Table B.12) |
} |
|
if (Magnitude > 0) { |
|
if(Coefficient > 0) EncodeBit(0,0); | Transmits signal (transmits '0') (Table B.12 |
else EncodeBit(1,0); | Transmits signal (transmits '1') (Table B.12) |
} |
|
} |
|
Table B.10 — 4D-block hexadeca-tree encode minimum bit-plane procedure
Procedure EncodeMinimumBitPlane() { | Encodes the 8-bit unsigned representation of the maximum number of bit-planes minus 1. |
for(bitplane_counter=7 ; bitplane_counter>=0; |
|
bitplane_counter--) { |
|
Bit = (MinimumBitPlane >> bitplane_counter) & 01H; |
|
EncodeBit(Bit,0); | Transmits Bit (Table B.12) |
} |
|
} |
|
Table B.11— 4D block optimal bit-plane procedure
Procedure EvaluateOptimalbitPlane(MinimumBitPlane, block, lambda) { | Returns the minimal bit-plane |
Jmin = infinity; |
|
MinimumBitPlane = MaximumBitPlane; |
|
AcumulatedRate = 0.0; |
|
for(bitplane = MaximumBitPlane; bitplane >= 0; bitplane--) { |
|
Distortion = 0.0 |
|
for all pixels in block { |
|
AcumulatedRate += rate to encode bit at current bitplane for the current pixel; |
|
Distortion += distortion incurred by encoding the current pixel with bitplane precision |
|
J = Distortion + lambda × AcumulatedRate; |
|
if(J <= Jmin) { |
|
MinimumBitPlane = bitplane; |
|
Jmin = J; |
|
} |
|
} |
|
} |
|
return(MinimumBitPlane); |
|
} |
|
Table B.12 —Encode bit procedure
Procedure EncodeBit(inputbit, modelIndex) { | Encodes the bit that composes the codestream (for modelIndex see Table B.31) |
threshold = floor(((tag - inferiorLimit + 1) × |
|
length = floor(((superiorLimit - inferiorLimit + 1) × |
|
if(inputbit == 0) superiorLimit = inferiorLimit + length - 1; |
|
else inferiorLimit = inferiorLimit + length; |
|
while((MSB of inferiorLimit == MSB of superiorLimit) || |
|
if(MSB of inferiorLimit == MSB of superiorLimit) { |
|
bit = MSB of inferiorLimit; |
|
transmits bit |
|
inferiorLimit = inferiorLimit << 1; | Shifts a zero into the LSB |
superiorLimit = superiorLimit << 1; | Shifts a zero into the LSB |
superiorLimit = superiorLimit+1 |
|
while(ScalingsCounter > 0) { |
|
ScalingsCounter = ScalingsCounter - 1; |
|
transmits (1-bit) |
|
} |
|
} |
|
if(inferiorLimit >= 4000H) and (superiorLimit < C000H ) { |
|
inferiorLimit = inferiorLimit << 1; | Shifts a zero into the LSB |
superiorLimit = superiorLimit << 1; | Shifts a zero into the LSB |
superiorLimit = superiorLimit+1; |
|
inferiorLimit = inferiorLimit ^ 8000H; |
|
superiorLimit = superiorLimit ^ 8000H; |
|
ScalingConter = ScalingCounter + 1; |
|
} |
|
} |
|
} |
|
Table B.13 —Init encoder procedure
Procedure InitEncoder() { | Initializes the Arithmetic Encoder |
inferiorLimit = 0; | Inferior limit of the interval length |
superiorLimit = FFFFH; | Superior limit of the interval length |
ScalingsCounter = 0; |
|
} |
|
Table B.14 —Flush encoder procedure
Procedure FlushEncoder() { | When the encoding is complete, the bits in the buffer must be moved to output codestream before a terminating marker is generated. |
mScalingsCounter++; |
|
if(inferiorLimit >= 4000H) |
|
bit = 1; |
|
else |
|
bit = 0; |
|
transmit bit; |
|
while(ScalingsCounter > 0) { |
|
transmits (1-bit); |
|
ScalingsCounter--; |
|
} |
|
} |
|
In this subclause, the decoding procedure of the codestreams for light field data encoded with the 4D Transform mode is specified. The codestream is signalled as payload of the Contiguous Codestream box defined in Annex A.3.4. Figure B.12 illustrates the decoder architecture.
Figure B.12 — 4D transform mode light field decoder architecture
This section specifies the marker and marker segment syntax and semantics defined by this document. These markers and marker segments provide codestream information for this document. Further, this subclause provides a marker and marker segment syntax that is designed to be used in future specifications that include this document as a normative reference.
This document does not include a definition of conformance. The parameter values of the syntax described in this annex are not intended to portray the capabilities required to be compliant.
This document uses markers and marker segments to delimit and signal the characteristics of the source image and codestream. This set of markers and marker segments is the minimal information needed to achieve the features of this document and is not a file format. A minimal file format is offered in Annexes A and B.
Main header is collections of markers and marker segments. The main header is found at the beginning of the codestream.
Every marker is two bytes long. The first byte consists of a single 0xFF byte. The second byte denotes the specific marker and can have any value in the range 0x01 to 0xFE. Many of these markers are already used in ITU-T Rec. T.81 | ISO/IEC 10918-1, ITU-T Rec. T.84 | ISO/IEC 10918-3, ITU-T Rec. T.800 | ISO/IEC 15444-1 and ITU-T Rec. T.801 | ISO/IEC 15444-2 and shall be regarded as reserved unless specifically used.
A marker segment includes a marker and associated parameters, called marker segment parameters. In every marker segment the first two bytes after the marker shall be an unsigned value that denotes the length in bytes of the marker segment parameters (including the two bytes of this length parameter but not the two bytes of the marker itself). When a marker segment that is not specified in the document appears in a codestream, the decoder shall use the length parameter to discard the marker segment.
Each marker segment is described in terms of its function, usage, and length. The function describes the information contained in the marker segment. The usage describes the logical location and frequency of this marker segment in the codestream. The length describes which parameters determine the length of the marker segment.
These descriptions are followed by a figure that shows the order and relationship of the parameters in the marker segment. Figure B.13 shows an example of this type of figure. The marker segments are designated by the three-letter code of the marker associated with the marker segment. The parameter symbols have capital letter designations followed by the marker's symbol in lower-case letters. A rectangle is used to indicate a parameter's location in the marker segment. The width of the rectangle is proportional to the number of bytes of the parameter. A shaded rectangle (diagonal stripes) indicates that the parameter is of varying size. Two parameters with superscripts and a grey area between indicate a run of several of these parameters.
Figure B.13 — Example of the marker segment description figures
The figure is followed by a list that describes the meaning of each parameter in the marker segment. If parameters are repeated, the length and nature of the run of parameters is defined. As an example, in Figure B.13, the first rectangle represents the marker with the symbol MAR. The second rectangle represents the size of the length parameter SLmar (Table B.15). The third rectangle represents the length parameter Lmar. Parameters Amar, Bmar, Cmar, and Dmar are 8-, 16-, 32-bit and variable length respectively. The notation Emari implies that there are n different parameters, Emari, in a row.
After the list is a table that either describes the allowed parameter values or provides references to other tables that describe these values. Tables for individual parameters are provided to describe any parameter without a simple numerical value. In some cases, these parameters are described by a bit value in a bit field. In this case, an "x" is used to denote bits that are not included in the specification of the parameter or sub-parameter in the corresponding row of the table.
Table B.15 — Size parameters for the SLlfc, SLscc and SLpnt
Value (bits) | Parameter size | |
MSB | LSB | |
xxxx xx00 | Length parameter is 16 bits. | |
xxxx xx01 | Length parameter is 32 bits. | |
xxxx xx10 | Length parameter is 64 bits. | |
| All other values reserved | |
Table B.16 lists the markers specified in this document.
Table B.16 — List of defined marker segments
| Symbol | Code | Main Header | 4D Block Header |
Start of codestream | SOC | 0xFFA0 | Required | Not allowed |
Light field Configuration | LFC | 0xFFA1 | Required | Not allowed |
Colour component scaling | SCC | 0xFFA2 | Optional | Optional |
Codestream pointer set | PNT | 0xFFA3 | Optional | Not allowed |
Start of block | SOB | 0xFFA4 | Not allowed | Required |
End of codestream | EOC | 0xFFD9 | Not allowed | Not allowed |
Figure B.14 shows the construction of the codestream. All of the solid lines show required marker segments. The following markers and marker segments are required to be in a specific location: SOC, LFC, PNT, SOB and EOC. The dashed lines show optional or possibly not required marker segments.
Figure B.14 — Codestream structure
The delimiting marker and marker segments shall be present in all codestreams conforming to this document. Each codestream has only one SOC marker, one EOC marker, and contains at least one 4D block. Each 4D block has one SOB and one EOB marker. The SOC, SOB, and EOC are delimiting markers, not marker segments, and have no explicit length information or other parameters.
Function: Marks the beginning of a codestream specified in this document (Table B.17).
Usage: Main header. This is the first marker in the codestream. There shall be only one SOC per codestream.
Length: Fixed.
SOC | marker code |
Table B.17 — Start of codestream parameter values
Field name | Size (bits) | Value |
SOC (Start of Codestream) | 16 | 0xFFA0 |
Function: Provides information about the uncompressed light field such as the width and height of the subaperture views, number of subaperture views in rows and columns, number of components, component bit depth, number of 4D blocks and size of the 4D blocks, component bit depth for transform coefficients (Figure B.15):
— Llfc: The value of this parameter is determined as .
— ROWS (T): The value of this parameter indicates the number of rows of the subaperture view array. This field is stored as a 4-byte big-endian unsigned integer.
— COLUMNS (S): The value of this parameter indicates the number of columns of the subaperture view array. This field is stored as a 4-byte big-endian unsigned integer.
— HEIGHT (V): The value of this parameter indicates the height of the sample grid. This field is stored as a 4-byte big-endian unsigned integer.
— WIDTH (U): The value of this parameter indicates the width of the sample grid. This field is stored as a 4-byte big-endian unsigned integer.
— NC: This parameter specifies the number of components in the codestream and is stored as a 2-byte big-endian unsigned integer. The value of this field shall be equal to the value of the NC field in the Light Field Header box. If no Channel Definition Box is available, the order of the components for colour images is R-G-B-Aux or Y-U-V-Aux.
— Ssizi: The precision is the precision of the component samples before DC level shifting is performed (i.e. the precision of the original component samples before any processing is performed). If the component sample values are signed, then the range of component sample values is –2(Ssiz+1 AND 0x7F)–1 ≤ component sample value ≤ 2(Ssiz+1 AND 0x7F)–1 – 1. There is one occurrence of this parameter for each component. The order corresponds to the component's index, starting with zero.
— TRNC: If unset (TRNC=0), all 4D blocks will have initially the same fixed-sizes and a padding procedure is applied.
Usage: Main header. There shall be one and only one in the main header immediately after the SOC marker segment. There shall be only one LFC per codestream.
Length: Fixed.
Key
LFC | marker code |
SLlfc | size of Llfc parameter |
Llfc | length of marker segment in bytes (not including the marker) |
ROWS (T) | number of rows of the subaperture view array |
COLUMNS (S) | number of columns of the subaperture view array |
HEIGHT (V) | subaperture view height |
WIDTH (U) | subaperture view width |
NC | number of (colour) components |
Ssizi | precision (depth) in bits and sign of the ith component samples |
N_4D | number of 4D blocks in which the light field is segmented |
BLOCK-SIZE_t | size of the 4D block in the t direction – number of rows of the view array |
BLOCK-SIZE_s | size of the 4D block in the s direction – number of columns of the view array |
BLOCK-SIZE_v | size of the 4D block in the v direction – height of the views |
BLOCK-SIZE_u | size of the 4D block in the u direction – width of the views |
max_bitplanei | precision (depth) in bits of the ith component for the transform coefficients |
TRNC | flag indicating that 4D blocks with dimensions spanning the full available light field dimensions have their sizes truncated to fit in the light field dimensions |
LFC | marker code |
SLlfc | size of Llfc parameter |
Figure B.15 — Light field configuration syntax
Table B.18 — Format of the contents of configuration parameter set for the 4D coding
Field name | Size (bits) | Value |
LFC | 16 | 0xFFA1 |
SLlfc | 8 | 0 |
Llfc | 16 | 42 to 32808 |
ROWS | 32 | 1 to (232– 1) |
COLUMNS | 32 | 1 to (232– 1) |
HEIGHT | 32 | 1 to (232– 1) |
WIDTH | 32 | 1 to (232– 1) |
NC | 16 | 1 to 16 384 |
Ssizi | 8 | See Table B.19 |
N_4D | 32 | 1 to (232– 1) |
BLOCK-SIZE_t | 32 | 1 to (232– 1) |
BLOCK-SIZE_s | 32 | 1 to (232– 1) |
BLOCK-SIZE_v | 32 | 1 to (232– 1) |
BLOCK-SIZE_u | 32 | 1 to (232– 1) |
max_bitplanei | 8 | Number less than or equal to |
TRNC | 8 | 0 or 1 |
Table B.19 — Component Ssiz parameter
Value (bits) | Component sample precision | |
MSB | LSB | |
x000 0000 | Component sample bit depth = value + 1. From 1 bit deep to 38 bits deep respectively (counting the sign bit, if appropriate). | |
0xxx xxxx | Component sample values are unsigned values. | |
1xxx xxxx | Component sample values are signed values. | |
| All other values reserved for ISO/IEC use. | |
Function: Describes the global scaling factor used for a colour component (Table B.20). If this marker segment is not signalled for a specific colour component, the value of the global scaling factor for this colour component shall be 1.
Usage: Main header. No more than one per any given component may be present. Optional.
Length: Variable depending on the number of colour components.
SCC | marker code Table B.20 shows the size and values of the symbol and parameters for the quantization component marker segment. |
SLscc | size of Lscc parameter |
Lscc | length of marker segment in bytes (not including the marker) Lscc = 6 for NC < 257; Lscc = 8 for NC ≥ 257 |
Cscc | index of the component to which this marker segment relates The components are indexed 0, 1, 2, etc. (either 8 or 16 bits depending on NC value defined in the LFC marker segment). |
Spscc | Scaling factor used for the colour component Cscc (see Table B.21) |
|
Table B.20 — Colour component scale parameter values
Field name | Size (bits) | Value |
SCC | 16 | 0xFFA2 |
SLscc | 8 | 0 |
Lscc | 16 | 6 or 8 |
Cscc | 8 | 0 to 255; if NC |
Spscc | 16 | Table B.21 |
Table B.21 — Quantization values for the Spscc parameter
Value (bits) | Scaling factor values | |
MSB | LSB | |
xxxx x000 0000 0000 | mantissa of scaling factor value | |
0000 0xxx xxxx xxxx | exponent of scaling factor value | |
Function: Provides pointers to the codestream associated each 4D block to facilitate efficient access (Table B.22).
Usage: Optional. If present, must be included in the main header between the LFC marker segment and the first SOB marker segment defined in this document. Only one PNT marker segment shall be embedded in the main header (Figure B.16).
Length: Variable.
Key
PNT marker code
SLpnt size of Lpnt parameter
Lpnt length of marker segment in bytes (not including the marker)
Spnt size of the PPnt parameter
PPnti,c pointer to codestream of 4D block and colour component c
NOTE 1 The value of the Lpnt parameter is determined as follows:
where N_4D is defined in the LFC marker segment.
NOTE 2 The PPnti,c pointer indicates the position of the addressed 4D block codestream – its SOB marker – in the Contiguous Codestream box counting from the beginning of this box, i.e. the LBox field.
Figure B.16 — Codestream pointer set syntax
Table B.22 — Format of the contents of the codestream pointer set for the 4D coding
Field name | Size (bits) | Value |
PNT | 16 | 0xFFA3 |
SLpnt | 8 | 2 |
Lpnt | 64 | 13 to |
Spnt | 8 | Table B.23 |
PPnti | 32 if Spnt = 0 | 1 to (232– 1) |
Table B.23 — Size parameters for Spnt
Value (bits) | Parameter size | |
MSB | LSB | |
xxxx xxx0 | PPnt parameters are 32bits. | |
xxxx xxx1 | PPnt parameter are 64bits. | |
| All other values reserved. | |
Function: Marks the beginning of a 4D block (Table B.24).
Usage: Every 4D block header. Shall be the first marker segment in a 4D Block header. There shall be at least one SOB in a codestream. There shall be only one SOB per 4D block.
Length: Fixed.
SOB | marker code |
Table B.24 — Format of the contents of the start of block
Field name | Size (bits) | Value |
SOB | 8 | 0xFFA4 |
Function: Indicates the end of the codestream (Table B.25).
NOTE 1 This marker shares the same code as the EOI marker in ITU-T Rec. T.81 | ISO/IEC 10918-1 and the EOC marker in ITU-T Rec. T.800 | ISO/IEC 15444-1.
Usage: Shall be the last marker in a codestream. There shall be one EOC per codestream.
NOTE 2 In the case a file has been corrupted, it is possible that a decoder could extract much useful compressed image data without encountering an EOC marker.
Length: Fixed.
EOC | marker code |
Table B.25 — Format of the contents of end of codestream
Field name | Size (bits) | Value |
EOC | 8 | 0xFFD9 |
The procedure to parse and decode a light field codestream as contained by the Contiguous Codestream box (Annex A.3.4), with dimensions (max_t, max_s, max_v, max_u) defined as ROW, COLUMN, HEIGHT and WIDTH in Figure B.15 (Table B.18), with block dimensions (tk, sk, vk, uk) defined as BLOCK-SIZE_t, BLOCK-SIZE_s, BLOCK-SIZE_v and BLOCK-SIZE_u in Figure B.15 (Table B.18), with a maximum number of bit-planes of max_bitplane (defined in Figure B.15 (Table B.18)), is described in the pseudo-code (all variables are integers). The scan order is in the order t, s, v, u, as described in Table B.26.
The coordinate set (t, s, v, u) refers to the left, superior corner of the 4D block. The procedure “ResetArithmeticDecoder()” resets all the context model probabilities of the arithmetic decoder. The procedure “LocateContiguousCodestream” reads the pointer corresponding to position (t,s,v,u) and delivers the respective codestream to procedure “DecodeContiguous Codestream()”. The procedure “DecodeContiguousCodestream()” decodes the components of the block contained in the codestream enabling the sequential decoding of light field. The codestreams of the components are decoded sequentially.
Table B.26 — JPEG Pleno (JPL) codestream structure for the 4D transform mode
Defined syntax | Simplified structure |
LightField() { |
|
SOC_marker() | Codestream_Header() |
LFC_marker() | |
Read all SCC_marker() | |
PNT_marker() | |
for(t=0; t<T; t+=BLOCK-SIZE_t ){// scan order on t | Codestream_Body() |
for(s=0; s<S; s+= BLOCK-SIZE_s){ // scan order on s |
|
for(v=0; v<V; v+= BLOCK-SIZE_v){ // scan order on v |
|
for(u=0; u<U; u+= BLOCK-SIZE_u){ // scan order on u |
|
for(c=0; c<NC-1; c++){ // scan order on colour components |
|
// Initializes the arithmetic decoder for each decoded 4D block codestream |
|
ResetArithmeticDecoder(); |
|
// Finds the corresponding 4D block codestream for the desired position on the light field |
|
LocateContiguousCodestream (t, s, v, u); |
|
SOB_marker() |
|
// Decodes contiguous 4D block codestream found by the previous procedure |
|
if ( (TRNC) && ((T – BLOCK-SIZE_t) < t < T) ) { |
|
tk = T umod BLOCK-SIZE_t; } | |
else tk = BLOCK-SIZE_t; | |
if ( (TRNC) && ((S – BLOCK-SIZE_s) < s < S) ) { |
|
sk = S umod BLOCK-SIZE_s; } | |
else sk = BLOCK-SIZE_s; | |
if ( (TRNC) && ((V – BLOCK-SIZE_v) < v < V) ) { |
|
vk = V umod BLOCK-SIZE_v; } | |
else vk = BLOCK-SIZE_v; | |
if ( (TRNC) && ((U – BLOCK-SIZE_u) < u < U) ) { |
|
uk = U umod BLOCK-SIZE_u; } | |
else uk = BLOCK-SIZE_u; | |
LF.BlockAtPosition(t, s, v, u) = |
|
DecodeContiguousCodestream(tk, sk, vk, uk); | |
} // end of scan order on colour components loop |
|
} // end of scan order on u loop |
|
} // end of scan order on v loop |
|
} // end of scan order on s loop |
|
} // end of scan order on t loop |
|
EOC_marker() | Codestream_End() |
} |
The datasets (all texture views in Figure B.1) are composed by 4D light fields of dimensions t×s×v×u. The views are addressed by the s,t coordinates pair, while the u,v pair addresses a pixel within each s,t view, as pictured in Figure B.6.
The root node of the tree corresponds to a full t×s×v×u transform. The partition tree is represented by a series of ternary flags:
— | A spatialSplit flag indicates that a tk×sk×vk×uk block is segmented into a set of four sub-blocks {spatialSubblock0, spatialSubblock1, spatialSubblock2 and spatialSubblock3}, of dimensions tk×sk×⎿vk/2⏌×⎿uk/2⏌, tk×sk×(vk−⎿vk/2⏌)×⎿uk/2⏌, tk×sk×⎿vk/2⏌×(uk−⎿uk/2⏌) and tk×sk×(vk −⎿vk/2⏌)×(uk −⎿uk/2⏌) respectively, where |
— | A viewSplit flag indicates that a tk×sk×vk×uk block is segmented into a set of four sub-blocks {viewSubblock0, viewSubblock1, viewSubblock2 and viewSubblock3}, of dimensions ⎿tk/2⏌×⎿sk/2⏌×vk×uk, (tk−⎿tk/2⏌)×⎿sk/2⏌×vk×uk, ⎿tk/2⏌×(sk−⎿sk/2⏌)×vk×uk and tk×sk×(vk −⎿vk/2⏌)×(uk −⎿uk/2⏌), and. (tk−⎿tk/2⏌)×(sk−⎿sk/2⏌)×vk×uk . Figure B.9, Figure B.10 and Figure B.18 illustrate the results when applying the viewSplit flag; |
— | The partition tree has its leaf nodes marked by a transform flag. Each sub-block that is not a leaf is recursively decoded in this fashion, from sub-block 0 to 3 of the sub-block set, and the decoded flags of the sub-trees of each sub-block are concatenated in this order. |
NOTE The transform flag signals that the node is a leaf node and will be no further partitioned.
— Figure B.11 shows six nodes marked by a transform flag, two nodes split (segmented) accordingly the spatialSplit and the viewSplit flags, and the 9th node, (⎿tk/2⏌×(sk-⎿sk/2⏌)×⎿vk/2⏌×⎿uk/2⏌, which can be further segmented using either the spatialSplit flag or the viewSplit flag.
Figure B.17 depicts the result of a 4D block with dimensions of 9×9×434×625 (tk×sk×vk×uk) partitioned, using the spatialSplit flag, into a sub-block with dimensions of 9×9×217×312, in grey, (tk×sk×(vk −⎿vk /2⏌)×⎿uk/2⏌). Please note that, in Figure B.17 the partitioned v and u dimensions are: 217 = 434 −⎿434/2⏌ (vk −⎿vk /2⏌) and 312 =⎿625/2⏌ (⎿uk/2⏌).
Figure B.18 depicts the result of a 4D block with dimensions of 9×9×434×625 (tk×sk×vk×uk) partitioned, using the viewSplit flag, into a sub-block with dimensions of 5×4×434×625, in grey, ((tk−⎿tk/2⏌)×⎿sk/2⏌×vk×uk). Please note that, the partitioned t and s dimensions are: (tk−⎿tk/2⏌) and 4 =⎿9/2⏌ (⎿sk/2⏌).
Figure B.17 — Example of a 4D block with dimensions tk×sk×(vk −)×
(in grey) superimposed to a 4D block with dimensions tk×sk×vk×uk
Figure B.18 — Example of a 4D block with dimensions (tk )×
×vk×uk (in grey) superimposed on a 4D block with dimensions tk×sk×vk×uk
For a contiguous codestream of a 4D-block with size tk×sk×vk×uk, a 4D-block partitioning decoding procedure is performed (DecodeContiguousCodestream), that uses the recursive procedure “Procedure DecodePartitionStep”, both defined in Table B.27 and Table B.28.
Table B.27 — Decode contiguous codestream procedure
Procedure DecodeContiguousCodestream(tk, sk, vk, uk){ | Decodes a 4D block of size (tk, sk, vk, uk) |
|
|
ReadMinimumBitPlane(); | Reads an 8 bits integer that represents the lower bitplane of transform coefficient, as defined in Table B.34 |
Block=DecodePartitionStep(0,0,0,0,tk,sk,vk,uk); | Defined in Table B.29 |
Return Block; |
|
} |
|
Table B.28 — 4D-DCT block coefficient component inverse scaling
Procedure InverseScaleBlock(block){ | Inverse scaling of the 4D-DCT coefficients components by Spscc of colour c (see B.3.2.6.3) |
for(ti = 0; ti<tk; ++ti) { |
|
for(si = 0; si< sk; ++si) { |
|
for(vi = 0; vi< vk; ++vi) { |
|
for(ui = 0; ui< uk; ++ui) { |
|
block[ti][si][vi][ui] = block[ti][si][vi][ui] / Spscc[c]; |
|
} |
|
} |
|
} |
|
} |
|
} |
|
Table B.29 — Decode partition step procedure
Procedure DecodePartitionStep(tpp, spp, vpp, upp, tk, sk, vk, uk){ | Recursively decodes the 4D block and the partition flags of a contiguous codestream |
Block decodedBlock; |
|
ReadPartitionTreeFlag(); | Reads flag from arithmetic decoder – defined in Table B.37 |
if (flag == transform) { | Reached the Leaf Node (Transform flag; Figure B.11, Table B.30) |
block=DecodeBlock(max_bitplane); | Recursively decodes the DCT coefficients (Table B.43) |
InverseScaleBlock(Block); | Inverse scaling of the 4D-DCT coefficients components by Spscc (see B.3.2.6.3) |
decodedBlock = 4D_IDCT(Block); | Performs the inverse DCT of the decoded coefficients (Annex B.3.7) |
} |
|
elseIf (flag == spatialSplit){ | Figure B.7, Figure B.8, Figure B.17 and Table B.30 |
Int new_tp, new_sp, new_vp, new_up, new_ tk, new_sk, |
|
new_vk, new_uk; | |
new_tp = tpp; |
|
new_sp = spp; |
|
new_vp = vpp; |
|
new_up = upp; |
|
new_tk = tk; |
|
new_sk = sk; |
|
new_vk = floor(vk/2); |
|
new_uk = floor(uk/2); |
|
decodedSubBlock = DecodePartitionStep(new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Recursively calls the procedure to decode the top-left sub-block |
decodedBlock.CopyFrom(decodedSubBlock, 0, 0, 0, 0, new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Fills the top-left quadrant of the decoded block with decoded sub-block |
new_up = upp +floor(uk/2); |
|
new_uk = uk – floor(uk/2); |
|
decodedSubBlock = DecodePartitionStep(new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Recursively calls the procedure to decode the top-right sub-block |
decodedBlock.CopyFrom(decodedSubBlock, 0, 0, 0, 0, new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Fills the top-right quadrant of the decoded block with decoded sub-block |
new_vp = vpp +floor(vk/2); |
|
new_vk = vk – floor(vk/2); |
|
decodedSubBlock DecodePartitionStep(new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Recursively calls the procedure to decode the bottom-left sub-block |
decodedBlock.CopyFrom(decodedSubBlock, 0, 0, 0, 0, new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Fills the bottom-left quadrant of the decoded block with decoded sub-block |
new_up = upp; |
|
new_uk = floor(uk/2); |
|
decodedSubBlock = DecodePartitionStep(new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Recursively calls the procedure to decode the bottom-right sub-block |
decodedBlock.CopyFrom(decodedSubBlock, 0, 0, 0, 0, new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Fills the bottom-right quadrant of the decoded block with decoded sub-block |
} |
|
|
|
elseIf (flag == viewSplit) { | Figure B.9, Figure B.10, Figure B.18 and Table B.30 |
Int new_tp, new_sp, new_vp, new_up, new_ tk, new_sk, new_vk, new_uk; |
|
new_tp = tpp; |
|
new_sp = spp; |
|
new_vp = vpp; |
|
new_up = upp; |
|
new_tk = floor(tk/2); |
|
new_sk = floor(sk/2); |
|
new_vk = vk; |
|
new_uk = uk; |
|
decodedSubBlock = DecodePartitionStep(new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Recursively calls the procedure to decode the top-left sub-block |
decodedBlock.CopyFrom(decodedSubBlock, 0, 0, 0, 0, new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Fills the top-left quadrant of the decoded block with decoded sub-block |
new_sp = spp + floor(sk/2); |
|
new_sk = sk – floor(sk/2); |
|
decodedSubBlock = DecodePartitionStep(new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Recursively calls the procedure to decode the top-right sub-block |
decodedBlock.CopyFrom(decodedSubBlock, 0, 0, 0, 0, new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Fills the top-right quadrant of the decoded block with decoded sub-block |
new_tp = tpp + floor(vk/2); |
|
new_tk = tk – floor(tk/2); |
|
decodedSubBlock = DecodePartitionStep(new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Recursively calls the procedure to decode the bottom-left sub-block |
decodedBlock.CopyFrom(decodedSubBlock, 0, 0, 0, 0, new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Fills the bottom-left quadrant of the decoded block with decoded sub-block |
new_sp = spp; |
|
new_sk = floor(sk/2); |
|
decodedSubBlock = DecodePartitionStep(new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Recursively calls the procedure to decode the bottom-right sub-block |
decodedBlock.CopyFrom(decodedSubBlock, 0, 0, 0, 0, new_tp, new_sp, new_vp, new_up, new_tk, new_sk, new_vk, new_uk); | Fills the bottom-right quadrant of the decoded block with decoded sub-block |
} |
|
return blockDecoded; |
|
} |
|
Table B.30 — Lists of partition flags representations
Partition Flag | Representation |
transform | 0 |
spatialSplit | 1 |
viewSplit | 2 |
Figure B.19 shows a simple block diagram of a binary adaptive arithmetic decoder. The compressed light field data cd and a context cx from the decoder's model unit (not shown) are input to the arithmetic decoder. The arithmetic decoder's output is the decision d. The encoder and decoder model units need to supply exactly the same context cx for each given decision. The decoding process should be initialized. The contexts (cx) and bytes of compressed light field data (as needed) are read and passed on to the decoder until all contexts have been read. The decoding process computes the binary decision d and returns a value of either 0 or 1. The estimation procedures for the probability models, which provide adaptive estimates of the probability for each context are part of the decoding process.
Figure B.19 — Arithmetic decoder inputs and output
The contexts of the arithmetic coder are defined in Table B.31. When the adaptive flag is Off the probability model is fixed.
Table B.31 — Arithmetic coder contexts
Symbol | Context range | Adaptive Flag |
DCT coefficient sign | 0 | Off |
DCT coefficients bits | 1-32 | On |
Hexadecatree flags | 33-98 | On |
Partition flags | 0 | Off |
Each call to the arithmetic decoder to read a variable is performed according to the following syntax:
symbol=read(Context, AdaptiveFlag);
Each call to read a symbol from stream, may update arithmetic decoder models if the adaptive flag is On. The procedure read(Context, AdaptiveFlag) is defined in Table B.32 and the variable Context is defined in Table B.31.
Table B.32 — Read bit using arithmetic decoder procedure
Procedure Read(Context, AdaptiveFlag){ | Context defined in Table B.31 |
Bit =DecodeBit(Context); | Defined in Table B.41 |
If (AdaptiveFlag == On) { |
|
If Bit == 0 { |
|
UpdateModel(0, Context); | Defined in Table B.42 |
} |
|
Else { |
|
UpdateModel(1,Context); | Defined in Table B.42 |
} |
|
} |
|
return Bit; |
|
} |
|
Table B.33 — Read magnitude procedure
Procedure ReadMagnitude(bitplane, MinimumBitPlane) | Reads the magnitude of a DCT coefficient |
Magnitude = 0; |
|
for(bitplane_counter=bitplane ; bitplane_counter>= ReadMinimumBitPlane; | MinimumBitPlane denotes the lowest bit-plane used for encodinga |
bitplane_counter--){ | |
Magnitude = Magnitude << 1; |
|
CoefficientBit =Read(bitplane_counter + 1, On); | Decodes a coefficient bit |
Magnitude += CoefficientBit; |
|
} |
|
Magnitude = Magnitude << MinimumBitPlane; |
|
if (Magnitude > 0) { |
|
Magnitude += (1 << MinimumBitPlane)/2; |
|
} |
|
return Magnitude; |
|
} |
|
a Its value is encoded as an 8-bit unsigned integer and should be less than or equal to the variable max_bitplane defined in Figure B.15. It must be read just before a DecodePartitionStep procedure, using the procedure “ReadMinimumBitPlane”, defined in Table B.34. | |
Table B.34 — Read minimum bitplane procedure
Procedure ReadMinimumBitPlane(){ | MinimumBitPlane denotes the lowest bit-plane used for encoding |
MinimumBitPlane = 0; |
|
for(counter=0 ; counter<8; counter++){ |
|
MinimumBitPlane = MinimumBitPlane << 1; |
|
Bit =Read(0, Off); | Defined in Table B.32 |
MinimumBitPlane += Bit; |
|
} |
|
return MinimumBitPlane; |
|
} |
|
If a DCT coefficient is different from zero its sign must be read from the stream with the procedure in Table B.35.
Table B.35 — Read sign procedure
Procedure ReadSign(){ | Reads DCT coefficient sign if ≠ 0 |
sign = read(0, Off) |
|
return sign |
|
} |
|
Every ternary hexadecatree flag shall be read using the procedure in Table B.36:
Table B.36 — Read hexadeca-tree flag procedure
Procedure ReadHexadecatreeFlag(bitplane){ | Reads ternary hexadecatree flag: |
bit1 = read(33 + 2×bitplane, On); |
|
if(bit1 == 0) |
|
bit0 = read(34 + 2×bitplane, On); |
|
flag = bit0+2×bit1; |
|
return flag; |
|
} |
|
Every ternary partition flag must be read using the procedure in Table B.37.
Table B.37 — Read partition tree flag procedure
Procedure ReadPartitionTreeFlag(){ | Reads ternary partition flag: |
flag = read(0, Off); |
|
if(flag ==1){ |
|
bit = read(0, Off); |
|
if(bit == 1) |
|
flag++ |
|
} |
|
return flag |
|
} |
|
The arithmetic decoder employs probabilistic models to decode symbols from the codestream.
Definitions and procedures to reset the arithmetic decoder, to update probabilistic models and read information from the stream are detailed in this section.
The procedures use two vectors (acumFreq_0 and acumFreq_1), defined with length MAX_NUMBER_OF_MODELS.
The codec uses 99 models numbered from 0 to 98 (i.e. MAX_NUMBER_OF_MODELS=98).
At initialization the decoder limits are set and first bits are read from the stream as specified in the procedure in Table B.38.
Table B.38 — Initialize decoder procedure
Procedure InitDecoder(){ | Arithmetic decoder initialization |
inferiorLimit = 0; |
|
superiorLimit = FFFFH1; |
|
t = Read16bitsFromStream(); |
|
tag = 0;} |
|
for(n=0; n<16; n++){ | Reads 16 bits From Stream |
tag = tag << 1; |
|
bit = Read1BitFromStream(); | Reads one byte at a time, LSB first |
tag = tag + bit | Append bits to tag |
} |
|
} |
|
A probabilistic model is initialized executing the procedure in Table B.39.
Table B.39 — Initialize probabilistic model procedure
Procedure InitProbabilisticModel(modelIndex) | Probabilistic model initialization |
acumFreq_0[modelIndex] = 1; |
|
acumFreq_1[modelIndex] = 2; |
|
} |
|
The arithmetic decoder is reset, for every model, by calling the “InitDecoder” procedure (Table B.38), followed by the “InitProbabilistModel” (Table B.39 and Table B.40).
Table B.40 — Reset arithmetic decoder procedure
Procedure ResetArithmeticDecoder() |
|
InitDecoder(); | Defined in Table B.38 |
for (counter=0; counter<99;counter++) |
|
InitProbabilisticModel(counter); |
|
} |
|
The procedure to decode a bit from the stream is described in Table B.41.
Table B.41 — Decode bit procedure
bit = DecodeBit(modelIndex){ | Decodes a bit form the codestream (modelIndex defined in Table B.31) |
threshold = floor(((tag - inferiorLimit + 1) × |
|
acumFreq_1[modelIndex]-1)/( superiorLimit - inferiorLimit + 1)); | |
length = floor(((superiorLimit - inferiorLimit + 1) × |
|
acumFreq_0[modelIndex])/acumFreq_1[modelIndex]); | |
|
|
if(threshold < acumFreq_0[modelIndex]) { |
|
bitDecoded = 0; |
|
superiorLimit = inferiorLimit + length-1; |
|
} |
|
else { |
|
bitDecoded = 1; |
|
inferiorLimit = inferiorLimit + length; |
|
} |
|
while((MSB of inferiorLimit == MSB of superiorLimit) || |
|
((inferiorLimit >= 4000H) and (superiorLimit < C000H ))) { | |
if(MSB of inferiorLimit == MSB of superiorLimit) { |
|
inferiorLimit = inferiorLimit << 1; | Shifts a zero into the LSB |
superiorLimit = superiorLimit << 1; | Shifts a zero into the LSB |
superiorLimit = superiorLimit+1 |
|
tag = tag << 1; | Shifts a zero into the LSB |
bit = Read1bitFromStream(); | Reads one byte at a time, LSB first |
tag += bit; |
|
inferiorLimit = inferiorLimit & FFFFH; |
|
superiorLimit = superiorLimit & FFFFH; |
|
t = t & FFFFH; |
|
} |
|
if((inferiorLimit >= 4000H) && (superiorLimit < C000H )) { |
|
inferiorLimit = inferiorLimit << 1; | Shifts a zero into the LSB |
superiorLimit = superiorLimit << 1; | Shifts a zero into the LSB |
superiorLimit = superiorLimit+1; |
|
tag = tag<< 1; //Shifts a zero into the LSB | Shifts a zero into the LSB |
bit = Read1bitFromStream(); |
|
tag+= bit; |
|
inferiorLimit = inferiorLimit ^ 8000H; |
|
superiorLimit = superiorLimit ^ 8000H; |
|
tag = tag ^ 8000H; |
|
inferiorLimit = inferiorLimit & FFFFH; |
|
superiorLimit = superiorLimit & FFFFH; |
|
tag = tag & FFFFH; |
|
} |
|
} |
|
inferiorLimit = inferiorLimit & FFFFH; |
|
superiorLimit = superiorLimit & FFFFH; |
|
tag = tag & FFFFH; |
|
return bitDecoded; |
|
} |
|
The procedure to update the statistic model is described in Table B.42.
Table B.42 — Update model procedure
Procedure UpdateModel(bit, modelIndex){ | Probabilistic model update (modelIndex defined in Table B.31) |
if(bit == 0) { |
|
acumFreq_0[modelIndex]++; |
|
acumFreq_1[modelIndex]++; |
|
} |
|
else { |
|
acumFreq_1[modelIndex]++; |
|
} |
|
if(acumFreq_1[modelIndex] == 4095){ |
|
acumFreq_1[modelIndex] = acumFreq_1[modelIndex]/2; |
|
acumFreq_0[modelIndex] = acumFreq_0[modelIndex]/2; |
|
if(acumFreq_0[modelIndex] == 0) { |
|
acumFreq_0[modelIndex]++; |
|
acumFreq_1[modelIndex]++; |
|
} |
|
} |
|
The MinimumBitPlane value indicates the minimum depth that the hexadeca-tree decoder will descend for the current block.
For each Block SB_k, of size tk×sk×vk×uk, corresponding to the leaves of the partition tree, a hexadeca-tree decoding procedure is performed. It is described by the following recursive procedure:
Table B.43 — Decode block procedure
SB = DecodeBlock(bitPlane) { | Decodes the 4D blocks from the hexadeca-tree structure |
if (bitPlane < MinimumBitPlane){ |
|
return SB = zero | Returns Block = 0 |
} |
|
if (SB is of size 1x1x1x1){ |
|
M = ReadMagnitude(bitplane, MinimumBitPlane); | Reads a bitPlane-MinimumBitPlane precision positive integer M from input (Table B.33) |
if (M > 0){ |
|
ReadSign; | Reads a sign bit (Table B.35) |
if(sign bit == 1) M = -M |
|
} |
|
return SB = M |
|
} |
|
ReadHexadecatreeFlag(bitplane); | Reads hexadeca-tree flag (Table B.36) |
if (flag == “zeroBlock”) return SB = zero |
|
if (flag == “lowerBitPlane”){ |
|
SB = DecodeBlock(bitPlane-1) |
|
return SB |
|
} |
|
if (flag == “splitBlock”){ |
|
t'b = floor(tb/2); | tb,sb,vb,ub are the dimensions of the original Block (SB) |
s'b = floor(sb/2); | |
v'b = floor(vb/2); | |
u'b = floor(ub/2); | |
nseg_t = nseg_s = nseg_v = nseg_u = 1; |
|
if(tb > 1) nseg_t++; |
|
if(sb > 1) nseg_s++; |
|
if(vb > 1) nseg_v++; |
|
if(ub > 1) nseg_u++; |
|
|
|
for(t = 0; t < nseg_t; t++) { |
|
for(s = 0; s < nseg_s; s++) { |
|
for(v = 0; v < nseg_v; v++) { |
|
for(u = 0; u < nseg_u; u++) { |
|
|
|
new_t = t×t'b + (1-t)×(tb-t'b); |
|
new_s = s×s'b + (1-s)×(sb-s'b); |
|
new_v = v×v'b + (1-v)×(vb-v'b); |
|
new_u = u×u'b + (1-u)×(ub-u'b); |
|
SSB_t,s,v,u = DecodeBlock(subBlock, bitplane); |
|
} | |
} | |
} | |
} | |
return SB = cat(SSB_t,s,v,u) | The side-by-side concatenation of all sub-blocks composes the original block. |
} |
|
} |
|
Table B.44 lists the representations of the hexadeca-tree flags.
Table B.44 — Hexadeca-tree flags
Partition Flag | Representation |
lowerBitPlane | 0 |
splitBlock | 1 |
zeroBlock | 2 |
Figure B.20 — Inverse 4D-DCT
As with the direct transform, the inverse transform (4D-IDCT) is separable, i.e. with 1D-IDCTs computed separately in each of the 4 directions. An example of the computation flow of the t×s×v×u separable 4D-IDCT is depicted in Figure B.20 (the 4D inverse transform is the same irrespective of the order of application of the inverse 1D transform). After applying the inverse 4D-DCT a level shift operation is performed.
For a given light field 4D-DCT representation X(i,j,p,q) the corresponding light field x(u,v,s,t) can be computed by inverse transforming each dimension in sequence as follows.
where , and
, where N is the size of the transform. Output pixels are represented as 32-bit integers. As indicated earlier, the transform order is arbitrary.
After processing the inverse DCT for a block of source light field samples, the reconstructed samples of the component that are unsigned shall be inversely level shifted. If the MSB of Ssizi from the LFC marker segment (see B.4.2.2) is zero, all reconstructed samples x(u,v,s,t) of the ith component are level shifted by adding the same quantity from each sample as follows:
NOTE Due to quantization effects, the reconstructed samples x(u,v,s,t) can exceed the dynamic range of the original samples. There is no specified procedure for this overflow or underflow situation. However, clipping the value within the original dynamic range is a typical solution.
This annex describes an instantiation of the reference view encoder for the 4D prediction mode. Next, the decoding process is detailed.
The JPEG Pleno Light Field Reference View box is a superbox that contains the following (see Figure C.1):
— a JPEG Pleno Light Field Reference View Description box, signalling the configuration of the reference view encoding;
— a Common Codestream Elements box, signalling redundant header information from individual codestreams of the reference views;
— a Contiguous Codestream box, containing as payload the individual codestreams of the reference views.
The type of the JPEG Pleno Reference View box shall be ‘lfrv’ (0x6C66 7276).
Figure C.1 — Organization of the JPEG Pleno Light Field Reference View superbox
The JPEG Pleno Light Field Reference View Description box contains information on the encoder issued to individually encode the reference views, the number of reference views, which views are encoded as reference view and pointers to the individual codestreams (see Figure C.2 and Table C.1).
The type of the JPEG Pleno Reference View Description box shall be ‘lfrd’ (0x6C66 7264).
Figure C.2— Organization of the contents of a Light Field Reference View Description box
Table C.1 — Format of the contents of the Light Field Reference View Description box (optional)
Field name | Size (bits) | Value |
TCODEC | 8 | 0 to (28-1) |
16 | 0 to (216– 1) | |
| variable | 1 bit per view [0, 1] |
PP | 8 | 0 (Precision = 32) or 1 (Precision = 64) |
| Precision | 1 to (2Precision– 1) |
| Precision | 1 to (2Precision– 1) |
TCODEC | identifier for the codec deployed for reference views TCODEC values shall correspond to the potential values for the coder type C defined in the JPX file format (ISO/IEC 15444-2) for the Image Header box. The default value of | |
NREF | number of reference views | |
H | this field specifies per subaperture image | |
|
| |
|
| |
8 successive bits are packed as a byte. If | ||
PP | pointer precision 0 indicates 32-bit precision (unsigned integer, default option) – 1 indicates 64-bit precision (unsigned integer). Other values are not valid. | |
TPECl | pointer to contiguous (EXTRDATA) codestream of texture data for texture reference view This pointer indicates the position of the EXTRDATA codestream in the Contiguous Codestream box counting from the beginning of this box, i.e. the LBox field. | |
EXTRDATA | external encoded data payload Contains texture data encoded with the external codec TCODEC. The header information produced by the external codec has been removed. | |
When the codec deployed can be ISO/IEC 15444-1 or other parts, such as ISO/IEC 15444-2, which added support for multi-component transforms. The required capabilities are signalled to the JPEG 2000 decoder using the extended capabilities marker (CAP) which was introduced in ISO/IEC 15444-1. The CAP marker is defined as 0xFF50 followed by a variable length field indicating the Parts of the ISO/IEC 15444 series containing extended capabilities that are used to encode the image.
NOTE Row-wise scan order illustrated with black arrows and grey arrows. Grey arrows indicate move to the beginning of the next row. The coordinates , are the set of
where
=1.
Figure C.3 — Reference view configuration array for
with five reference views in centre plus corners configuration
This box contains redundant codestream elements extracted from the reference view codestreams (Figure C.4 and Table C.2). If this box is signalled the contained codestream elements, representing codestream header information, shall be concatenated with every codestream fragment contained by the Contiguous Codestream box (Figure C.5 and Table C.3). This box is optional.
The type of the Common Codestream Elements box shall be ‘lfcc’ (0x6C66 6363).
Figure C.4 — Organization of the contents of the Common Codestream Elements box
Table C.2 — Format of the contents of the Common Codestream Elements box
Field name | Size (bits) | Value |
Common codestream payload | variable | variable |
Common | The common codestream element can be, for example, the header information from |
The Contiguous Codestream box is specified in Annex A.3.4. In this subclause, its payload is specified, which corresponds to the stripped codestreams of the reference views, i.e. codestreams with the redundant header information removed and stored in the Common Codestream Elements box in the JPEG Pleno Reference View box.
Figure C.5 — Organization of the contents of the Contiguous Codestream box for reference view signalling in the 4D Prediction mode
Table C.3 — Format of the contents of the JPEG Pleno Light Field Contiguous Codestream box for reference view signalling in the 4D Prediction mode
Field name | Size (bits) | Value | Comments |
|---|---|---|---|
Codestream for reference view 1 | variable | variable | External codec codestream |
Codestream for reference view 2 | variable | variable | External codec codestream |
Codestream for reference view | variable | variable | External codec codestream |
This clause provides a description of encoding operations for the reference views. The following variables are elements obtained from matrix H, used for indexing the reference views:
Subscript of the row index for the reference view | |
Subscript of the column index for the reference view |
Reference views are encoded with an external codec, such as . Since all reference views share the same dimensions and bit depth, the header information of such encoding needs to be obtained only once. For this reason, the externally encoded files are split to two parts: 1) the header information, 2) the remaining codestream. The encoder will place the redundant header information as payload in the common codestream element box, and the decoder needs to concatenate the header to the remaining codestream part prior to the decoding with
.
The texture views of the light field are denoted as,
.
The views are addressed by the coordinates pair, while the
pair addresses a pixel within each
view and index
stands for colour component.
Similarly, the decoded texture views are denoted as,
,
and the normalized disparity views of the light field as,
,
and the decoded normalized disparity views as,
,
where
| |
| |
| |
| |
|
|
Figure C.6 — Light field with dimensions T×S×V×U displaying reference views at locations marked REF ()
For a given reference view the encoding of texture is performed as in Table C.4. An example configuration of reference views is illustrated in Figure C.6. Note that the numbering follows the row-wise scanning pattern as illustrated in Figure C.3.
NOTE Row-wise scan order illustrated with black arrows and grey arrows. Grey arrows indicate move to the beginning of the next row. The coordinates , are the set of
where
=1.
Table C.4 — Procedure for encoding the reference view
1 | Read the |
2 | Encode |
3 | Repeat step 1 and 2 until all NREF reference views have been encoded |
4 | Extract and remove common codestream element |
5 | Output all stripped |
6 | Output the common codestream element ( |
The encoding procedure of a reference view is displayed as a flowchart in Figure C.7. In Table C.4, step 4 extracts the header information from the encoded file. The header information is an array of bytes
, and subsequently the bytes are removed from the codestream. The result is a codestream, that is not fully decodable with the
decoder. The header information array
needs to be concatenated back prior to decoding. The decoder can obtain the header information from common codestream element box, detailed in Table C.2, append it to the beginning of the stripped codestream (EXTRADATA), and decode successfully. This stripping of header information saves bytes, and makes the encoding less redundant, since only a single unique header is required.
When obtaining the common codestream element, the encoder must use the correct markers for identifying the header section in the encoded codestream. In the case of , the deployed codec is JPEG 2000, and the marker for identifying the last two bytes of the header is the SOT marker (0xFF90), i.e. the beginning of a tile marker. For other
types, the marker will depend on the chosen codec. Note that the decoding process of the common codestream element is transparent to the codec type. Hence, no normative constraints need to be imposed on the exact cutting point of the
codestream.
The process of Table C.4 applies to all reference views. The encoder will output all the reference view data and common codestream element information according to the format of Light Field Reference View Description box.
Figure C.7 — Overview of reference view encoding of view
This clause provides a description of decoding operations for the reference views. Reference views are views that use no prediction. These views are the views on the lowest hierarchical level where
Since, the externally encoded files are split to two parts 1) the header information 2) the remaining codestream, the decoder needs to concatenate the header to the remaining codestream part prior to the decoding with .
See Figure C.8 for an overview of the reference view decoding process. The detailed steps for decoding the reference view are given in Table C.5.
Table C.5 —Procedure for decoding
1 | Obtain the header payload from Common Codestream Element box, and store the payload as an array of bytes |
2 | Obtain the reference view i data from the contiguous codestream box, pointed to by |
3 | Concatenate THEADER and TDATA as |
4 | Decode the |
5 | |
6 | Repeat from step 2 until all reference views have been decoded. |
Figure C.8 — Overview of decoding reference view
This annex describes an instantiation of the normalized disparity view encoder for the 4D prediction mode. Thereafter, the decoding process is detailed.
The JPEG Pleno Light Field Normalized Disparity View box is a superbox that contains the following (see Figure D.1):
— a JPEG Pleno Light Field Normalized Disparity View Description box, signalling the configuration of the normalized disparity view encoding;
— a Common Codestream Elements box, signalling redundant header information from individual codestreams of the normalized disparity views;
— a Contiguous Codestream box, containing has payload the individual codestreams of the normalized disparity views.
The type of JPEG Pleno Light Field Normalized Disparity View box shall be ‘lfdv’ (0x6C66 6476).
Figure D.1 — Organization of the JPEG Pleno Light Field Normalized Disparity View superbox
The JPEG Pleno Light Field Normalized Disparity View Description box contains information on the encoder issued to individually encode the normalized disparity views, the number of normalized disparity views, which views are encoded as normalized disparity view, and pointers to the individual codestreams.
The type of JPEG Pleno Light Field Normalized Disparity View Description box shall be ‘lfdd’ (0x6C66 6464).
Figure D.2— Organization of the contents of a Light Field Normalized Disparity View Description box
Table D.1 — Format of the contents of the Light Field Normalized Disparity View Description box
Field name | Size (bits) | Value |
DCODEC | 8 | 0 to (28-1) |
Q | 8 | 0 to (28-1) |
16 | 0 to (216– 1) | |
16 | 0 to (216– 1) | |
| variable | 1 bit per view [0, 1] |
PP | 8 | 0 (Precision = 32) or 1 (Precision = 64) |
| Precision | 1 to (2Precision– 1) |
| Precision | 1 to (2Precision– 1) |
DCODEC | identifier for the codec deployed for normalized disparity views DCODEC values shall correspond to the potential values for the coder type C defined in the JPX file format (ISO/IEC 15444-2) for the Image Header box. The default value of DCODEC = 7 corresponding to ISO/IEC 15444 (JPEG 2000). |
Q | depth quantization parameter |
minimum normalized disparity value (optional) The integer value is represented in unsigned 16 bit. The | |
number of normalized disparity (reference) views | |
this field specifies per normalized disparity view (
8 successive bits are packed as a byte. If | |
PP | pointer precision 0 indicates 32-bit precision (unsigned integer, default option) – 1 indicates 64-bit precision (unsigned integer). Other values are not valid. |
pointer to contiguous (EXTRDATA) codestream of normalized disparity data for normalized disparity reference view This pointer indicates the position of the EXTRDATA codestream in the Contiguous Codestream box counting from the beginning of this box, i.e. the LBox field. | |
EXTRDATA | externally encoded data payload Contains disparity data encoded with the external codec DCODEC. |
When the codec deployed can be ISO/IEC 15444-1 or other parts , such as ISO/IEC 15444-2, which added support for multi-component transforms. The required capabilities are signalled to the JPEG 2000 decoder using the extended capabilities marker (CAP) which was introduced in ISO/IEC 15444-1. The CAP marker is defined as 0xFF50 followed by a variable length field indicating the Parts in the ISO/IEC 15444 series containing extended capabilities that are used to encode the image.
NOTE Row-wise scan order illustrated with black arrows and grey arrows. Grey arrows indicate move to the beginning of the next row. The coordinates , are the set of
where
=1.
Figure D.3 — Normalized disparity view configuration array for
with five reference views in centre plus corners configuration
The redundant codestream syntax from the individual codestreams of the normalized disparity data is extracted and signalled as specified in Annex C.3.2 for the reference views. This box is optional.
The Contiguous Codestream box has been specified in Annex A.3.4. In this subclause, its payload is specified, which is corresponding to the stripped codestreams of the normalized disparity views, i.e. codestreams with the redundant header information removed and stored in the Common Codestream Elements box in the JPEG Pleno Normalized Disparity View box (see Figure D.4 and Table D.2).
Figure D.4 — Organization of the contents of the Contiguous Codestream box for normalized disparity view signalling in the 4D Prediction mode
Table D.2 — Format of the contents of the JPEG Pleno Light Field Contiguous Codestream box for normalized disparity view signalling in the 4D Prediction mode
Field name | Size (bits) | Value | Comments |
|---|---|---|---|
Codestream for normalized disparity view 1 | variable | variable | External codec codestream |
Codestream for normalized disparity view 2 | variable | variable | External codec codestream |
|
|
|
|
Codestream for normalized disparity view | variable | variable | External codec codestream |
The horizontal disparity map and vertical disparity map
between the views
and
are expressing the correspondence of the pixel
with the pixel
, where
If the pixel is not occluded by another pixel
, then the colour attributes
and
are very similar one to another, allowing to predict one from the other.
The disparity maps can be estimated from the texture views and
using optical flow stereo estimation methods[3]. For the case of light fields there exist more specialized methods utilizing the whole light field in the estimation[4][5]. In the latter category, the disparity maps between each pairs of views can be obtained using the normalized disparity map, defined as,
,
where
|
|
|
|
The normalized disparity of pixel in view
is denoted as
. For a pair of two arbitrary views
and
the normalized disparity map can be used to find corresponding pixels by,
for
,
where
|
|
|
|
This clause provides details how the normalized disparity data is encoded using DCODEC encoder. Normalized disparity data is used to predict intermediate views from reference views using disparity-based warping, see Annex E.
The normalized disparity views are real-valued floating-point quantities. Denote the floating-point normalized disparity view as . A quantized integer precision normalized disparity view is obtained as,
where is the quantization factor.
For codecs which do not support encoding of negative data, the quantized normalized disparity is level-shifted to positive range using the constant . The shifting constant
is common to all normalized disparity maps in the light field and is obtained over all quantized normalized disparity maps. In this case of level-shifting to positive range, the quantized normalized disparity data becomes,
.
The normalized disparity views are usually provided at lowest hierarchical level (see the example in Figure D.5), and are used to predict subsequent intermediate views at higher hierarchies. Normalized disparity views are encoded with an external codec, such as DCODEC. Since all normalized disparity views share the same dimensions and bit depth, the header information of such encoding needs to be obtained only once. For this reason, the externally encoded files are split to two parts: 1) the header information, 2) the remaining codestream. The encoder will place the redundant header information as payload in the common codestream element box, and the decoder needs to concatenate the header to the remaining codestream part prior to the decoding with .
Figure D.5 — Light field with dimensions T×S×V×U displaying normalized disparity reference views at locations marked REF (in row-wise scan order )
The following variables are elements obtained from matrix , used for indexing the normalized disparity views,
Subscript of the row index for the normalized disparity view | |
Subscript of the column index for the normalized disparity view k |
An example configuration of normalized disparity views is given in Figure D.5. For a given normalized disparity view the encoding of normalized disparity using
is performed as in Table D.3.
Table D.3 — Normalized disparity view encoding procedure for normalized disparity view when using
1 | Obtain reference normalized disparity view as |
2 | Quantize by rounding to nearest integer after multiplication by
|
3 | Level-shift by adding the quantity |
4 | Encode |
5 | Extract and remove common codestream element |
6 | Output to codestream the headerless |
7 | Output the common codestream element as the payload in the Common Codestream Element box, see Table C.2. |
8 | Obtain reference normalized disparity view as |
Step 5 extracts the header information from the encoded file. The header information is an array of bytes
, and subsequently the bytes are removed from the encoded file. The result is an encoding, that is not fully decodable with the
decoder. The header information array
needs to be concatenated back prior to decoding. The decoder can obtain the header information from common codestream element box, append it to the beginning of the headerless encoding, and decode successfully. This stripping of header information saves bytes, and makes the encoding less redundant, since only a single unique header is required.
When obtaining the common codestream element, the encoder must use the correct markers for identifying the header section in the encoded codestream. In the case of , the deployed codec is JPEG 2000, and the marker for identifying the last two bytes of the header is 0xFF90, i.e. the beginning of a tile marker. For other
types, the marker will depend on the chosen codec. Note that the decoding process of the common codestream element is transparent to the codec type.
This clause provides details on decoding the normalized disparity data using DCODEC decoder. The normalized disparity views are usually provided at lowest hierarchical level and used to predict subsequent intermediate views at higher hierarchical levels, see Annex E.
The normalized disparity views are real-valued floating-point quantities. During encoding they are quantized into integer range by a multiplication with followed by rounding to nearest integer and encoded using 16 bits. Optionally, for normalized disparity views with negative values the data is level shifted by subtracting the quantity
, see Table D.1. The
value is obtained only once over all normalized disparity views. Level shifting is not necessary for codecs which support negative input data.
Consider an encoded normalized disparity view obtained from decoding a codestream encoded with
. First, the decoder level-shifts the data using,
,
Next, the inverse quantization step is applied,
,
where is truncated to 16 fractional bits and is representing the final decoded normalized disparity view. It is assigned to
.
The steps for decoding the normalized disparity view using
are given in Table D.4.
Table D.4 — Normalized disparity view decoding procedure for normalized disparity view using
1 | Obtain header payload from Common Codestream Element box as an array of bytes |
2 | Obtain the normalized disparity view i data from the contiguous codestream box, pointed to by |
3 | Concatenate DHEADER and DDATA as |
4 | Decode |
5 | Level shift by subtracting |
6 | Dequantize to floating-point values by division with |
7 |
|
This annex specifies the decoding process for intermediate view decoding in the 4D prediction mode. It also specifies the superbox containing all information necessary to reconstruct and to decode the intermediate views, such as prediction parameters and encoded residual prediction data.
However, for informative purposes, it first describes an instantiation of the intermediate view encoder for the 4D prediction mode.
The JPEG Pleno Light Field Intermediate View box is a superbox that contains the following (see Figure E.1):
— a JPEG Pleno Light Field Prediction Parameter box, signalling the parameters of the intermediate view prediction;
— a JPEG Pleno Light Field Residual View Description box, signalling the configuration of the residual view encoding;
— a Common Codestream Elements box, signalling redundant header information from individual codestreams of the residual views;
— a Contiguous Codestream box, containing has payload the individual (stripped) codestreams of the residual views.
The type of JPEG Pleno Light Field Intermediate View box shall be ‘lfiv’ (0x6C66 6976).
Figure E.1 — Organization of the JPEG Pleno Light Field Intermediate View superbox
The prediction parameters box contains prediction parameters for the intermediate views
for which
For each intermediate view this box contains updated hierarchy information for the intermediate view configuration, view merging mode options and sparse filter parameters. For each intermediate view
the prediction parameters are contained in a separate prediction parameter block (see Figure E.2 and Table E.1).
The type of JPEG Pleno Light Field Prediction Parameter box shall be ‘lfpp’ (0x6C66 7070).
Figure E.2 — Organization of the contents of a Light Field Prediction Parameter box
Table E.1 — Format of the contents of the Light Field Prediction Parameters Description box
Field name | Size (bits) | Value |
NI | 32 | 0 to (232-1) |
hsize | 8 | 1 to (28-1) |
| variable | [0, (2hsize-1)] in hsize bit per view |
32 | 1 to (232-1) | |
32 | 1 to (232-1) | |
Prediction parameter block1 | variable | variable |
Prediction parameter blockNI | variable | variable |
NI | number of intermediate views, |
hsize | precision in bit for specifying the number of hierarchical levels in the array |
the value specifying the type of view
Please note that the indexes The hsize bit per view are concatenated to one stream. 8 Successive bits are packed as a byte. If | |
| |
subscript of the row index in row-wise scanning order for the intermediate view | |
subscript of the column index in row-wise scanning order for the intermediate view | |
pointer to Prediction Parameters block for intermediate view This pointer indicates the position of this Prediction Parameters block in the Prediction Parameters Description box counting from the beginning of this box, i.e. the LBox field. |
NOTE Skipping of the centre reference view is illustrated with the rounded arrow. Hierarchy level is indicated when . In this example there are four hierarchical levels
and four non-existing views with
Figure E.3 — Subaperture views are scanned in row-wise order skipping the reference views that have been signalled earlier (see Annex C.3.1)
For each intermediate view the prediction parameters are contained in a separate prediction parameter block, see Table E.2.
Table E.2 — Format of the contents of the Prediction Parameter block
Field name | Size (bits) | Value |
8 | 0 to (28-1) | |
8 | 0 to (28-1) | |
16 | 0 to (216– 1) | |
16 | 0 to (216– 1) | |
16 | 0 to (216– 1) | |
16 | 0 to (216– 1) | |
16 | 0 to (216– 1) | |
16 | 0 to (216– 1) | |
16 | 0 to (216– 1) | |
16 | 0 to (216– 1) | |
8 | {0,1,2} | |
8 | [0,1] | |
| ||
16 | -(215-1) to (215-1) | |
16 | -(215-1) to (215-1) | |
16 | -(215-1) to (215-1) | |
| ||
32 | single precision, big endian floating-point | |
| ||
8 | 28-1 | |
8 | 28-1 | |
32 | -(231-1) to (231-1) | |
32 | -(231-1) to (231-1) | |
32 | -(231-1) to (231-1) | |
| ||
| ||
number of reference views for intermediate view | |||||
number of normalized disparity views for intermediate view | |||||
subscript of the row index of the reference view, | |||||
subscript of the column index of the reference view, The set of reference views for intermediate view | |||||
subscript of the row index of the normalized disparity view, | |||||
subscript of the column index of the normalized disparity view, The set of normalized disparity reference views for intermediate view | |||||
view merging mode for texture view | |||||
| |||||
| View merging mode for view |
| |||
| 0 | Least-squares merging, Annex E.5.4.1 |
| ||
| 1 | Fixed-weight merging, Annex E.5.4.2 |
| ||
| 2 | Median merging, Annex E.7 |
| ||
|
| ||||
sparse filter enabled/disabled at view | |||||
|
|
| |||
|
|
| |||
number of LS merging coefficients for view | |||||
least-squares merging weight of component | |||||
fixed-weight merging parameter for view | |||||
regressor template parameter of sparse filter for view | |||||
sparse filter order for view | |||||
sparse filter coefficients of component | |||||
sparse filter regressor mask of component | |||||
The JPEG Pleno Light Field Residual View Description box contains information on the encoder issued to individually encode the residual views, the number of residual views, which views are encoded as residual view and pointers to the individual codestreams (see Figure E.4 and Table E.3).
The type of JPEG Pleno Light Field Residual View Description box shall be ‘lfre’ (0x6C66 7265).
Figure E.4 — JPEG Pleno Light Field Residual View Description box
Table E.3 — Format of the contents of the JPEG Pleno Light Field Residual View Description box
Field name | Size (bits) | Value |
RCODEC | 8 | 0 to (28-1) |
NRES | 16 | 0 to (216– 1) |
| 1 | 1 bit per view [0, 1] |
PP | 8 | 0 (Precision = 32) or 1 (Precision = 64) |
| Precision | 1 to (2Precision– 1) |
| Precision | 1 to (2Precision– 1) |
RCODEC | identifier for the codec deployed for residual images RCODEC values shall correspond to the potential values for the coder type C defined in the JPX file format (ISO/IEC 15444-2) for the Image Header box. The default value of |
NRES | number of texture residual images |
whenever | |
subscript of the row index for the intermediate image | |
subscript of the column index for the intermediate image | |
PP | pointer precision. 0 indicates 32-bit precision (unsigned integer, default option) – 1 indicates 64-bit precision (unsigned integer). Other values are not valid |
pointer to contiguous (EXTRDATA) codestream of residual data for residual image This pointer indicates the position of the EXTRDATA codestream in the Contiguous Codestream box counting from the beginning of this box, i.e. the LBox field. | |
EXTRDATA | externally encoded data payload |
When the codec deployed can be ISO/IEC 15444-1 or other Parts of the ISO/IEC 15444 series, such as ISO/IEC 15444-2, which added support for multi-component transforms. The required capabilities are signalled to the JPEG 2000 decoder using the extended capabilities marker (CAP) which was introduced in ISO/IEC 15444-1. The CAP marker is defined as 0xFF50 followed by a variable length field indicating the Parts of the ISO/IEC 15444 series containing extended capabilities that are used to encode the image.
Figure E.5 — Subaperture views are scanned in row-wise order skipping the reference views that have been signalled earlier (see Annex C.3.1) and also skipping non-existing views
(see Annex E.3.1)
The redundant codestream syntax from the individual codestreams of the residual view data is extracted and signalled as specified in Annex C.3.2 for the reference views.
The Contiguous Codestream box has been specified in Annex A.3.4. In this subclause, its payload is specified, which is corresponding to the stripped codestreams of the residual views, i.e. codestreams with the redundant header information removed and stored in the Common Codestream Elements box in the JPEG Pleno Residual View Description box (see Figure E.6 and Table E.4).
Figure E.6 — Organization of the contents of the Contiguous Codestream box
Table E.4 — Format of the contents of the Contiguous Codestream box
Field name | Size (bits) | Value | Comments |
|---|---|---|---|
Codestream for residual view 1 | variable | variable | External codec codestream |
Codestream for residual view 2 | variable | variable | External codec codestream |
|
|
|
|
Codestream for residual view NRES | variable | variable | External codec codestream |
In this clause, the view warping notation and algorithm is introduced. Warping is the action of spatially displacing samples by the amount of horizontal and vertical disparity to obtain a warped view.
Disparity is a quantity derived from the normalized disparity information. The disparity is obtained by multiplying the normalized disparity with the horizontal and vertical camera distance between views, see Annex D.4.
Warping is used for predicting an intermediate texture view
based on a set of reference views
. Warping is also used for predicting an intermediate normalized disparity view
. based on a set of normalized disparity views
.
The views are addressed by the coordinates pair, while the
pair addresses a pixel within each
view and index
stands for colour component.
The subaperture views of the light field are denoted as,
,
Similarly, the decoded subaperture views are denoted as,
,
and the normalized disparity views of the light field as,
,
and the decoded normalized disparity views as,
,
where
| |
| |
| |
| |
|
Consider two subaperture views and
. The pixels values of
can be used to predict the pixels values
by applying displacements based on the pixel-wise disparity at view
. A view warped from
to
is denoted as
.
The geometrical coordinates of the centre of the camera when acquiring the view are denoted as
assuming that all camera centres are located on the same depth plane, where the world space is defined by the horizontal, vertical, and depth axes. and
are specified in the Camera parameter box in Annex A.3.3.3.
The horizontal baseline for a pair of two cameras is a scalar quantity and refers to the horizontal distance between the two camera centres, while vertical baseline
is a scalar quantity and refers to the vertical distance between the two camera centres. For two views
and
the baselines become
,
where and
are given in Annex A.3.3.3.
The algorithm for warping the texture and normalized disparity of view to view location
is defined as described in Table E.5.
Table E.5 — View warping algorithm
initialize | initializes warped normalized disparity view | ||||
initialize | initializes warped texture view | ||||
horizontal baseline | |||||
| vertical baseline | ||||
for | |||||
for | |||||
| obtain vertical disparity between views | ||||
| obtain horizontal disparity | ||||
| vertical displacement by vertical disparity | ||||
| horizontal displacement by horizontal disparity | ||||
| if | check visibility by depth compison and also check that displaced pixels are not out of bounds | |||
| && | ||||
| warp normalized disparity | ||||
| warp texture | ||||
| |||||
end |
| ||||
end |
| ||||
For each intermediate view the set of reference indices
, and the set of normalized disparity indices
have been designed at the encoder and have been defined in Table E.2. For subaperture view
at most
view warping operations are required.
For intermediate depth view prediction, the decoder needs to warp the set of reference normalized disparity views to view location
to obtain, for
.
For intermediate view prediction the decoder needs to warp the set of reference views to view location
to obtain
, for
.
For each intermediate view the set of reference view indices
, and the set of normalized disparity indices
have been designed at the encoder and have been defined in Table E.2.
Intermediate view prediction starts with warping of each of the reference and normalized disparity views, respectively in the sets and
, to the camera position of the intermediate view, as presented in Annex E.4.3. Due to occlusions these warped versions may exhibit artifacts such as holes. The warped reference views are merged together to mitigate the occlusion problem.
Three view merging modes are available. The view merging mode for each intermediate view is controlled by the variable If
the merging mode will be least-squares optimal filter design based on occlusion-classes[6]. If
, the view merging design is based on a fixed filter for each occlusion class, with merging weights derived from the relative distances between views. If
, the median is applied as the merging operator, and no occlusion classes are used. The modes {1,2} are used in low rate cases, where the bit budget does not allow for transmitting the parameters from the full least squares design.
For normalized disparity views the merging mode is not signalled because the merging mode is always the median mode.
In this subclause, the occlusion state-based partitioning of the intermediate view pixels is described. This section applies to those intermediate views ,
for which
or
.
For any warped view, due to occlusions, it is possible that not all pixel locations are assigned a value in the main loop of the warping algorithm of Table E.2. The occlusion state at a pixel in the current view
is a binary vector with the elements stating that
for each reference view,
. The term occlusion state refers to the combination of warped reference views, which are available at a certain pixel location[5]. The occlusion states are stored in binary vectors, which have length equal to the number of reference views used at this particular intermediate view,
.
The number of reference views for the intermediate view
, (where
) is read from the Prediction Parameters codestream block, in Table E.2.
Consider an intermediate view with three (
) reference views
. The warped reference views are denoted as
.
The warped normalized disparity views are denoted as
,
where stands for the normalized disparity part of the light field.
The following selector operating on a warped normalized disparity views is defined:
For there are
occlusion classes. The partitioning of pixels at the intermediate view
based on the warped views is given in Table E.6. The 2D array
stores the label at each pixel location
thus providing a type of segmentation.
Table E.6 — Possible pixel labels at an intermediate view based on its warped reference views
0 | 0 | 0 | 0 |
1 | 1 | 0 | 0 |
2 | 0 | 1 | 0 |
3 | 1 | 1 | 0 |
4 | 0 | 0 | 1 |
5 | 1 | 0 | 1 |
6 | 0 | 1 | 1 |
7 | 1 | 1 | 1 |
The partitioning process is sensitive to the order of the views , and it is important to obey the same order at the encoder and the decoder. The order shall be the row-wise scan order of the subscripts in the set
.
The matrix is introduced as binary matrix of size
where
and
. The rows of
consist of the binary representation of the values
. For
this results in:
.
In this subclause, the merging weight matrix used in the occlusion state based view merging is introduced. This subclause applies to those intermediate views ,
for which
or
.
The merging weight matrix or coefficient matrix , for texture component
, has dimensions
where
and
, and contains the merging weights, or coefficients, used in obtaining the merged intermediate texture view
as a linear combination of multiple warped refence texture views in the set
.
In this subclause, the construction of the merging weight matrix is presented for the case of using least-squares optimal view merging weights. This subclause applies to those intermediate views ,
for which
.
The least-squares design of the merging weights of colour component of intermediate view
is done at the encoder, and the obtained weights are signalled to the decoder using 16-bit signed integers
for
in Annex E.3.2, where
.
The de-quantized LS merging weights are obtained by
where is truncated to having 16 fractional bits. The values
are the columns of
appended one after another with elements having
being skipped. For example with
, the weight matrix
becomes:
.
In this subclause the construction of the merging weight matrix is presented for the case of using geometric distance-based view merging weights. This subclause applies to those intermediate views ,
for which
.
When using the fixed-weight merging mode, the weight matrix is derived from a scaling parameter
, which controls the influence of views based on their geometric distance in the camera array. The scaling parameter from Annex E.3.1 is obtained as,
.
Consider the case when . When predicting the intermediate view
the weight applied to the pixels obtained from texture reference view
) from the set
becomes:
where
and
Define matrix as:
.
The sum of the elements on row of
is denoted as:
,
and the final weight matrix , which is identical for all texture components
, becomes:
.
In this subclause, the texture view merging formula used in linear combination based view merging mode is presented. This subclause applies to those intermediate views ,
for which
or
.
For intermediate view the merged pixel value at location
for component
becomes
In this subclause, the texture view merging formula used in median operator based view merging mode is presented. This subclause applies to those intermediate views ,
for which
.
For intermediate view the merged pixel value at location
for component
becomes,
In this subclause, the normalized disparity view merging formula is presented. This subclause applies to all intermediate views ,
.
For intermediate view the merged normalized disparity value at location
becomes,
.
This subclause describes the inpainting method used for filling missing normalized disparity and texture values found in the intermediate views ,
.
During the warping process at intermediate view ,
occlusions prevent some pixel locations
from acquiring a value. For missing pixels an in-painting process is used.
First, the locations of undefined pixels are extracted as a set
for normalized disparity views and as a set
for texture views. Then at each location in the sets
and
a
neighbourhood The sum of the elements on a row is considered and the median from the defined values is calculated. The median value is assigned to the undefined location
. Since this action is performed sequentially over all undefined pixels, all undefined pixels are ensured to be filled. The median operation is illustrated by an example in the NOTE to Figure E.7, where the missing centre value (illustrated in black) is filled by the median of the defined values (illustrated in white) of the 3×3 neighbourhood.
NOTE The value picked for in-painting of is the median of the defined (non-occluded) neighbours (white squares).
Figure E.7 — Hole filling applied to the undefined value (black square) in the centre of the 3×3 neighbourhood
defined by the dashed line
The inpainting algorithm is applied to both texture and normalized disparity
. For texture views the algorithm processes each texture component
independently.
Consider intermediate view . The normalized disparity view obtained from warping and merging the set of reference normalized disparity views
is
, see Annex E.8.
The set of pixel locations contain undefined pixels so that
.
The neighbourhood at pixel location
is the set of indices
defined as,
Consider intermediate view . The normalized disparity view obtained from warping and merging the set of reference normalized disparity views
is
, see Annex E.8.
The set of pixel locations contain undefined pixels so that
.
The inpainted normalized disparity view is defined as,
.
Consider intermediate view . The normalized disparity view obtained from warping and merging the set of reference views
is
, see Annex E.8.
The set of pixel locations contain undefined pixels so that
.
The inpainted texture view is defined as,
.
This subclause describes the filtering step used as the final stage in view prediction for any intermediate view ,
which has
. For texture views the algorithm processes each texture component
independently. In the following subclauses, the derivations for a component
are given.
Sparse filtering is used to perform the final adjustment of the component of the merged and in-painted intermediate view
. The filter is controlled by two parameters: the neighbourhood size
, and the filter order parameter
.
controls the largest possible size of the filter, while
controls the largest number of non-zero filter coefficients used. Prior to the filtering operation, each colour component
of the decoded image
, are padded to dimensions
and
, by replicating the values in the first and last row and column
times. The resulting padded colour components are stored in
. The dimensions involved in the padding are illustrated in Figure E.8.
NOTE Padding is done by replicating the values in the first and last row and column times.
Figure E.8 — Padding of the image of dimensions
into the padded version
of dimensions
, where
,
At pixel location , the regressor template
for the sparse filter is defined as the
neighbourhood centred at pixel location
. The regressor template
contains
elements and is defined as the vector of coordinate pairs,
.
The non-zero filter coefficient locations in tell which elements of
are selected to be non-zero in the final filter weights. It contains
number of elements such that,
where the case that , indicates the use of the bias (or intercept) term in the filter.
The elements of are obtained as the non-zero bit locations from the
-bits long binary vector
defined in Annex E.3.1. Each bit location
, where
, is stored in the vector
and sorted increasingly, i.e.
and
. The decoding operation for the locations of the non-zero filter coefficients is illustrated in Figure E.9.
NOTE Each non-zero bit location in indicates the location of a non-zero filter coefficient in the template
. In this example, the largest non-zero filter coefficient location is
, indicating that also the bias term is used in filtering. The binary vector
is signalled as bytes, and if
is not a multiple of 8, the remaining bits of the last byte are put to zero (zero padding to stuff last byte).
Figure E.9 —Example of decoding the locations of the non-zero filter coefficients from the binary vector
into the vector
, when
and
When , de-quantization for all quantized filter coefficients is defined as,
, for
.
When de-quantization is skipped for the integer-valued bias term,
, for
,
.
In both cases the coefficients are truncated to 20 fractional bits and stored in the real-valued vector,
.
When the
th filter coefficient
corresponds to the regressor
, where
. In the second case,
and the filter coefficient
is the bias term.
The filtered pixel value at pixel location
, where
, and
, in the case
, becomes,
and in the case , the bias term is used as follows,
where . The filtering result is cropped and stored as,
Residual images are provided for intermediate views for which
to improve image quality after prediction. Residual images are supported for texture views only. Denote the residual part of the light field as,
,
and the decoded residual part as,
,
where
| |
| |
| |
| |
|
Residual images contain the error which remains after view prediction,
where is the predicted texture view.
For a given residual texture image for which
the encoding of residual texture is performed as in Table E.7.
Table E.7 — Procedure for encoding view prediction residual using RCODEC
1 | Denote view prediction residual as |
2 | Level shift by
|
3 | Encode |
4 | Repeat step 1 and 2 until all NRES residual views have been encoded |
5 | Extract and remove common codestream element |
6 | Output to all stripped |
7 | Output the common codestream element ( |
Step 4 extracts the header information from the encoded file. The header information is an array of bytes
, and subsequently the bytes are removed from the codestream. The result is a codestream, that is not fully decodable with the
decoder. The header information array
needs to be concatenated back prior to decoding. The decoder can obtain the header information from common codestream element box, append it to the beginning of the stripped codestream (EXTRADATA), and decode successfully. This stripping of header information saves bytes, and makes the encoding less redundant, since only a single unique header is required.
When obtaining the common codestream element, the encoder must use the correct markers for identifying the header section in the encoded codestream. In the case of , the deployed codec is JPEG 2000, and the marker for identifying the last two bytes of the header is the SOT marker (0xFF90), i.e. the beginning of a tile marker. For other
types, the marker will depend on the chosen codec. Note that the decoding process of the common codestream element is transparent to the codec type. Hence, no normative constraints need to be imposed on the exact cutting point of the TCODEC codestream.
The steps for decoding the JPEG 2000 encoded view prediction residual for which
are given in Table E.8.
Table E.8 —Procedure for decoding view prediction residual using
1 | Obtain header payload from Common Codestream Element box, and store payload as an array of bytes |
2 | Obtain the residual view data from the contiguous codestream box, pointed to by |
3 | Concatenate RHEADER and RDATA as |
4 | Decode |
5 | De-quantize by multiplying with 2 and level-shift by
|
6 | |
7 | Repeat from step 2 until all residual views have been decoded. |
For intermediate view ,
the texture view prediction steps include:
1) View warping from the reference view set , see Annex E.4.
2) View merging of the warped texture reference views, see Annexes E.6 and E.7. The resulting merged texture view is
3) Inpainting of the merged texture view, see Annex E.9. The resulting in-painted texture view is .
4) ,
.
5) , apply sparse filter to the in-painted texture view and obtain
, see Annex E.10. Set
.
If the view prediction residual is decoded as
, see Annex E.11.3.
The view prediction residual is applied using,
For any intermediate view ,
the set of normalized disparity views is defined as,
.
The warped normalized disparity reference views are denoted as:
, for
.
For normalized disparity view prediction at any intermediate view , the following three steps are taken:
1) Warp the set of reference normalized disparity views to view location
and obtain
, for
.
2) Merge for
into a single image and obtain
, see Annex E.8.
3) Inpaint and obtain
, see Annex E.9.2.
4)
This clause provides details on how the decoder predicts the intermediate texture views from the texture reference views.
For any intermediate view
the set of texture reference views is defined as,
.
The warped texture reference views are denoted as,
, for
.
For intermediate view ,
the texture view prediction steps include:
1) View warping from the reference view set , see Annex E.4.
2) View merging of the warped reference views see Annexes E.6 and E.7. The resulting merged texture view is
3) Inpainting of the merged texture view, see Annex E.9. The resulting in-painted texture view is .
4) ,
.
5) , apply sparse filter to the in-painted texture view and obtain
, see Annex E.10. Set
.
6) the view prediction residual is decoded as
, see Annex E.11.3.
7) The residual is applied using, , see Annex E.11.4.
8) .
The intermediate texture view prediction reflects the prediction performed at the encoder, with the difference that the encoder will additionally design the prediction parameters (LS merging and fixed weight merging, and sparse filter) and decide the view merging mode ().
In this subclause, the hierarchical encoding, which allows for random access capabilities and provides the encoding quality, is described.
The angular random-access capability can be defined as the number of views that have to be decoded before any single view can be decoded. One can have a flexible setting of the random view access by organizing the encoding of the views according to hierarchical levels The first hierarchy
is formed by the central view (or any other preferred view) as in Figure E.10. Second hierarchy
contains all views that should be decoded using only the central view. Third hierarchy
has the views that can be decoded by decoding first any views from the preceding two hierarchies. The overall number of hierarchies being set according to the requirements.
For low bit rates just a small number of references on a small number of hierarchical levels are sufficient for encoding the light field. For example, at the lowest bitrate there is a single reference, situated at with all other views being synthesized from this one view. At highest bitrate, H may have 6 levels of hierarchy, with about half of the views being references (specifically those in the levels
) while the level
contains only views that are not used as references for any other view. Figure E.10 illustrates the view hierarchy configuration for an encoding of a light field captured with a plenoptic camera.
If the best encoding efficiency is targeted, the number of hierarchical levels is unconstrained, and should be larger if the bitrate is higher. While if random access is targeted, one could use for example a single reference level with a single reference view (e.g. the centre view) at all bitrates.
Key
<graphic>21794-</graphic> decoded views
<graphic>21794-</graphic> reference view used in inter-view prediction
Figure E.10 — Hierarchical structure of the subaperture views
In Figure E.10, the centre view at works as a reference to 8 views at
. Rate allocation between hierarchical levels varies. Usually
, when
, since prediction performance increases towards the higher hierarchical levels.
This Annex first describes for informative purposes an instantiation of Slanted 4D Transform mode encoder.[8] Next, the codestream syntax is specified and subsequently, the light field decoding process is detailed.
The necessity of such geometric transformation can be understood by analysing the reasons for the poor RD performance of the 4D-transform mode for light fields with wide baselines. To do so, the light field Epipolar Plane Images (EPIs) should be defined[1-8]. EPIs are 2D spatial-angular slices of the light field if the light field is a matrix of views indexed by s (horizontal) and t (vertical), with each view being an image with coordinates u (horizontal) and v (vertical), then the EPI associated to coordinate (v0 , t0) consists on the stacked horizontal scan lines of vertical coordinate v0 for all views of horizontal coordinate t0. Conversely, the EPI associated to coordinate (u0, s0) consists of the stacked vertical scan lines of horizontal coordinate u0 for all views of vertical coordinate s0.
For a light field of a Lambertian scene, an EPI is composed of diagonal lines of samples with constant intensity where each sample corresponds to the projection of a single 3D scene point. Then, it is possible to derive a depth-from-slope relationship which associates to each EPI slope a depth for the corresponding scene point. Therefore, if a 4D block images a scene of constant depth, its EPIs consist of texture regions composed of straight lines with a fixed inclination, as depicted in Figure F.1 (a). For light fields with narrow baselines, such as lenslets, the EPIs of its 4D block tend to be as in Figure F.1 (b).
(a)
(b)
Figure F.1—(a) EPI consisting of texture regions composed of straight lines with a fixed inclination; (b) EPI of a lenslet light field with a narrow baseline.
When applying a 2D-DCT to the EPI as in Figure F.1 (b) there is good energy compaction since there is a large amount of redundancy in the vertical direction. On the other hand, if the 2D-DCT is applied to the EPI of wider baseline 4D block as the one in Figure F.1 (a), the energy compaction will be much smaller. In a nutshell, the slanted 4D transform mode applies a geometric transformation to the 4D block such that its EPIs become as in Figure F.2 prior to the 2D-DCT. Since this EPI has, likewise the EPIs of the lenslets in Figure F.1(b), a large amount of redundancy in the vertical direction, then the 2-DCT has very good energy compaction properties. Such a transformation as applied to an EPI is defined in Figure F.3 and is referred to as a Slant Transformation. The slope that is vertically aligned by the associated Slant Transformation is referred as to the slant of the transformation. The left-hand side of Figure F.3 shows two slopes with inclination that are slanted on the right-hand side of Figure F.3.
The images of 3D-space points with depth z on the s × u EPI, that are straight lines with inclinations δu/δs = σ = tan β, are shown in the original light field in the left-hand side of Figure F.3. These straight lines are mapped by the 2D Slant transform to straight lines on the s × u EPI of the slanted light field that is orthogonal to the u axis, as shown in the right-hand side of Figure F.3. This implies that a rectangular region on the original s × u EPI is mapped by the 2D Slant transform into a parallelogram whose side is also inclined by β, but in the opposite direction of the straight line in the original EPI.
Figure F.2—Resulting EPI after applying a geometric transformation.
Figure F.3—Slant Transformation.
A 4D Slant Transformation can be applied to a 4D block in a separable way, that is, one applies a 2D Slant Transformation to all (s, u) EPIs of a 4D block and then to all (t, v) EPIs of the resulting 4D block.
A difficulty with this approach is that for the EPIs of the Slant Transformation of the 4D block to be as the one in Figure F.1 (b), for which the 2D-DCT has excellent energy compaction properties, all the EPI lines should have similar slopes, which would imply that all corresponding 3D-space points would have the same depth. However, in complex scenes, this is usually not true. An example can be seen in Figure F.5 (a), (b) and (c). There, one can notice that the EPI associated with the highlighted line in red in the central view of the light field has many different slopes in Figure F.5 (b). This is because this scene has multiple objects at various depths. Figure F.5 (c) shows a Slant Transformation of this EPI. No Slant Transformation can produce a large amount of redundancy along one direction associated with the separable 4D-DCT (vertical direction in Fig. 5), as required for the 4D-DCT to have excellent energy compaction properties. A way to circumvent this problem is to partition the EPI into variable-sized blocks so that the slopes of the lines in each EPI block are similar. In this case, a Slant Transformation with a different slant should be applied for each block, such that all transformed EPI blocks would have a large amount of redundancy along the vertical direction. Such Slant Transformations are the essence of the Slanted 4D Transform Mode. Its high-level coding architecture is described in Section F.2.1. The Slant Transformation is performed in Figure F.1, and the determination of the optimum block partition and the optimum slant associated with each resulting 4D block are performed by the Hierarchical Slant Tree Optimization module.
Figure F.4 shows the overall architecture of the proposed Slanted 4D TM codec. The changes to the 4D Transform Mode codec (Annex B) are highlighted by the blocks with grey background while the blocks with white background are the same as in Figure B.1 since this coding mode is an extension of the 4D Transform Mode, specified in Annex B.
In the 4D transform mode, the light field is encoded with a four-step process, represented in Figure F.4 by the white background blocks. First, the 4D light field data is divided into fixed-sized 4D blocks that are independently encoded according to a predefined and fixed scanning order. These blocks can be further partitioned into a set of non-overlapping 4D sub-blocks, where the optimal partitioning parameters are derived based on a rate-distortion (R-D) criterion. Each sub-block is independently transformed by a variable block-size 4D-DCT. Subsequently, the transformed blocks are quantized and entropy-coded using hexadeca-tree bit plane decomposition and adaptive arithmetic encoding, producing a compressed representation of the light field.[2] This coding procedure is applied to each colour component independently. A detailed specification of the blocks with a white background can be found in Annex B.
The specification of the blocks with grey background are given in the sequel.
Figure F.4—Slanted 4D-Transform mode encoding architecture.
F.3.1.1 Delimiting markers and marker segments
F.3.1.1.1 Light field configuration (LFC)
In addition to the uncompressed light field parameters specified in the Section B.3.2.6.3, the Slanted 4D-Transform mode introduces following parameter:
— BLRatio: This parameter specifies the ratio between the horizontal and vertical view array baselines and is stored as a 32-byte big-endian unsigned integer. The range accepted is 0.125 ≤ BLRatio ≤ 8, with the precision of up to 6 decimal places. The decimal BLRatio value is multiplied by 228 prior to converted to 32-byte big-endian unsigned integer.
The Slant Transform module performs the geometric transformation for each 4D sub-block such that the Slanted 4D block at its output maximizes the 4D-DCT compression efficiency. This module has the objective of geometrically transforming a 4D block of a light field by performing the proposed Slant operation. Figure F.5 (b) shows an example of the s × u EPI of the input Light Field at line 644 (highlighted in red in Figure F.5 (a)), and Figure F.5 (c) shows the resulting EPI after the Slant transform is applied.
Figure F.5— (a) Laboratory1 light field top view, (b) s × u EPI at line 644 (highlighted in red in (a)), and (c) slanted EPI, with stretched s dimension for better visualization.
In the Slant Tree Optimization module (Section F.2.5 and Table F.5), a 4D block is recursively partitioned and slanted. That is, given a 4D block to code, it is first slanted, then the resulting slanted 4D block is split and each resulting 4D sub-blocks are recursively slanted and split or not, always following an RD criterion. This process is guided by a Slant tree, with the iterative search performed so that the tree leaves contain slanted 4D blocks such that the 4D-DCT applied to their padded versions (see Section F.3.4 and Table F.11) result in better compression performance. This implies that the final slanted 4D sub-blocks at the leaves of this tree are the result of cascades of 4D block splits and Slant transforms.
During the Slant tree optimization process, based on successive slanting and partitioning, it is possible that a 4D block is slanted in one direction at a specific recursion level, and, at the next recursion level, some of the 4D sub-blocks resulting from block partitioning are further slanted in opposite directions. This might lead to a hyper-rectangular region where all samples are empty. In this case, these samples should be discarded (trimmed). The trimming process appied to an EPI is illustrated in Figure F.6.
Figure F.6— Slant-split-trim sequence. (a) Original 2D EPI of a 4D block. (b) 2D EPI block in (a) after being applied a Slant transform with slope parameter σ. (c) Slanted EPI blocks s in (b) after splitting. (d) Slant transforms of the blocks in (c) with slope parameters σ’ and σ’’, σ ≠ σ′. The samples to be trimmed are on the right-hand side of (d). Note that in this example there is no need for trimming on the left-hand side. (e) EPI Blocks after the trimming operation. Note that, after trimming, the non-empty samples define trapezoids.
Section F.2.2 presented the adopted 4D Slant transform, which has the goal of maximizing the 4D-DCT compression efficiency for a given 4D block. The Slant transform of an input t b ⨉ sb ⨉ vb ⨉ ub 4D block is a sheared hyper-parallelepiped. This 4D sheared hyper-parallelepiped is then recursively partitioned, slanted, and trimmed, as detailed in sections F.2.2, F.2.3 and F.2.5 generating non-empty samples that are finally organized as 4D hyper-trapezoids.
The padding module is needed because the 4D-DCT must be applied to hyper-rectangular 4D blocks and not to any other shape. However, if a Slant transform is applied to a 4D block, it is sheared into a 4D hyper-parallelepiped that cannot be directly processed by the 4D-DCT. In addition, the trimming procedure my generate 4D hyper-trapezoids. In this context, the sheared 4D hyper-parallelepipeds or 4D hyper-trapezoids must be padded along the t, s, v and u directions, so that they become 4D hyper-rectangles.
The goal of the Slant Tree Optimization module is to recursively generate a block partition tree with associated Slant transforms for its nodes that generate the optimal 4D-DCT compression efficiency. The Hierarchical Slant Tree Optimization module architecture is depicted in Figure F.7 and works as follows:
Figure F.7—Slant Tree Optimization module architecture
To maximize the compression efficiency of the 4D-DCT applied after padding, each hyper-rectangular region of the light field may have a different Slant transform. Therefore, the t b ⨉ sb ⨉ vb ⨉ ub 4D blocks output by the regular 4D block partitioning module may be further partitioned so that after each individual 4D block has been subjected to its optimal Slant transform, trimmed, and padded, the set of slanted 4D blocks reaches optimized 4D-DCT-based compression efficiency. Thus, the Hierachical Slant tree optimization module has the objective to find, recursively, the optimal 4D block partitioning and associated optimal Slant transform parameterization for each resulting 4D sub-block. The Hierarchical Slant Tree Optimization is described in detail in Table F.3.
Finally, the slanted, trimmed, and padded 4D blocks are encoded by the standard JPEG Pleno Light Field 4D TM (Annex B) with characteristics that maximize the final RD performance.
F.3.6 Sample encoding procedure
In this subclause, a sample encoding algorithm is provided for informative purposes. The main procedure processing the individual 4D blocks is listed in Table F.
Table F.1 — 4D block scan procedure for Slanted 4D-Transform mode
LightField() { |
|
SOC_marker() | Writes SOC marker |
LFC_marker_Sl() | Writes LFC marker, including the parameter BLRatio defined in section F.3.1.1.1 |
Write SCC_marker() for every colour component not having a global scaling factor equal to 1. | Writes SCC marker for colour component of which the global scaling factor is different from 1 |
PNT_marker() | Writes PNT marker |
for(t=0; t<T; t+=BLOCK-SIZE_t ){// scan order on t | Scan order on t (T defined in Light Field Header box) |
for(s=0; s<S; s+= BLOCK-SIZE_s){ // scan order on s | Scan order on s (S defined in Light Field Header box) |
for(v=0; v<V; v+= BLOCK-SIZE_v){ // scan order on v | Scan order on v (V defined in Light Field Header box) |
for(u=0; u<U; u+= BLOCK-SIZE_u){ // scan order on u | Scan order on u (U defined in Light Field Header box) |
for(c=0; c<NC; c++){ // scan order on colour components | Scan order on c (colour component) |
SOB_marker(); | Writes SOB marker |
if ( (TRNC) && ((T – BLOCK-SIZE_t) < t < T) ) { tk = T umod BLOCK_SIZE_t; } else tk = BLOCK-SIZE_t; | Block size computation |
if ( (TRNC) && ((S – BLOCK-SIZE_s) < s < S) ) { sk = S umod BLOCK_SIZE_s; } else sk = BLOCK-SIZE_s; | Block size computation |
if ( (TRNC) && ((V – BLOCK-SIZE_v) < v < V) ) { vk = V umod BLOCK_SIZE_v; } else vk = BLOCK-SIZE_v; | Block size computation |
if ( (TRNC) && ((U – BLOCK-SIZE_u) < u < U) ) { uk = U umod BLOCK_SIZE_u; } else uk = BLOCK-SIZE_u; | Block size computation |
Padding(LF.BlockAtPosition(t, s, v, u)); | Fills the pixels outside the light field if needed. If TRNC ==1, there will be no pixel outside the light field, and Padding (Table B.2) will have no effect. Note that the 4D array LF.BlockAtPosition is a local copy of the currently processed 4D block (for colour component c). |
InitEncoder() | Initializes the arithmetic coder for each coded codestream (defined in Table B.13) |
EncodeSl(LF.BlockAtPosition(t, s, v, u), lambda) | Defined in Table F.2 |
} // end of scan order on colour components loop |
|
} |
|
} |
|
} |
|
} |
|
Table F.2 —4D block encoding procedure for Slanted 4D-Transform mode
Procedure EncodeSl(block, lambda) { |
|
OptimizeSlantTree(block, lambda, slantTree); | Defined in Table F.3 |
EncodeSlantTree(block, lambda, slantTree); | Defined in Table F.4 |
FlushEncoder(); | Defined in Table B.14 |
} |
|
Table F.3 — Hierarchical slant tree optimization procedure
Procedure OptimizeSlantTree(block, lambda, slantTree) { | Finds the best slant tree (see Figure F.4 and Figure F.7). The 4D block is received from (Figure F.4 and Table B.1) |
[J0, bestSlant, fail] = FindBestSlant(block, lambda); | Finds the best slant for the 4D block (Table F.5). |
slantedBlock = Slant(block, bestSlant); | Applies the Slant transform to the 4D block (Section F.3.2 and Table F.9). |
trimmedBlock = Trim(slantedBlock); | Applies the trimmimg to the 4D block (Section F.2.3 and Table F.). |
SlantSubTree.empty; | The slant tree is a quadtree data structure with an integer field for the slant and four pointers for the four child nodes. If all pointers are NULL, then the node is a leaf node. |
if( (tb,sb,vb,ub) <= (ts,ss,vs,us)){ | If the block size is equal to the minimum slant block size… |
SlantTree.AppendLeafNode(bestSlant); | … a leaf node is appended to the tree. |
return(J0); |
|
} |
|
v'b = floor(vb/2); |
|
u'b = floor(ub/2); |
|
JS = OptimizeSlantTree(trimmedBlock.GetSubblock( 0,0,0,0,tb,sb,v'b,u'b), lambda, topLeftSlantSubTree); | The procedure is recursively called for the four child sub-blocks: top-left, top-right, bottom-left and bottom-right. |
JS += OptimizeSlantTree(trimmedBlock.GetSubblock( 0,0,0,u’b,tb,sb,v'b,ub-u'b), lambda, topRightSlantSubTree); | |
JS += OptimizeSlantTree(trimmedBlock.GetSubblock( 0,0,v’b,0,tb,sb,vb-v'b, u’b), lambda, bottomLeftSlantSubTree); | |
JS += OptimizeSlantTree(trimmedBlock.GetSubblock( 0,0,v’b,u’b,tb,sb,vb-v'b, ub-u’b), lambda, bottomRightSlantSubTree); | |
if(J0 < JS) { | If the cost of splitting the block is greater than the cost of current block… |
slantTree.AppendLeafNode(bestSlant); | … a leaf node is appended to the tree… |
return(J0); | …and the cost of current block is returned. |
} |
|
slantTree.AppendSubTreeAsTopLeftChild(bestSlant, topLeftSlantSubTree); | If the cost of splitting the block is less than the cost of current block, the split operation is signaled in the tree by appending the four child blocks. |
slantTree.AppendSubTreeAsTopRightChild(bestSlant, topRightSlantSubTree); | |
slantTree.AppendSubTreeAsBottomLeftChild(bestSlant, bottomLeftSlantSubTree); | |
slantTree.AppendSubTreeAsBottomRightChild(bestSlant, bottomRightSlantSubTree); | |
return(JS); |
|
} |
|
Table F.4 — Slant tree encoding procedure
Procedure EncodeSlantTree(block, lambda, slantTree) { |
|
bestSlant = SlantTree.currentNode.slant; |
|
EncodeInteger(bestSlant+slantMax, numberOfSlantbits); |
|
slantedBlock = Slant(block, bestSlant); | Applies the Slant transform to the 4D block (Section F.3.2 and Table F.). |
trimmedBlock = Trimm(slantedBlock); | Applies the trimmimg to the 4D block (Section F.2.3 and Table F.). |
if(slantTree.currentNode.isLeaf == true) { |
|
Encode(isLeafNodeFlag); |
|
paddedBlock = Padd(trimmedBlock); | Applies the padding procedure to the 4D block (Section F.3.4 and Table F.). |
OptimizePartition(paddedBlock, lambda); | Finds the best partitition for the 4D block (Section B.2.2 and Table B.5). |
EncodePartition(paddedBlock, lambda); | Encodes the 4D block (Section B.2.4 and Table B.6). |
return; |
|
} |
|
Encode(isInternalNodeFlag); |
|
v'b = floor(vb/2); |
|
u'b = floor(ub/2); |
|
EncodeSlantTree(trimmedBlock.GetSubblock(0,0,0,0,tb,sb,v'b,u'b,lambda, SlantTree.topLeftChildSubTree); | If the current node is not a leaf node, the procedure is recursively called for the four child sub-blocks: top-left, top-right, bottom-left and bottom-right. |
EncodeSlantTree(trimmedBlock.GetSubblock(0,0,0,u’b,tb,sb,v'b,ub-u'b,lambda, SlantTree.topRightSubTree); | |
EncodeSlantTree(trimmedBlock.GetSubblock(0,0,v’b,0,tb,sb,vb-v'b,u’b,lambda, SlantTree.bottomLeftSlantSubTree); | |
EncodeSlantTree(trimmedBlock.GetSubblock(0,0,v’b,u’b,tb,sb,vb-v'b,ub-u’b,lambda, bottomRightSlantSubTree); | |
return; |
|
} |
|
Table F.5 — Find Optimum slant procedure
Procedure FindBestSlant(block, lambda) { | Finds the optimal slant (Figure F.7) |
optimumJ = infinity; |
|
optimumSlant = 0; |
|
valid = 0; |
|
for(slant = minSlant; slant < maxSlant; slant++) { | Iterates though the entire range of possible slant values. |
slantedBlock = Slant(block, slant); | Applies the Slant transform to the 4D block (Section F.3.2 and Table F.). |
trimmedBlock = Trim(slantedBlock); | Applies the trimmimg to the 4D block (Section F.2.3 and Table F.). |
[paddedBlock, fail] = Padd(trimmedBlock); | Applies the padding procedure to the 4D block (Section F.3.4 and Table F.11). |
If (!fail){ |
|
valid = 1; | Indicates that at least one Slant transformation tested resulted in a valid padded 4D block. |
transPaddBlock = 4DDCT(paddedBlock); | Applies 4D-DCT (Annex B.2.3.2) |
J = EvaluateCost(transPaddBlock, lambda, trimmedBlock.numberOfNonEmptyPixels); | Evaluates the Lagrangian cost (Table F.6). |
if(J < optimumJ) { |
|
optimumJ = J; |
|
optimumSlant = slant; |
|
} |
|
} |
|
} |
|
return(optimumSlant, optimumJ, valid); |
|
} |
|
Table F.6 — Evaluate Lagrangian cost procedure
Procedure EvaluateCost(block, lambda, numberOfNonEmptyPixels) { | Evaluates the lagrangian cost of a 4D block. |
rate = EvaluateRate(block, 30); | Evaluates the rate of a 4D block (Table F.). |
distortion = numberOfNonEmptyPixels*EvaluateDistortion(block) /bloc k.numberOfPixels; | Evaluates the rate of a 4D block. (Table F. ). |
return(distortion + lambda*rate); |
|
} |
|
Table F.7 — Evaluate rate procedure
Procedure EvaluateRate(block, currentBitPlane) { | Evaluates the number of bits used to encode a 4D block using the given bitplane |
if(currentBitPlane < minimumBitPlane) { | If the current bitplane is lower than the minimun bitplane, no bits are used to encode the block |
return(0); |
|
} |
|
if(block.numberOfPixels == 1) { |
|
rate = currentBitPlane-minimumBitPlane+1; |
|
if(abs(block.pixel[0][0][0][0]) >> minimumBitPlane !=0) { |
|
rate++; |
|
} |
|
return(rate); |
|
} |
|
significance = 0; |
|
threshold = 1 << currentBitPlane; |
|
for(iv = 0; iv < vb; iv++) { | Scans the block for any sample whose magnitude contains a bit “1” in the current or in any superior bitplane. |
for(iu = 0; iu < ub; iu++) { |
|
for(it = 0; it < tb; it++) { |
|
for(is = 0; is < sb; is++) { |
|
if(abs(block[it][is][iv][iu])>threshold){
|
|
significance = 1; |
|
goto search_end; |
|
} |
|
} |
|
} |
|
} |
|
search_end: |
|
if(significance == 0) { |
|
bits = EvaluateRate(block, currentBitPlane-1); | Scans the block for any sample whose magnitude contains a bit “1” in the current or in any superior bitplane. |
return(bits); |
|
} |
|
v'b = floor(vb/2); |
|
u'b = floor(ub/2); |
|
bits = EvatuateRate(block.GetSubblock(0, 0, 0, 0, tb, sb, v'b, u'b, currentBitPlane); | Recursivelly evaluates the bitrate for the four sub-blocks: top-left, top-right, bottom-left and bottom-right |
bits += EvaluateRate(block.GetSubblock(0, 0, 0, u’b, tb,sb, v'b, ub-u'b, currentBitPlane); | |
bits += EvaluateRate(block.GetSubblock(0, 0, v’b, 0, tb, sb, vb-v'b, u’b, currentBitPlane); | |
bits += EvaluateRate(block.GetSubblock(0, 0, v’b, u’b, tb, sb, vb-v'b, ub-u’b, currentBitPlane); | |
return(bits); |
|
} |
|
Table F.8 — Evaluate distortion procedure
Procedure EvaluateDistortion(block) { | Evaluates the distortion of encoding a 4D block using the minimum bitplane |
distortion = 0; |
|
for(iv = 0; iv < vb; iv++) { |
|
for(iu = 0; iu < ub; iu++) { |
|
for(it = 0; it < tb; it++) { |
|
for(is = 0; is < ib; is++) { |
|
magnitude = abs(block.pixel[it][is][iv][iu]); |
|
quantizedMagnitude = (magnitude >> minimumBitPlane) << minimumBitPlane; | Computes the quantized magnitude by setting to zero the bitplanes lower than the minimum |
if(quantizedMagnitude > 0){ |
|
quantizedMagnitude+=minimumBitPlane/2; |
|
} |
|
distortion += pow(magnitude- quantizedMagnitude, 2); | The distortion is defined as que square of the difference between the original and quantized magnitudes |
} |
|
} |
|
} |
|
} |
|
return(distortion); |
|
} |
|
Table F.9 — Slant block procedure
Procedure Slant(block, slant) { | Applies the Slant transform to the 4D block (Section F.3.2 and Table F.) |
scaleSlantU = 1; | Computes scale to be used in the slant transform from the ratio between the horizontal and vertical view array baselines (Section B.3.2.6.3). |
scaleSlantV = 1/BLRatio; | |
scaleSlantRatio = TO_FLOAT(BLRatio) / pow(2, 28); | Computes scale to be used in the slant transform from the ratio between the horizontal and vertical view array baselines (Section B.3.2.6.3). |
scaleSlantU = 1; |
|
scaleSlantV = 1/BLRatio; |
|
if(tb < sb) { |
|
pV = round(scaleSlantV*slant*(tb-1)/(sb-1)); | Computes the increments in the slanted block size, considering the scales according to vertical and horizontal baselines |
pU = round(scaleSlantU*slant); | |
} |
|
else { |
|
pV = round(scaleSlantV*slant); | Computes the increments in the slanted block size, considering the scales according to vertical and horizontal baselines |
pU = round(scaleSlantU*slant*(sb-1)/(tb-1)); | |
} |
|
if(tb == 1) pV = 0; |
|
if(sb == 1) pU = 0; |
|
for(iv = 0; iv < vb+abs(pV), iv++) { |
|
for(iu = 0; iu < ub+abs(pU); iu++) { |
|
for(it = 0; it < tb; it++) { |
|
for(is = 0; is < sb; is++) { |
|
slantedBlock.pixel[it][is][iv][iu] = EMPTY_VALUE; | Initializes the output 4D block with empty values |
} |
|
} |
|
} |
|
} |
|
for(iv = 0; iv < vb, iv++) { |
|
for(iu = 0; iu < ub; iu++) { |
|
for(it = 0; it < tb; it++) { |
|
posV = iv; |
|
if(tb > 1) { |
|
posV += round(Pv*(it/(tb-1)); | Computes new V position of slant-transformed sample |
if(pV > 0) posV += -round(pV); |
|
} |
|
for(is = 0; is < sb; is++) { |
|
posU = iu; |
|
if(sb > 1) { |
|
posU += round(pU*is/(sb-1)); | Computes new U position of slant-transformed sample |
if(pU > 0) posU += -round(pU); |
|
} |
|
if((0 <= posV < vb)&&(0 <= posU < ub)) { |
|
slantedBlock.pixel[it][is][iv][iu] = block.pixel[it][is][PosV][PosU]; | Populates the transformed output 4D block |
} |
|
} |
|
} |
|
} |
|
} |
|
return(slantedBlock); |
|
} |
|
Table F.10 — Trim block procedure
Procedure Trim(block) { | Applies the trimmimg to the 4D block (Section F.2.3). |
max_u = max_v = max_s = max_t = 0; |
|
min_u = ub-1; |
|
min_v = vb-1; |
|
min_s = sb-1; |
|
min_t = tb-1; |
|
for(iv = 0; iv < vb; iv++) { | Searches the 4D block for non-empty hyper-rectangular region |
for(iu = 0; iu < ub; iu++) { |
|
for(it = 0; it < tb; it++) { |
|
for(is = 0; is < ib; is++) { |
|
if(block.pixel[it][is][iv][iu] != EMPTY_VALUE) { | If a non-empty pixel is found, updates the non-empty hyper-rectangular region boundaries |
if(min_u > iu) min_u = iu; |
|
if(min_v > iv) min_v = iv; |
|
if(min_t > it) min_t = it; |
|
if(min_s > is) min_s = is; |
|
if(max_u < iu) max_u = iu; |
|
if(mx_v < iv) max_v = iv; |
|
if(max_t < it) max_t = it; |
|
If(max_s < is) max_s = is; |
|
} |
|
} |
|
} |
|
} |
|
} |
|
trimmedBlock=trimmedBlock.GetSubblock(min_t,min_s, min_v, min_u, max_t, max_s, max_v, max_u); | Extracts the non-empty hyper-rectangular region |
return(trimmedBlock , min_t, min_s, min_v, min_u); | Returns the trimmed block and its position within the input block |
} |
|
Table F.11 — Padd block procedure
Procedure Padd(block) { | Applies the padding procedure to the 4D block (Section F.3.4). |
for(iv = 0; iv < vb; iv++) { |
|
for(iu = 0; iu < ub; iu++) { |
|
for(it = 0; it < tb; it++) { |
|
lowerS = -1; |
|
searchS = 0; |
|
empty = 1; |
|
while((empty == 1)&&(searchS < sb)) { |
|
If (block.pixel[it][searchS][iv][iu]) != EMPTY_VALUE) { | Searches for empty pixels to be filled across S dimension, in the left of the slant-transformed samples |
empty = 0; |
|
lowerS = searchS; |
|
} |
|
searchS++; |
|
} |
|
for(is = 0; is < lowerS; is++) { |
|
Block.pixel[it][is][iv][iu] = block.pixel[it][lowerS][iv][iu]; | Fills empty pixels across S dimension, in the left of the slant-transformed samples |
} |
|
higherS = sb; |
|
searchS = sb-1; |
|
empty = 1; |
|
while((empty == 1)&&(searchS >= 0)) { |
|
If (block.pixel[it][searchS][iv][iu]) != EMPTY_VALUE) { | Searches for empty pixels to be filled across S dimension, in the right of the slant-transformed samples |
empty = 0; |
|
higherS = searchS; |
|
} |
|
searchS--; |
|
} |
|
for(is = higherS+1; is < sb; is++) { |
|
block.pixel[it][is][iv][iu] = block.pixel[it][higherS][iv][iu]; | Fills empty pixels across S dimension, in the right of the slant-transformed samples |
} |
|
} |
|
} |
|
} |
|
for(iv = 0; iv < vb; iv++) { |
|
for(iu = 0; iu < ub; iu++) { |
|
for(is = 0; is < sb; is++) { |
|
lowerT = -1; |
|
searchT = 0; |
|
empty = 1; |
|
while((empty == 1)&&(searchT < tb)) { |
|
if(block.pixel[searchT][is][iv][iu])!= EMPTY_VALUE) { | Searches for empty pixels to be filled across T dimension, on top of the slant-transformed samples |
empty = 0; |
|
lowerT = searchT; |
|
} |
|
searchT++; |
|
} |
|
for(it = 0; it < lowerT; it++) { |
|
block.pixel[it][is][iv][iu] = block.pixel[lowerT][is][iv][iu]; | Fills empty pixels across T dimension, on top of the slant-transformed samples |
} |
|
higherT = tb; |
|
searchT = tb-1; |
|
empty = 1; |
|
while((empty == 1)&&(searchT >= 0)) { |
|
if(block.pixel[searchT][is][iv][iu]) != EMPTY_VALUE) { | Searches for empty pixels to be filled across T dimension, in the bottom of the slant-transformed samples |
empty = 0; |
|
higherT = searchT; |
|
} |
|
searchT--; |
|
} |
|
for(it = higherT+1; it < tb; it++) { |
|
block.pixel[it][is][iv][iu] = block.pixel[higherT][is][iv][iu]; | Fills empty pixels across T dimension, in the bottom of the slant-transformed samples |
} |
|
} |
|
} |
|
} |
|
for(iv = 0; iv < vb; iv++) { | Iterates the 4D blocks samples to check if padding procedure was successful |
for(iu = 0; iu < ub; iu++) { |
|
for(it = 0; it < tb; it++) { |
|
for(is = 0; is < sb; is++) { |
|
if(if(block.pixel[it][is][iv][iu]) == EMPTY_VALUE)) { | The padding procedure fails if any block sample remains empty |
return(1); |
|
} |
|
} |
|
} |
|
} |
|
} |
|
return(0); |
|
} |
|
In this subclause, the decoding procedure of the codestreams for light field data encoded with the 4D Transform mode is specified. The codestream is signalled as payload of the Contiguous Codestream box defined in Annex A.3.4, Figure B.12 illustrates the decoder architecture.
This section specifies the marker and marker segment syntax and semantics defined by this document. These markers and marker segments provide codestream information for this document. Further, this subclause provides a marker and marker segment syntax that is designed to be used in future specifications that include this document as a normative reference.
This document does not include a definition of conformance. The parameter values of the syntax described in this annex are not intended to portray the capabilities required to be compliant.
This document uses markers and marker segments to delimit and signal the characteristics of the source image and codestream. This set of markers and marker segments is the minimal information needed to achieve the features of this document and is not a file format. A minimal file format is offered in Annexes A and B.
Figure F.8— Slanted 4D transform mode light field decoder architecture
The main header is a collection of markers and marker segments. The main header is found at the beginning of the codestream.
Every marker is two bytes long. The first byte consists of a single 0xFF byte. The second byte denotes the specific marker and can have any value in the range 0x01 to 0xFE. Many of these markers are already used in ITU-T Rec. T.81 | ISO/IEC 10918-1, ITU-T Rec. T.84 | ISO/IEC 10918-3, ITU-T Rec. T.800 | ISO/IEC 15444-1 and ITU-T Rec. T.801 | ISO/IEC 15444-2 and shall be regarded as reserved unless specifically used.
A marker segment includes a marker and associated parameters, called marker segment parameters. In every marker segment the first two bytes after the marker shall be an unsigned value that denotes the length in bytes of the marker segment parameters (including the two bytes of this length parameter but not the two bytes of the marker itself). When a marker segment that is not specified in the document appears in a codestream, the decoder shall use the length parameter to discard the marker segment.
Function: Provides information about the uncompressed light field such as the width and height of the subaperture views, number of subaperture views in rows and columns, number of components, component bit depth, number of 4D blocks and size of the 4D blocks, component bit depth for transform coefficients (Figure B.15):
The procedure to parse and decode a light field codestream as contained by the Contiguous Codestream box (Annex A.3.4), with dimensions (max_t, max_s, max_v, max_u) defined as ROW, COLUMN, HEIGHT and WIDTH in Figure B.15 (Table B.18), with block dimensions (tk, sk, vk, uk) defined as BLOCK-SIZE_t, BLOCK-SIZE_s, BLOCK-SIZE_v and BLOCK-SIZE_u in Figure B.15 and Table B.18, with a maximum number of bit-planes of max_bitplane (defined in Figure B.15 and Table B.18), is described in the pseudo-code (all variables are integers). The scan order is in the order t, s, v, u, as described in Table B.26.
The coordinate set (t, s, v, u) refers to the left, superior corner of the 4D block. The procedure “ResetArithmeticDecoder()” resets all the context model probabilities of the arithmetic decoder. The procedure “LocateContiguousCodestream” reads the pointer corresponding to position (t,s,v,u) and delivers the respective codestream to procedure “DecodeContiguous Codestream()”. The procedure “DecodeContiguousCodestream()” decodes the components of the block contained in the codestream enabling the sequential decoding of light field. The codestreams of the components are decoded sequentially.
The procedures that are specific to the decoder of the Slanted 4D transform mode are described in Tables F.12 to F.16.
Table F.12 — JPEG Pleno (JPL) codestream structure for Slanted 4D transform mode
Defined syntax | Simplified structure |
LightField() { |
|
SOC_marker() | Codestream_Header() |
LFC_marker_Sl() | |
Read all SCC_marker() | |
PNT_marker() | |
for(t=0; t<T; t+=BLOCK-SIZE_t){// scan order on t | Codestream_Body() |
for(s=0; s<S; s+= BLOCK-SIZE_s){// scan order on s |
|
for(v=0; v<V; v+= BLOCK-SIZE_v){// scan order on v |
|
for(u=0; u<U; u+= BLOCK-SIZE_u){// scan order on u |
|
for(c=0; c<NC-1; c++){ // scan order on colour components |
|
// Initializes the arithmetic decoder for each decoded 4D block codestream |
|
ResetArithmeticDecoder(); |
|
SOB_marker() |
|
if ( (TRNC) && ((T – BLOCK-SIZE_t) < t < T) ) { |
|
tk = T umod BLOCK-SIZE_t; } | |
else tk = BLOCK-SIZE_t; | |
if ( (TRNC) && ((S – BLOCK-SIZE_s) < s < S) ) { |
|
sk = S umod BLOCK-SIZE_s; } | |
else sk = BLOCK-SIZE_s; | |
if ( (TRNC) && ((V – BLOCK-SIZE_v) < v < V) ) { |
|
vk = V umod BLOCK-SIZE_v; } | |
else vk = BLOCK-SIZE_v; | |
if ( (TRNC) && ((U – BLOCK-SIZE_u) < u < U) ) { |
|
uk = U umod BLOCK-SIZE_u; } | |
else uk = BLOCK-SIZE_u; | |
LF.BlockAtPosition(t, s, v, u) = |
|
DecodeSlantTree(); | |
} // end of scan order on colour components loop |
|
} // end of scan order on u loop |
|
} // end of scan order on v loop |
|
} // end of scan order on s loop |
|
} // end of scan order on t loop |
|
EOC_marker() | Codestream_End() |
} |
Table F.13 — Decode Slant Tree procedure
Procedure DecodeSlantTree() { | Decodes the slant tree (Figure F.8). |
numberOfSlantBits = 1; |
|
value = 2; |
|
while(value-1 < slantMax) { | Computes the number of bits used to represent the slants within the valid range |
numberOfSlantBits++; |
|
value = value << 1; |
|
} |
|
numberOfSlantBits++; | Adds extra bit for the signal |
decodedBlock.zeros(); |
|
DecodeSlantTreeStep(decodedBlock); | Initializes the slant tree decoding from the root node (Table F.11). |
return(decodedBlock); |
|
} |
|
Table F.14 — Decode Slant Tree Step procedure
Procedure DecodeSlantTreeStep(decodedBlock) { | Decodes a node of the slant tree (Figure F.8). |
slant = DecodeInteger(numberOfSlantBits)-slantMax; | Recovers the slant parameter from the encoded bitstream |
slantedBlock = Slant(decodedBlock, slant); | Applies the Slant transform (Table F.12) and trims the 4D block (Section F.2.3) to find the correct block size |
[trimmedBlock, tp, sp, vp, up] = Trim(slantedBlock); | |
slantTreeFlag = DecodeInteger(1); | Recovers the partitioning from the codestream |
Block tempBlock; |
|
if(slantTreeFlag == LEAF_NODE) { | If the current node is a leaf node: |
ResetArithmeticDecoder(); | Defined in Table B.40 |
tempBlock = DecodeContiguousCodestream(tp, sp, vp, up); | Decodes the slant-transformed 4D block (Table B.27) |
} |
|
else { | If the current node is a split node: |
v'b = floor(vb/2); |
|
u'b = floor(ub/2); |
|
decodedSubBlock = DecodeSlantTreeStep(trimmedBlock.GetSubblock(0, 0, 0, 0, tb, sb, v'b, u'b); | Recursively calls the procedure to decode the top-left child block |
tempBlock.CopyFrom(decodedSubBlock, TOP_LEFT); | Copies the pixels from the top-left decoded child block into the top-left quarter of the parent block |
decodedSubBlock = DecodeSlantTreeStep (trimmedBlock.GetSubblock(0, 0, 0, u’b, tb, sb, v'b, ub-u'b) | Recursively calls the procedure to decode the top-right child block |
tempBlock.CopyFrom(decodedSubBlock, TOP_RIGHT); | Copies the pixels from the top-right decoded child block into the top-right quarter of the parent block |
decodedSubBlock = DecodeSlantTreeStep (trimmedBlock.GetSubblock(0, 0, v’b, 0, tb, sb, vb-v'b, u’b); | Recursively calls the procedure to decode the bottom-left child block |
tempBlock.CopyFrom(decodedSubBlock, BOTTOM_LEFT); | Copies the pixels from the bottom-left decoded child block into the bottom-left quarter of the parent block |
decodedSubBlock = DecodeSlantTreeStep (trimmedBlock.GetSubblock(0, 0, v’b, u’b, tb, sb, vb-v'b, ub-u’b); | Recursively calls the procedure to decode the bottom-right child block |
tempBlock.CopyFrom(decodedSubBlock, BOTTOM_RIGHT); | Copies the pixels from the bottom-right decoded child block into the bottom-right quarter of the parent block |
} |
|
decodedBlock = DeSlant(tempBlock, slant); | Applies the inverse Slant transform to the 4D block (Table F.12) |
} |
|
Table F.15 — Deslant block procedure
Procedure DeSlant(block, slant) { | Applies the inverse Slant transform to the 4D block (Section F.3.2 and Table F.). |
scaleSlantRatio = TO_FLOAT(BLRatio) / pow(2, 28); | Computes scale to be used in the slant transform from the ratio between the horizontal and vertical view array baselines (Section B.3.2.6.3). |
scaleSlantU = 1; |
|
scaleSlantV = 1/scaleSlantRatio; |
|
if(tb < sb) { |
|
pV = round(scaleSlantV*slant*(tb-1)/(sb-1)); | Computes the increments in the block size, considering the scales according to vertical and horizontal baselines |
pU = round(scaleSlantU*slant); | |
} | |
else { | |
pV = round(scaleSlantV*slant); | |
pU = round(scaleSlantU*slant*(sb-1)/(tb-1)); | |
} |
|
if(tb == 1) pV = 0; |
|
if(sb == 1) pU = 0; |
|
for(iv = 0; iv < vb-abs(pV), iv++) { |
|
for(iu = 0; iu < ub-abs(pU); iu++) { |
|
for(it = 0; it < tb; it++) { |
|
for(is = 0; is < sb; is++) { |
|
deSlantedBlock.pixel[it][is][iv][iu] = EMPTY_VALUE; | Initializes the output de-slanted 4D block with empty values |
} |
|
} |
|
} |
|
} |
|
for(iv = 0; iv < block.vb, iv++) { |
|
for(iu = 0; iu < block.ub; iu++) { |
|
for(it = 0; it < tb; it++) { |
|
posV = iv; |
|
if(tb > 1) { |
|
posV += round(Pv*(it/(tb-1)); | Computes new V position of the sample |
if(pV > 0) posV += -round(pV); |
|
} |
|
for(is = 0; is < sb; is++) { |
|
posU = iu; |
|
if(sb > 1) { |
|
posU += round(pU*is/(sb-1)); | Computes new U position of the sample |
if(pU > 0) posU += -round(pU); |
|
} |
|
if((0 <= posV < vb)&&(0 <= posU < ub)) { |
|
deSlantedBlock.pixel[it][is][PosV][PosU] = block.pixel[it][is][iv][iu]; | Populates the output de-slanted 4D block |
} |
|
} |
|
} |
|
} |
|
} |
|
return(deSlantedBlock); |
|
} |
|
Table F.16 — CopySubblock
Procedure CopySubblock(targetBlock, sourceSubblock, pt, ps, pv, pu) { | Copies samples from a source 4D block to a target 4D block |
for(it=0; it < sourceSubblock.tb; it++) { |
|
for(is=0; is < sourceSubblock.sb; is++) { |
|
for(iv=0; iv < sourceSubblock.vb; iv++) { |
|
for(iu=0; iu < sourceSubblock.ub; iu++) { |
|
targetBloc.pixel[it+pt][is+ps][iv+pv][iu+pu] = sourceSubblock.pixel[it][is][iv][iu]; |
|
} |
|
} |
|
} |
|
} |
|
} |
|
This annex defines three profiles for JPEG Pleno Light Field coding (ISO/IEC 21794-2 2ED). The first profile named baseline block-based profile comprises all coding tools belonging to the 4D transform mode coding path. The second profile named baseline view-based profile comprises all coding tools belonging to the 4D prediction mode coding path. The third profile named slanted block-based profile comprises all coding tools belonging to the Slanted 4D transform mode coding path.
Figure G.1 shows the relation between ISO/IEC 21794-2 2ED coding tools and profiles defined in this Annex.
Figure G.1 — Generic JPEG Pleno light field decoder architecture and associated profiles
The JPEG Pleno profile and level box (A.3.2) shall be used for signalling one of the three profiles setting the Ppih value according to the values defined in Table G.1. JPEG Pleno decoders that conform to the Baseline view-based profile shall deploy ISO/IEC 15444-1 codec for coding reference views (Annex C.3.1), normalized disparity views (Annex D.3.1), and residual images (Annex E.3.1).
Table G.2 shows the baseline block-based profile levels, Table G.3 shows the baseline view-based profile levels, and Table G.4 shows the slanted block-based profile levels.
A sample represents a single component value for channel. For example, if a light field has subaperture views, each view has resolution
, with three (3) channels (e.g. RGB) and has also five (5) disparity views with same resolution and a single channel (1), the total number of samples is computed as
.
Table G.1—JPEG Pleno profiles
Profile | Ppih |
Baseline block-based profile | 1 |
Baseline view-based profile | 2 |
Slanted block-based profile | 3 |
Table G.2— Baseline block-based profile levels
Level | Plev | Maximum Number of Samples | Maximum block dimension |
1 | 1 | 256M | 64 |
2 | 2 | 1024M | 96 |
3 | 3 | 4096M | 128 |
4 | 4 | 16384M | 192 |
The maximum bit precision for source light field samples is 16 bpp.
Table G.3— Baseline view-based profile levels
Level | Plev | Maximum Number of Samples |
1 | 1 | 256M |
2 | 2 | 1024M |
3 | 3 | 4096M |
4 | 4 | 16384M |
The maximum bit precision for source light field samples is 16 bpp.
Table G.4— Slanted block-based profile levels
Level | Plev | Maximum Number of Samples | Maximum padded block dimensiona |
1 | 1 | 256M | 64 |
2 | 2 | 1024M | 96 |
3 | 3 | 4096M | 128 |
4 | 4 | 16384M | 192 |
a Maximum padded block dimension = maximum (block dimension + abs(maximum_disparity) × number of views) | |||
The maximum bit precision for source light field samples is 16 bpp.
[1] HARTLEY R. and ZISSERMAN A. 2003. Multiple View Geometry in Computer Vision (2 ed.). Cambridge University Press, New York, NY, USA.
[2] ALVES G., DE CARVALHO M. B., PAGLIARI Carla L., GARCIA, P., SEIDEL I., PEREIRA M. P., SCHUELER C., TESTONI V., PEREIRA F., and DA SILVA E. A. B. . “The JPEG Pleno Light Field Coding Standard 4D-Transform Mode: How to Design an Efficient 4D-Native Codec”. IEEE Access, v. 8, p. 170807-170829, 2020.
[3] HU Y. SONG R., and LI Y. "Efficient Coarse-to-Fine Patch Match for Large Displacement Optical Flow," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 5704-5712.
[4] SHIN C. JEON H., YOON Y., KWEON I.S., and KIM S.J. "EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 4748-4757.
[5] RÜEFENACHT D., NAMAN A. T., MATHEW R. and TAUBMAN D., "Base-Anchored Model for Highly Scalable and Accessible Compression of Multiview Imagery," in IEEE Transactions on Image Processing, vol. 28, no. 7, pp. 3205-3218, July 2019.
[6] ASTOLA P., and TABUS I. “Light Field Compression of HDCA Images Combining Linear Prediction and JPEG 2000”, 2018 26th European Signal Processing Conference (EUSIPCO), pp. 1860-1864.
[7] ASTOLA P., and TABUS I. “WaSP: Hierarchical Warping, Merging, and Sparse Prediction for Light Field Image Compression”, in 2018 7th European Workshop on Visual Information Processing (EUVIP)
[8] CARVALHO M. B., PAGLIARI C. L., ALVES, G. A., SCHRETTER C., SCHELKENS P., PEREIRA, F. and SILVA, E. A. B. “Supporting Wider Baseline Light Fields in JPEG Pleno With a Novel Slanted 4D-DCT Coding Mode”. IEEE Access, v. 11, p. 28294-28317, 2023.
