This document describes the technique used to improve the chroma prediction in the Thor video codec.

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction

Modern video coding standards such as Thor [I-D.fuldseth-netvc-thor] form predictions for the luma channel (Y) and chroma channels (U and V) which are encoded separately (in that order). The prediction for each channel has spatial or temporal dependencies only in its own channel. Most of the perceived information of a video is to be found in the luma channel, but there still remain correlations between the luma and chroma channels. For instance, the same shape of an object can often be seen in all three channels, and if this correlation is not exploited, some structural information will be transmitted three times. Thor will attempt to improve the chroma prediction by finding linear relationships between the each of the initial chroma predictions and the luma prediction, and if certain criteria are satisfied, use that relationship to form a new prediction based on the reconstructed luma samples.

2. Definitions

2.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

3. Background

The improved predictions are derived from the reconstructed luma samples using a mapping. The underlying assumption is that the colours can be identified by their luminosities. Informally we can say that a new chroma prediction is formed from the reconstructed luma block painted with the colours of the initial chroma prediction.

            
c = a*y + b

Figure 1: Linear relationship

There is often a linear correlation between the luma and chroma channel, so that a chroma sample c can be expressed by the linear function

Since it would be too costly to transmit the values a and b in the linear mapping, and since both the encoder and decoder must be able to compute identical predictions, a and b are derived from data available to both using linear regression.

4. Computing the improved prediction

            
        _N_ _N_                  
        \   \                    
        /__ /__ (yr(i, j) - y(i, j)) ^ 2
        i=1 j=1                  
        -------------------------------- > 64
                       N*N

Figure 2: Requirement for improvement 1

Since the assumption that the correlation is the same in the predicted block and in the reconstructed block is not always true, the new prediction from luma might not be better even when there is a very good correlation in the predicted block. Therefore, we can only expected an improvement if the initial prediction is bad, and the luma residual is used as an estimate for this. The initial chroma prediction is kept unless the average squared difference between the reconstructed luma samples yr and the predicted y samples for an N*N prediction block is above 64:

            
        _N_ _N_                            _N_ _N_         
        \   \                              \   \           
 Ysum = /__ /__ y(i, j)             Csum = /__ /__ c(i, j) 
        i=1 j=1                            i=1 j=1         

        _N_ _N_                            _N_ _N_         
        \   \                              \   \           
YYsum = /__ /__ y(i, j) ^ 2        CCsum = /__ /__ c(i, j) ^ 2
        i=1 j=1                            i=1 j=1         
       
        _N_ _N_                  
        \   \                    
YCsum = /__ /__ y(i, j) * c(i, j)
        i=1 j=1

Figure 3: Equations for linear regression 1

The encoder and decoder must compute a and b using the same least square fit for an N*N prediction block, where y and c denote the luma and chroma samples in the initial prediction:

            
SSyy = YYsum - ((Ysum * Ysum) >> 2*log2(N))
SScc = CCsum - ((Csum * Csum) >> 2*log2(N))
SSyc = YCsum - ((YCsum * YCsum) >> 2*log2(N))

Figure 4: Equations for linear regression 2

These sums will all be contained within a 32 bit signed integer. Then the following must be computed using 64 bit arithmetic:

            
SSyy > 0 /\ 2 * SSyy * SSyy > SSyy * SScc

Figure 5: Requirement for improvement 2

            
a = (SSyc << 16) / SSyy
b = ((Csum << 16) - a * YCsum) >> 2*log2(N)

Figure 6: Equation for linear regression 3

            
c'(i, j) = clip((a * yr(i, j) + b) >> 16)

Figure 7: Improved chroma prediction

Still using 64 bit arithmetic, if

            
y'(i,j) = (y(2*i, 2*j)   + y(2*i+i, 2j) +
           y(2*i, 2*j+1) + y(2*i+1, 2*j+1) + 2) >> 2

Figure 8: Subsampling of predicted luma block

            
c(i, j) = (clip((a*yr(2*i, 2*j) + b) >> 16) +
           clip((a*yr(2*i+1, 2*j) + b) >> 16) +
           clip((a*yr(2*i, 2*j+1) + b) >> 16) +
           clip((a*yr(2*i+1, 2*j+1) + b) >> 16) + 2) >> 2

Figure 9: Subsampling of improved chroma prediction

The above assumes 4:4:4 format. For the 4:2:0 format the predicted luma block must be subsampled first:

In intra mode the chroma prediction improvement must be performed right after each transform, since the new chroma reconstruction will be used to predict the next block.

5. Performance

The improved chroma prediction may significantly improve the compression efficiency for images or video containing high correlations between the channels. It is particularly useful for encoding screen content, 4:4:4 content, high frequency content and "difficult" content where traditional prediction techniques perform poorly. Little quality change is seen for content not in these categories, but there is a general small increase in chroma PSNR.

An encoded configured for low delay and medium complexity was used for the following results. The numbers have been computed using the Bjontegaard Delta Rate (BDR [BDR]). The rates for Y, U and V have been shown separately.


+--------------+--------------------+--------------------+
|              |        4:4:4       |        4:2:0       |
+--------------+------+------+------+------+------+------+
|Sequence      |   Y  |   U  |   V  |   Y  |   U  |   V  |
+--------------+------+------+------+------+------+------+
|cad_waveform  |-14.2%|-17.5%|-16.1%| -3.7%| -5.2%| -5.3%|
|pcb_layout    | -4.8%| -7.1%| -8.2%| -1.1%| -1.8%| -1.5%|
|ppt_doc_xls   |-19.6%| -9.1%|-10.8%| -0.3%| -1.2%| -0.0%|
|vc_doc_sharing| -3.0%| -6.5%| -6.7%| -0.0%| -0.2%| -2.1%|
|web_browsing  | -0.5%| -0.8%| -0.8%| -0.7%| -3.6%| -1.1%|
|wordEditing   | -4.3%| -6.0%| -3.5%| -0.1%| -0.4%| -0.7%|
|park_joy      | -0.2%| -0.5%| -0.2%| -0.5%| -4.4%| -1.1%|
|old_town_cross| -0.2%| -1.4%| -0.7%| -0.0%| -4.2%| -1.7%|
+--------------+------+------+------+------+------+------+
|Average       | -5.9%| -6.1%| -5.9%| -0.8%| -2.6%| -1.7%|
+--------------+------+------+------+------+------+------+

Figure 10: Compression Performance, improved prediction for intra blocks only


+--------------+--------------------+--------------------+
|              |        4:4:4       |        4:2:0       |
+--------------+------+------+------+------+------+------+
|Sequence      |   Y  |   U  |   V  |   Y  |   U  |   V  |
+--------------+------+------+------+------+------+------+
|cad_waveform  |-22.6%|-27.9%|-25.9%| -2.8%| -3.9%| -3.7%|
|pcb_layout    |-18.9%|-27.1%|-20.5%| -1.1%| -1.8%| -1.6%|
|ppt_doc_xls   | -6.4%|-12.4%|-13.5%| -0.4%| -0.2%| -0.8%|
|vc_doc_sharing| -5.7%|-11.9%|-11.9%| -0.1%| -2.9%| -0.6%|
|web_browsing  | -1.4%| -1.8%| -1.8%| -0.6%| -1.0%| -1.2%|
|wordEditing   |-12.9%|-16.3%|-13.5%| -0.3%| -5.4%| -1.2%|
|park_joy      | -5,7%| -7.3%| -6.9%| -1.3%| -3.0%| -1.9%|
|old_town_cross| -1.9%| -2.4%| -2.4%| -0.2%| -4.9%| -1.7%|
+--------------+------+------+------+------+------+------+
|Average       | -9.4%|-13.4%|-12.1%| -0.8%| -2.8%| -1.7%|
+--------------+------+------+------+------+------+------+

Figure 11: Compression Performance, improved prediction using intra only coding


+--------------+--------------------+--------------------+
|              |        4:4:4       |        4:2:0       |
+--------------+------+------+------+------+------+------+
|Sequence      |   Y  |   U  |   V  |   Y  |   U  |   V  |
+--------------+------+------+------+------+------+------+
|cad_waveform  |-10.3%|-13.5%|-11.6%| -0.6%| -1.1%| -1.3%|
|pcb_layout    | -3.6%| -5.8%| -5.2%|  0.0%|  0.0%|  0.0%|
|ppt_doc_xls   | -1.1%| -0.6%| -0.5%|  0.0%|  0.0%|  0.0%|
|vc_doc_sharing| -0.0%|  0.0%| -1.5%|  0.0%| -0.1%|  0.1%|
|web_browsing  | -0.1%| -0.1%| -0.1%|  0.0%| -0.2%| -0.4%|
|wordEditing   | -9.2%|-13.3%|-13.1%|  0.0%| -0.1%|  0.1%|
|park_joy      | -1.3%| -7.1%| -1.1%| -0.3%| -8.0%| -1.5%|
|old_town_cross|  0.0%| -0.1%|  0.1%|  0.0%| -0.0%|  0.0%|
+--------------+------+------+------+------+------+------+
|Average       |-3.2% | -5.1%| -4.1%| -0.1%| -1.2%| -0.4%|
+--------------+------+------+------+------+------+------+

Figure 12: Compression Performance, improved prediction for inter blocks only


+--------------+--------------------+--------------------+
|              |        4:4:4       |        4:2:0       |
+--------------+------+------+------+------+------+------+
|Sequence      |   Y  |   U  |   V  |   Y  |   U  |   V  |
+--------------+------+------+------+------+------+------+
|cad_waveform  |-20.0%|-24.7%|-22.4%| -4.1%| -5.7%| -5.6%|
|pcb_layout    | -7.3%|-11.1%|-10.1%| -1.1%| -1.8%| -1.6%|
|ppt_doc_xls   |-19.6%| -8.9%| -9.0%| -0.3%| -1.2%| -0.8%|
|vc_doc_sharing| -3.2%| -6.5%|-10.1%|  0.2%| -0.0%| -0.5%|
|web_browsing  | -0.5%| -0.3%| -0.5%| -0.8%| -3.7%| -2.5%|
|wordEditing   | -9.3%|-14.1%|-13.9%| -0.1%| -1.0%| -0.6%|
|park_joy      | -1.4%| -7.4%| -1.2%| -0.8%| -9.9%| -1.4%|
|old_town_cross| -0.2%| -1.4%| -0.5%| -0.0%| -4.3%| -1.7%|
+--------------+------+------+------+------+------+------+
|Average       | -7.7%| -9.3%| -8.5%| -0.9%| -3.4%| -1.8%|
+--------------+------+------+------+------+------+------+

Figure 13: Compression Performance, improved prediction for intra and inter blocks

6. IANA Considerations

This document has no IANA considerations yet. TBD

7. Security Considerations

This document has no security considerations yet. TBD

8. Acknowledgments

The author would like to thank Arild Fuldseth and Mo Zanaty for reviewing this document and design.

9. References

9.1. Normative References

[I-D.fuldseth-netvc-thor]	Fuldseth, A., Bjontegaard, G., Midtskogen, S., Davies, T. and M. Zanaty, "Thor Video Codec", Internet-Draft draft-fuldseth-netvc-thor-02, March 2016.
[RFC2119]	Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.

9.2. Informative References

[BDR]

Bjontegaard, G., "Calculation of average PSNR differences between RD-curves", ITU-T SG16 Q6 VCEG-M33 , April 2001.

Author's Address

Steinar Midtskogen Cisco Lysaker, Norway EMail: stemidts@cisco.com

Abstract

Status of This Memo

Copyright Notice

Table of Contents