Network Working Group S. Midtskogen
Internet-Draft M. Zanaty
Intended status: Standards Track Cisco
Expires: April 21, 2016 October 19, 2015

Constrained Low Pass Filter
draft-midtskogen-netvc-clpf-00

Abstract

This document describes a low complexity filtering technique which is being used as a low pass loop filter in the Thor video codec.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on April 21, 2016.

Copyright Notice

Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

Modern video coding standards such as Thor [I-D.fuldseth-netvc-thor] include in-loop filters which correct artifacts introduced in the encoding process. Thor includes a deblocking filter which correct artifacts introduced by the block based nature of the encoding process, and a low pass filter correcting artifacts not corrected by the deblocking filter, in particular artifacts introduced by quantisation errors of transform coefficients and by the interpolation filter. Since in-loop filters have to be applied in both the encoder and decoder, it is highly desirable that these filters have low computational complexity.

2. Definitions

2.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

2.2. Terminology

This document will refer to a pixel X and its eight neighbouring pixel A, B, C, D, E, F, G, H ordered in the following pattern.


+---+---+---+
| A | B | C |
+---+---+---+
| D | X | E |
+---+---+---+
| F | G | H |
+---+---+---+

Figure 1: Filter pixel positions

In Thor the frames are divided into super blocks (SB) 64x64 pixels in size and each SB can be divided into smaller coding blocks (CB).

3. Filtering Process

3.1. General Approach

Given a pixel X and its neighbouring pixels described above an index for a 2^8 (256) entry lookup table can be constructed by comparing X with each of its neighbours and organise the results into an 8 bit binary number. For instance:


I = (A>X)*2^0 + (B>X)*2^1 + (C>X)*2^2 + (D>X)*2^3 +
    (E>X)*2^4 + (F>X)*2^5 + (G>X)*2^6 + (H>X)*2^7

Figure 2: Equation 1

Comparisons evaluate into 0 or 1. Each entry in the lookup table contains an offset by which X is to be increased. If this offset is 1, it has always the effect of making X more like its neighbours, thus it effectively is a low pass filter. If the offset is a number greater than 1, X may need to be clipped into its valid range. The table is shared by the encoder and the decoder. A simple way of constructing the table is to try different values for a single entry, set the others to 0 and find the value that gives the best improvement, if any, based on a squared error metric for a set of typical videos, and then repeat this process for every entry.

If a neighbour pixel is outside the image frame, it is given the same value as X. This rule can optionally be applied to block boundaries to allow blocks to be filtered independently in parallel with low memory requirements.

The same table can be used when forming an index using the less-than operator instead:


I = (A<X)*2^0 + (B<X)*2^1 + (C<X)*2^2 + (D<X)*2^3 +
    (E<X)*2^4 + (F<X)*2^5 + (G<X)*2^6 + (H<X)*2^7

Figure 3: Equation 2

In this case the entries must be regarded as negative offsets. That is, the offset by which X is to be decreased.

The effectiveness of the filter depends on the contents of the video frame and how different parts of the frame were predicted. In some cases the filter will have an adverse effect. To counter this the encoder tests the effectiveness and signals whether to apply the filter at frame level or at block level or at both levels. This signal, at frame level, block level or both, can also indicate which table to use in a set of different tables shared by the encoder and decoder. A set of tables could be one table made up of only 0s and 1s and the signal could be a scale factor for the entries which would eliminate the need of storing multiple tables.

To reduce the signaling cost at block level the filter can be disabled implicitly for certain block types for which the filter is expected to have little or sometimes adverse effects, such as for blocks without residual information or for bi-predictive blocks (since the bi-predictive averaging have low pass properties).

3.2. Simplification

It can be noted that some simplifications of the filter can be reduced to simple equations, such as:


X' = X + ((B>X)+(D>X)+(E>X)+(G>X) > 2) -
              ((B<X)+(D<X)+(E<X)+(G<X) > 2)

Figure 4: Equation 3

This is equivalent to a table having 1 for the entries at 26, 74, 82, 88, 90, and 0 for everything else. This simplification, which is used by Thor, can be rephrased as follows:

  • Every pixel is compared to the adjacent left, right, upper, lower pixel.
  • If at least three of them are greater, the center pixel is increased by one.
  • If at least three of them are smaller, the center pixel is decreased by one.

Thor does not filter across SB borders and has a binary signal at frame and SB level as well as at sequence level. Fully bi-predictive SBs are never filtered, nor uni-predictive if they lack residual information, and no signal is transmitted for these blocks. Other SBs are filtered (unless filtering has been disabled at frame or sequence level), but individual CBs within a SB are unchanged if they are bi-predictive or lack residual information. Note that B, D, E, G and X are the unfiltered values; the order in which the pixels are filtered does not matter.

The filter is applied after the deblocking filter.

The simplifications adopted by Thor reduce the computational complexity at a slight loss of compression effectiveness.

3.3. Complexity Considerations

Many modern computers have processors supporting SIMD (single instruction, multiple data) instructions, which perform the same operation on multiple data. The filter in Thor is particularily suited for such architectures since multiple comparisons, additions and subtractions of pixels and results can be performed with single instructions. The frame and block border restrictions can be implemented using a shuffle instruction, also frequently available in processors supporting SIMD instructions.

Since the filter does not depend on any filtered pixels within the same frame, further parallelism can be achieved on multi core architectures. And since there is no filtering across SB borders, the memory requirement for storing filtered values before copying them back into the original frame is very low.

4. Performance

The table below show filters effect on the bandwidth for a selection of 10 second video sequences encoded in Thor with uni-prediction only. The numbers have been computed using the Bjontegaard Delta Rate (BDR). BDR-low and BDR-high indicate the effect at low and bigh bitrates respectively.


+----------------+---------+---------+---------+
|Sequence        |   BDR   | BDR-low | BDR-high|
+----------------+---------+---------+---------+
|Kimono          | -1.509% | -0.766% | -2.510% |
|BasketballDrive | -2.860% | -1.571% | -4.514% |
|BQTerrace       | -6.634% | -3.838% | -7.997% |
|FourPeople      | -4.484% | -2.322% | -8.037% |
|Johnny          | -3.608% | -1.457% | -7.043% |
|ChangeSeats     | -4.744% | -1.900% | -8.280% |
|HeadAndShoulder | -6.697% | -0.670% |-14.925% |
|TelePresence    | -2.875% | -0.966% | -5.778% |
+----------------+---------+---------+---------+
|Average         | -4.176% | -1.686% | -7.386% |
+----------------+---------+---------+---------+

Figure 5: Compression Performance without Biprediction

Whilst the filter objectively performs better at relatively high bitrates, the subjective effect seems better at relatively low bitrates, and overall the subjective effect seems better than what the objective numbers suggest.

If bi-prediction is allowed, there is less bandwidth reduction as the table below shows.


+----------------+---------+---------+---------+
|Sequence        |   BDR   | BDR-low | BDR-high|
+----------------+---------+---------+---------+
|Kimono          | -0.909% | -0.413% | -1.535% |
|BasketballDrive | -1.159% | -0.843% | -1.542% |
|BQTerrace       | -1.641% | -1.067% | -2.011% |
|FourPeople      | -2.450% | -1.528% | -3.724% |
|Johnny          | -2.108% | -1.143% | -3.550% |
|ChangeSeats     | -2.462% | -1.157% | -4.107% |
|HeadAndShoulder | -2.412% | -0.909% | -4.483% |
|TelePresence    | -1.477% | -0.210% | -3.542% |
+----------------+---------+---------+---------+
|Average         | -1.827% | -0.909% | -3.062% |
+----------------+---------+---------+---------+

Figure 6: Compression Performance with Biprediction

5. IANA Considerations

This document has no IANA considerations yet. TBD

6. Security Considerations

This document has no security considerations yet. TBD

7. Acknowledgements

The authors would like to thank Arild Fuldseth and Gisle Bjontegaard for reviewing this document and design, and providing constructive feedback and direction.

8. Normative References

[I-D.fuldseth-netvc-thor] Fuldseth, A., Bjontegaard, G., Midtskogen, S., Davies, T. and M. Zanaty, "Thor Video Codec", Internet-Draft draft-fuldseth-netvc-thor-01, October 2015.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.

Authors' Addresses

Steinar Midtskogen Cisco Lysaker, Norway EMail: stemidts@cisco.com
Mo Zanaty Cisco RTP,NC, USA EMail: mzanaty@cisco.com