The WebM Open Media Project Blog

http://webmproject.blogspot.com/

Les articles publiés sur le site

  • Inside WebM Technology : The VP8 Alternate Reference Frame

    15 juin 2010, par noreply@blogger.com (John Luther)inside webm, vp8

    Since the WebM project was open-sourced just a week ago, we've seen blog posts and articles about its capabilities. As an open project, we welcome technical scrutiny and contributions that improve the codec. We know from our extensive testing that VP8 can match or exceed other leading codecs, but to get the best results, it helps to understand more about how the codec works. In this first of a series of blog posts, I'll explain some of the fundamental techniques in VP8, along with examples and metrics.

    The alternative reference frame is one of the most exciting quality innovations in VP8. Let’s delve into how VP8 uses these frames to improve prediction and thereby overall video quality.

    Alternate Reference Frames in VP8

    VP8 uses three types of reference frames for inter prediction: the last frame, a "golden" frame (one frame worth of decompressed data from the arbitrarily distant past) and an alternate reference frame. Overall, this design has a much smaller memory footprint on both encoders and decoders than designs with many more reference frames. In video compression, it is very rare for more than three reference frames to provide significant quality benefit, but the undesirable increase in memory footprint from the extra frames is substantial.

    Unlike other types of reference frames used in video compression, which are displayed to the user by the decoder, the VP8 alternate reference frame is decoded normally but is never shown to the user. It is used solely as a reference to improve inter prediction for other coded frames. Because alternate reference frames are not displayed, VP8 encoders can use them to transmit any data that are helpful to compression. For example, a VP8 encoder can construct one alternate reference frame from multiple source frames, or it can create an alternate reference frame using different macroblocks from hundreds of different video frames.

    The current VP8 implementation enables two different types of usage for the alternate reference frame: noise-reduced prediction and past/future directional prediction.

    Noise-Reduced Prediction

    The alternate reference frame is transmitted and decoded similar to other frames, hence its usage does not add extra computation in decoding. The VP8 encoder however is free to use more sophisticated processing to create them in off-line encoding. One application of the alternate reference frame is for noise-reduced prediction. In this application, the VP8 encoder uses multiple input source frames to construct one reference frame through temporal or spatial noise filtering. This "noise-free" alternate reference frame is then used to improve prediction for encoding subsequent frames.

    You can make use of this feature by setting ARNR parameters in VP8 encoding, where ARNR stands for "Alternate Reference Noise Reduction." A sample two-pass encoding setting with the parameters:

    --arnr-maxframes=5 --arnr-strength=3

    enables the encoder to use "5" consecutive input source frames to produce one alternate reference frame using a filtering strength of "3". Here is an example showing the quality benefit of using this experimental "ARNR" feature on the standard test clip "Hall Monitor." (Each line on the graph represents the quality of an encoded stream on a given clip at multiple datarates. The higher points on the Y axis (PSNR) indicates the stream with the better quality.)


    The only difference between the two curves in the graph is that VP8_ARNR was produced by encodings with ARNR parameters and VP8_NO_ARNR was not. As we can see from the graph, noise reduced prediction is very helpful to compression quality when encoding noisy sources. We've just started to explore this idea but have already seen strong improvements on noisy input clips similar to this "Hall Monitor." We feel there's a lot more we can do in this area.

    Improving Prediction without B Frames

    The lack of B frames in VP8 has sparked some discussion about its ability to achieve competitive compression efficiency. VP8 encoders, however, can make intelligent use of the golden reference and the alternate reference frames to compensate for this. The VP8 encoder can choose to transmit an alternate reference frame similar to a "future" frame, and encoding of subsequent frames can make use of information from the past (last frame and golden frame) and from the future (alternate reference frame). Effectively, this helps the encoder to achieve results similar to bidirectional (B frame) prediction without requiring frame reordering in the decoder. Running in two-pass encoding mode, compression can be improved in the VP8 encoder by using encoding parameters that enable lagged encoding and automatic placement of alternate reference frames:

    --auto-alt-ref=1 --lag-in-frames=16

    Used this way, the VP8 encoder can achieve improved prediction and compression efficiency without increasing the decoder’s complexity:


    In the video compression community, "Mobile and calendar" is known as a clip that benefits significantly from the usage of B frames. The graph above illustrates that the use of alternate reference frame benefits VP8 significantly without using B frames.

    Keep an eye on this blog for more posts about VP8 encoding. You can find more information on above encoding parameters or other detailed instructions to use with our VP8 encoders on our site, or join our discussion list.

    Yaowu Xu, Ph.D. is a codec engineer at Google.

  • VP8 Codec Optimization Update

    15 juin 2010, par noreply@blogger.com (John Luther)inside webm
    Since WebM launched in May, the team has been working hard to make the VP8 video codec faster. Our community members have contributed improvements, but there's more work to be done in some interesting areas related to performance (more on those below).


    Encoder


    The VP8 encoder is ripe for speed optimizations. Scott LaVarnway's efforts in writing an x86 assembly version of the quantizer will help in this goal significantly as the quantizer is called many times while the encoder makes decisions about how much detail from the image will be transmitted.

    For those of you eager to get involved, one piece of low-hanging fruit is writing a SIMD version of the ARNR temporal filtering code. Also, much of the assembly code only makes use of the SSE2 instruction set, and there surely are newer extensions that could be made use of. There are also redundant code removal and other general cleanup to be done; (Yaowu Xu has submitted some changes for these).

    At a higher level, someone can explore some alternative motion search strategies in the encoder. Eventually the motion search can be decoupled entirely to allow motion fields to be calculated elsewhere (for example, on a graphics processor).

    Decoder


    Decoder optimizations can bring higher resolutions and smoother playback to less powerful hardware.

    Jeff Muizelaar has submitted some changes which combine the IDCT and summation with the predicted block into a single function, helping us avoid storing the intermediate result, thus reducing memory transfers and avoiding cache pollution. This changes the assembly code in a fundamental way, so we will need to sync the other platforms up or switch them to a generic C implementation and accept the performance regression. Johann Koenig is working on implementing this change for ARM processors, and we'll merge these changes into the mainline soon.

    In addition, Tim Terriberry is attacking a different method of bounds checking on the "bool decoder." The bool decoder is performance-critical, as it is called several times for each bit in the input stream. The current code handles this check with a simple clamp in the innermost loops and a less-frequent copy into a circular buffer. This can be expensive at higher data rates. Tim's patch removes the circular buffer, but uses a more complex clamp in the innermost loops. These inner loops have historically been troublesome on embedded platforms.

    To contribute in these efforts, I've started working on rewriting higher-level parts of the decoder. I believe there is an opportunity to improve performance by paying better attention to data locality and cache layout, and reducing memory bus traffic in general. Another area I plan to explore is improving utilization in the multi-threaded decoder by separating the bitstream decoding from the rest of the image reconstruction, using work units larger than a single macroblock, and not tying functionality to a specific thread. To get involved in these areas, subscribe to the codec-devel mailing list and provide feedback on the code as it's written.

    Embedded Processors


    We want to optimize multiple platforms, not just desktops. Fritz Koenig has already started looking at the performance of VP8 on the Intel Atom platform. This platform need some attention as we wrote our current x86 assembly code with an out-of-order processor in mind. Since Atom is an in-order processor (much like the original Pentium), the instruction scheduling of all of the x86 assembly code needs to be reexamined. One option we're looking at is scheduling the code for the Atom processor and seeing if that impacts the performance on other x86 platforms such as the Via C3 and AMD Geode. This is shaping up to be a lot of work, but doing it would provide us with an opportunity to tighten up our assembly code.

    These issues, along with wanting to make better use of the larger register file on x86_64, may reignite every assembly programmer's (least?) favorite debate: whether or not to use intrinsics. Yunqing Wang has been experimenting with this a bit, but initial results aren't promising. If you have experience in dealing with a lot of assembly code across several similar-but-kinda-different platforms, these maintainability issues might be familiar to you. I hope you'll share your thoughts and experiences on the codec-devel mailing list.

    Optimizing codecs is an iterative (some would say never-ending) process, so stay tuned for more posts on the progress we're making, and by all means, start hacking yourself.

    It's exciting to see that we're starting to get substantial code contributions from developers outside of Google, and I look forward to more as WebM grows into a strong community effort.

    John Koleszar is a software engineer at Google.
  • New WebM Product Rollouts

    14 juin 2010, par noreply@blogger.com (John Luther)
    We are excited to see more WebM-enabled products launching every day.

    Mozilla has now officially enabled WebM playback in the nightly dev builds of Firefox. Read all about it on their blog.

    We also want to welcome several new supporters to the WebM project, including Oracle/Sun, Videantis, Flumotion, VMIX and others. See the most recent list on our supporters page.
  • New WebM Product Rollouts

    10 juin 2010, par noreply@blogger.com (John Luther)
    We are excited to see more WebM-enabled products launching every day.

    Mozilla has now officially enabled WebM playback in the nightly dev builds of Firefox. Read all about it on their blog.

    We also want to welcome several new supporters to the WebM project, including Oracle/Sun, Videantis, Flumotion, VMIX and others. See the most recent list on our supporters page.
  • Changes to the WebM Open Source License

    5 juin 2010, par noreply@blogger.com (John Luther)
    You'll see on the WebM license page and in our source code repositories that we've made a small change to our open source license. There were a couple of issues that popped up after we released WebM at Google I/O a couple weeks ago, specifically around how the patent clause was written.

    As it was originally written, if a patent action was brought against Google, the patent license terminated. This provision itself is not unusual in an OSS license, and similar provisions exist in the 2nd Apache License and in version 3 of the GPL. The twist was that ours terminated "any" rights and not just rights to the patents, which made our license GPLv3 and GPLv2 incompatible. Also, in doing this, we effectively created a potentially new open source copyright license, something we are loath to do.

    Using patent language borrowed from both the Apache and GPLv3 patent clauses, in this new iteration of the patent clause we've decoupled patents from copyright, thus preserving the pure BSD nature of the copyright license. This means we are no longer creating a new open source copyright license, and the patent grant can exist on its own. Additionally, we have updated the patent grant language to make it clearer that the grant includes the right to modify the code and give it to others. (We've updated the licensing FAQ to reflect these changes as well.)

    We've also added a definition for the "this implementation" language, to make that more clear.

    Thanks for your patience as we worked through this, and we hope you like, enjoy and (most importantly) use WebM and join with us in creating more freedom online. We had a lot of help on these changes, so thanks to our friends in open source and free software who traded many emails, often at odd hours, with us.

    Chris DiBona is the Open Source Programs Manager at Google.