AVX2-Optimized VVC (H.266) Decoding Filters

Overview

This project took place over Summer 2023 as part of the Google Summer of Code (GSoC). This project's objective was to implement the in-loop filter portion of the new Versatile Video Codec (VVC, H.266) decoder currently under development in FFmpeg. The In-Loop filters that I chose to implement are the Sample Adaptive Offset (SAO) and Deblocking filters. These filters are already implemented in C, and my objective was to implement x86_64 assembly versions that were more performant than the C "template" version.

The need for SIMD-accelereated code for this new decoder is clear - even with SIMD, software decoding of the VVC requires lots of computational power, and compilers are currently unable to optimize code to fully utilize the SIMD aspects of modern processors.

SAO Filter

SAO is an in-loop filter used in both HEVC and VVC to improve the objective quality of the reconstructed picture by supressing banding artifacts and ringing artifacts caused by quantization errors of high frequency components. The SAO filter can be split into two parts - a band filter and an edge filter.

As part of a "qualification task" to join FFmpeg's GSoC 2023 program, I had to bring over the SAO filter from HEVC to VVC. This is possible as the filter is largely unchanged between the two. During this period before GSoC, I was largely able to bring forward the band filter. However, the edge filter alluded me, and I was not able to complete that.

Thus, my first GSoC task was to complete the edge filter implementation. This was largely done by the middle of June, and my code was merged in this PR.

Deblocking Filter

As with the SAO filter, the deblocking filter can been split into two parts: the luma deblocking filter and the chroma deblocking filter. I chose to start with the chroma deblocking filter. The deblocking filters also shared commonality with the HEVC version to some degree - which I intended to port. I also tried writing the "stronger" version of these filters, but I was ultimately confined to only working on the parts that were already implemented in HEVC.

I chose to start with the chroma deblocking filter, as that had a checkasm test that would be easily modified to work with the new filter. This would guide me in working on the chroma deblocking filter. While the checkasm test did prove helpful, it also gave me a lot of issues!

The chroma deblocking filter took until early August to complete - mainly because I did not catch some pixel boundaries and I was getting lots of segfaults that I was not able to pin down. I then tried writing a version that respected theseboundries, but found myself stuck. However, I decided to benchmark the version I had that worked, and the results came back much more performant than I had expected, so my work will be submitted as such.

This left me with not a lot of time to work on the luma deblocking filter. I had honestly wished that I can say that the luma deblocker was complete to some level and in my final GSoC submission, but this is not to be. Complicating matters was the lease on the place I was renting ended on the 24th, leaving me very little time between moving to work on the filter. I had identified where the HEVC and VVC filters differed (mainly in accessing and calculating tc, and using tc to clip pixels to a limit), but YUView still shows lots of pixels where off-by-one errors on luma pixels occured.

Current State

  • AVX2 SAO Code: Merged

  • AVX2 Chroma Deblocker: Currently blocked by checkasm throwing an error on the GitHub Actions check. I am not sure what that is about, as on my system, checkasm reports all functions passses. Furthermore, all conformance checks pass, so I believe the issue is with checkasm and the Github Actions tester.

  • AVX2 Luma Deblocker: Still needs more work. I am not sure where the error is, as I have accounted for all tc changes. Further time will be needed for the Luma deblocker to be ready.

Out-of-scope(?) work

This was entirely out of scope of my original GSoC project, but I got a MacBook Pro with an ARM-based M1 Pro chip after I had submitted my GSoC proposal. One night, after bashing my head on the chroma filter to no end, I decided to work on a distraction, and did the exact same port of the SAO filter from HEVC to VVC on ARM NEON. I spent a few hours on this and it has been merged in this PR.

Future work

  • Luma Deblocker: the luma deblocker currently found in this PR and the codepack below does not work. Obviously, this needs to be fixed.

  • Chroma Deblocker: shift != 0 (which I believe is where the pixels are close to an end) should be handled in assembly. The strong case should also not be too hard

  • ARM NEON: Once the issues with the luma deblocker are complete and I have a sense of all the changes needed to port the luma deblocking filter, an ARM NEON port of the filter might be a good idea.

Before I go into the next part, I would like to say that I am not abandoning the FFvvc project after the end of GSoC. My availability might not be as good as I will have to juggle between classes, club activities, work, and this project. However, I will continue working on this to the best of my ability.

Where to find my code

Acknowledgement

I would like to extend my utmost appreciation to Nuo Mi, my mentor, who has been an amazing mentor to work with. Despite the timezone difference, Nuo Mi has been able to explain exactly what I needed in order to get to the point where I am right now. I would like to also thank the FFmpeg project as a whole for taking me up on this GSoC project.

I would like to thank my parents for being extremely supportive of me working through GSoC even if they can't quite grasp what I'm doing.

I would like to thank my roommates Tom, August, Hemant, and Kelton for keeping me sane over the summer while I was couped up in my room staring at assembly - thanks for all the beers!

And finally - thank you, the reader, for reading my submission/blog post! I'm glad I was able to keep your attention, and I hope you got something out of my overall GSoC experience!

Till Next Time, Shaun

Where I've been

The Quiet Kid

I'm not sure if it come across, but I've never exactly been the loud kid. If you've met me IRL, you know I tend to not have much to say, even if I'm pretty comfortable with you. The same goes for this blog!

The time in between the last post and this has been a whirlwind, though. I suppose an update is in order.

Another year in the bag

I took an entire year of purely compsci classes, and I seem to have done pretty well by all metrics! These classes were all of great interest to me, but even then I had to sell my soul to the devil for the grades I earned.

The classes that were particularly memorable were Operating Systems, Formal Languages and Automata Theory, and Introduction to Parallel Programming. The professors for these classes truly are GOATs, and the content were really engaging.

This was the first year that I TA'ed for, too. I would like to think I was a competent TA, if not a little too un-serious at times.

Summer of Everything

This summer is certainly a hectic one for me. Let's start from the top - I'm currently in the Google Summer of Code program, writing accelerated decoding filters for FFmpeg! You can read about exactly what I'm doing here! It's really exciting stuff, at the intersection of multimedia and low-level computing. x86_64 Assembly may be a daunting task, and throwing SIMD on top of that makes it super inaccessible, but it's really not that bad once you have a good infrastructure for macros, and FFmpeg certainly has that in spades!

The other main shenangian I've been up to is that I'm now Studio Engineer at Radio K! I'm in charge of the audio side of Off the Record! It's really an honor that I get to record the amazing musicians in this part of the world and witness it first-hand! You can hear my mixes as well as the amazing performances here!

One last thing is that ACM UMN is, after many long years, finally getting the upgraded 10-gigabit network! My tenure as systems administrator may be over, but it's all-hands-on-deck for this network upgrade, and I'm giving all the guidance I can to our new sysadmin in this transition!

Future of the Blog

My best friend recently talked to me about the lack of simple explainations for digital multimedia - she's had to rely on me to explain lots of stuff in digital audio and video. Essentially, there really isn't an accessible resource for this stuff.

I'm planning to write a short series of blogposts on this site explaining the stuff that goes into your codecs, from audio to video, and generally just nerd out about digital audio and video in an accessible manner! Sort of a more approachable version of xiphmont's amazing "Digital Media Primer". Stay tuned for those!

I'm also planning to write an article or two about x86_64 assembly, SIMD, and efficient memory access. I'm going to be a TA for Machine Architecture and Organization next semester, so not only will this be a resource for future students, but also gives me a chance to practice elegantly explaining this stuff!

Till next time, Shaun