Will AI Power the Next Leap in Video Compression?

Artificial intelligence’s (AI) role in the digital content economy isn’t merely confined to better anticipating what TV series you’d like to binge watch next. It’s increasingly being brought to bear to solve a more challenging (and equally long-standing) technical problem: how to shave off bits from a video file without fatally compromising quality.

While the advent of adaptive bit rate streaming and next-generation codecs like HEVC have ushered in significant video compression improvements, streaming service providers and encoding companies are still exploring ways to wring out greater efficiency—particularly as streaming services expand globally, to regions with slower Internet connections.

At the recent Mobile World Congress, Netflix demonstrated a new compression process, dubbed Dynamic Optimizer, that they claim leverages AI (specifically, machine learning) to better compress video frames. Working with university partners, Netflix collected data from viewers to understand what they perceived as a quality image across a variety of different metrics (degree of distortion, types of visual artifacts, etc.). They then used this data, alongside some more well-established video quality metrics, to train an algorithm to compress video frames in a manner that would preserve the elements deemed most important to the average viewer.

Essentially, Netflix was able to incorporate a variety of subjective human perceptions of video quality into an algorithm that could leverage that data to shape how a piece of video content should be compressed. The results, demonstrated at Mobile World Congress, appear to build from a 2016 paper detailing what was then called the Video Multimethod Assessment Fusion (memo to Netflix marketers, Dynamic Optimizer is way, way better).

According to Quartz’ Joon Ian Wong, Netflix showed off two implementations of the technology at Mobile World Congress. The first was a 555 kbps video stream that looked identical to one compressed with the Dynamic Optimizer at half the bandwidth. The second was a pair of 100 kbps video streams, one compressed with the Dynamic Optimizer, the other not. The later, Wong wrote, “appeared patchy and distorted” while the former was “dramatically” improved.

Netflix isn’t the only company to deploy AI to the challenge of improved encoding. At IBC 2016, Harmonic unveiled its EyeQ technology that uses AVC-based codecs and “the mechanics of the human visual system” to measure video quality in real time. Armed with this data, the system can better remove unnecessary (to the human eye) bits without degrading quality. Leveraging an understanding of how the human visual system works, EyeQ is able to identify individual shapes in a video image and prioritize motion over texture, contrast over luminance and faces above other objects when determining which bits can be lost and which must be preserved. According to Harmonic, the system is able to reduce bandwidth consumption by up to 50 percent.

Both the Netflix and Harmonic approaches point to a future where AI-driven software is layered on top of existing codecs to wring out still more improvements in video compression. But there are still challenges ahead. A major one is familiar to all programmers: the quality of the data.

As Netflix noted in its 2016 paper, the algorithm tasked with evaluating video data was trained on information that was gathered from consumers in a single viewing environment (in this case, on a TV at a standardized distance). Netflix, of course, is available on a range of devices and is viewable under a range of conditions. To further improve the algorithm, it has to be trained on a wider number of possible viewing experiences. That also includes the quality of the video stream that viewers are judging. As the paper notes, video titles in the Netflix library vary fairly widely in quality, so collecting data from human assessments of those videos may not yield universally applicable quality standards.

Still, those challenges seem surmountable and it appears likely that more OTT services will mimic this approach. Indeed, Netflix has open-sourced its Video Multimethod Assessment Fusion code so that other industry participants can contribute to its improvement.

Watching endless hours of TV may not be making people any smarter, but it does appear to be helping our software raise its I.Q.