*Originally published on NoJitter.com

Vidyo is positioned to see tremendous benefit from a revenue and deployment perspective as WebRTC clients proliferate using VP9 SVC

Google and Vidyo jointly announced an agreement in which "Vidyo will develop a scalable video extension for the VP9 codec as part of the WebRTC client open source project". What does this really mean, and what will be the impact for both WebRTC and Vidyo?

This article will explore the implications of the announcement, but first will offer some background on the technology.

A Short Discussion of Video Encoding
Digital video must be compressed, using a video codec, so that it can be transmitted efficiently over a network. Although many video codecs exist, the most prevalent in the enterprise video conferencing industry today are H.263 and the newer H.264. Another codec, VP8, is the video codec currently available in the WebRTC implementations available via Google Chrome and Mozilla Firefox browsers. Google is currently working on the next iteration of its VP codec, VP9.

The H.264 codecs allow video to be compressed into bit rates that are half or less of the H.263 bit rates, for equivalent video quality. H.264 "AVC" or baseline profile codecs have been available since they were approved by the ITU in May 2003, and several clarifications or enhancements have been added since then.

One of the most significant enhancements, approved in November 2007, was H.264 Scalable Video Coding (H.264 Annex G). SVC leverages the same encoding techniques but allows the encoding engine to split the video into a base layer, called AVC, and several enhancement layers or streams. These enhancement layers can represent spatial resolution (screen size), temporal resolution (frame rate) or video image quality. Vidyo was the company that really brought H.264 SVC into the video conferencing world through its line of SVC-enabled Vidyo endpoints and infrastructure.

It is this additive capability of SVC layers that makes this encoding technique so compelling, because it eliminates the need for video transcoding and bridging devices. Even if some layers of the full video stream are removed, the resulting sub-layers form a valid video bit stream for target endpoints supporting lower quality . For example, a mobile phone, with a small screen, requires a much smaller amount of video information in order to show a high quality image on its small display; consequently, it does not need or use all of the SVC layers a telepresence system would require. Contrast this to a non-SVC call in which a transcoding video bridge would be required to connect systems with different resolutions to the same call.

Figure 1. H.264 SVC Introduces Temporal, Spatial and Quality Video Layers

It is the responsibility of the SVC-compliant endpoints to signal the capabilities they have to other endpoints and to any infrastructure participating in the call. Note that SVC does not use less bandwidth than AVC; it may actually increase bandwidth by 10% to 15% compared with AVC. But the tradeoff is worth it because the video infrastructure should in principle be less expensive.

SVC-encoded video performs better over networks with significant packet loss or with less available bandwidth; this is because it sends only those video layers that can make it through the network and which are then used in the decoding process to reconstruct the video image at a lower frame rate or possibly a lower image size or even at a lower video quality. H.264 AVC and H.264 SVC both require about half the bandwidth of the older H.263 codec, and it is anticipated that H.265 and VP9 will require about half of the bandwidth of their predecessors.

Compressing video using newer video codecs usually requires more CPU processing than does compressing a codec's earlier versions. Consequently, care must be taken when deploying a new version of a codec, because one must assure that the devices on which this video is to be compressed have enough processing power.

Not all SVC encoders are created equal. The standard really defines how to decode video, not encode it. So video encoders from different vendors will support varying video quality and bandwidth efficiencies. In principle, all encoders encoding the same video standard should at least interoperate at the base layer. The reality is that implementations from different vendors may not interoperate, even for the base layer, and SVC implementations certainly do not interoperate. In addition, some incompatibilities even for the same codec (H.264, for example) may arise due to proprietary signaling a vendor may choose to use.

Figure 2 below shows the video compression codecs used by major desktop video conferencing vendors.

Figure 2. Video Compression Codecs Used in Several Desktop Video Solutions


* Note that Lync 2013 does not support H.263. Lync 2010 does support H.264. Also see http://social.technet.microsoft.com/Forums/en-US/ocscapacityplanning/thread/8bb71480-64d8-47f3-b639-0f4b7d3320ff for more details on the Microsoft codecs.
** The Vidyo endpoints do not support H.263 nor H.264 AVC natively. A gateway is required to connect with these endpoints. Vidyo asked that H.263 and H.264 be placed in this list so that readers would not be misled into thinking that Vidyo does not support these older codecs at all.

A Short Discussion of Multipoint Video
The first question many video users ask after experiencing a point-to-point video call is how to have a video meeting with three or more people. There are basically two mechanisms for enabling multiparty video, depending upon which codecs and bridging hardware are being used: a Multipoint Control Unit (MCU) or a video media relay server.

Traditional MCUs
If multiple endpoints in a call are using single-layer codecs like H.264 AVC or H.263 (or earlier codecs), then an MCU is required for audio and/or video bridging. (This assumes continuous presence, i.e., video from multiple video endpoints viewable simultaneously on the same screen, sometimes called "Hollywood Squares" video). Each video endpoint enters into a point-to-point call with the MCU. The MCU receives video feeds from all endpoints and mixes both the audio streams and the video streams and then sends a single audio and a single video stream back to each endpoint.

In order to do this mixing, the MCU must first decode the audio and video streams. It then combines or mixes the audio, often mixing only two or three of the audio inputs with the most amplitude. Simultaneously, the MCU takes those images corresponding to the loudest audio inputs and puts them together in a smaller single image. It then re-encodes the audio and video, and returns these streams to the individual endpoints. (There is more processing than is described here; for example, there has to be some subtraction when mixing audio so that a speaker's own audio is not returned. However, for the purposes of this paper, the description here will suffice.).

Figure 3. How A Traditional MCU Mixes Video

MCUs exist as software running on a server or as dedicated hardware with Digital Signal Processing (DSP) chips. Large enterprises typically use hardware-based MCUs for performance reasons. By the nature of the processes involved, MCUs add some latency (typically less than 200 milliseconds) to a multipoint video conference. In addition to doing the processing necessary to create a composite video image, the MCU must have "jitter" buffers to reassemble packets that arrive out of order, a common occurrence on many networks. Also, because there are multiple encode/decode cycles, the video quality will slightly degrade.

Media Relay Servers for SVC
SVC codecs and the endpoints that support SVC have enabled a different way to provide multipoint video. These endpoints are able to encode and decode multiple streams simultaneously. An SVC multipoint video solution is controlled by a media relay server that determines which layers are sent to each connected endpoint. At least one H.264 SVC solution, that from Vidyo, also requires the media server even in point-to-point calls between Vidyo's H.264 SVC endpoints. In any case, as discussed above, each endpoint receives only those SVC layers it can properly decode based on an endpoint's screen size, processing power and the dynamically computed available bandwidth connecting the endpoint to the video router.

In an SVC solution, no video is mixed or transcoded (assuming all endpoints are SVC; if there is a mix of SVC and non-SVC endpoints, some mixing will still be required). For SVC endpoints, the media relay server replicates and routes video streams for each participant to the other endpoints without mixing. The SVC-compliant endpoint simultaneously decodes these multiple video streams, each with their own layers, and displays a multipoint image properly on its corresponding screen.

Because SVC media relay servers do not encode or decode the video, the video quality will be higher than when a MCU is used. In addition, routing video packets adds less latency than does a MCU (typically less than 10-20 milliseconds).

Figure 4. SVC Video Call: Media Relay Server Replicates and Routes Video Packets, Mixing Only The Audio

The Implication of Using SVC for WebRTC
One of the complaints about WebRTC video is that it requires a lot of bandwidth--typically between 300 kbps and 2 Mbps--and that there are few options available to control that bandwidth. Creating the new VP9 SVC codec will reduce bandwidth in two ways:

1. VP9 will be able to more efficiently compress the video, and may give equivalent image quality at half the bandwidth. This is a huge benefit to VP9 in WebRTC.

2. Using SVC technology will allow WebRTC developers to provide excellent video quality even on low bandwidth networks or networks with significant packet loss. The proof that it works is seeing the current Vidyo H.264 SVC implementation working on mobile devices like Android phones, iPhones, and iPads over Wi-Fi connections.

The current WebRTC deployment using VP8 does not scale particularly well beyond a small number of endpoints in a call, because each endpoint must make a direct connection with every other endpoint. There are some companies that either have or who are working on WebRTC MCUs so that the MCU infrastructure will mediate the need for so many video streams and so much bandwidth and processing power required of each endpoint in a multiparty video call.

With WebRTC VP9 SVC, point-to-point calls will work just fine. But multipoint calls will need a media router. It is the media router that will be able to provide the fine controls for routing video packets. Using the VP9 SVC codec, each endpoint encodes at the highest quality that it is capable of producing. The media server determines what packets to send to all the other participants based upon what resolution they want to display for each participant, balanced with the available bandwidth and computational power of the device. A website serving up WebRTC will be able to become the media router, or this function can be disaggregated to a specialized server that only does media routing.

The bottom line is that WebRTC based on VP9 SVC will require much less bandwidth than WebRTC based on VP8 does. Just as H.265 compared to H.264 will require more processing power, VP9 as compared to VP8 will likely require more processing power as well. Should mobile chipset manufacturers include VP9 SVC in their future chip designs, then mobile devices will be able to support VP9 SVC just as easily as they do H.264 SVC today.

The Implication of VP9 SVC for Vidyo
Vidyo as a company has had remarkable success providing video communications technology to end user companies and to OEM manufacturers who have incorporated Vidyo's video capabilities within their own products. Vidyo will provide the WebRTC browser endpoint SVC technology through Google to the WebRTC open source product; however, the "secret sauce" of controlling the video effectively remains highly valuable Vidyo proprietary technology. If WebRTC ultimately includes VP9 SVC as enabled by Vidyo's technology, then every web server that uses WebRTC potentially becomes a customer for Vidyo's media routing engine.

As Vidyo stated in a recent interview: "Application developers may create their own SVC media routers from scratch, or they can use ours. Nothing prevents them from using the WebRTC VP9 SVC capabilities." However, Vidyo has 38 patents issued for optimizing control and routing video, with more on the way, which are largely not part of the WebRTC project. The company is positioned to see tremendous benefit from a revenue and deployment perspective as WebRTC clients proliferate using VP9 SVC.