Every website you visit may download a different collaboration app!
When I think of voice, video, or data on my PC, tablet, or smartphone, I think of applications over which I have some control. For example, I can choose if I want to download the Lync, Skype, GoToMeeting, WebEx, Vidyo, or Google Hangouts clients. Choice is preserved across all of my devices: I choose what I want on my device. I can also remove any of these apps any time I want.
But WebRTC changes all that.
In thinking about how WebRTC will be used, it is possible that many of the sites I visit may have their own unique real-time communications and collaboration application, and this application will be downloaded automatically in the Java Script my browser fetches from the web server...without any permission required on my part.
Given the variety in each of these communications applications, it would be very easy to inadvertently click on something that gave camera or microphone control to someone I don't know and don't care to know. How does WebRTC provide security for this brave new world of ubiquitous browser-based voice, video, and data?
I recently had the privilege of speaking with Eric Rescorla of RTFM, Inc. about this topic. Eric is the author of two IETF RTC-Web working group documents focusing on WebRTC security--one discusses WebRTC Security Considerations and the other proposes a WebRTC Security Architecture that satisfies these security considerations.
According to these documents, "RTC-Web communications are directly controlled by some Web server,...[and] a Web browser might expose a JavaScript API which allows a server to place a video call [unknowingly by the user]. Unrestricted access to such an API would allow any site which a user visited to "bug" a user's computer, capturing any activity which passed in front of their camera."
In this post, I will discuss WebRTC security and how it has been specifically formulated to protect users from unauthorized persons or sites creating malicious scripts that could take over control of the user's camera and microphone. It is important to recognize that WebRTC security is still under development in the working groups; hence, there may be some variation in the final specification from what is discussed below.
Because the user has no control over the Web servers visited during a browsing session, a key to making WebRTC secure is to make each browser the only trusted base for which security decisions can be made and to assume that any Web site could have malicious Java Code embedded therein. Furthermore, identity is at the heart of any decision to allow a Web-based application to have camera and microphone control.
WebRTC Calling Scenarios
Eric's security considerations document identifies several different WebRTC calling scenarios and what the user expectations are from those scenarios.
1. Using A Dedicated Calling Service: A user may establish a relationship with a Website that provides a calling service. This could be a site that effectively provides a "rendezvous" capability or directory for calling other people using WebRTC, or it could be a service that interconnects WebRTC with the PSTN or enterprise infrastructure. In this case, because there is an established trust relationship with the website, the user may want to give this service the ability to automatically access the camera and microphone. Social networking sites or gaming sites may be examples of a dedicated calling service.
However, by giving the site long-term authorization, the user is effectively also automatically giving the site the ability to "bug" the computer and make calls on the user's behalf. User expectation is that the site is not listening in on the calls and that the user can be sure the call is actually made to the intended person or entity.
2. Calling The Site Itself: Suppose a user looking for information goes to a support website or an e-commerce site or a vendor site and wishes to contact someone associated with the organization that owns the site. An easy way to do this with WebRTC would be for the site owner to put a button on the site with verbiage to this effect: "Click here to talk to a representative". The user assumes that he is actually calling the site he is visiting, and the expectation is that this site will be able to access the camera and microphone with the user's permission one time only, and only for the duration of the call.
3. Redirection and Calling: We are bombarded with advertisements on many of the sites we visit. Often these ads are served up by parties not even affiliated with the site we are visiting. If we click on an ad, we will often be redirected to a site we may not know and with which we have no relationship at all. The original site we visited may not even know if we were redirected by clicking on an ad. Users would expect that the site to which we were redirected would not be able to take over camera and microphone control without permission.
The above scenarios deal with the Web browser being pointed at a particular site, and the user at least having some control over allowing that site to make calls based on the site origin and the relationship. WebRTC must make sure that "origin-based" attacks can be avoided.
However, another kind of attack is also possible. This is done by network attackers, often known as man-in-the-middle attacks. These kind of attacks can be made when we use an unsecured network such as a hotspot or home Wi-Fi network. In this scenario, we use HTTP, rather than HTTPS (secure HTTP). While on the unsecured network we point our browser to a particular site unaffiliated with a calling service we may have authorized to make calls in our behalf. The attack proceeds as follows (per Eric's document):
1. I connect to http://anything.example.org/. Note that this site is unaffiliated with the calling service.
2. The attacker modifies my HTTP connection to inject an IFRAME (or a redirect) to http://calling-service.example.com.
3. The attacker forges my credentials at the calling service site, making the user's browser assume it is pointing at http://calling-service.example.com/ while the attacker injects JavaScript to initiate a call to himself.
Attacks can also be made while connected to secure HTTPS sites if that site fetches JavaScript from an unsecured, HTTP, site.
The Security Mechanisms Within WebRTC
This kind of scenario is pretty scary to think about. It reminds me of the rather sophisticated attack at the Iranian nuclear facilities where the microphones and video cameras were hacked by a third party who could see and hear what was going on within these facilities. How can WebRTC's security mechanisms prevent unauthorized parties from taking over our devices?
Security is based around trust, and in WebRTC, any security or trust property that the user needs enforced would need to be guaranteed by the browser. Realistically, however, in a working system, the browser must rely on other trusted sources. For example, if I log into a website that provides WebRTC rendezvous services (a directory), then I trust that website to assure that the other users I may wish to call are also authenticated. The website itself becomes the trusted identity provider. There are a number of other third-party identity providers such as BrowserID, Federated Google Login, Facebook Connect, LinkedIn, OAuth, OpenID, and WebFinger. WebRTC can use also these trusted third parties to verify a user's identity.
Here's how it works (see figure below).
Figure 1. The WebRTC security architecture.
User A and User B are both connected to the same secure website via HTTPS. They have also authenticated their identity using their credentials with either an external identity provider or with the website itself.
User A decides to call user B. This will likely be done by clicking on some type of a "call" button next to B's name. When A clicks on the call button, the Web server sends a message to the JavaScript running in A's browser that creates two peer connections: one for audio and one for video (assuming this is an audio and video call--a peer connection is needed for both media types). At this point, no security has been invoked and any website can proceed with WebRTC to this point.
Next, the calling application needs to actually get the audio and video from the microphone and camera. User A is presented with a "door hanger", which is a pop-up window that asks if the website can access user A's camera and microphone.
Figure 2. An example of a WebRTC "door hanger" asking the user for permission to access the microphone and camera (source: a real call using Uberconference.com).
This door hanger has two key elements:
1. It identifies which website is asking for use of the camera and microphone.
2. It gives you the option to allow or deny camera and microphone access.
There are differences in how the browsers currently implement camera and microphone access permissions. Chrome implements persistent permissions only, meaning that if a site is given permission once, it will always have permission. Firefox, on the other hand, has implemented one-time permissions, meaning that the user must always approve camera and microphone access regardless of how many times the same site is browsed to so as to invoke a call.
In addition, the browser window will always display an indicator showing the user that they are in a WebRTC call. If the indicator cannot be displayed, then the standard requires the call to be terminated.
Figure 3. Indicator showing that this site is using the camera or microphone in a WebRTC call (source: a real call using Uberconference.com).
Once user A gives permission to use the camera and microphone, the peer connection script contacts the identity server to get a token that binds user A's identity to his "fingerprint" (digital information uniquely identifying the user). Next, the peer connection looks up possible IP addresses through which the media can flow in order to traverse firewall or NAT devices (these IP addresses are actually ICE candidates--we won't go into the details of ICE, TURN, and STUN in this article, but they can and will be used with WebRTC to securely traverse network boundaries).
At this point, A's browser sends a "communications offer" to the Web server which in turn routes the offer to user B's Web browser. The Java Script on B's browser processes the offer, and the very first thing it does is contact A's identity server to verify that the identity of A in the offer is the same as the identity of A provided by the identity server. As mentioned above, the identity server may be external to the website or it could be the website itself. Once the identity is verified, the "trusted element" icon is shown in the browser URL address pane.
After verifying the identity of user A, user B's browser pops up a message indicating that there is an incoming call from A. If B accepts this invitation, B's browser sets up the peer connection, asks for permission to use the camera and microphone, contacts B's identity server, and returns a message to A containing B's security information, the media information, and the IP addresses needed to traverse B's firewall/NAT.
A's browser receives this message, and contacts B's identity server to verify B's identity. Once B's identity is verified, the two browsers can set up the audio and video exchange on the two media channels that each browser previously established.
The browsers exchange a Datagram Transport Layer Security (DTLS) handshake on every media channel (two channels in this case because there is both voice and video). Once these DTLS handshakes are completed, the media is encrypted and begins flowing between the browsers using Secure Real-time Protocol (SRTP).
Security in WebRTC is still a work in progress. Although this identity model has not yet been implemented in the browsers supporting WebRTC, it is under active development. It is also important to note that the identity server portion of the WebRTC security model will be optional and application specific so that people can make anonymous calls when needed or appropriate, such as when connecting to an ecommerce or support site.
It is also important to note that at any time during the call, if the user points the browser to a different website, the call is terminated because the JavaScript is torn down.
DTLS Versus SDES
There is consensus that DTLS will be mandatory for WebRTC to support. There is active debate over whether to allow SDES at all (see Laurent Philonenko's post on NoJitter.com). SDES, which is used more in the SIP world, would help interface more easily to existing SIP-based infrastructure. Mozilla Firefox currently does not support SDES while Google Chrome does.
The implication here is that if DTLS-SRTP is used, then there will need to be a border element between the WebRTC world and the SIP world. Hence, the border controller providers likely have an excellent future in front of them as secure WebRTC calling and communications applications become widely deployed.
Conclusion
WebRTC uses IETF communications protocols to assure that media and data flowing between browsers is secure. The level of security in a given call will depend on several factors as well as on the context of the communications application.
If HTTPS is not used or an HTTPS site pulls in JavaScript from an HTTP-only site, then there will be a lower level of security (the browser will also alert you that the page has both secure and non-secure data, so you can intelligently decide whether to continue). Furthermore, if there is no identity server involved, which will often be the case when a user goes to a simple calling service and logs on, then the level of security will be good but not as good as if there were an independent identity server.
Finally, anonymous calls can be made, but the security standard under development does suggest that the browser should allow one-time only camera and microphone access permissions.
WebRTC requires the communications application to ask the user if it can access the camera and microphone. There is some variation as to the persistence of the approval: persistent (Chrome) or one-time (Firefox). DTLS is required in WebRTC; there is active discussion about SDES being another security option.
Readers interested in trying WebRTC for themselves can go to www.uberconrerence.com and sign up for a free basic account (this is voice only), or alternatively, they may point their Chrome or Firefox 22 Beta browser to https://apprtc.appspot.com to try voice and video.