The WebSocket protocol is a fairly simple one; regardless, understanding how it works is essential to understanding how to secure (and exploit) it. The protocol is comprised of two parts: a handshake and the data transfer.
The handshake is intended to be compatible with HTTP-based server-side software and intermediaries; therefore, the handshake is done via an HTTP Upgrade request.
The client initiates the handshake with the server:
-- CODE language-http --GET /chat HTTP/1.1
Sec-WebSocket-Protocol: chat, superchat
And the server responds:
-- CODE language-http --HTTP/1.1 101 Switching Protocols
In addition to the typical HTTP headers, this interaction includes some headers unique to the WebSocket handshake: the Sec-WebSocket-Protocol, Sec-WebSocket-Version, Sec-WebSocket-Key, and Sec-WebSocket-Accept headers. The Sec-WebSocket-Protocol header is sent by the client to tell the server what sub-protocols are acceptable to the client. The server selects one or none of the provided protocols and echoes it in the response to inform the client which sub-protocol was selected. The Sec-WebSocket-Version header simply designates which version of the WebSocket protocol the client wishes to communicate over. The Sec-WebSocket-Key and Sec-WebSocket-Accept headers are less self-explanatory. The Sec-WebSocket-Key header contains a base64-encoded nonce. The client sends this to the server, and the server responds with a hash of the key in the Sec-WebSocket-Accept header. This is used by the server to prove to the client that it received the client’s handshake request and to prevent the client from receiving cached server handshake responses and assuming that they came from the server.
If the Sec-WebSocket-Accept value matches the expected value and the server responds with a 101 HTTP status code, then the WebSocket connection will be established and frames can be sent between the server and the client.
The security model for WebSockets as defined in RFC6455 section 1.6 states that “the WebSocket Protocol uses the origin model used by web browsers to restrict which web pages can contact a WebSocket server when the WebSocket Protocol is used from a web page.” Ensuring that WebSocket connections can only be established with trusted origins defends against cross-origin attacks such as Cross-Site Request Forgery (CSRF). While this is defined in the standard, the WebSocket protocol does not intrinsically check the Origin header in the client’s handshake request. The task of ensuring that the handshake is coming from a trusted origin is left to the persons implementing the WebSocket protocol in their application. A weak or unimplemented check against the Origin header in the client’s handshake leaves the WebSocket open to abuse by an attacker. Before we can understand how to use a weak implementation of this check to attack WebSockets, we must first understand how CSRF attacks work.
CSRF attacks trick users into submitting authenticated requests to an application. They allow attackers to inherit the identity and privilege of their victim to make requests on their behalf. One way such an attack can work is by exploiting the properties of cookies. Cookies are stored in users’ browser and are scoped to a specific domain. When a user makes a request in their browser to a domain that a cookie is scoped to, the associated cookie is appended to the request being made and sent to the target site. If an application identifies users through the use of unique user identifiers in cookies, CSRF attacks can abuse this mechanism to forge requests to the target site from the victim’s browser. The cookies associated with the target site will be appended to the forged request, and when the server receives the request it will read the session cookie and handle the request with the privileges of the user that the session cookie is associated with.
For the attack to be meaningful, the attacker’s malicious website needs to be able to open a valid WebSocket connection as the victim and send and read messages that are sent over the connection. The simple web page below achieves just that.
When the victim visits the attacker’s website that is hosting the content shown above, the site will initiate the handshake and the victim’s browser will append their session cookies to the request.
-- CODE language-http --GET /chat HTTP/1.1
To protect against CSWSH and CSRF attacks alike, a few things can be done. The most common defense against these kinds of attacks is to implement a CSRF token. A CSRF token is an additional piece of authentication data that is unique to each request made by a user. This token can be added to each request—either in an HTTP header or the body of the request—then validated by the server before evaluating the rest of the request. Since the token is unique to each request made by the user, an attacker would not be able to insert the correct token into their forged payload, nullifying the request. Additionally, applications can check each requests’ Origin header to ensure that they are coming from a trusted source. When the CSRF payload attempting to initiate the WebSocket handshake is sent to the server, the Origin header is set to the attacker’s malicious website. Proper validation checks on the Origin header will see that the request is coming from an untrusted source and discard it. Another way to defend against this type of attack is to use a session management system that does not send user session identifiers through cookies. Session identifiers can be sent through HTTP headers which will not be accessible to the attacker or the browser at the time of the attack preventing an attacker from being able to authenticate to the target application as the victim.