WebSocket on Rails by Action Cable

Published: Aug 3, 2024

In the web application domain, we hear some protocol names. Absolutely, HTTP or HTTPS is the most famous protocol that all web developers know. Although there’s a mechanism of Keep-Alive, a single request/response sequence with a single client/server is all done by HTTP. The client initiates the HTTP request to the server. Once the client receives the HTTP response from the server, communication finishes. As far as HTTP is used, the server just waits and waits. Only when the request comes in, the server can send back some data to the client. This communication style is surprisingly capable of doing many things, so most web applications are satisfied with HTTP.

But! Let’s think of a chat application. Two or more people write messages to each other. How do we see newer messages from other folks? Reload every time? Definitely, no. We don’t want to click a reload button every time to see newer ones. For this type of communication, a protocol called WebSocket has been created. Since then, WebSocket is used for a chat, multi-player game, online presence status, and more. This blog post is about WebSocket protocol and how Ruby on Rails handles it.

What is WebSocket protocol

WebSocket is a protocol defined by RFC6455. As many documents or blog posts explain, WebSocket provides a two-way interactive communication session over a single TCP connection. The two-way interactive communication is often called a full-duplex bidirectional communication. Not like HTTP, both client and server can send data each other.

WebSocket Handshake

The communication by WebSocket initially starts from normal HTTP request/response. Then the communication is upgraded to WebSocket. Once the bidirectional communication establishes, a single TCP connection is used to send/receive data until a client or server closes the connection.

The initial HTTP request/response sequence is called a WebSocket handshake. MDN (Mozilla Developer Network) document explains how the handshake goes. Suppose the server listens to the WebSocket request at http://example.com/chat, the client sends the HTTP request something like below:

GET /chat HTTP/1.1
Host: example.com:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

The HTTP request header above has Connection: Upgrade and Upgrade: websocket. These are keys to get started. Also, Sec-WebSocket-Key header field is there. The value is basically 16 bytes base 64 encoded string. Precisely, the forgiving-base64-encoded or isomorphic encoded is used to create a string. The key is used to avoid Cross-site WebSocket hijacking.

When the server receives the upgrade request, it returns something like below as a response.

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

The status code is 101. Upon receiving the response above, the protocol is upgraded to WebSocket. Among the response header field, we see a Sec-WebSocket-Accept field. It is the answer to Sec-WebSocket-Key from the client. The client can verify the server when it receives the Sec-WebSocket-Accept.

The Sec-WebSocket-Accept value is created by the steps below:

  1. Concatenate Sec-WebSocket-Key field value in the request and GUID value, 258EAFA5-E914-47DA-95CA-C5AB0DC85B11.
    The GUID value is always the same, a magic number.
  2. Take SHA1 hash and base64 encode.

In Ruby, the encoding can be done by Digest::SHA1.base64digest. Let’s try above Sec-WebSocket-Key and see if the same value of Sec-WebSocket-Accept will be created.

$ irb
irb(main):001> require 'digest'
=> false
irb(main):002> key = 'dGhlIHNhbXBsZSBub25jZQ=='
=> "dGhlIHNhbXBsZSBub25jZQ=="
irb(main):003> magic = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11'
=> "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"
irb(main):004> accpt = Digest::SHA1.base64digest(key + magic)
=> "s3pPLMBiTxaQ9kYGzzhZRbK+xOo="

img: WebSocket

WebSocket Heartbeat

After the WebSocket connection is established, a client or server can send a ping to a counterpart. The client or server who receives the ping must return a pong to the other side immediately or within a certain time frame. This is the heartbeat of the WebSocket.

The heartbeat by pinging is useful for handling the connection for the reasons below:

  • verify the client and server are connected
  • prevent firewalls and proxies from terminating inactive connections

That is all about WebSocket as a protocol.

Action Cable

Ruby on Rails supports WebSocket by Action Cable. The Action Cable was introduced on Rails 5. Just including (or not skipping) Action Cable makes WebSocket available to use. Let’s try Action Cable.

Create a Rails app

Rails is an all-inclusive type web framework. All features are supported by default. We don’t need to do anything special to enable Action Cable while creating a Rails app. Only when --minimal or --skip-action-cable option is specified, Action Cable is not included. Even when --api option is specified, Action Cable a.k.a WebSocket is supported.

For a simplicity, create a Rails app with default options.

$ rails new just-a-sample

Test WebSocket

Nothing special. Go to just-a-sample directory and start the Rails server.

$ cd just-a-sample
$ bin/rails s

That’s it. WebSocket is ready to accept the request. ActionCable.server is mounted on /cable by default, so using a curl command, try below HTTP request.

$ curl --http1.1 -i -N \
> -H 'Sec-Websocket-Version: 13' \
> -H 'Sec-Websocket-Key: QUo86XL2bHszCCpigvKqHg==' \
> -H "Connection: Upgrade" \
> -H "Upgrade: websocket" \
> -H "Origin: http://localhost:3000/cable" \
> http://localhost:3000/cable
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: 8dlVwoWynsF/RauFo6HjkWl7dLk=

�{"type":"welcome"}�${"type":"ping","message":1722608438}�${"type":"ping","message":1722608441}�${"type":"ping",
"message":1722608444}�${"type":"ping","message":1722608447}

As we see, the status code 101 is returned. The connection is upgraded to WebSocket, and Sec-WebSocket-Accept is returned. Right after the connection is established, a welcome message is sent back. Then, ping messages come in a fixed interval.

On the console that the Rails server is running, we see the message below:

Started GET "/cable" for ::1 at 2024-08-02 23:20:35 +0900
Started GET "/cable" [WebSocket] for ::1 at 2024-08-02 23:20:35 +0900
Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: websocket)

When, the curl command quits by hitting control-c, the message below shows up:

Finished "/cable" [WebSocket] for ::1 at 2024-08-02 23:20:48 +0900

Note

As we see, starting WebSocket on Rails is very easy, zero configuration. That’s a good news, but also a bad news. If the Rails app is created without –skip-action-cable option, the app accepts WebSocket requests. The app keeps sending ping frames to the clients who requested a WebSocket connection. What if many curl requests try to establish WebSocket connections? The ping frame is light weight, and the WebSocket connection can’t do anything except sending pings. The risk of data breach or session hijacking would be almost zero, however, DoS (DDoS) type attack might be possible.

It’s a good practice to specify –skip-action-cable option if the app doesn’t need WebSocket.

Comments and Discussions

GitHub Discussions: WebSocket on Rails by Action Cable #10

References

Latest Posts

Real-time App on Rails by Action Cable

The previous blog post, WebSocket on Rails by Action Cable, focused on WebSocket as a protocol. As in the previous post, by default, Rails app responds to WebSocket connection requests without any hassle. However, other than connecting and sending ping frames, it doesn’t do anything. This blog post focuses on an application side and explains how we can create a full-duplex, bidirectional app.

WebSocket on Rails by Action Cable

In the web application domain, we hear some protocol names. Absolutely, HTTP or HTTPS is the most famous protocol that all web developers know. Although there’s a mechanism of Keep-Alive, a single request/response sequence with a single client/server is all done by HTTP. The client initiates the HTTP request to the server. Once the client receives the HTTP response from the server, communication finishes. As far as HTTP is used, the server just waits and waits. Only when the request comes in, the server can send back some data to the client. This communication style is surprisingly capable of doing many things, so most web applications are satisfied with HTTP.

Conserving Network Resources -- Keep-Alive

“When you type a URL in your browser, what will happen?” If you are a web developer, you might have answered this sort of interview question once or twice. It would be a popular question to test the knowledge how the Internet works. As it is famous, you will find many answers here and there online.