Ruby Sorcery Part 2: Ractor, Chapter 2

~9 min.read

In the previous chapter of this excurusion into Ractor1, the concept of the actor model was introduced and a toy TCP server was created. It was a naive implementation that created new Ractors for every TCP connection made to the server, and it looked like this:

require 'socket'

tcp_server = Ractor.new do
  server = TCPServer.new(1337)

  loop do
    Ractor.new(server.accept) do |client|
      loop do
        input = client.gets
        client.puts(input.upcase)
      end
    end
  end
end

While simple, this isn’t something you’d call ‘production ready’. There is at least one notable shortcoming: see if you can guess what it is, it will be addressed in a later chapter. Meanwhile, it’s a good time to turn this toy example into something more serious.

Building an HTTP server with Ractor

In the beginning, the HyperText Transfer Protocol was (and still is!) a purely text-based specification layered on top of TCP. It has grown a hell of a lot of complexity over the years, in order to power more complex interactions over the web, but it is still easier to build a server that understands HTTP (even just HTTP 1.1) than it is to build the browser that handles all of the rendering.

So, it sounds like a good time to turn this toy TCP server into a basic HTTP server, using Ractors, and by the end of this chapter it should be possible for the the server to understand a simple request:

GET /hello-world HTTP/1.1
Host: localhost:1337

There is, of course, a lot more to serving HTTP than this. For a start, there’s no HTTP response! And then there are the other HTTP methods. Much of that will be covered in the following chapters.

Parsing the request

The anatomy of an HTTP request is divided, essentially, into three parts:

GET /hello-world HTTP/1.1        | 1. Method, location/target, HTTP version
Host: localhost:1337             | 2. Headers (host, content-type, accept,
Content-Type: application/json       caching, user-agent, etc.)
                                 | Empty line to separate headers and body
{ data: "Hello, world!" }        | 3. Body
                                 | Final empty line to denote end of request

The first line states the request method (i.e. GET, POST, etc.) and the target of the request, which will often be a relative path on the server but can also be a full URL.

What follows is a list of headers, which are key/value pairs used to provide extra information about the request. For the sake of simplicity, most of these will be ignored, and there are many of them.

Finally, there is a place for the body of the request. This is optional, but it must have an empty line both before and afterwards if it is present. A typical request body may contain URLEncoded form data, which is how your typical HTML forms work, but there is not much of a restriction provided that the Content Type header describes the format of the payload, e.g if it’s JSON, XML, or perhaps even something like an image or a video.

In this chapter, the main focus is on the first section, and some of the second. And since the purpose of the post is to demonstrate Ractor, and because in the Actor model, everything is an actor… the thing that parses the request will be an Actor too.

Something like this should do the trick, and provide a foundation to build on.

require 'strscan'

HttpRequest = Struct.new(
  :method, :location, :version, :host, :content_type,
  keyword_init: true
)

HttpRequestParser = Ractor.new do
  while raw_req = StringScanner.new(receive)
    http_req = HttpRequest.new

    case raw_req.scan_until(/\n/).strip.split(" ")
    in "GET" => method, location, "HTTP/1.1" => http_version
      http_req.method = method
      http_req.location = location
      http_req.version = http_version
    end

    raw_req.scan_until(/(\r\n){2}/).split("\n").map(&:strip).each do |header|
      case header.split(": ")
      in "Content-Type", content_type
        http_req.content_type = content_type
      in "Host", host
        http_req.host = host
      else
        next
      end
    end

    # `move` the object as this Ractor no longer needs ownership
    # the Ractor that calls `take` will... take... ownership
    Ractor.yield(http_req, move: true)
  end
ensure
  raw_req.terminate
end

What we have here is a Ractor that waits for incoming HTTP request messages, and then parses them into something that the server can more easily work with by pulling out important info like the request location, the HTTP method, the content type, and the body. In this example, Ruby’s pattern matching features are liberally employed to handle the parsing in some places; this is more for the sake of demonstration to show that it can be done, not necessarily that it always should be.

In any case, once the HttpRequest object is constructed, it is yielded so that another Ractor can use the object, and therefore it will sit in a queue (or a mailbox, in actor model parlance) until it is taken from it. As a final housekeeping step, the string scanner instance used to parse the request is terminated. It’s always a good idea to clean up after yourself if the language provides you the mechanism to do so.

Going back to the functionality at hand; this basically shunts the parsing of HTTP requests into another thread, which means that the Ractors responsible for managing the TCP layer can stay responsible for that, and hand over the application-layer responsibilities to other actors/processes/Ractors.

The TCP server now requires an upgrade: it’s going to read input but it can no longer work on a line-by-line basis, because a HTTP message takes up many lines. The only thing we can really depend on is that it always ends with two carriage returns (CR-LF characters, or ‘\r\n’ in a string).

Ractor.new do
  tcp_server = TCPServer.new(1337)

  loop do
    Ractor.new(tcp_server.accept) do |client|
      HttpRequestParser.send(client.gets("\r\n\r\n"))
      request = HttpRequestParser.take
      client.puts("requested: #{request.location}")
      client.close
    end
  end
end

The most significant change, here, is that the innnermost Ractor sends input over to the new HttpRequestParser Ractor. It then immediately waits for a response. That seems a bit weird - why not just do it inline? - but that’s only because the job of the HTTP Parser is pretty basic right now, whereas in future a whole bunch of things can happen in between the TCP layer reading in some data, and the TCP layer sending back a bunch of HTML or JSON or some such.

In our toy examples, this works fine, but try this with many clients at once and you will experience chaos. This is because we’re using a single global ractor to parse input from any number of connections. Perhaps it shouldn’t be a ractor at all, or it should work a little differently. This will be addressed in another chapter, as it becomes clear that building a concurrent HTTP server isn’t as simple as it looks even when your concurrency primitives are threadsafe.

Note that this won’t work with curl yet, because the server isn’t returning an appropriate response.

With that said, it’s a good time to combine these two things, to make a functioning server:

require 'socket'
require 'strscan'

HttpRequest = Struct.new(
  :method, :location, :version, :host, :content_type, :headers, :body,
  keyword_init: true
)

HttpRequestParser = Ractor.new do
  while raw_req = StringScanner.new(receive)
    http_req = HttpRequest.new

    case raw_req.scan_until(/$/).strip.split(" ")
    in "GET" => method, location, "HTTP/1.1" => http_version
      http_req.method = method
      http_req.location = location
      http_req.version = http_version
    end

    raw_req.scan_until(/(\r\n){2}/).split("\n").map(&:strip).each do |header|
      case header.split(": ")
      in "Content-Type", content_type
        http_req.content_type = content_type
      in "Host", host
        http_req.host = host
      else
        next
      end
    end

    # `move` the object as this Ractor no longer needs ownership
    # the Ractor that calls `take` will... take... ownership
    Ractor.yield(http_req, move: true)
  end
ensure
  raw_req.terminate
end

Ractor.new do
  tcp_server = TCPServer.new(1337)

  loop do
    Ractor.new(tcp_server.accept) do |client|
      HttpRequestParser.send(client.gets("\r\n\r\n"))
      request = HttpRequestParser.take
      client.puts("requested: #{request.location}")
      client.close
    end
  end
end

Let’s see it in action!

It’s gonna take a little bit more work to turn this into a workable HTTP server, but let’s recap:

  • The TCP server now knows about HTTP, even if it’s just a little bit
  • There is a parser for HTTP requests which can learn how to parse more of the protocol in future
  • It primarily uses Ractors for communication

The next chapter will focus on creating a valid response, something that curl will like. Keep in mind that the primarily goal is to get something that works, warts and all, and later on it will be revisited, having learned more.


References

  1. https://www.kamelasa.dev/posts/ruby-sorcery-ractor.html↩︎
  2. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Trailer↩︎
  3. https://www.kamelasa.dev/posts/ruby-sorcery.html↩︎
  4. https://ruby-doc.com/core-3.0.0/Enumerator.html↩︎

Changelog

  • f8b5436 Fix typo at end of post
  • 3fe4dfa Formatting
  • b652ba7 More clarification
  • 950104d Formatting
  • 2046f14 Add clarification
  • ae95100 Fix syntax error
  • c963ebc More on Ractor
  • 20f61a8 draft: more ractor