Real-Time Streaming Data Meetup

On November 21 I was pleased to participate in a meetup entitled Real-Time Streaming Data.  The organizer of this meetup assembles a wide variety of presenters and topics under the umbrella topic of “Large-Scale Production Engineering.” Chris (the organizer) does a remarkable job of keeping a pipeline of interesting talks coming.  I’m particularly interested in the January talk humorously entitled “Whatever happened to IPV6?”

Continue reading Real-Time Streaming Data Meetup

Twisted: txWS and Autobahn and Resource together

The web is organized in terms of “resources”, and many web frameworks make it easy to define a website as a collection of resources managed by an HTTP Resource handler.  Twisted is no exception: it has an excellent Web Server that manages a forest of Resources.  In Twisted terms, its web server is a case of one particular type of Protocol handler: an HTTP server that manages resources.

Sometimes it’s convenient to organize a web-site as a collection of Protocol handlers that each manage a different portion of the Resource forest.  There are many reasons for doing this, but most come down to when the characteristics of some of the resources are better handled by one type of server over another.

Apache installations are configured to do this all the time.  It is customary for static resources (files) to be served by a handler that is optimized for file serving.  Dynamic portions of the website can then be handled by Rails or Django.  In common use, one organizes their web site as a collection of handlers, each mounted at a different URL prefix.

"/static/*"             StaticHandler
"/mywebsite/*"          DynamicHandler

Streaming uploads are another type of traffic that is often better served by its own type of protocol handler.  In Twisted, the default Resource handler buffers an entire request before handling it.  If your application wants to handle upload packets as they arrive (i.e., a streaming upload), you need a custom handler.

"/static/*"             StaticHandler
"/mywebsite/*"          DynamicHandler
"/uploads/*"            UploadHandler

WebSockets

And then there are websockets.  The WebSocket protocol has been evolving and changing over the past few years.  A WebSockets protocol implements a full-duplex channel between the client and server.  Right now, Twisted’s Resources don’t co-mingle with the two popular WebSockets implementations txWS and Autobahn.  Out of the box, Twisted makes it easy to set up a handler for an HTTP Site on different ports, but it doesn’t make it easy to set up two different Protocol handlers at different URI prefixes.

When I have needed to run a standard HTTP website along-side a Twisted WebSocket, I’ve organized my site something like the code below.

reactor.listenTCP(80, HTTPFactory())
reactor.listenTCP(8080, WebSocketFactory())

This isn’t exactly what I wanted to do.  What I wanted was something that dispatched HTTP requests to entirely different protocol handlers based on the URI prefix.

reactor.listenTCP(80,
 ProxyFactory(
   "/mywebsocket/*" = WebSocketFactory(),
   "/everythingelse/*" = HTTPFactory(),
 ))

I wanted to have a little proxy factory that could be configured to route a connection to the proper handler.  This way, I could write specialized Protocol handlers for special applications (streaming media in my case), and still use the proven parts of Twisted’s Resources for everything else.  This way I could drop in either one of the websockets toolkits (txWS or Autobahn) in a Twisted application.

A Proxy Toolkit

I wrote a little toolkit for proxying an incoming TCP connection and dispatching it to one of a collection of Twisted services.  The first few packets of the connection are collected for the dispatcher to make its decision.  These packets are replayed into the chosen service and then the original connection is spliced into the new service.  The service can implement a full-duplex connection like a WebSocket if needed.  A service may be out-of-process as well.  The toolkit can proxy traffic besides HTTP since it operates at Layer 4 of the OSI model.

You can read about the toolkit here at http://github.com/sheffler/StreamProx.  It comes with a couple of examples that show how to use it.

Update: 2012-11-05

Autobahn recently added a Resource handler. There is a Twisted repository branch adding txWS as a standard resource; it is not released yet.

Virtual Servers for Physical Devices

Sensr.net is a webcam site — but it is also much more.  Sensr.net provides software services for physical devices connected to the Internet like cameras. (see my DCS-920 article)

Sensr.net makes it easy for users to create virtual “camera servers” that run in the cloud.  A camera server is a collection of software that continuously monitors and processes camera images.  These services are hosted in the cloud and accessed over the web, providing “web services” for a user’s camera device.

The services that a Sensr.net camera server provides include the following:

  • connection points for uploading images through FTP or HTTP,
  • a Web server for browsing recorded images,
  • image processing (motion detection, motion regions, enhancement),
  • real-time live streaming and viewing,
  • event notification of camera status change through email and SMS,
  • long-term archival and storage,
  • portals for sharing through Facebook,
  • a programming API for developing your own applications.

One of the things that Sensr.net does really well is make it easy to Add and  Delete camera servers.  Behind these two operations are the set-up and tear-down of the virtual services allocated for a particular physical device.

Setting up a new camera is simple enough that using a temporary camera server to track some short-lived occasion is not that hard.  And since Sensr.net can use the camera built-in to most laptops, it’s pretty easy to add a new camera server for your laptop camera for a short-term suveillance operation like:

  • watching your hotel room while you’re out, or
  • looking for that pesky rodent in the backyard at night.

I allocated one to track the sunlight on a sick azalea bush for a day and discovered that it got no direct sun over the period of 24 hours.

Give Sensr.net a try.  You’ll find your own things to do with it once you have a disposable virtual camera server at hand.

WebSocket Upgrade Handshake. EventMachine HttpServer and Null characters.

I took advantage of some holiday downtime to experiment with WebSockets a bit.  As of December, Chrome implements HTML5 WebSockets natively.  I think we can expect more implementations shortly.

My experiments were inspired by this blog entry and my own ongoing experiments with EventMachine.  I learned two things from this exercise:

  1. The WebSocket Upgrade Handshake should be sent as a contiguous stream of bytes.
  2. Null characters in WebSockets streams may confuse existing HTTP Protocol handlers.

Upgrade Handshake

The websocket protocol (IETF Spec) is absolutely clear about the fact that the first three lines of the Upgrade Protocol Handshake must match “character-for-character” as laid out in the specification. I wanted to know if these first three lines could be sent as separate frames.  In experimenting with the Chrome implementation I learned that the three lines should be sent as one multi-line string.

Using an EventMachine Connection to listen on a port, I returned the handshake using three separate send_data calls.  Chrome failed to recognize this as a handshake by refusing to report an open connection.

send_data("HTTP/1.1 101 Web Socket Protocol Handshake\r\n")
send_data("Upgrade: WebSocket\r\n")
send_data("Connection: Upgrade\r\n")
send_data("WebSocket-Origin: #{origin}\r\n")
send_data("WebSocket-Location: #{location}\r\n")
send_data("\r\n")

I changed the handshake to the following.  This succeeded in creating an open connection between the browser and server.

upgrade = "HTTP/1.1 101 Web Socket Protocol Handshake\r\n"
upgrade << "Upgrade: WebSocket\r\n"
upgrade << "Connection: Upgrade\r\n"
upgrade << "WebSocket-Origin: #{origin}\r\n"
upgrade << "WebSocket-Location: #{location}\r\n"
upgrade << "\r\n"
send_data(upgrade)

Null Characters

In another experiment, I wanted to subclass EventMachine::HttpServer to modify it to be a WebSocket server. The idea was to detect WebSocket connections and upgrade the connection if a WebSocket is detected. This technique would use the HttpServer header parsing mechanism only and then would get the Request handler out of the way to allow the bytes that are sent in the socket stream to flow freely.

HttpServer has a mode (dont_accumulate_post) that tells it to call method (receive_post_data) on the bytes received in a Request body as they are received, rather than to accumulate the entire body. I reasoned that I could receive bytes in the WebSocket stream by using this option and overriding receive_post_data. This didn’t work. [Yes, I realize all of this is an abuse of HTTP.]

The reason that this didn’t work is that each message in a WebSocket stream starts with a Null (\000) byte. EventMachine::HttpServer is written in C and uses a string function (strpbrk) to do request body parsing. Most of what HttpServer sees is strings that start with Null.

Of course, the WebSockets protocol is not a subset of HTTP exactly, and I shouldn’t expect this to work, but now I know.

I did succeed in subclassing EventMachine::HttpServer to give me the byte streams. Here’s an outline of what I did. This puts a big conditional before the processing of every packet received, but it achives the effect I want. HttpServer parses the headers and I get the byte stream. My existing HTTP header-processing (cookies, sessions) works with my WebSocket stream.

class WsServer < EM::Connection
  include EM::HttpServer

  def post_init
    super
    no_environment_strings
    dont_accumulate_post        # dont read all of the post data
    @header_processing = true
  end

  def process_http_request
    ...
    @header_processing = false
  end

  def receive_data(*args)
    if @header_processing
      super
    else
      receive_post_data(args)
    end
  end

  def receive_post_data(x)
    puts "GOT POST DATA:#{x.inspect}:"
  end
end

Conclusion

The WebSocket protocol superficially looks like a subset of HTTP, and even the terminology “upgrade” suggests that it is only a refinement.  Strictly speaking however, it is a different (but related) protocol.  It will be a while until servers, caches and middleware get the details sorted out.