Real-Time Streaming Data Meetup

On November 21 I was pleased to participate in a meetup entitled Real-Time Streaming Data.  The organizer of this meetup assembles a wide variety of presenters and topics under the umbrella topic of “Large-Scale Production Engineering.” Chris (the organizer) does a remarkable job of keeping a pipeline of interesting talks coming.  I’m particularly interested in the January talk humorously entitled “Whatever happened to IPV6?”

Continue reading Real-Time Streaming Data Meetup

Twisted: txWS and Autobahn and Resource together

The web is organized in terms of “resources”, and many web frameworks make it easy to define a website as a collection of resources managed by an HTTP Resource handler.  Twisted is no exception: it has an excellent Web Server that manages a forest of Resources.  In Twisted terms, its web server is a case of one particular type of Protocol handler: an HTTP server that manages resources.

Sometimes it’s convenient to organize a web-site as a collection of Protocol handlers that each manage a different portion of the Resource forest.  There are many reasons for doing this, but most come down to when the characteristics of some of the resources are better handled by one type of server over another.

Apache installations are configured to do this all the time.  It is customary for static resources (files) to be served by a handler that is optimized for file serving.  Dynamic portions of the website can then be handled by Rails or Django.  In common use, one organizes their web site as a collection of handlers, each mounted at a different URL prefix.

"/static/*"             StaticHandler
"/mywebsite/*"          DynamicHandler

Streaming uploads are another type of traffic that is often better served by its own type of protocol handler.  In Twisted, the default Resource handler buffers an entire request before handling it.  If your application wants to handle upload packets as they arrive (i.e., a streaming upload), you need a custom handler.

"/static/*"             StaticHandler
"/mywebsite/*"          DynamicHandler
"/uploads/*"            UploadHandler

WebSockets

And then there are websockets.  The WebSocket protocol has been evolving and changing over the past few years.  A WebSockets protocol implements a full-duplex channel between the client and server.  Right now, Twisted’s Resources don’t co-mingle with the two popular WebSockets implementations txWS and Autobahn.  Out of the box, Twisted makes it easy to set up a handler for an HTTP Site on different ports, but it doesn’t make it easy to set up two different Protocol handlers at different URI prefixes.

When I have needed to run a standard HTTP website along-side a Twisted WebSocket, I’ve organized my site something like the code below.

reactor.listenTCP(80, HTTPFactory())
reactor.listenTCP(8080, WebSocketFactory())

This isn’t exactly what I wanted to do.  What I wanted was something that dispatched HTTP requests to entirely different protocol handlers based on the URI prefix.

reactor.listenTCP(80,
 ProxyFactory(
   "/mywebsocket/*" = WebSocketFactory(),
   "/everythingelse/*" = HTTPFactory(),
 ))

I wanted to have a little proxy factory that could be configured to route a connection to the proper handler.  This way, I could write specialized Protocol handlers for special applications (streaming media in my case), and still use the proven parts of Twisted’s Resources for everything else.  This way I could drop in either one of the websockets toolkits (txWS or Autobahn) in a Twisted application.

A Proxy Toolkit

I wrote a little toolkit for proxying an incoming TCP connection and dispatching it to one of a collection of Twisted services.  The first few packets of the connection are collected for the dispatcher to make its decision.  These packets are replayed into the chosen service and then the original connection is spliced into the new service.  The service can implement a full-duplex connection like a WebSocket if needed.  A service may be out-of-process as well.  The toolkit can proxy traffic besides HTTP since it operates at Layer 4 of the OSI model.

You can read about the toolkit here at http://github.com/sheffler/StreamProx.  It comes with a couple of examples that show how to use it.

Update: 2012-11-05

Autobahn recently added a Resource handler. There is a Twisted repository branch adding txWS as a standard resource; it is not released yet.

Running Twisted Daemons with twistd

Twisted ships with a nice daemon runner called “twistd” that can do a lot of different things for your Twisted plugin or application.  It can set the UID/GID of your process, open up a log file and manage a PID file for you.   All of this is configured with command-line options to twistd.

Twisted Tower of the de Young Museum

While each of these options is well documented, the way in which they interact and the order in which these properties are applied to daemon creation are not.  Starting a daemon involves grabbing some ports, changing UID/GID, opening up and managing both a logfile and a PID file.  The management of the logfile and PID file is complicated by the fact that the UID of twistd changes between the time these files are created and when twistd wants to modify them.

This post explains the sequence of operations that twistd performs for starting a daemon defined in a TAC file on Unix.  We will consider only a minimal subset of the options that twistd handles so that we can focus on the interactions due to the daemon’s UID/GID changing.  Our assumptions are that:

  • twistd is started as user root,
  • our daemon will run at reduced privileges,
  • our application is defined in a TAC file and
  • we are discussing Unix only.

Along the way we’ll explain how the order in which twistd performs its operations creates a few “gotchas” that one might run into when setting up logging and a PID file.

An Overview of the twistd Daemon-starter

Although twistd actually has more options than are shown here, the ones that we are interested in for this article are summarized below.

twistd --uid=UID --gid=GID --umask=UMASK --chroot=CHROOT
     --rundir=RUNDIR --pidfile=PIDFILE --logfile=LOGFILE -y myapp.tac

Here is an outline illustrating the steps that twistd performs in order to create the daemon from the TAC file. We’ll describe each of the steps in turn.

  1. preApplication
    1. checkPID
  2. createApplication
    1. readTacFile
    2. startLogger
  3. postApplication
    1. setupEnvironment
      1. chroot
      2. chdir
      3. umask
      4. daemonize
      5. openPIDFile
    2. privilegedStartService
    3. switchUidGid
    4. startApplication
    5. startReactor
    6. removePIDfile

The “preApplication” phase is what is executed before the application is even created.

  • checkPID – this checks for the existence of a prior PID file and removes it if the old PID does not correspond to a currently running process.  This step is performed as user root

The “createApplication” phase deals with the instantiation of the object that defines a Twisted application.  An application is a service object created with a call to the function twisted.application.service.Application.

  • readTacFile – Read the contents of the TAC file “myapp.tac” as Python source code.  Evaluate it in a completely empty namespace. Upon completion return the value of the variable “application” (if there is one) and discard everything else.  Note: this step is performed as user root.  The code in the TAC file is free to import Python modules at will and can interact with the file system, but it is not able to access global Python state and can only return a single value.
  • startLogger – If the Application object returned in the previous step has not defined a logger, then give this application a default rotating logfile.  This step is performed as user root.  The logfile will be created with name LOGFILE and owned by user “root.”

The “postApplication” phase does most of the real work of running the Twisted application.

  • setupEnvironment – Call chroot(CHROOT), chdir(RUNDIR) and set the umask to UMASK.  Daemonize the process by doing the double-fork trick.  Lastly, create a PID file with name PIDFILE.  The PID file will be created by user “root.”
  • privilegedStartService – Grab the ports that are needed.
  • switchUidGid – Change the daemon’s user and group IDs to UID and GID.
  • startApplication – Call startService on our application’s service object.
  • startReactor – Start the Twisted event-reactor in motion.  It is while the reactor is running that the logfile will be rotated.  Log management will be done with the privileges of UID/GID.
  • removePIDFile –  After the reactor has finished, remove PIDFILE.  This will be done with permissions UID/GID.

Implications for Logfile Rotation

Using the configuration described here, the LOGFILE will be created as user “root” and group “root”, but rotated as user UID and group GID.  If you want rotation to work as advertised it is necessary to put the LOGFILE in a directory in which UID/GID has permissions to rename files.

Implications for PID file creation

The PIDFILE will be created as user “root” but when it comes time to remove it, the daemon process will have the permissions UID/GID.  If you want your daemon to be able to remove its PID file, then it would be placed in a directory in which UID/GID has permissions to remove files.

Conclusion

Twisted’s daemon-runner is a useful and well-tested program that has been in use for a long time.  Some of its side-effects are due to the order in which it performs its steps.  This note laid out some of these steps to explain how process permissions interact with logfile rotation and PID file removal.

Twisted: learning about Cred and Basic/Digest Authentication

One of Twisted’s best features is it’s credentials plug-in architecture.  Twisted abstracts the notion of username/password across many different protocols and password-protection schemes.  By abstracting the idea of “credentials” it is possible to implement one password checking facility and use it to protect both FTP and HTTP logins.  Twisted has a deep class hierarchy and a fairly significant learning curve however, so it’s difficult to get your head around some of the concepts until you dive in.

Twisted Matrix Labs

This article will share a little bit of what I’ve learned recently using Twisted.  I’ll show how to implement simple password protection of a static web resource using Basic and Digest auth schemes, and I’ll show an example that doesn’t quite work!  Then I’ll explain what is wrong with that example and fix it. The code snippets shown are complete programs that may be a good starting point for experimenting further.

Password Dictionary

The first example implements a password dictionary checker for a simple web site protected with Basic Auth.  The PasswordDictChecker class implements the interface “ICredentialsChecker.”  Checking a credential is performed by the method requestAvatarId: if the password given matches the password sent, then the username is returned, otherwise, the lookup fails.

Our HTTP Realm (class HttpPasswordRealm) maps authenticated users to resources. In our case, there is only one resource and it is fixed: self.myresource. Any authenticated user receives the same fixed resource regardless of their username.

Our web resource is a simple static example. We render the last part of the path as part of the page.

Our main program stitches all of the pieces together. We mount our resource at http://localhost:8081/example/foo, and it is protected by the passwords in the dictionary. Run this program and try it out. Any of the three passwords will get you to the same resource.

Incorrectly extending our First example to Digest Authentication

In the next example, we substitute a DigestCredentialFactory using md5 encryption for our BasicCredentialFactory of the first example.  This seems to be a straightforward substitution: instead of using the Basic auth scheme, I want Twisted to use the Digest auth scheme with the same password dictionary.  Unfortunately, this program does not allow access to the resource. It was not entirely obvious to me why not. If you run this, you will see that no matter what passwords you type, you are not granted access.  Twisted doesn’t complain, but it does deny you access to the resource.

For a while I believed that there was a problem with Twisted.  There was not: there was a problem with my understanding of the cred system. Our class PasswordDictChecker implements the interface for checkers.ICredentialsChecker – a credentials checker.  What I didn’t get my head around at first was the sub-interface specification.  Our first PasswordDict says that the “credentialsInterfaces” that it implements are only for credentials.IUsernamePassword. What I observed was that the code for requestAvatarId wasn’t even called in this case.  It turned out that the interface specification resulted in the credentials being rejected even before password comparison was being performed.

Upon further research I understood that the Digest authentication method did not supply credentials of the type IUsernamePassword. Instead, it supplies credentials of the type IUsernameHashedPassword. Because this “type” is not in the list of credentialInterfaces, the authentication machinery cannot find a checker for our Digest authentication scheme, and all logins fail.

Digest Authentication

Extending our example to Digest auth involves – adding credentials.IUsernameHashedPassword to the list of credentialInterfaces – abstracting password checking from the equals operator to using the checkPassword method In the example below, we’ve abstracted our simple password dictionary checker to one that abstracts to two types of credentials interfaces and uses the method call “checkPassword” to verify password validity.  This password checker is versatile enough to be used with both Basic and Digest authentication.  (It is also flexible enough to use with other protocols, such as FTP.) As a matter of fact, this password checker is essentially the same as one bundled with Twisted that is appropriately called: twisted.cred.InMemoryUsernamePasswordDatabaseDontUse.
(Do not use it because in-memory passwords can be a security hole.)

Conclusion

This article shows a straightforward implementation of password checker that authenticates against an in-memory dictionary. It used this checker with HTTP Basic auth, and then attempted to use the same checker with HTTP Digest auth. Using the abstraction facilities of the Twisted Cred system, we finally implemented a dictionary password checker that was compatible with either Basic or Digest auth.