Generators in C

A “generator” is like a function except that it produces a  series of values, rather than a single return value.  Generators are a nice way to produce the values of a collection, one by one, and are often used with iterator constructs.  Generators have also found use as an event-driven scheduling construct and as the basis for light-weight thread packages. I used them to write a temporal expression package in Oroboro.

An example generator in Python is defined as shown below.  Each execution of the “yield” statement suspends execution of the function and produces a value.

def series_of_ints(n):
  i = 0
  while i < n:
    yield i
    i = i + 1
  return

In Python, it would be used like this.

g = series_of_ints(10)
for i in g.next():
  print "The next value is", i

The actual call to the generator-function “series_of_ints” is a factory function that produces a generator object.  The generator’s “next” method produces the next value in the sequence or halts if there are no more.

C does not have any such notion of generators.  Part of the reason is that generator functions need to save their state in the heap and not on the stack since they may be resumed.

I wanted to do an experiment and see what it would be like to implement something “generator-like” in C.   I do not particularly like what I came up with, but here is a very short explanation of how it works.  In my experiment, a generator is defined to be a function with a single argument of a special type.  All such generators have the same “stack profile” while active.  Since generators have the same stack profile, it is possible to JMP between the instructions of various generators using a technique similar to what’s been called a trampoline.  I take advantage of the special stack profile by writing a “generator helper macro” that sets up the stack and jumps to the next instruction of the generator using SETJMP.

Because all generators in my system must have the same stack profile, no generator function may use any local variables.  (Local variables are allocated on the stack.) Generators have to arrange to store their variables in the heap.  I defined a macro “V()” to help reference variables in the heap in a small effort to make the code prettier.

Here is an example of a counter generator that yields with a series of ints.  Between calls to the generator is instruction pointer and local variable value (“i”) are saved on the heap.

int counter1(slam_gen_p me)
{
  SLAM_GEN_STARTUP;

  for (V(i) = 0; V(i) < 10; V(i)++) {
    fprintf(stderr, "+++ gen:%s i:%d\n", me->name, V(i));
    SLAM_GEN_YIELD(V(i));
  }
 return 0;
}

The generator yields with a value, or terminates execution by returning 0.  Yielding and returning are two different things.

Use of the generator looks like this:

slam_gen_p gen1;

int i1;
gen1 = slam_gen_init(counter1);
do {
  i1 = slam_gen_next(gen1)
} while (i1 != 0);

The call to the “next” function resumes the generator.  The value i1 “returned” is 1 if the generator yielded and 0 if it returned.  The last value that the generator yielded with is available in its heap record through a macro.

You can view an example program here: slam_gen.c

This little experiment explored the following ideas:

  • restricting the stack-profile of all generators to be the same (a single arg) so that a helper could set up that profile
  • using SETJMP to jump from helper, into the generator and back out though YIELD
  • using return values to signify whether YIELD or EXIT was intended, not what the yielded value was
  • moving “local” variables into the heap and accessing them through a macro

In the end, I think this is just to ugly to actually use without actual compiler or interpreter support, and it may not be very portable. C is simply not the right language to use: stick with an interpreter.

4 thoughts on “Generators in C”

  1. It looks like ECMAScript 7 is going to add generators and the yield statement as an official part of the language. I learned this last night at a WebRTC meetup, from a researcher already using the feature. It will be available in browsers as well as node.js running on a server.

    https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/function*

    This particular researcher had developed a WebRTC media processing hub called Kurento that was programmed using Javascript. Using generators and the yield statement, he was able to describe complex chains of video processing elements. The alternative would have been a multitude of nested Javascript functions.

  2. In the years since I first wrote this, the Go programming language (http://golang.org) has gained serious traction. It implements a concurrency mechanism close to co-routines (named “Go-routines”) and includes full garbage-collection. “Go” is the way to go!

Leave a Reply

Your email address will not be published. Required fields are marked *