Recommended listening: Gétatchèw Mèkurya - Ethiopian Urban Modern Music Vol. 5 (1972)
Here's my source code review of the coroutine
C library by an author called cloudwu. Figuring out this code was a great exercise in understanding cooperative scheduling, coroutines/fibers, and handling stacks in user-space. We'll focus on these concepts during the review. There's some clever bits that I absolutely loved as well, and about which I'm excited to share.
coroutine
is built atop the ucontext
library. I've dedicated the immediately subsequent section to a brief foray into ucontext
and its most commonly used APIs. Thereafter, we'll take what we know about ucontext
and review the coroutine source code.
ucontext
ucontext
handles much of the heavy lifting needed to implement something like threads i.e. saving registers, context switching, etc.
The core API is very simple and focused around the ucontext_t
struct:
typedef struct ucontext_t {
struct ucontext_t *uc_link;
stack_t uc_stack;
mcontext_t uc_mcontext;
// signals to block in the context
sigset_t uc_sigmask;
// ...
} ucontext_t;
It'll be helpful to first see some examples to understand what purpose each of these fields serves. We'll look at a few example programs.
#include <stdio.h>
#include <stdlib.h>
#include <ucontext.h>
static ucontext_t ctx1;
int main(void)
{
int x = 0;
getcontext(&ctx1);
int y = 0;
printf("x=%d, y=%d\n", ++x, ++y);
setcontext(&ctx1);
return EXIT_SUCCESS;
}
First, we have getcontext
. It accepts as input a ucontext_t
and will take whatever registers - including the program counter - we have in the current execution context and save them in this ctx1
context.
Next, we have setcontext
. This will cause the program to resume execution right after the getcontext
call. We'll therefore infinitely loop between the two calls.
Quite illustrative here is the x
and y
variables. The value of x
continues to increase while y
forever remains 1. This happens because setcontext
restores the registers and stack to that of the last getcontext
call, including the program counter (hence resuming execution back at the most recent getcontext
call). So when we jump back to the getcontext
line in the execution flow, y
- which is allocated on the stack - no longer exists. We're constantly re-declaring it and pushing it onto the stack.
Next, we have makecontext
and swapcontext
, which will be fundamental in cloudwu's coroutines implementation.
#include <stdio.h>
#include <stdlib.h>
#include <ucontext.h>
static ucontext_t context;
void handler(int arg1, int arg2) {
printf("Function called with arguments: %d, %d\n", arg1, arg2);
}
int main(void) {
char stack[2048];
getcontext(&context);
context.uc_stack.ss_sp = stack;
context.uc_stack.ss_size = sizeof(stack);
makecontext(&context, (void (*)())handler, 2, 10, 20);
printf("pre-swap\n");
setcontext(&context);
printf("Back to main function - will never run\n");
return EXIT_SUCCESS;
}
In main
, we leverage makecontext
, which allows us to create a new context with its own stack. Notice we'll also have to allocate a stack for the context. Using this context allows us to reset the program counter to a different function (handler
in the demo program). If you've ever written assembly, this is very analogous to a jmp
instruction. The third argument to makecontext
is an integer indicating the number of arguments we want to pass to the context handler, and any subsequent makecontext
arguments are the handler's arguments - the quantity of which, obviously, should align with however many arguments we pass.
Know that the handler arguments must be integers, which isn't particularly useful in practice. The coroutine
library handles this in a unique way that we'll see later.
In this example, we call setcontext
on this new context to immediately invoke handler
with 2 integer arguments. The default behavior after handler
finishes executing is to end the process. Thus, the string "Back to the main function - will never run\n"
will never be written to stdout.
We can control this behavior by leveraging the uc_link
field in the ucontext
struct. Setting this field tells the program to setcontext
to the context stored in uc_link
as soon as it finishes executing.
In the following example we set uc_link
to main_context
. As soon as handler
finishes running, execution will resume at getcontext(&main_context)
, causing an infinite loop.
#include <stdio.h>
#include <stdlib.h>
#include <ucontext.h>
static ucontext_t main_context;
static ucontext_t context;
void handler(int arg1, int arg2) {
printf("Function called with arguments: %d, %d\n", arg1, arg2);
}
int main() {
char stack[2048];
getcontext(&main_context);
getcontext(&context);
context.uc_stack.ss_sp = stack;
context.uc_stack.ss_size = sizeof(stack);
context.uc_link = &main_context;
makecontext(&context, (void (*)())handler, 2, 10, 20);
printf("pre-swap\n");
setcontext(&context);
printf("Back to main function\n");
return EXIT_SUCCESS;
}
This is really quite akin to what threads do, and I've heard that a lot of university courses teach ucontext
for toy thread scheduler implementations.
One can certainly see why - much of this is almost perfectly analogous to pthread_create
and pthread_join
superficially.
There's one more function we should look at: swapcontext
.
#include <stdio.h>
#include <stdlib.h>
#include <ucontext.h>
static ucontext_t main_context;
static ucontext_t context;
void handler(int arg1, int arg2) {
printf("Function called with arguments: %d, %d\n", arg1, arg2);
swapcontext(&context, &main_context);
}
int main() {
char stack[2048];
getcontext(&context);
context.uc_stack.ss_sp = stack;
context.uc_stack.ss_size = sizeof(stack);
makecontext(&context, (void (*)())handler, 2, 10, 20);
printf("pre-swap\n");
swapcontext(&main_context, &context);
printf("Back to main function\n");
return EXIT_SUCCESS;
}
In this program, we made a slight modification in that we're now calling swapcontext
. swapcontext
effectively calls getcontext
on the first argument, then setcontext
on the second. That is, we create a checkpoint of sorts at swapcontext
, then immediately begin executing handler
. Notice we don't need getcontext
to initialize main_context
, as the swap call does this for us.
After the printf
statement in handler
, we call swapcontext
again and resume execution just after the initial swapcontext
call in main
. So the program output looks like this:
pre-swap
Function called with arguments: 10, 20
Back to main function
This should be sufficient knowledge of ucontext
such that we can appreciate coroutine
.
Source Code Review: coroutine
A while back I found an implementation of coroutines written in C by someone called cloudwu. I bookmarked the library because it looked like a very concise and well-written implementation of coroutines that would be easy to digest. Coroutines are a flavor of cooperative multi-tasking centered around suspending and resuming execution flow. You might be familiar with the concept because of Go's stdlib implementation, Goroutines, or Kotlin's kotlinx.coroutines
library.
You can find the source code for coroutine
here (this is the exact revision I'll be reviewing). All code samples henceforth are authored by cloudwu, with occasional albeit slight modifications I've made to improve readability for the sake of code review (predominantly spacing), or to add annotations. I'll prefix all of my annotations with Z:
.
We begin with the example program, main.c, found in the root level directory.
#include "coroutine.h"
#include <stdio.h>
struct args {
int n;
};
static void
foo(struct schedule * S, void *ud) {
struct args * arg = ud;
int start = arg->n;
int i;
for (i=0;i<5;i++) {
printf("coroutine %d : %d\n",coroutine_running(S) , start + i);
coroutine_yield(S);
}
}
static void
test(struct schedule *S) {
struct args arg1 = { 0 };
struct args arg2 = { 100 };
int co1 = coroutine_new(S, foo, &arg1);
int co2 = coroutine_new(S, foo, &arg2);
printf("main start\n");
while (coroutine_status(S,co1) && coroutine_status(S,co2)) {
coroutine_resume(S,co1);
coroutine_resume(S,co2);
}
printf("main end\n");
}
int
main() {
struct schedule * S = coroutine_open();
test(S);
coroutine_close(S);
return 0;
}
In main
, we create a struct called schedule
, then pass it into a test
function where we initialize two coroutines and continually call coroutine_resume
on them until they are no longer active. If you run this program, you'll see that the coroutines take turns executing the foo
function each with their own local state. The output is as follows:
main start
coroutine 0 : 0
coroutine 1 : 100
coroutine 0 : 1
coroutine 1 : 101
coroutine 0 : 2
coroutine 1 : 102
coroutine 0 : 3
coroutine 1 : 103
coroutine 0 : 4
coroutine 1 : 104
main end
I was really quite interested to see how this logic is implemented. I decided to read the source code in order of execution. We'll therefore begin in main
, with the schedule
struct and coroutine_open
.
struct schedule {
char stack[STACK_SIZE];
ucontext_t main;
int nco;
int cap;
int running;
struct coroutine **co;
};
In coroutine
, a schedule
is an opaque struct that represents the runtime environment in which all of the coroutines execute and context switch. Despite being logically well-written, coroutine
uses unnecessarily terse names that make it difficult to understand. I've therefore summarized each field below to elucidate its purpose:
stack
is the stack for the current coroutine execution environmentmain
is effectively a checkpoint context for the coroutine. This is where we'll resume execution when a coroutine yields control back to the scheduler.nco
tracks the number of coroutines that have been added to an array thereofco
is said arraycap
is the capacity of the coroutines array,co
, which will be pre-initialized and realloc'd whennco
reachescap
running
is the index of the currently executing coroutine qua theco
array
coroutine_open
is a very simple initializer function for a schedule
. The initial coroutine capacity is 16 and the co
array's elements are initialized to 0 via memset
. The author does this because we will later need to perform NULL
checks against any one of these slots when inserting new coroutines.
struct schedule *
coroutine_open(void) {
struct schedule *S = malloc(sizeof(*S));
S->nco = 0;
S->cap = DEFAULT_COROUTINE;
S->running = -1;
S->co = malloc(sizeof(struct coroutine *) * S->cap);
memset(S->co, 0, sizeof(struct coroutine *) * S->cap);
return S;
}
Now we have our schedule
in main
and we pass it down through test
. We'll call coroutine_new
twice, initializing two coroutines. We will first look at the coroutine
struct, then the function:
struct coroutine {
coroutine_func func;
void *ud;
ucontext_t ctx;
struct schedule * sch;
ptrdiff_t cap;
ptrdiff_t size;
int status;
char *stack;
};
func
is the coroutine handler. This is the "work" performed by the coroutine.ud
is the coroutine handler arg. Later on, we will see how the library manages to pass this in.ctx
is the context.sch
is a pointer to theschedule
that manages this coroutine. As far as I can tell, it's not used in the library.cap
is the coroutine's stack capacitysize
is the coroutine's stack sizestack
is where we persist the coroutine's stack prior to yieldingstatus
indicates the coroutine's state -COROUTINE_DEAD
,COROUTINE_READY
,COROUTINE_RUNNING
, orCOROUTINE_SUSPEND
ptrdiff_t was a type I had not encountered before. The signed integer type of the result of subtracting two pointers, it exists primarily for pointer arithmetic. Interesting!
Here's the function coroutine_new
, annotated:
int
coroutine_new(struct schedule *S, coroutine_func func, void *ud) {
// Z: Just mallocs and sets all fields to defaults
struct coroutine *co = _co_new(S, func , ud);
// Z: If the num coroutines in the schedule is at or above capacity...
if (S->nco >= S->cap) {
// Z: Reallocate to have double the capacity and memset the memory.
int id = S->cap;
S->co = realloc(S->co, S->cap * 2 * sizeof(struct coroutine *));
memset(S->co + S->cap , 0 , sizeof(struct coroutine *) * S->cap);
// Z: Assign the new coroutine at the next available index (the old capacity)
S->co[S->cap] = co;
// Z: Now update the capacity
S->cap *= 2;
++S->nco;
return id;
} else {
// Z: We have enough capacity. Find the next available index
int i;
for (i = 0; i < S->cap; i++) {
// Z: Calculating this way always gives us the next available index
// and always cycles to the beginning if we run out of cap (shouldnt
// happen because of the realloc cond above)
int id = (i + S->nco) % S->cap;
if (S->co[id] == NULL) {~~~~
S->co[id] = co;
++S->nco;
return id;
}
}
}
// Z: Trigger a failure (this will be compiled out in release builds)
assert(0);
return -1;
}
Not much of interest in _co_new
, just a struct initializer. The coroutine is initialized with a status of COROUTINE_READY
. coroutine_new
is super basic, but I've annotated it nonetheless. The author has a very terse style that I'm not particularly fond of and it makes everything harder to understand than it should be. Really, this is your standard fare initialization logic with a few interesting clever bits.
One such clever bit that I like is the assert
towards the end. When targeting release builds in most C compilers, this will get stripped out but it serves its purpose during development as a safeguard against invariant violations that should never happen.
Okay, so back in main.c we've initialized two coroutines. Now we enter a while loop that won't break until either coroutine_status
handler returns a falsy value.
int co1 = coroutine_new(S, foo, &arg1);
int co2 = coroutine_new(S, foo, &arg2);
printf("main start\n");
while (coroutine_status(S, co1) && coroutine_status(S, co2)) {
coroutine_resume(S, co1);
coroutine_resume(S, co2);
}
coroutine_status
is very simple, but again, I have annotated it:
int
coroutine_status(struct schedule * S, int id) {
// Z: Maintain invariant: should be a positive index and below the capacity
assert(id>=0 && id < S->cap);
// Z: if the slot contains NULL, the coroutine is no longer active
if (S->co[id] == NULL) {
return COROUTINE_DEAD;
}
return S->co[id]->status;
}
Assuming neither coroutine has a status of COROUTINE_DEAD
, this loop will keep going. Inside the loop body, the program calls coroutine_resume
on both coroutines. coroutine_resume
is where things get interesting:
void
coroutine_resume(struct schedule * S, int id) {
assert(S->running == -1);
assert(id >=0 && id < S->cap);
struct coroutine *C = S->co[id];
if (C == NULL)
return;
int status = C->status;
switch(status) {
case COROUTINE_READY:
getcontext(&C->ctx);
C->ctx.uc_stack.ss_sp = S->stack;
C->ctx.uc_stack.ss_size = STACK_SIZE;
C->ctx.uc_link = &S->main;
S->running = id;
C->status = COROUTINE_RUNNING;
uintptr_t ptr = (uintptr_t)S;
makecontext(&C->ctx, (void (*)(void)) mainfunc, 2, (uint32_t)ptr, (uint32_t)(ptr>>32));
swapcontext(&S->main, &C->ctx);
break;
case COROUTINE_SUSPEND:
memcpy(S->stack + STACK_SIZE - C->size, C->stack, C->size);
S->running = id;
C->status = COROUTINE_RUNNING;
swapcontext(&S->main, &C->ctx);
break;
default:
assert(0);
}
}
It's difficult to read in its as-is state, so I'll break it down into three parts: the setup, the COROUTINE_READY
code path, and the COROUTINE_SUSPEND
code path.
Every call to coroutine_resume
begins with this preliminary logic, which I've annotated.
// Z: We shouldn't call resume while the schedule is in a running state.
// Should be set back to -1 after every run, so it shouldn't be possible
// for user code to violate this invariant. Hence why it's an `assert`.
assert(S->running == -1);
// Z: Ditto - we do a size and capacity check.
assert(id >=0 && id < S->cap);
// Z: Last, if the coroutine at the given index is NULL, we exit.
struct coroutine *C = S->co[id];
if (C == NULL)
return;
At this point in our systematic breakdown of main.c, we have just invoked coroutine_resume
for the first time, so we'll hit the COROUTINE_READY
case of the switch statement.
// Z: If the coroutine is COROUTINE_READY, create the context and prepare to
// run the coroutine
case COROUTINE_READY:
// Z: Setup the coroutine's context...
getcontext(&C->ctx);
// Z: ...and its stack
C->ctx.uc_stack.ss_sp = S->stack;
C->ctx.uc_stack.ss_size = STACK_SIZE;
// Z: S->main is where we will return after switching to ctx.
// Literally the swapcontext line below.
C->ctx.uc_link = &S->main;
S->running = id;
C->status = COROUTINE_RUNNING;
uintptr_t ptr = (uintptr_t)S;
// Z: Now, this looks interesting...
// makecontext expects two ints and on some architectures this means 32 bit
// values so we pass in the low 32 bits, and the high 32 bits of the
// schedule struct, then reconcile it in the context handler.
makecontext(&C->ctx, (void (*)(void)) mainfunc, 2, (uint32_t)ptr, (uint32_t)(ptr>>32));
// Z:
Run the coroutine. We'll swap the stack and reset the program counter to C->ctx, which is `mainfunc`.
// Calls getcontext on main and initializes it such that it resumes on the
// line after this swapcontext call.
swapcontext(&S->main, &C->ctx);
break;
You'll notice the pointers being passed in (and used in the next code sample) as arguments to the context handler. The manpage for swapcontext
explains this a bit.
On architectures where int and pointer types are the same size (e.g., x86-32, where both types are 32 bits), you may be able to get away with passing pointers as arguments to makecontext() following argc. However, doing this is not guaranteed to be portable, is undefined according to the standards, and won't work on architectures where pointers are larger than ints.
We'll look at mainfunc
next, as this is where the flow of execution is about to proceed. mainfunc
is brief, and as usual, I've annotated it liberally:
static void
mainfunc(uint32_t low32, uint32_t hi32) {
// Z: Reconcile the schedule pointer from the high and low 32 bits
uintptr_t ptr = (uintptr_t)low32 | ((uintptr_t)hi32 << 32);
struct schedule *S = (struct schedule *)ptr;
// Z: Grab the id of the running coroutine from the schedule
int id = S->running;
// Z: ...and grab the coroutine
struct coroutine *C = S->co[id];
// Z: Run the coroutine's handler function, passing along the coroutine handler argument.
// We'll want to peek back at main.c to remember what happens next.
C->func(S,C->ud);
// Z: If the program calls coroutine_yield, we'll context switch out of this function immediately
// and never hit this line. Thus, if the coroutine does not yield, it is effectively done and gets
// destroyed. This does not happen at this point in main.c.
_co_delete(C);
S->co[id] = NULL;
--S->nco;
S->running = -1;
}
We've hit the C->func
line and we are now in foo
- the coroutine handler - in main.c. Let's pick up there:
static void foo(struct schedule *S, void *ud) {
struct args *arg = ud;
int start = arg->n;
int i;
for (i = 0; i < 5; i++) {
printf("coroutine %d : %d\n", coroutine_running(S), start + i);
coroutine_yield(S);
}
}
After some work, we call coroutine_yield
, which will save the stack, call swapcontext
to context switch back to coroutine_resume
, right before the COROUTINE_READY
case's break
statement. This ultimately lands us back in main.c, about to call the coroutine_resume
on the second coroutine.
This is the "cooperative" multi-tasking characteristic of coroutines. At some point during a coroutine's execution, we simply save the stack, and context switch to some other coroutine, picking up where we left off.
Here's coroutine_yield
, where we suspend the running coroutine:
void
coroutine_yield(struct schedule * S) {
// Z: Another quick check - self explanatory
int id = S->running;
assert(id >= 0);
struct coroutine *C = S->co[id];
// Z: This one is interesting - we assert that the pointer address is equal to the stack address
// to prove we persisted the stack properly
assert((char *)&C > S->stack);
// Z: Persist the stack state
_save_stack(C, S->stack + STACK_SIZE);
C->status = COROUTINE_SUSPEND;
S->running = -1;
swapcontext(&C->ctx , &S->main);
}
Again, this swapcontext
call will jump back to the swapcontext
checkpoint at the end of the COROUTINE_READY
case statement. Now the first coroutine has suspended and the second coroutine does everything we just looked at.
After the second coroutine has run through all of these calls, we won't hit mainfunc
again until the end of the coroutine's lifetime. Back in main.c, coroutine_resume
is called with co1
for a second time.
This time, however, the coroutine's state is COROUTINE_SUSPEND
, so we hit the other case statement:
case COROUTINE_SUSPEND:
memcpy(S->stack + STACK_SIZE - C->size, C->stack, C->size);
S->running = id;
C->status = COROUTINE_RUNNING;
swapcontext(&S->main, &C->ctx);
break;
Here, we restore the coroutine's stack and swapcontext
again. This is where the code gets very interesting: because we've saved the stack, swapcontext
now context switches to the end of the coroutine_yield
stack frame. We immediately exit coroutine_yield
into the foo
handler function. The stack state is exactly as it was before, so we make another iteration of the loop and call coroutine_yield
, save the stack again, and return to the end of the COROUTINE_SUSPEND
block.
By the way, we should take a peek at _save_stack
. I've annotated it as usual
static void _save_stack(struct coroutine *C, char *top) {
// Z: Grab a reference of where we're at in memory
// This will be stored on the stack, so we'll use its address to figure out the size of the stack.
char dummy = 0;
// Z: top of the stack minus addr HERE should be less than the stack size
assert(top - &dummy <= STACK_SIZE);
// Z: update capacity and realloc if not enough
if (C->capacity < top - &dummy) {
free(C->stack);
C->capacity = top - &dummy;
C->stack = malloc(C->capacity);
}
C->size = top - &dummy;
// Z: starting at the address of dummy (the last thing pushed onto the stack),
// copy memory all the way up to size
memcpy(C->stack, &dummy, C->size);
}
Maybe this sort of trickery is typical among C programmers, but I thought it was rather clever; using a stack-allocated value's address to verifiably determine the current stack size.
This is the crux of the context switching logic. Both coroutines will continue to suspend and yield, context switching into the other. Each time we make a context switch, we restore the stack of the running coroutine, do a little work, save the stack again, and continue the cycle ad infinitum (or until we stop calling coroutine_resume
and coroutine_yield
).
And I really found this quite clever: the first time we finish executing foo
without calling coroutine_yield
, we're suddenly in the same stack frame where mainfunc
was initially invoked. Execution picks up after the C->func
call and the coroutine is deleted:
static void
mainfunc(uint32_t low32, uint32_t hi32) {
uintptr_t ptr = (uintptr_t)low32 | ((uintptr_t)hi32 << 32);
struct schedule *S = (struct schedule *)ptr;
int id = S->running;
struct coroutine *C = S->co[id];
C->func(S,C->ud);
// Z: After the final coroutine_yield, the outermost handler func
// stack frame exits and we continue execution here
_co_delete(C);
S->co[id] = NULL;
--S->nco;
S->running = -1;
}
There's a few utility functions that for the sake of brevity I have not included in the review. I had the most trouble comprehending the stack persistence logic because I only have a cursory level of experience with pointer arithmetic, but once I started to get it I gained an even more acute appreciation of the code.
My favorite part of this library was the stack frame manipulation around coroutine_yield
and mainfunc
- it's a completely bonkers stroke of genius that sent chills down my spine the moment it all clicked for me. The fact that we just fall out of the stack frame and into the other coroutine...totally baller. I've seen this same kind of "fallthrough" trick before (in a manner), specifically in the reactivity library Evan You implemented for the Vue frontend framework. Evan does this thing where you call a function, trigger a proxy trap inside of that function, then push the function itself onto a stack so you can start invoking it any time the proxy traps are triggered subsequently. Yes, it's nuts. In fact, the @vue/reactivity
library is so cool, I'll have to do a separate source code review for it on this blog (stay tuned for that). Anyway, I hope you enjoyed coroutine
; drop me a note and let me know what you thought!