A style guide for state machines

Michael Stone, August 28, 2020, , (src), (all posts)

Contents

I have a coffee-making robot.

A week ago the tablet I use for controlling the robot broke, creating a small life hiccup. However, because the original control software is open, I was able to port the software to run on my laptop.

Along the way, I recognized an old nemesis: naming things is hard. Here, the things to be named were the states in the state machine I needed to design for making coffee. (I first ran into this when trying to write a websocket connection and document syncing controller for focus.)

“How could I make this easier?”, I asked.

“Perhaps a style guide for state machines — or at least some rules of thumb — would help?”, I answered.


Ultimately, I wrote the program I needed over the course of about a week, and it was a great experience. (And very rewarding, at the end, to see the robot gradually come back into life and spring into motion.)

In the meantime though, here’s what I think worked well for me as I worked to design an actual coffee-making state machine(!)

Naming states

While I don’t think there’s a “right answer” about how to name states that exists in isolation from your problem — ultimately, naming them ‘0’, ‘1’, ‘2’, … has much to commend it — I’m assuming that if you’re still reading, you’re here looking for something a little bit more ergonomic and human-friendly.

With that in mind, here are some alternatives to consider:

The forward perspective:

“Need X”,

“Want X”.
The backward perspective:

“What have we (last) set in motion?”

“What are we working or waiting to complete?”
Ing-nominalization

Ongoing action

Simultaneity
The regular expressions perspective:

the states are are letters, not words. Maybe choose the “letters” mnemonically?

The build systems perspective:

state names can be derived algorithmically from what you are trying to build.

In any event: consider my espresso machine controller problem.

To get started, we needed to make a BLE connection and to do that we need to power-on the BLE adapter. What shall we name the initial state?

If we go with any of the first five, they signal nothing specific about what work we need to do.

If we go with the last, do we need to designate the substates between before we have initiated the power-on request and while we are waiting for the request to complete?

What do we want to do about the controller state (needPowerOn), the local adapter state (any of a number of things), and the machine state (not yet known)?

Maybe we ought to go take a look at state charts here? Also, what about USL/001AXES? Both seem to take a rather implicit approach to guiding state naming.

Modeling agency

Let’s try a different tack.

Proposed rule: decide who drives the state machine forward. Then it will be more clear — and more clearly observable — how to divide and name the states that the agent being modeled is passing through as they act.

How did it work out?

Ultimately, I decided that there was a previously implicit Controller driving the states forward, and that this Controller was what I had to build.

In my example, what does the controller do?

For me, the controller drives the machine through a collaborative dance of:

What does this description reveal?

I think it shows that I’m most interested in using controller states to represent periods or intervals of time in which something is happening which is necessary to the process of making coffee with this robot — and that I’m not too interested in modeling other triggers, events, potential states, or behaviors. Hence, for my purposes, it seems that ing-nominalization gives a good result — having written out this outline, I named my controller’s states after the ing-nominalizations of the activities that were happening in each identified period or interval.

What else did I learn?

Another thing I learned while working all this out that it’s very helpful to ‘make atomic’ the process of moving from one state to the next.

Here’s a “before” example that worried me in this regard:

case (.heating, .idle, .ready):
        state = .rinsing
        set(.hotWaterRinse)
        DispatchQueue.main.asyncAfter(deadline: .now() + 5) {
                self.set(.idle)
        }

After writing a ~working version that looked like this, I ultimately found that I don’t like this style of transition-implementation for this task because I want all of these changes to happen “more atomically”.

What do I mean by “more atomically”?

Originally, I thought that I was worried about actual concurrency problems (which in a more serious application, could be of grave concern). For example: in an ideal world, I don’t want to declare that we “Are Rinsing” until the machine has accepted the write (and maybe even until the machine state has changed; see the Three Mile Island experience with valves). Similarly, if any of these steps takes a long time, I don’t want to schedule the return to idle in less than 5 seconds and I also don’t want the machine to rinse forever if we lose connectivity, which means I need probably actually need to program a clock on the machine as part of setting all this in motion. One could also worry about more garden-variety distributed systems problems: perhaps about races here with the Bluetooth code, or about dropped, duplicated, corrupted, or delivered-out-of-order writes.

Ultimately though, I actually discovered that I wasn’t as bothered by these problems as I was by the challenge of convincing myself of the precursor probably-necessary-but-not-sufficient condition that my controller would only direct the machine to do things when it had both a clear and relatively recent view of both its own state and that of the machine.

Before I resolved all of these issues (again, incompletely for a production application, but to my satisfaction in this much more informal setting), here was a comment that I wrote to try to understand what was bothering me here:

// worry: can the machine report ready too soon? could
// our model of the machine's state report ready when
// the machine is actually in a different state because
// we haven't received a new reading yet? I think I've
// addressed this by making set(...) clear
// lastStateInfoReading but I still worry.

The last sentence hints at my solution so far, which is rather brute force (but I think, effective).

First, I ensured that all writes to the controller’s state happened by way of a helper function. This way, I could ensure that every write to the controller’s state would clear the controller’s record of the machine’s state. That way, by making the control loop handle missing machine states by waiting, I could ensure that the controller would only initiate (new) actions once it had received a fresh set of sensor readings from the machine.

Second, I switched the controller’s decision procedure from a series of sequenced ifs to a single large switch statement that simultaneously scrutinized both the controller’s state and the machine’s state (now ~adequately guaranteed for my purposes to be freshly read).

In the end, this approach made it very easy to articulate when transitions are being driven by the controller reacting to news that the machine’s state has changed, vs when the transition is instead driven by the passage of time as measured by the controller (possibly contingent on the machine state still being, as of the last update, in an acceptable state for the forthcoming transition.)