David Bakin’s programming blog.


Race Conditions and Debugging Multithreaded Programs: What Are Race Conditions?

Here is a useful paper that characterizes race conditions formally: What Are Race Conditions? Some Issues and Formalizations (Robert H. B. Netzer, Barton P. Miller) [ACM Letters on Programming Languages and Systems v1n1, March 1992)].

Netzer (who wrote his PhD thesis on this subject) classifies data races into two categories: general races, which pertain to programs which are meant to be deterministic, and data races, which pertain to programs which are non-deterministic. Then he also presents an orthogonal classification of races: feasible races which “capture the intuitive notions desired for debugging” but which are hard to compute completely and accurately, and apparent races which “capture less accurate notions” which can be detected in practice, but which are sufficiently less accurate that they tend to swamp the user with false positives.

So: general races cause non-deterministic execution in programs intended to be deterministic, and data races cause non-atomic execution of critical sections in non-deterministic programs. Thus both kind of races can cause failures in programs.

In fact, in my most recent job, I dealt with both kinds of races in the application program we were building.

In fact, both general races and data races are important concepts. Your typical Windows application program is non-deterministic – execution depends on the precise timing and ordering of input events (keystroke, mouse movement, and system messages) – but also contains large sections that are intended to operated deterministically (e.g., if in Photoshop you load a certain image file, and execute a specific filter with specific parameters on it, and then you save the resulting image in another file, then the result should be the same each time you do it even if the timing of your mouse movements differs from run to run.)

(Note that the “critical sections” that are violated by data races need not be the operating system-provided primitive, like CRITICAL_SECTION objects in Windows. It just means any bit of code that implements—or is supposed to implement—mutual exclusion.)

(By the way, I found this article somewhat difficult to understand: the differences between the feasible data races and apparent races, which are the key to Netzer’s classification, were hard to grasp. Also it wasn’t really clear what he meant by feasible execution. Finally, the writing was repetitive in places.)