David Bakin’s programming blog.

Unit Testing 'Legacy' Code You Can't Change

Introduction to a series on unit testing “legacy” code that you can’t change; see also:

How do you test legacy code that you’re not allowed to change?

For the purposes of this note legacy code is code that is running in production.

This will be a series of notes that talk about ways to write unit tests in the face of code that was not designed to be unit tested and which you’re not allowed to change.

  • This note introduces the issue by describing why it might be the case you’re not allowed to change your legacy code.
  • (The other notes in this series - with the techniques that can be used - will be listed here as they’re available.)

(This note will be C++-centric but the idea is translatable to other languages, even if the exact techniques are not.)

When does this happen? It happens when

  • you’re modifying important code that is in production and for which there are insufficient (or nonexistent!) unit tests in the area you’re working, and you really need tests to make your change correctly,,
  • the code you want to unit test is hidden from your unit test in some way by the syntax/semantics of C++, and
  • you’re not allowed to change the program source code to make the stuff you need visible, visible, in order to write your needed tests.

Why might you not be allowed to change that code?

  • It may very well be old code - very old code - that has been running for quite awhile, that your company depends on, and where the people who wrote and understood the code are unavailable. And it was written before unit tests were a “thing”.

    👉 Click to open use case!

    • Consider the code base at a major player in the travel industry. The bulk of their revenue depended on users being able to run searches for hotel rooms all the time. Their fleet of >1500 machines was running a C++ code base that not only was begun 20+ years ago but the foundation layer (threads, memory allocation, exception handling, etc.) was still using the original code - that nobody understood. Over time 3 completely different HTTP clients were created (all from scratch) and incorporated into the code and all 3 were in use in various different parts. The original comprehensive logging system that had all the information you needed to debug something logged to SQL Server but because usage of the system (esp. transaction rate) had long since overwhelmed the ability of any cluster of SQL server databases to keep up it could not be used - i.e., it was turned off altogether. What was left was normal “developer” logs written to stdout and collected into Splunk. Which were simultaneously way too verbose and also way too incomplete (for example: you couldn’t track user-oriented “transactions” over multiple machines because the request/response IDs changed as they we sharded from machine to machine and that wasn’t logged).

      The development team was still in the building but they had decamped en masse to write the replacement system so a couple of contractors were brought in to maintain it until the new system could be deployed. It was a 6 month max contracting job because the new system was “almost ready” and would be deployed in 3 months. I was one of those contractors and I was there two years before the replacement was deployed. And the cardinal rule for those two years was: do not change anything at all that did not absolutely have to be changed except to fix a bug. Because downtime for that application cost the company millions.

  • But it can also be much newer code, where the people who understand it are still around but, and which is still being modified/updated to deal with new requirements, or to address bugs, where for whatever reason unit tests were not originally written, yet having them would help you (and your team, and your company) be more confident in your bug fix and your feature upgrades.

  • And it definitely includes code that is in production and for which correctness is so essential, trumping performance and even reliability, that changing it in any way - for example, to make it easier to write tests - is strongly discouraged.

    👉 Click to open use case!

    • Bitcoin Core is in this category: Because any code that ships to production could be responsible for writing blocks to the absolutely positively unchangable-after-being-written blockchain and you can’t afford to have code changes invalidate previously-written-blocks/transaction. Or, code changes that inadvertently make some transactions now valid that shouldn’t be. One of the ways to achieve safety in that condition is to mandate you change nothing that doesn’t absolutely need changing, and that as little as possible.

      This is especially the case in code that is related to the “consensus” - that code that controls the decisions made together by all the distributed nodes, and thus, what gets written into the blockchain permanently. Bitcoin nodes are run by many groups (and individuals) and it is very difficult - probably becoming more and more difficult over time - to ensure an upgrade - especially an emergency bug-fix upgrade - is uniformly deployed in any short period of time. (And it has happened in the past that this has been needed, when Bitcoin was much younger and there were fewer nodes. For example, the 2013 “output value overflow bug”, for which the “fix” involved a controversal rollback that was only possible because a small group of BTC miners controlled so much “hashrate” that they could be individually contacted and be convinced to act immediately).

This code may actually have other kinds of tests besides unit tests: e.g., functional, acceptance, regression.

  • Bitcoin Core is an example where there are a lot of great functional tests and unit tests too, yet some areas are missing unit tests that would ensure and pinpoint local correctness, as bugs are fixed or features added or changed, as well as help explain how different routines actually work and interact with each other.
    • And it’s a fairly modern C++ code base though one that isn’t particularly friendly to providing unit tests after the fact, and is quite unfriendly to the idea of changing existing working deployed code in order to make unit tests easier to write.

(By the way, Michael Feathers’ book “Working Effectively With Legacy Code” is totally excellent overall and also includes great tips on how to retrofit unit tests to a legacy codebase.)