Unit Testing 'Legacy' Code You Can't Change

2022-07-08 (Updated: 2022-07-29)

Unit Testing , C++ , Legacy Systems

Introduction to a series on unit testing “legacy” code that you can’t change; see also:

Introduction ☜ you are here
part 1 - Threading The Needle

How do you test legacy code that you’re not allowed to change?

For the purposes of this note legacy code is code that is running in production.

This will be a series of notes that talk about ways to write unit tests in the face of code that was not designed to be unit tested and which you’re not allowed to change.

This note introduces the issue by describing why it might be the case you’re not allowed to change your legacy code.
(The other notes in this series - with the techniques that can be used - will be listed here as they’re available.)

(This note will be C++-centric but the idea is translatable to other languages, even if the exact techniques are not.)

When does this happen? It happens when

you’re modifying important code that is in production and for which there are insufficient (or nonexistent!) unit tests in the area you’re working, and you really need tests to make your change correctly,,
the code you want to unit test is hidden from your unit test in some way by the syntax/semantics of C++, and
you’re not allowed to change the program source code to make the stuff you need visible, visible, in order to write your needed tests.

Why might you not be allowed to change that code?

It may very well be old code - very old code - that has been running for quite awhile, that your company depends on, and where the people who wrote and understood the code are unavailable. And it was written before unit tests were a “thing”.
👉 Click to open use case!
- Consider the code base at a major player in the travel industry. The bulk of their revenue depended on users being able to run searches for hotel rooms all the time. Their fleet of >1500 machines was running a C++ code base that not only was begun 20+ years ago but the foundation layer (threads, memory allocation, exception handling, etc.) was still using the original code - that nobody understood. Over time 3 completely different HTTP clients were created (all from scratch) and incorporated into the code and all 3 were in use in various different parts. The original comprehensive logging system that had all the information you needed to debug something logged to SQL Server but because usage of the system (esp. transaction rate) had long since overwhelmed the ability of any cluster of SQL server databases to keep up it could not be used - i.e., it was turned off altogether. What was left was normal “developer” logs written to stdout and collected into Splunk. Which were simultaneously way too verbose and also way too incomplete (for example: you couldn’t track user-oriented “transactions” over multiple machines because the request/response IDs changed as they we sharded from machine to machine and that wasn’t logged).
  
  The development team was still in the building but they had decamped en masse to write the replacement system so a couple of contractors were brought in to maintain it until the new system could be deployed. It was a 6 month max contracting job because the new system was “almost ready” and would be deployed in 3 months. I was one of those contractors and I was there two years before the replacement was deployed. And the cardinal rule for those two years was: do not change anything at all that did not absolutely have to be changed except to fix a bug. Because downtime for that application cost the company millions.
But it can also be much newer code, where the people who understand it are still around but, and which is still being modified/updated to deal with new requirements, or to address bugs, where for whatever reason unit tests were not originally written, yet having them would help you (and your team, and your company) be more confident in your bug fix and your feature upgrades.
And it definitely includes code that is in production and for which correctness is so essential, trumping performance and even reliability, that changing it in any way - for example, to make it easier to write tests - is strongly discouraged.
👉 Click to open use case!
- Bitcoin Core is in this category: Because any code that ships to production could be responsible for writing blocks to the absolutely positively unchangable-after-being-written blockchain and you can’t afford to have code changes invalidate previously-written-blocks/transaction. Or, code changes that inadvertently make some transactions now valid that shouldn’t be. One of the ways to achieve safety in that condition is to mandate you change nothing that doesn’t absolutely need changing, and that as little as possible.
  
  This is especially the case in code that is related to the “consensus” - that code that controls the decisions made together by all the distributed nodes, and thus, what gets written into the blockchain permanently. Bitcoin nodes are run by many groups (and individuals) and it is very difficult - probably becoming more and more difficult over time - to ensure an upgrade - especially an emergency bug-fix upgrade - is uniformly deployed in any short period of time. (And it has happened in the past that this has been needed, when Bitcoin was much younger and there were fewer nodes. For example, the 2013 “output value overflow bug”, for which the “fix” involved a controversal rollback that was only possible because a small group of BTC miners controlled so much “hashrate” that they could be individually contacted and be convinced to act immediately).

This code may actually have other kinds of tests besides unit tests: e.g., functional, acceptance, regression.

Bitcoin Core is an example where there are a lot of great functional tests and unit tests too, yet some areas are missing unit tests that would ensure and pinpoint local correctness, as bugs are fixed or features added or changed, as well as help explain how different routines actually work and interact with each other.
- And it’s a fairly modern C++ code base though one that isn’t particularly friendly to providing unit tests after the fact, and is quite unfriendly to the idea of changing existing working deployed code in order to make unit tests easier to write.

(By the way, Michael Feathers’ book “Working Effectively With Legacy Code” is totally excellent overall and also includes great tips on how to retrofit unit tests to a legacy codebase.)

TODO See if some of the sections above - the “case studies” can be hidden with the HTML details tag - doesn’t format correctly now with Hugo.

TODO Mention Boccara’s book “The Legacy Code Programmer’s Toolbox” (if appropriate).

TODO Unit testing when you can’t change the code to make it easier to test (e.g., Bitcoin Core) - this ties into the “don’t write legacy apps” book - also mention why Bitcoin Core is in this category: because any code that ships to production could be responsible for writing blocks to the absolutely positively unchangable-after-being-written blockchain and you can’t afford to have code changes invalidate previously-written-blocks/transaction and one of the ways to achieve safety in that condition is to mandate you change nothing that doesn’t absolutely need changing, and that as little as possible.

TODO Each of these sections should have examples of how to do the test

TODO Each of these sections should also say how the code should have been written to avoid the problem (with the same example code, revised).

TODO Mention here (or in another post?) the difference between 100% coverage testing including branches and full code path testing: they’re not the same. (And it comes up in Bitcoin Core!) And you need to and can write code differently to more easily enable path testing.

Nevertheless, coverage is your friend when doing this kind of retrofit of tests to legacy code.
DRAFT Unless a class is declared final you can derive from it, even if it doesn’t have any virtual methods. You need to be careful to not add any fields: instance size must remain the same. Then casts back-and-forth will work and you can pass them around as the bare class. Just add behavior with this technique. (Doesn’t give access to private methods/fields, only protected ones.)
DRAFT ODR is not violated as long as duplicate declarations exactly match. It can be fragile to do this, but it can be useful for static classes & methods hidden inside a.cpp. (Not useful for stuff inside an anonymous namespace because anon namespaces are unique within each compilation unit.)
- Would actually be preferable to have a “test only” internal header file that can be #included in the source if you can make that change to the repo.
DRAFT Use of explicit instantiation of template classes/functions even if the definitions are hidden within a .cpp. As long as the declarations are in an .hpp and there can be explicit instantiations of useful types within the .cpp with the defns. I.e., templates hidden inside file - but maybe they have explicit instantiations you can use - or even derive from.
DRAFT assert raises SIGABRT on Linux (and an SEH on Windows). On Linux, the Boost Test framework allows (via execution_monitor) capture of signals (and other events) and reification of them as an exception that can be tested against. This doesn’t appear to work on Windows though (either MSVC or mingw). Googletest has death tests which work on both platforms and can be used for asserts. Generally, one shouldn’t need to unit test asserts, but it can be necessary in special cases.
DRAFT If allowed a change: You might be able to get away with a declaration of a friend class inside of a .hpp, in some cases. That’ll give access to private methods/fields.
DRAFT In case of emergency you can build a separate test binary where you have a source file that #includes a .cpp and therefore gives direct access to stuff with internal file scope - statics, and stuff in anonymous namespaces. Also gives a chance to define things - including macros that override stuff in ways that are normally totally illegitimate - before the .cpp is #included - thus changing it on-the-fly for the purposes of that single test. Probably should be considered fragile.
DRAFT Use of Linux LD_PRELOAD to override symbols in shared libraries (doesn’t work, of course, for statically linked libraries). For symbols in a statically linked library you can arrange that your override code appears in an object file - always linked. For symbols in another object file - some linkers have workarounds, e.g., gnu ld has --wrap=symbol to provide a redefine, during linking, symbols so that you can link your override version instead of the actual version while still letting the actual version be available (hence, “wrapping”) (see ld manual command line options). These techniques can work even in the face of existing build systems selecting which object files to put where, as long as you can control the linker command line for the test executable. (This is harder in MSVC and may not even be possible in the case the symbol you want to override is in an object file with other symbols.)
DRAFT gcc has a command line option -include file which does: “process file as if #include "file" appeared as the first line of the primary source file”. This might be able to be used to inject macros that redefine certain symbols sufficiently to allow them to be able to be overridden externally. Probably difficult or impossible because the included stuff happens first - too bad there isn’t an option to force the include after. And macros are pretty simple, can’t distinguish between multiple uses of static with different semantics, for example.
DRAFT might be able to preprocess source file with some better preprocessor than cpp, e.g., awk or … anything really. In that case you might be able to arrange to compile the preprocessed file only, not the original, by fussing with the build script. Might be able to look for stuff in context. Or, simpler - and possibly simply by using cat (!) might be able to append code to the file that can expose (via trivial wrappers) functions outside the file. Might be as simple as appending using declarations to make things visible (and renamed) - though this doesn’t seem to be defined to work for stuff in anon namespaces (how exactly would it work?).