Unit Testing 'Legacy' Code You Can't Change - 1 - Threading The Needle

2022-07-29 (Updated: 2022-08-26)

Unit Testing , C++ , Legacy Systems

Part 1 of a series on unit testing “legacy” code that you can’t change; see also:

Introduction
part 1 - Threading The Needle ☜ you are here

The most obvious way to test hidden free functions and methods (“hidden” means not visible due to being declared in a source file at file scope or in an anonymous namespace) is to test them through their callers.

Definitions

source file: A file containing C++ source, most often with an associated header file. Almost synonymous with “compilation unit”.
hidden {method, free function, class}: An entity defined at file scope or in an anonymous namespace.
mut: Method (free function, class) Under Test (a play on SUT, System Under Test).

This is applicable under certain circumstances:

You can call the calling function in such a way that it always calls the hidden MUT, and the behavior of the caller is affected in known ways by the return value(s) of the MUT such that you can observe it,
You can arrange that the calling function calls the MUT with specified arguments,
- sometimes these arguments are passed directly as arguments to the caller,
- sometimes they may be wrapped in a struct or class that you pass to the caller,
- sometimes you can arrange that the caller, in computing arguments to the MUT, computes values you want to use.
And it is easier if one of the arguments, passed to the caller to be passed to the MUT, is a method/function/class that you can mock or spy to affect the MUT’s behavior in some observable way, or use to observe its progress/results.

To be honest, I doubt these circumstances happen too often. But it did happen when I wrote unit tests for Bitcoin Core’s taproot/tapscript script interpreter code. (And that exciting (and tricky) case study is given later in this post.)

Here is a contrived example: Consider a file that contains a hidden function¹, and its visible caller²:

namespace {

int hidden_fn(int i) {
    if (i < 0) throw std::domain_error("must be non-negative");
    return 4*i*i - 3*i + 2;
}

} // anon ns

enum class TYPE { Option1, Option2 };

int conduit_fn(TYPE type, int i) {
    switch (type) {
        case TYPE::Option1: return i * 5;
        case TYPE::Option2: return (int)std::pow(hidden_fn(i), 2);
        default: throw std::invalid_argument("bad type");
    }
}

hidden_fn is what we want to test, but it is in an anonymous namespace. The only caller is conduit_fn. However, we can see that if the TYPE parameter type to conduit_fn is TYPE::Option2 then hidden_fn is called with the second argument of conduit_fn, the result is squared, and returned. So that’s the path we’ll take for the test³:

int test_hidden_fn() {
    struct { int value; int expected; } data[] = {{0, 4}, {2, 144}, {5, 7569}};
    for (size_t i = 0; i < size(data); ++i) {
        assert(data[i].expected == conduit_fn(TYPE::Option2, data[i].value));
    }
    try {
        throw conduit_fn(TYPE::Option2, -1);
    } catch (const domain_error&) {
        // expected
    }
    return 0;
}

This test method only tests hidden_fn - you’ll need another test for the rest of conduit_fn’s behavior.

Obviously this test is fragile with respect to changes in conduit_fn. If case TYPE::Option2 no longer calls hidden_fn, if the second argument isn’t directly passed to hidden_fn, or if the result of hidden_fn is no longer squared then this test will need to be changed.

But in that case someone is changing the “unchangable source code”. And while that change is happening that someone should take the opportunity to provide a test path for hidden_fn. As one example, it could be moved out of the anonymous namespace. Which is what should have been done in the first place, there most likely having been no good reason to make it hidden in the first place.

👉 Note that my strong suggestion that these functions should not have been hidden at all is in conflict to the usual recommendation to keep things private that are not explicitly part of an interface. Here, for example, the recommended “best practices” C++ Core Guidelines recommends everything that isn’t part of an interface should be buried in an anonymous namespace:

SF.22: Use an unnamed (anonymous) namespace for all internal/non-exported entities

Reason Nothing external can depend on an entity in a nested unnamed namespace. Consider putting every definition in an implementation source file in an unnamed namespace unless that is defining an “external/exported” entity.

And the same recommendation goes for declaring these methods static, which is simpler the older C++ way of hiding things within file scope.

I think this dogmatic approach is severely flawed, comprises testability (and thus correctness and reliability), and is unnecessary given other facilities in the language and in software engineering practice that would also protect these non-interface definitions from misuse. That is to say, it’s not a good recommendation, and should be (mostly) ignored.

The proper way to handle this (to be covered in more detail in a future post in this series) would be to:

Not hide any of these functions via an anonymous namespace or the static keyword.
Have, for this source file, a second header that declares only these “internal only” methods, and that header can be given a suggestive name, like foo_internal_decls_only.h to make its purpose crystal clear⁴, and
Use lint or other validation scripts to confirm that headers named .*_internal_decls_only are only #included by source files in directories named test/. (This step optional, needed only if you can’t trust your devs to follow the rules, even in the face of code reviews.)

Anyway. To continue.

You might wonder what this technique looks like in practice, instead of in a contrived example.

Here is where I used this technique in unit tests for Bitcoin Core:

The need was to unit test (among other things) the routines VerifyWitnessProgram and ExecuteWitnessScript inside of the unchangable file interpreter.cpp. I didn’t want to change interpreter.cpp as it is considered part of the Bitcoin “consensus” enforcement, and changes to it are considered very carefully. Even simply making hidden functions visible outside the file had the potential to delay acceptance of the PR, so I wanted to avoid that.

Here is ExecuteWitnessScript’s definition - 38 lins with a lot of code paths. You don’t have to read that code - just note that it is doing a lot of work in a bunch of code paths (with loops as well as conditionals), is obviously part of the Bitcoin “consensus” as it is executing a script that is part of the blockchain, that its behavior can’t be changed at all because it is already in production (that is, executing on Bitcoin nodes and therefore allowing transactions to be placed on the blockchain), and is declared static so it can’t be reached from outside its file.

static bool ExecuteWitnessScript(const Span<const valtype>& stack_span, const CScript& exec_script, unsigned int flags, SigVersion sigversion, const BaseSignatureChecker& checker, ScriptExecutionData& execdata, ScriptError* serror)
{
    std::vector<valtype> stack{stack_span.begin(), stack_span.end()};

    if (sigversion == SigVersion::TAPSCRIPT) {
        // OP_SUCCESSx processing overrides everything, including stack element size limits
        CScript::const_iterator pc = exec_script.begin();
        while (pc < exec_script.end()) {
            opcodetype opcode;
            if (!exec_script.GetOp(pc, opcode)) {
                // Note how this condition would not be reached if an unknown OP_SUCCESSx was found
                return set_error(serror, SCRIPT_ERR_BAD_OPCODE);
            }
            // New opcodes will be listed here. May use a different sigversion to modify existing opcodes.
            if (IsOpSuccess(opcode)) {
                if (flags & SCRIPT_VERIFY_DISCOURAGE_OP_SUCCESS) {
                    return set_error(serror, SCRIPT_ERR_DISCOURAGE_OP_SUCCESS);
                }
                return set_success(serror);
            }
        }

        // Tapscript enforces initial stack size limits (altstack is empty here)
        if (stack.size() > MAX_STACK_SIZE) return set_error(serror, SCRIPT_ERR_STACK_SIZE);
    }

    // Disallow stack item size > MAX_SCRIPT_ELEMENT_SIZE in witness stack
    for (const valtype& elem : stack) {
        if (elem.size() > MAX_SCRIPT_ELEMENT_SIZE) return set_error(serror, SCRIPT_ERR_PUSH_SIZE);
    }

    // Run the script interpreter.
    if (!EvalScript(stack, exec_script, flags, checker, sigversion, execdata, serror)) return false;

    // Scripts inside witness implicitly require cleanstack behaviour
    if (stack.size() != 1) return set_error(serror, SCRIPT_ERR_CLEANSTACK);
    if (!CastToBool(stack.back())) return set_error(serror, SCRIPT_ERR_EVAL_FALSE);
    return true;
}

Let’s see if we can test it using the method of this post: Get at it through its caller.

Here we see the only caller of ExecuteWitnessScript: It is VerifyWitnessProgram and here is its definition - again, please don’t read all of that. Just note it is doing significant work with lots of code paths - no loops this time!, is definitely part of the Bitcoin consensus since it is validating that a transaction’s script is valid before it either executed or added to the blockchian, and it needs to be thoroughly tested.

static bool VerifyWitnessProgram(const CScriptWitness& witness, int witversion, const std::vector<unsigned char>& program, unsigned int flags, const BaseSignatureChecker& checker, ScriptError* serror, bool is_p2sh)
{
    CScript exec_script; //!< Actually executed script (last stack item in P2WSH; implied P2PKH script in P2WPKH; leaf script in P2TR)
    Span stack{witness.stack};
    ScriptExecutionData execdata;

    if (witversion == 0) {
        ... 22 lines I don't need to test elided ...
    } else if (witversion == 1 && program.size() == WITNESS_V1_TAPROOT_SIZE && !is_p2sh) {
        // BIP341 Taproot: 32-byte non-P2SH witness v1 program (which encodes a P2C-tweaked pubkey)
        if (!(flags & SCRIPT_VERIFY_TAPROOT)) return set_success(serror);
        if (stack.size() == 0) return set_error(serror, SCRIPT_ERR_WITNESS_PROGRAM_WITNESS_EMPTY);
        if (stack.size() >= 2 && !stack.back().empty() && stack.back()[0] == ANNEX_TAG) {
            // Drop annex (this is non-standard; see IsWitnessStandard)
            const valtype& annex = SpanPopBack(stack);
            execdata.m_annex_hash = (CHashWriter(SER_GETHASH, 0) << annex).GetSHA256();
            execdata.m_annex_present = true;
        } else {
            execdata.m_annex_present = false;
        }
        execdata.m_annex_init = true;
        if (stack.size() == 1) {
            // Key path spending (stack size is 1 after removing optional annex)
            if (!checker.CheckSchnorrSignature(stack.front(), program, SigVersion::TAPROOT, execdata, serror)) {
                return false; // serror is set
            }
            return set_success(serror);
        } else {
            // Script path spending (stack size is >1 after removing optional annex)
            const valtype& control = SpanPopBack(stack);
            const valtype& script_bytes = SpanPopBack(stack);
            exec_script = CScript(script_bytes.begin(), script_bytes.end());
            if (control.size() < TAPROOT_CONTROL_BASE_SIZE || control.size() > TAPROOT_CONTROL_MAX_SIZE || ((control.size() - TAPROOT_CONTROL_BASE_SIZE) % TAPROOT_CONTROL_NODE_SIZE) != 0) {
                return set_error(serror, SCRIPT_ERR_TAPROOT_WRONG_CONTROL_SIZE);
            }
            execdata.m_tapleaf_hash = ComputeTapleafHash(control[0] & TAPROOT_LEAF_MASK, exec_script);
            if (!VerifyTaprootCommitment(control, program, execdata.m_tapleaf_hash)) {
                return set_error(serror, SCRIPT_ERR_WITNESS_PROGRAM_MISMATCH);
            }
            execdata.m_tapleaf_hash_init = true;
            if ((control[0] & TAPROOT_LEAF_MASK) == TAPROOT_LEAF_TAPSCRIPT) {
                // Tapscript (leaf version 0xc0)
                execdata.m_validation_weight_left = ::GetSerializeSize(witness.stack, PROTOCOL_VERSION) + VALIDATION_WEIGHT_OFFSET;
                execdata.m_validation_weight_left_init = true;
                return ExecuteWitnessScript(stack, exec_script, flags, SigVersion::TAPSCRIPT, checker, execdata, serror);
            }
            if (flags & SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_TAPROOT_VERSION) {
                return set_error(serror, SCRIPT_ERR_DISCOURAGE_UPGRADABLE_TAPROOT_VERSION);
            }
            return set_success(serror);
        }
    } else {
    ... more stuff I don't need to test elided ...
    }
}

And it is also declared static and isn’t available outside of its file.

Obviously an ideal approach would be to refactor all of this code by breaking it up into several smaller, visible, methods which themselves would be more easily tested (including making it clear that full branch coverage would be sufficient to test all functinality), all the while guaranteeing somehow that no change to behavior was made while moving stuff around. But in lieu of that we’ll (again) use the method of this post to test it: Get at it through its caller. So who calls it?

That would be VerifyScript, declared here - and finally we’ve got an entry point into this file.

bool VerifyScript(const CScript& scriptSig, const CScript& scriptPubKey, const CScriptWitness* witness, unsigned int flags, const BaseSignatureChecker& checker, ScriptError* serror)
{
    // Testing Taproot code paths in `VerifyWitnessProgram` and
    // `SigVersion::TAPSCRIPT` code paths in `ExecuteWitnessScript`.

    // Both `VerifyWitnessProgram` and `ExecuteWitnessScript` are `static`
    // inside of `interpreter.cpp` and thus inaccessible to a unit test.
    // The way to get to them is indirectly via `VerifyScript`.

    // Tests all success and failure paths mentioned in BIP-341 and
    // BIP-342.

    // This is a _unit test_ not a _functional test_ and the unit being
    // tested here does _not_ include actually verifying the signature.
    // That is tested elsewhere (e.g., by tests `verify_schnorr_signature`
    // and `check_schnorr_signature` in this file).  _This_ test _mocks_
    // the Schnorr signature verfication.  Thus the test data need not
    // actually have valid signatures (and is thus easier to prepare).

~425 lines of code follow, which includes a fluent API fixture for writing tests of these methods, and I still have a few tests to write. But the end result is already full branch coverage of VerifyWitnessScript and full branch coverage of ExecuteWitnessScript is coming shortly.

Note that a very nice thing about unit tests is that they are unit tests. You only have to test the code in this unit since other units have their own tests. Thus, it turns out that in this code the method used to validate the cryptographic signatures of the various scripts is actually - and fortunately! - passed in to VerifyScript as a C++ functor (in this case, virtual methods in a class passed as an argument) and from there plumbed through to ExecuteWitnessScript. Cryptographic signature verification is not part of this unit (and it is tested elsewhere) so doesn’t need to be tested here. Thus for these purposes I can mock signature verification, which makes it easier to write the scripts that will exercise the behavior of ExecuteWitnessScript, since I can just declare them properly signed (or not!) as needed without actually having to sign them.

And thus these critical parts of the Bitcoin Taproot/Tapscript implementation gained unit tests without changing consensus code that was already deployed and working in the Bitcoin network.

NOTES (HIDDEN):

wrap with outer via #include - this way can introduce stuff before and after the “real” source code
- e.g., provide visible wrappers for classes, free functions at file scope via static or anonymous namespace
- e.g., provide explicit instantiation (or additional specialization) of hidden templates

TODO See if some of the sections above - the “case studies” can be hidden with the HTML details tag - doesn’t format correctly now with Hugo.

TODO Mention Boccara’s book “The Legacy Code Programmer’s Toolbox” (if appropriate).

TODO Unit testing when you can’t change the code to make it easier to test (e.g., Bitcoin Core) - this ties into the “don’t write legacy apps” book - also mention why Bitcoin Core is in this category: because any code that ships to production could be responsible for writing blocks to the absolutely positively unchangable-after-being-written blockchain and you can’t afford to have code changes invalidate previously-written-blocks/transaction and one of the ways to achieve safety in that condition is to mandate you change nothing that doesn’t absolutely need changing, and that as little as possible.

TODO Each of these sections should have examples of how to do the test

TODO Each of these sections should also say how the code should have been written to avoid the problem (with the same example code, revised).

TODO Mention here (or in another post?) the difference between 100% coverage testing including branches and full code path testing: they’re not the same. (And it comes up in Bitcoin Core!) And you need to and can write code differently to more easily enable path testing.

Nevertheless, coverage is your friend when doing this kind of retrofit of tests to legacy code.
DRAFT Instantiate a template with a spy class. This can be very fragile, as all spys are, as you’re depending on knowing how the template uses the spy. (Use standard mocking styles to minimize this.)
DRAFT Unless a class is declared final you can derive from it, even if it doesn’t have any virtual methods. You need to be careful to not add any fields: instance size must remain the same. Then casts back-and-forth will work and you can pass them around as the bare class. Just add behavior with this technique. (Doesn’t give access to private methods/fields, only protected ones.)
DRAFT ODR is not violated as long as duplicate declarations exactly match. It can be fragile to do this, but it can be useful for static classes & methods hidden inside a.cpp. (Not useful for stuff inside an anonymous namespace because anon namespaces are unique within each compilation unit.)
- Would actually be preferable to have a “test only” internal header file that can be #included in the source if you can make that change to the repo.
DRAFT Use of explicit instantiation of template classes/functions even if the definitions are hidden within a .cpp. As long as the declarations are in an .hpp and there can be explicit instantiations of useful types within the .cpp with the defns. I.e., templates hidden inside file - but maybe they have explicit instantiations you can use - or even derive from.
DRAFT assert raises SIGABRT on Linux (and an SEH on Windows). On Linux, the Boost Test framework allows (via execution_monitor) capture of signals (and other events) and reification of them as an exception that can be tested against. This doesn’t appear to work on Windows though (either MSVC or mingw). Googletest has death tests which work on both platforms and can be used for asserts. Generally, one shouldn’t need to unit test asserts, but it can be necessary in special cases.
DRAFT If allowed a change: You might be able to get away with a declaration of a friend class inside of a .hpp, in some cases. That’ll give access to private methods/fields.
DRAFT In case of emergency you can build a separate test binary where you have a source file that #includes a .cpp and therefore gives direct access to stuff with internal file scope - statics, and stuff in anonymous namespaces. Also gives a chance to define things - including macros that override stuff in ways that are normally totally illegitimate - before the .cpp is #included - thus changing it on-the-fly for the purposes of that single test. Probably should be considered fragile.
DRAFT Use of Linux LD_PRELOAD to override symbols in shared libraries (doesn’t work, of course, for statically linked libraries). For symbols in a statically linked library you can arrange that your override code appears in an object file - always linked. For symbols in another object file - some linkers have workarounds, e.g., gnu ld has --wrap=symbol to provide a redefine, during linking, symbols so that you can link your override version instead of the actual version while still letting the actual version be available (hence, “wrapping”) (see ld manual command line options). These techniques can work even in the face of existing build systems selecting which object files to put where, as long as you can control the linker command line for the test executable. (This is harder in MSVC and may not even be possible in the case the symbol you want to override is in an object file with other symbols.)
DRAFT gcc has a command line option -include file which does: “process file as if #include "file" appeared as the first line of the primary source file”. This might be able to be used to inject macros that redefine certain symbols sufficiently to allow them to be able to be overridden externally. Probably difficult or impossible because the included stuff happens first - too bad there isn’t an option to force the include after. And macros are pretty simple, can’t distinguish between multiple uses of static with different semantics, for example.
DRAFT might be able to preprocess source file with some better preprocessor than cpp, e.g., awk or … anything really. In that case you might be able to arrange to compile the preprocessed file only, not the original, by fussing with the build script. Might be able to look for stuff in context. Or, simpler - and possibly simply by using cat (!) might be able to append code to the file that can expose (via trivial wrappers) functions outside the file. Might be as simple as appending using declarations to make things visible (and renamed) - though this doesn’t seem to be defined to work for stuff in anon namespaces (how exactly would it work?).

Necessary #includes , present in the file, omitted here for space. ↩︎
And here, a header file that contains the declarations of TYPE and conduit_fn is assumed. ↩︎
Written here in a stupid manner without any test framework at all. It’s just an example! ↩︎
And in many build systems it could be placed somewhere where its location would signal it is special. E.g., frequently interface headers are placed in a special include/ directory while internal-only headers are placed side-by-side with their source files. ↩︎