David Bakin’s programming blog.


Unit Testing 'Legacy' Code You Can't Change - 1 - Threading The Needle

Part 1 of a series on unit testing “legacy” code that you can’t change; see also:

The most obvious way to test hidden free functions and methods (“hidden” means not visible due to being declared in a source file at file scope or in an anonymous namespace) is to test them through their callers.

Definitions

  • source file: A file containing C++ source, most often with an associated header file. Almost synonymous with “compilation unit”.
  • hidden {method, free function, class}: An entity defined at file scope or in an anonymous namespace.
  • mut: Method (free function, class) Under Test (a play on SUT, System Under Test).

This is applicable under certain circumstances:

  • You can call the calling function in such a way that it always calls the hidden MUT, and the behavior of the caller is affected in known ways by the return value(s) of the MUT such that you can observe it,
  • You can arrange that the calling function calls the MUT with specified arguments,
    • sometimes these arguments are passed directly as arguments to the caller,
    • sometimes they may be wrapped in a struct or class that you pass to the caller,
    • sometimes you can arrange that the caller, in computing arguments to the MUT, computes values you want to use.
  • And it is easier if one of the arguments, passed to the caller to be passed to the MUT, is a method/function/class that you can mock or spy to affect the MUT’s behavior in some observable way, or use to observe its progress/results.

To be honest, I doubt these circumstances happen too often. But it did happen when I wrote unit tests for Bitcoin Core’s taproot/tapscript script interpreter code. (And that exciting (and tricky) case study is given later in this post.)

Here is a contrived example: Consider a file that contains a hidden function1, and its visible caller2:

namespace {

int hidden_fn(int i) {
    if (i < 0) throw std::domain_error("must be non-negative");
    return 4*i*i - 3*i + 2;
}

} // anon ns

enum class TYPE { Option1, Option2 };

int conduit_fn(TYPE type, int i) {
    switch (type) {
        case TYPE::Option1: return i * 5;
        case TYPE::Option2: return (int)std::pow(hidden_fn(i), 2);
        default: throw std::invalid_argument("bad type");
    }
}

hidden_fn is what we want to test, but it is in an anonymous namespace. The only caller is conduit_fn. However, we can see that if the TYPE parameter type to conduit_fn is TYPE::Option2 then hidden_fn is called with the second argument of conduit_fn, the result is squared, and returned. So that’s the path we’ll take for the test3:

int test_hidden_fn() {
    struct { int value; int expected; } data[] = {{0, 4}, {2, 144}, {5, 7569}};
    for (size_t i = 0; i < size(data); ++i) {
        assert(data[i].expected == conduit_fn(TYPE::Option2, data[i].value));
    }
    try {
        throw conduit_fn(TYPE::Option2, -1);
    } catch (const domain_error&) {
        // expected
    }
    return 0;
}

This test method only tests hidden_fn - you’ll need another test for the rest of conduit_fn’s behavior.

Obviously this test is fragile with respect to changes in conduit_fn. If case TYPE::Option2 no longer calls hidden_fn, if the second argument isn’t directly passed to hidden_fn, or if the result of hidden_fn is no longer squared then this test will need to be changed.

But in that case someone is changing the “unchangable source code”. And while that change is happening that someone should take the opportunity to provide a test path for hidden_fn. As one example, it could be moved out of the anonymous namespace. Which is what should have been done in the first place, there most likely having been no good reason to make it hidden in the first place.

👉 Note that my strong suggestion that these functions should not have been hidden at all is in conflict to the usual recommendation to keep things private that are not explicitly part of an interface. Here, for example, the recommended “best practices” C++ Core Guidelines recommends everything that isn’t part of an interface should be buried in an anonymous namespace:

SF.22: Use an unnamed (anonymous) namespace for all internal/non-exported entities

Reason Nothing external can depend on an entity in a nested unnamed namespace. Consider putting every definition in an implementation source file in an unnamed namespace unless that is defining an “external/exported” entity.

And the same recommendation goes for declaring these methods static, which is simpler the older C++ way of hiding things within file scope.

I think this dogmatic approach is severely flawed, comprises testability (and thus correctness and reliability), and is unnecessary given other facilities in the language and in software engineering practice that would also protect these non-interface definitions from misuse. That is to say, it’s not a good recommendation, and should be (mostly) ignored.

The proper way to handle this (to be covered in more detail in a future post in this series) would be to:

  • Not hide any of these functions via an anonymous namespace or the static keyword.
  • Have, for this source file, a second header that declares only these “internal only” methods, and that header can be given a suggestive name, like foo_internal_decls_only.h to make its purpose crystal clear4, and
  • Use lint or other validation scripts to confirm that headers named .*_internal_decls_only are only #included by source files in directories named test/. (This step optional, needed only if you can’t trust your devs to follow the rules, even in the face of code reviews.)

Anyway. To continue.

You might wonder what this technique looks like in practice, instead of in a contrived example.

Here is where I used this technique in unit tests for Bitcoin Core:

The need was to unit test (among other things) the routines VerifyWitnessProgram and ExecuteWitnessScript inside of the unchangable file interpreter.cpp. I didn’t want to change interpreter.cpp as it is considered part of the Bitcoin “consensus” enforcement, and changes to it are considered very carefully. Even simply making hidden functions visible outside the file had the potential to delay acceptance of the PR, so I wanted to avoid that.

Here is ExecuteWitnessScript’s definition - 38 lins with a lot of code paths. You don’t have to read that code - just note that it is doing a lot of work in a bunch of code paths (with loops as well as conditionals), is obviously part of the Bitcoin “consensus” as it is executing a script that is part of the blockchain, that its behavior can’t be changed at all because it is already in production (that is, executing on Bitcoin nodes and therefore allowing transactions to be placed on the blockchain), and is declared static so it can’t be reached from outside its file.

static bool ExecuteWitnessScript(const Span<const valtype>& stack_span, const CScript& exec_script, unsigned int flags, SigVersion sigversion, const BaseSignatureChecker& checker, ScriptExecutionData& execdata, ScriptError* serror)
{
    std::vector<valtype> stack{stack_span.begin(), stack_span.end()};

    if (sigversion == SigVersion::TAPSCRIPT) {
        // OP_SUCCESSx processing overrides everything, including stack element size limits
        CScript::const_iterator pc = exec_script.begin();
        while (pc < exec_script.end()) {
            opcodetype opcode;
            if (!exec_script.GetOp(pc, opcode)) {
                // Note how this condition would not be reached if an unknown OP_SUCCESSx was found
                return set_error(serror, SCRIPT_ERR_BAD_OPCODE);
            }
            // New opcodes will be listed here. May use a different sigversion to modify existing opcodes.
            if (IsOpSuccess(opcode)) {
                if (flags & SCRIPT_VERIFY_DISCOURAGE_OP_SUCCESS) {
                    return set_error(serror, SCRIPT_ERR_DISCOURAGE_OP_SUCCESS);
                }
                return set_success(serror);
            }
        }

        // Tapscript enforces initial stack size limits (altstack is empty here)
        if (stack.size() > MAX_STACK_SIZE) return set_error(serror, SCRIPT_ERR_STACK_SIZE);
    }

    // Disallow stack item size > MAX_SCRIPT_ELEMENT_SIZE in witness stack
    for (const valtype& elem : stack) {
        if (elem.size() > MAX_SCRIPT_ELEMENT_SIZE) return set_error(serror, SCRIPT_ERR_PUSH_SIZE);
    }

    // Run the script interpreter.
    if (!EvalScript(stack, exec_script, flags, checker, sigversion, execdata, serror)) return false;

    // Scripts inside witness implicitly require cleanstack behaviour
    if (stack.size() != 1) return set_error(serror, SCRIPT_ERR_CLEANSTACK);
    if (!CastToBool(stack.back())) return set_error(serror, SCRIPT_ERR_EVAL_FALSE);
    return true;
}

Let’s see if we can test it using the method of this post: Get at it through its caller.

Here we see the only caller of ExecuteWitnessScript: It is VerifyWitnessProgram and here is its definition - again, please don’t read all of that. Just note it is doing significant work with lots of code paths - no loops this time!, is definitely part of the Bitcoin consensus since it is validating that a transaction’s script is valid before it either executed or added to the blockchian, and it needs to be thoroughly tested.

static bool VerifyWitnessProgram(const CScriptWitness& witness, int witversion, const std::vector<unsigned char>& program, unsigned int flags, const BaseSignatureChecker& checker, ScriptError* serror, bool is_p2sh)
{
    CScript exec_script; //!< Actually executed script (last stack item in P2WSH; implied P2PKH script in P2WPKH; leaf script in P2TR)
    Span stack{witness.stack};
    ScriptExecutionData execdata;

    if (witversion == 0) {
        ... 22 lines I don't need to test elided ...
    } else if (witversion == 1 && program.size() == WITNESS_V1_TAPROOT_SIZE && !is_p2sh) {
        // BIP341 Taproot: 32-byte non-P2SH witness v1 program (which encodes a P2C-tweaked pubkey)
        if (!(flags & SCRIPT_VERIFY_TAPROOT)) return set_success(serror);
        if (stack.size() == 0) return set_error(serror, SCRIPT_ERR_WITNESS_PROGRAM_WITNESS_EMPTY);
        if (stack.size() >= 2 && !stack.back().empty() && stack.back()[0] == ANNEX_TAG) {
            // Drop annex (this is non-standard; see IsWitnessStandard)
            const valtype& annex = SpanPopBack(stack);
            execdata.m_annex_hash = (CHashWriter(SER_GETHASH, 0) << annex).GetSHA256();
            execdata.m_annex_present = true;
        } else {
            execdata.m_annex_present = false;
        }
        execdata.m_annex_init = true;
        if (stack.size() == 1) {
            // Key path spending (stack size is 1 after removing optional annex)
            if (!checker.CheckSchnorrSignature(stack.front(), program, SigVersion::TAPROOT, execdata, serror)) {
                return false; // serror is set
            }
            return set_success(serror);
        } else {
            // Script path spending (stack size is >1 after removing optional annex)
            const valtype& control = SpanPopBack(stack);
            const valtype& script_bytes = SpanPopBack(stack);
            exec_script = CScript(script_bytes.begin(), script_bytes.end());
            if (control.size() < TAPROOT_CONTROL_BASE_SIZE || control.size() > TAPROOT_CONTROL_MAX_SIZE || ((control.size() - TAPROOT_CONTROL_BASE_SIZE) % TAPROOT_CONTROL_NODE_SIZE) != 0) {
                return set_error(serror, SCRIPT_ERR_TAPROOT_WRONG_CONTROL_SIZE);
            }
            execdata.m_tapleaf_hash = ComputeTapleafHash(control[0] & TAPROOT_LEAF_MASK, exec_script);
            if (!VerifyTaprootCommitment(control, program, execdata.m_tapleaf_hash)) {
                return set_error(serror, SCRIPT_ERR_WITNESS_PROGRAM_MISMATCH);
            }
            execdata.m_tapleaf_hash_init = true;
            if ((control[0] & TAPROOT_LEAF_MASK) == TAPROOT_LEAF_TAPSCRIPT) {
                // Tapscript (leaf version 0xc0)
                execdata.m_validation_weight_left = ::GetSerializeSize(witness.stack, PROTOCOL_VERSION) + VALIDATION_WEIGHT_OFFSET;
                execdata.m_validation_weight_left_init = true;
                return ExecuteWitnessScript(stack, exec_script, flags, SigVersion::TAPSCRIPT, checker, execdata, serror);
            }
            if (flags & SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_TAPROOT_VERSION) {
                return set_error(serror, SCRIPT_ERR_DISCOURAGE_UPGRADABLE_TAPROOT_VERSION);
            }
            return set_success(serror);
        }
    } else {
    ... more stuff I don't need to test elided ...
    }
}

And it is also declared static and isn’t available outside of its file.

Obviously an ideal approach would be to refactor all of this code by breaking it up into several smaller, visible, methods which themselves would be more easily tested (including making it clear that full branch coverage would be sufficient to test all functinality), all the while guaranteeing somehow that no change to behavior was made while moving stuff around. But in lieu of that we’ll (again) use the method of this post to test it: Get at it through its caller. So who calls it?

That would be VerifyScript, declared here - and finally we’ve got an entry point into this file.

bool VerifyScript(const CScript& scriptSig, const CScript& scriptPubKey, const CScriptWitness* witness, unsigned int flags, const BaseSignatureChecker& checker, ScriptError* serror)
{
    // Testing Taproot code paths in `VerifyWitnessProgram` and
    // `SigVersion::TAPSCRIPT` code paths in `ExecuteWitnessScript`.

    // Both `VerifyWitnessProgram` and `ExecuteWitnessScript` are `static`
    // inside of `interpreter.cpp` and thus inaccessible to a unit test.
    // The way to get to them is indirectly via `VerifyScript`.

    // Tests all success and failure paths mentioned in BIP-341 and
    // BIP-342.

    // This is a _unit test_ not a _functional test_ and the unit being
    // tested here does _not_ include actually verifying the signature.
    // That is tested elsewhere (e.g., by tests `verify_schnorr_signature`
    // and `check_schnorr_signature` in this file).  _This_ test _mocks_
    // the Schnorr signature verfication.  Thus the test data need not
    // actually have valid signatures (and is thus easier to prepare).

~425 lines of code follow, which includes a fluent API fixture for writing tests of these methods, and I still have a few tests to write. But the end result is already full branch coverage of VerifyWitnessScript and full branch coverage of ExecuteWitnessScript is coming shortly.

Note that a very nice thing about unit tests is that they are unit tests. You only have to test the code in this unit since other units have their own tests. Thus, it turns out that in this code the method used to validate the cryptographic signatures of the various scripts is actually - and fortunately! - passed in to VerifyScript as a C++ functor (in this case, virtual methods in a class passed as an argument) and from there plumbed through to ExecuteWitnessScript. Cryptographic signature verification is not part of this unit (and it is tested elsewhere) so doesn’t need to be tested here. Thus for these purposes I can mock signature verification, which makes it easier to write the scripts that will exercise the behavior of ExecuteWitnessScript, since I can just declare them properly signed (or not!) as needed without actually having to sign them.

And thus these critical parts of the Bitcoin Taproot/Tapscript implementation gained unit tests without changing consensus code that was already deployed and working in the Bitcoin network.


  1. Necessary #includes , present in the file, omitted here for space. ↩︎

  2. And here, a header file that contains the declarations of TYPE and conduit_fn is assumed. ↩︎

  3. Written here in a stupid manner without any test framework at all. It’s just an example! ↩︎

  4. And in many build systems it could be placed somewhere where its location would signal it is special. E.g., frequently interface headers are placed in a special include/ directory while internal-only headers are placed side-by-side with their source files. ↩︎