Advantages of Semantic Logging

I see at least two definite advantages to “semantic logging” (a.k.a. “structured logging”) … and some more that could be realized (I think, I hope).

I’m working on tooling for semantic logging as a background project. Targeted languages are C++ and C#. I’m looking at:

  • an external DSL (i.e., specification file in an appropriate syntax, as for protobufs, with a generator for language-specific headers/implementation/whatever),
  • IDE support in the form of Visual Studio and Visual Code extensions,
    • Goal for the latter is that the developer can point to a code line where a logging statement should be inserted, give it a name, and enter (or pick from) a list of fields, then the logging method is generated (in the external DSL file) and the call inserted at that location.
  • integration with call-stack contexts, exception/recovery contexts, work-item/work-thread contexts, and transaction-oriented contexts (what OpenTelemetry calls “traces”),
  • multiple sinks, from standard log files to OpenTelemetry logging and metrics.

More to come as that develops.

Advantage: Logging is done in “semantic” chunks - the message is “structured” of “semantic” fields

The first major advantage is that the logging is done in “semantic” chunks - the message itself is “structured”, and it is composed of “semantic” fields (typically a key/value dictionary). (Most discussions of semantic logging focus on this aspect.)

Structured log messages are much easier to interpret. Messages have uniform syntax, and, especially important, at the field level within a message there is consistent formatting of values and possibly even consisting naming of fields.

  • It’s very difficult and tiresome - and sometimes impossible - to dig information out of service logs where each log message was individually formatted by a developer, at the time he was writing the code, with him only thinking of his immediate need and caring little or nothing for consistency with rest of the system. The same field being logged is given different names in different messages - even adjacent messages, the same values are logged with different formatting, and the fields are not in the same order, and are mixed with free text, and, and, and … - it makes it very difficult to search for common values, common patterns.

  • With semantic logging the consistent naming of fields and formatting of values (and lack of interspersed random text) makes it easier to grep for what you’re looking for, or for patterns, and also enables the use of tools designed to do a structured search (e.g., search in JSON structures), without having to go to heroics (i.e., the kind of stuff that people pay Splunk for: mining structure out of unstructured logs).

It’s very easy for the developer, when writing a series of log messages in closely related code - most often sequential code! - to omit values from various log messages “knowing” that it was “just logged right there” a few lines above. This by itself means the troubleshooter, reading the logs, has to put things together himself, piecing together information from multiple log messages that are not apparently connect except in the code itself.

  • This gets worse when, later, log messages are added to that code or to methods called from that code - the log messages which the original developer thought would be together in the log are suddenly separated by an arbitrary distance.

  • (And now imagine a refactoring where the method is split in to different async tasks and the messages for a single operation are now not close, and not even on the same thread!)

  • Semantic logging reduces the chance of this happening (by providing schemes for inexpensively logging all relevant values, even if there is some redundancy, and by providing ways to link related messages via “contexts” (see below)).

Semantic logging can unify metrics/telemetry collection with logging. Instead of having to annotate code with calls to both metrics/telemetry APIs and also, separately, to logging APIs, a single call to a logging method can both log and also send (a subset of) the logged data to metrics/telemetry sinks.

Advantage: Log statements are no longer ugly and obtrusive messes of formatting that get in the way of reading/understanding the actual program (business logic/control flow)

A second major advantage of semantic logging - the way I envision it - is that it eliminates the mess of formatting code combined with logging boilerplate (specifying log levels, etc.) that make up a logging statement, replacing it with a simple method call on a method with an appropriate semantic name.

  • E.g., Log.BlockVerification() instead of Log(INFO, "Block verification for #%16s size %d", blockId, block.size()).

    (Most semantic logging libraries I’ve seen don’t go this far, they just simplify the logging statement by eliminating explicit formatting, e.g., Log(INFO, "block#", blockId, "szBlock", block.size()). That’s less crufty - but, still there’s a bunch of cruft there.

It is always pointed out that code is read much more often than it is written (or modified). It’s exceptionally important that anyone reading the code - a reviewer, someone troubleshooting a problem in production, or even just the original author coming back to it 3 months later to add a new feature - be easily able to understand both the business logic and the necessary programming-overhead control flow. Log statements, as they exist today, are mostly useless for helping a reader understand what is going on: They’re a big mess of formatting and boilerplate stuck right in the middle of actual working code. Every time you read the code you’ve got to stumble over multiple interruptions in the form of these log statements. (I’ve yet to see an IDE, or IDE extension, that automatically folds away log statements…)

They’re not even useful as documentation, e.g., in the place of comments, because of all the formatting syntax and boilerplate mixed in to the actual “logged” content.

If, instead, each log statement was a bespoke method call on a logger instance custom-crafted to suit the logging needs of a particular method/module/component, then not only is all the cruft gone, but all that would be left is a method call where the name of the method would be semantically meaningful to the reader.

  • This needs tooling/library support/IDE support: see below.
  • Method names can still go out-of-date, just like comments, so you have to watch for it. It’s not a surefire replacement.
  • (It kind of makes you want exception recovery code to be handled this way too: the dream of exception throwing/handling where the code to prepare for/handle/recover an exception doesn’t interrupt the reader’s understanding of the main line of code.)

Potential advantage: Integration with other logging “context”

Each logging statement stands alone in the code, but there is a relationship between logging statements at runtime: They’re sequential in a given thread of control. But the thread of control isn’t always evident in the source - going from one statement to the next one in the text. Places where control flow is confused in the source yet you’d like to have, for troubleshooting or metrics, an unambiguous timeline of log entries include:

  • The call graph of which methods are called in which order, and which are nested calls in other methods.

  • The flow when an error/exception is processed (which is not the ordinary flow, and which in a system using “exceptions” (like C++) includes returning up the call stack unwinding as you go and skipping intermediate “uninteresting” frames.

  • Work items/work threads where a computation is split into one or more chunks and moved off in a work item to run on some other thread asynchronously (or on the same thread when it is not busy, with coroutines or async/await.)

  • A “transaction”, which is a large computation driven from an external source - a user or some external client service

    • Canonically, a user clicking on a shopping cart to purchase something, or
    • A single API call to a service.
      • Or even multiple related API calls to a service that together accomplish a single goal.
  • There are even computations which remote across multiple machines (perhaps a large computation shared into similar pieces, perhaps calling out to one or many separate microservices). Ideally the transaction logging context should travel to the remote machines too, so all the separate logs (and metrics) can be connected. (OpenTelemetry has facilities for this, I believe.)

These contexts should not have to be explicitly recorded by user-written logging statements. It’s unwieldly (or even very difficult) to pass the (all the relevant) context information around so it can be logged, and it would be expensive too (as well as adding to the cruft). Plus, it is extremely easy to forget to log the appropriate context with each and every log statement (anyone who’s dealt with any legacy code - even code from last year! - will have seen this).

Instead, for each kind of context, a separate start/stop (or enter/exit) should be recorded - along with an identifying mark (a transaction id, or user id, or work item id, unique) - and they should be (if appropriate) nested, and then the semantic logging routines should know where these context markers are and automatically add them to each logging method.

  • These context start/stop (enter/exit) routines should preferably be scoped, using whatever language abilities there are. (C++: RAII objects on the stack; C#: using statements.)
    • And that means that often each scope can be both entered and left with only one statement in the code!
  • In some cases the context start/stop routines can be injected into the code with annotations, or aspects …
  • I think serilog does this or something quite like transaction contexts, other logging and metrics collection systems as well.

Potential advantage: Ease of programing with coding assists (tooling)

There’s great potential to make semantic logging very easy for the programmer - much easier than traditional logging (where the programmer is always fiddling with formatting, and frequently doesn’t even get it right - which you can see in the logs when rare (or even not-so-rare) log messages have obvious missing or incorrect or incorrectly formated data).

Possible choices include:

  • Embedded DSL to write log method implementations, which can keep terms and formatting consistent, while specifying appropriate context information to log, and direct individual fields to various metrics collectors (including computing summaries and trends)
  • External DSL with an external language processor (compiler) to generate language-specific declarations/definitions (like the protoc compiler for the protobuf specification language)
  • Use of reflection (if the language supports it) so that less work need go into defining fields/values, including picking field names/values directly out of structures.
  • Use of customizable code annotations (if the language supports it) to assist reflective code into pulling out the proper field names/values.
  • And, very important: IDE support (extensions, language server) that make it easy for a programmer to insert a brand new semantic logging method - giving it a proper name with meaning, and specifying which fields/values/contexts to log - and then both generating the body (in the DSL) of the new logging method in the proper file and inserting a call to that new logging method in the code where the developer wants it.

I’m working on tooling for semantic logging as a background project. Targeted languages are C++ and C#. I’m looking at:

  • an external DSL (i.e., specification file in an appropriate syntax, as for protobufs, with a generator for language-specific headers/implementation/whatever),
  • IDE support in the form of Visual Studio and Visual Code extensions,
    • Goal for the latter is that the developer can point to a code line where a logging statement should be inserted, give it a name, and enter (or pick from) a list of fields, then the logging method is generated (in the external DSL file) and the call inserted at that location.
  • integration with call-stack contexts, exception/recovery contexts, work-item/work-thread contexts, and transaction-oriented contexts (what OpenTelemetry calls “traces”),
  • multiple sinks, from standard log files to OpenTelemetry logging and metrics.