Logging Diagnostics

OK, confession time. I have no idea where the code is for this. Bad puppy. Could have sworn it was up on GitHub. In my defence this is like 15 years old so… Anyway, I still think there’s some nice ideas in here.

You would think after all this time that there would be a decent way to log diagnostic info generated by running code. There are lots of attempts out there but none of them seem to be good enough. Now, to be fair, this may be one of those areas where there is no ‘one size fits all’ solution and all the current solutions are perfect in their own context. I’m not convinced though so I’ve written my own along with most other people out there. It’s written in C++ as that’s what I use for most of my work. This was my wish list:

  • Lightweight – it’s only logging
  • Fast enough – not too much of a performance hit
  • Simple to call – it gets called a lot
  • Easy integration – every project should have it
  • Steams based interface – typesafe and no tedious type conversions
  • Thread safe – log entries should be atomic
  • Extensible – different message destinations, formatting etc.
  • Reliable under crashes – no unflushed buffers
  • Works out of the box – default config should be fine for 99% of users

Adding logging to an application is not a big deal. It’s not complicated so don’t make it so. The most widely used logging code is the Apache family of log4*. Implementations are available for a wide variety of platforms based around the original Java library, log4j. I recently tried to integrate log4cxx, the C++ port of log4j, into a project. A bit of sweating and beating got it built and basically working but in the end it turned out it wouldn’t play nicely with Windows services and it was at that point that I gave up. And the reason I gave up? The log4cxx distribution weighs in at over 7 megs and there’s no way I was going to start fixing anything that complicated to get some diagnostics messages in a file. Yes, it does all sorts of very clever things but that’s all in theory and at the end of the day most people don’t need ’em. Despite its popularity it’s just a really, really, badly designed product. I can’t think of any reason at all why a logging library should require a 7 meg download. That’s just poor. My logger will have three files. One header, one implementation and one test to exercise the code and explain how to integrate it and how to extend it.

The integration should be trivial. Drop the .h and .cpp files in the project build. Declare a global logger. Write messages to it. End of story. All the extras that weirdos and control freaks might want should be pretty trivial too but they should not be forced on the sane. Think about it, how often do you really need to specify the name of the log file yourself. Yes, sometimes you might have two apps writing logs to the same folder, but that’s not the default behaviour. Default behaviour is that everything gets dumped into ‘log.txt’. And default behaviour is not to ask the client code to specify what format the month name should be in for the timestamps. Oh, and macros will only be used where they provide a clear benefit.

Specifying streams based logging is very C++ specific but it’s also central to the operation. Streams mean you don’t have all that terrifying printf formatting kludgery. Steams mean it’s easy to extend the library to write output to the screen, or a teletype, or a remote machine in your beach-side monitoring station in Tuvalu. Streams mean you can write the contents of any code object directly to the log without having to repeat your text output routines every time. Just make the object streamable and Bob’s your aunty.

Multiple threads should be able to write to the same log file without causing parts of one message to be inserted part way through another message. In other words, the writing of each message to the log should be an atomic transaction that cannot be interrupted by another thread. A simple file locking mechanism will do the job. Working out where to put it is the only tricky bit.

All that stuff about keeping things simple is very true but there will come a time when you need to step outside convention. You may want to internationalise the timestamps. You may want to add a custom prefix or suffix to certain log messages. You may want to enable logging only for certain events. So, the library has been written to make use of both predefined and custom stream manipulators. Don’t want milliseconds displayed in the timestamp? Write your own ‘prefix’ manipulator and attach it to the logger in place of the default one. Ditto for multi-level message indentation to show a stack trace for example. The point is, you only rarely need to use these features. And when you do, the mechanism for using them or adding your own is simple and consistent.

One minor but important pet peeve of mine is logging libraries not being robust in the face of crashes. Most rely on default buffering of the log messages which is fine when all’s well but the point of the exercise here is to give clues to the developer in the event of failure. So, the buffer gets flushed after each message is written. That’s the best you can do but most libraries don’t do it.