Sunday, June 10, 2007

The Case for Asynchronous Logging

It is common practise for federated systems to maintain seperate logfiles to assist in fingerpointing should an error in production occur. However, this duplication of effort is unsustainable in the world of High Frequency Finance (HFF) where messaging volumes are approaching the 400K messages per second leading to a rethink and perhaps a spirit of cooperation between data sinks and sources.

I propose that it's time to abrogate responsibility to one party not both, to log asynchronously. Deciding who carries the responsibility however, is not obvious. To understand the problem, lets look at the issues involved (or if you're of the half-empty glass persuasion - who gets the blame). Here's an example of a possible route between two applications:

  • Memory/Disk/SAN
  • Sender Application
  • Application proxy
  • TCP/IP Stack
  • Software Firewall
  • Hardware NIC
  • Network infrastructure (various routers/switches/firewalls, lan/wan etc)
  • Receivers NIC
  • Software Firewall
  • TCP/IP Stack
  • Application proxy
  • Receiver Application
  • Memory/Disk/SAN
As you can see, there's quite a log to go wrong. Lets now analyse where to perform the logging.

Sender Logging


If we rely on the sender, there's the immediate advantage that the sender will have to account for the log file space, access control and maintenance. However, from a consumer's point of view - that means there is a lack of control and potentially the case where you require a log and it's been deleted or is offline. From an audit point of view, you have increased the external dependency and hence the risk.

From the senders perspective, consumer lifecycle mangement also becomes slighty more difficult as you now have to poll your customers to see if they are still consuming your data, as it's not unknown for applications to be turned off without turning the feeds off due to lack of knowledge on who to contact.

Receiver Logging

With receiver logging, we have, effectively, a forensic record of the transfer across the stack and have control over the logfile lifecycle. It seems strange to state the obvious, but for higher performance, you should log to local disk, not NFS or SAN storage then back up the log files to resiliient storage.

Service Level Monitoring

A nice addition is to write service level monitoring as part of our productionised system. In this way you can monitor the normal performance of the system and build a predictive capability on applicaiton performance.

Conclusion

Asynchronous logging has the potential to save considerable disk space and processor time whilst reducing maintenance overhead. The receiver/data sink is the right place to log as it tests the circuit between server and receiver and puts the management of log files in the domain of the application which is where it belongs from a resource, audit and service level management perspective.

No comments: