Logging: To Log or not to Log that is the Question!

We discuss why you should add logging to your programs, applications and services, what you should (and should not) log and why just using the print() or printf() function is not sufficient.

15-08-2022
Bcorp Logo


To log or to log

Many programming languages have common logging libraries including Java, C# and Python also has a logging module. This blog discusses why you should add logging to your programs, applications and services, what you should (and should not) log and why just using the print() or printf() function is not sufficient.

Why Log?

Logging is typically a key aspect of any production application; this is because it is important to provide appropriate information to allow future investigation following some event or issue in such applications. These investigations include:

  • Diagnosing failures; that is why did an application fail or crash.
  • Identifying unusual or unexpected behaviour, which might not cause the application to fail but which may leave it in an unexpected state or where data may be corrupted etc.
  • Identifying performance or capacity issues; in such situations the application is performing as expected but it is not meeting some non-functional requirements associated with the speed at which it is operating or its ability to scale as the amount of data or the number of users grows.
  • Dealing with attempted malicious behaviour in which some outside agent is attempting to affect the behaviour of the system or to acquire information which they should not have access to etc. This could happen for example, if you are creating a web application and a user tries to hack into your web server.
  • Regulatory or legal compliance. In some cases, records of program execution may be required for regulatory or legal reasons. This is particularly true of the financial sector where records must be kept for many years in case there is a need to investigate the organisations’ or individuals’ behaviour.

What is the Purpose of Logging?

In general, there are therefore two general reason to log what an application is doing during it operation:

  • For diagnostic purposes so that recorded events / steps can be used to analyse the behaviour of the system after the event.
  • For auditing purposes that allow for later analysis of the behaviour of the system for business, legal or regulatory purposes. For example to determine who did what with what and when.

Without such logged information it is impossible after the event to know what happened. For example, if all you know is that an application crashed (unexpectedly stopped executing) how can you determine what state the application was in, what functions, methods etc. were being executed and which statements run?

Remember that although a developer may have been using an IDE to run their applications during development and may possibly have been using the debugging facilities available that allow you to see what functions or methods, statements and even variable values are place; this is not how most production systems are run. In general, a production system will be run either from a command line or possibly through a short cut (for example on a Windows box) to simplify running the program. All the users will know is that something failed or that the behaviour they expected didn’t occur - if in fact they are aware of any issue at all!

Logs are therefore key to after the event analysis of failures, unexpected behaviour or for analysis of the operation of the system for business reasons.

Logging: To Log or not to Log that is the Question!


What should you Log?

One question that you might be considering at this point is ‘what information should I log?’.

An application should log enough information so that post event investigators can understand what was happening, when and where. In general, this means that you will want to log the time of the log message, the module / filename, function name or method name executing, potentially the log level being used (see later) and in some cases the parameter values / state of the environment, program or class involved.

In many cases developers log the entry (and to a lesser extent) the exit from a function or method. However, it may also be useful to log what happens at branch points within a function or method so that the logic of the application can be followed.

All applications should log all errors / exceptions. Although care is needed to ensure that this is done appropriately. For example, if an exception is caught and then re thrown several times it is not necessary to log it every time it is caught. Indeed, doing this can make the log files much larger, cause confusion when the problem is being investigated and result in unnecessary overheads. One common approach is to log an exception where it is first raised and caught and not to log it after that.

What not to Log

The follow-on question to consider is ‘what information should I not log?’.

One general area not to log is any personal or sensitive information including any information that can be used to identify an individual. This sort of information is known as PII or Personally Identification Information.

Such information includes

  • user ids and passwords,
  • email addresses,
  • data of birth, birthplace,
  • personally identifiable financial information such as bank account details, credit card details etc.,
  • biometric information,
  • medical / health information,
  • government issued personal information such as passport details, driver’s license number, social security numbers, National Insurance numbers etc.,
  • official organisational information such as professional registrations and membership numbers,
  • physical addresses, phone (landline) numbers, mobile phone numbers,
  • verification elated information such as mother’s maiden name, pets’ names, high school, first school, favourite film, etc.,
  • it also increasing includes online information relating to social media such as Facebook or LinkedIn accounts.

All the above represents sensitive information and much of it can be used to identify an individual; none of this information should be logged directly.

That does not mean that you cannot and shouldn’t log that a user logged in; you may well need to do that. However, the information should at least be obfuscated and should not include any information not required. For example, you may record that a user represented by some id attempted to log in at a specific time and whether they were successful or not. However, you should not log their password and may not log the actual userid - instead you may log an id that can be used to map to their actual userid.

You should also be careful about directly logging data input into an application directly into a log file. One way in which a malicious agent can attack an application (particularly a web application) is by attempting to send very large amounts of data to it (as part of a field or as a parameter to an operation). If the application blindly logs all data submitted to it, then the log files can fill up very quickly. This can result in the file store being used by the application filling up and causing potential problems for all software using the same file store. This form of attack is known as a log (or log file) injection attack and is well documented (see https://www.owasp.org/index.php/Log_Injection which is part of the well-respected Open Web Application Security Project).

In general you should also aim for empty logs in a production system; that is only information that needs to be logged in a production system should be logged (often information about errors, exceptions or other unexpected behaviour). However, during testing much more detail is required so that the execution of the system can be followed. It should therefore be possible to select how much information is logged depending on the environment the code is running in (that is within a test environment or within a production environment).

A final point to note is that it is important to log information to the correct place. Many applications (and organisations) log general information to one log file, errors and exceptions to another and security information to a third. It is therefore important to know where your log information is being sent and not to send information to the wrong log location.

Why not just use print?

If you want to log information in your application then next question is how should you do that? At first sight it may appear that the print() style function in your favoured language would be the ideal way to log information (and indeed many training courses and introductory texts do exactly that). Thus we need to consider whether using a print() function the best way to log information.

In actual fact, using print() to log information in a production system is almost never the right answer, this is for several reasons:

  • The print() function typically writes strings out to the standard output (stdout) or standard error output (stderr) which by default directs output to the console / terminal. For example, when you run an application within an IDE, the output is displayed in the Console window. If you run an application from the command line, then the output is directed back to that command / terminal window. Both are fine during development, but what if the program is not run from a command window, instead being run from a container or as a servicer then there may not be a terminal / console window to send the data to; instead the data is just lost. As it happens the stdout and stderr output streams can be directed to a file (or files). However, this is typically done when the program is launched and may be easily omitted. In addition, there is only the option of sending all stdout or stderr to a specific file.
  • Another issue with using the print() function is that all calls to print will be output. When using most loggers, it is possible to specify the log level required. These different log levels allow different amounts of information to be generated depending upon the scenario. For example, in a well-tested reliable production system we may only want error related or critical information to be logged. This will reduce the amount of information we are collecting and reduce any performance impact introduced by logging into the application. However, during testing phases we may want a far more detailed level of logging.
  • In other situations, we may wish to change the log level being used for a running production system without needing to modify the actual code (as this has the potential to introduced errors into the code). Instead, we would like to have the facility to externally change the way in which the logging system behaves, for example through a configuration file. This allows system administrators to modify the amount and the detail of the information being logged. It typically also allows the designation of the log information to be changed.
  • Finally, when using the print() function a developer can use whatever format they like, they can include a timestamp on the message or not, they can include the module or function / method name or not they can include parameters of not. Using a logging system usually standardises the information generated along with the log message. Thus, all log messages will have (or not have) a timestamp, or all messages will include (or not include) information on the function or method in which they were generated etc.

Common Logging Frameworks

There are many logging frameworks available across a wide range of technologies and programming languages, here are just a few.

  • Java common logging frameworks used with Java include Log4J, LogBack and the Java Logging API. In addition, the SLF4J (or the Simple Logging Facade for Java) is a pluggable framework that relies on an underlying logging framework (such as LogBack or Log4J) as such it provides a common interface to a wide range of loggers.
  • Python has included a built-in logging module since Python 2.3. This module, the logging module, defines functions and classes which implement a flexible logging framework that can be used in any Python application / script or in Python libraries / modules.
  • C#. As with the other languages listed here there are numerous C# logging libraries available including Log4net, NLog and Logary.
  • JavaScript. There is a very wide range of logging libraries for JavaScript including Log4JS, Winston, Pino, and npmlog.
  • Go. Go comes with a simple logging framework but many projects use libraries that build on that such as Logrus and Zerolog.

Although different logging frameworks differ in the specific details of what they offer; almost all offer the same core elements (although different names are sometimes used). The Python logging module illustrated above is no different and the core elements that make up the logging framework and its processing pipeline are shown below (note that a very similar diagram could be drawn for logging frameworks in Java, Kotlin, C# etc.).

Logging: To Log or not to Log that is the Question!


Typical Logging Framework (derived from Python’s logging module).

The core elements of the logging framework (some of which are optional) are shown above and described below:

  • Log Message The is the message to be logged from the application.
  • Logger Provides the programmers entry point / interface to the logging system. The Logger class provides a variety of methods that can be used to log messages at different levels.
  • Handler Handlers determine where to send a log message, default handlers include file handlers that send messages to a file and HTTP handlers that send messages to a web server.
  • Filter This is an optional element in the logging pipeline. They can be used to further filter the information to be logged providing fine grained control of which log messages are output (for example to a log file).
  • Formatter These are used to format the log message as required. This may involve adding timestamps, module, and function / method information etc. to the original log message.
  • Configuration Information The logger (and associated handlers, filters, and formatters) can be configured either programmatically in Python or through configuration files. These configuration files can be written using key-value pairs or in a YAML file (which is a simple mark-up language). YAML stands for Yet Another Markup Language!

It is worth noting that much of the logging framework is hidden from the developer who only sees the logger; the remainder of the logging pipeline is either configured by default or via log configuration information typically in the form of a log configuration file.

Summary

Most applications should provide some form of logging, however such log data should be handled appropriate using a logging frame developed for your programming language / technology stack.


Would you like to know more?

If you found this article interesting you might be interested in some of our courses:

Share this post on:

We would love to hear from you

Get in touch

or call us on 020 3137 3920

Get in touch