We discuss why you should add logging to your programs, applications and services, what you should (and should not) log and why just using the print() or printf() function is not sufficient.
Many programming languages have common logging libraries including Java, C# and Python also has a logging module. This blog discusses why you should add logging to your programs, applications and services, what you should (and should not) log and why just using the print() or printf() function is not sufficient.
Logging is typically a key aspect of any production application; this is because it is important to provide appropriate information to allow future investigation following some event or issue in such applications. These investigations include:
In general, there are therefore two general reason to log what an application is doing during it operation:
Without such logged information it is impossible after the event to know what happened. For example, if all you know is that an application crashed (unexpectedly stopped executing) how can you determine what state the application was in, what functions, methods etc. were being executed and which statements run?
Remember that although a developer may have been using an IDE to run their applications during development and may possibly have been using the debugging facilities available that allow you to see what functions or methods, statements and even variable values are place; this is not how most production systems are run. In general, a production system will be run either from a command line or possibly through a short cut (for example on a Windows box) to simplify running the program. All the users will know is that something failed or that the behaviour they expected didn’t occur - if in fact they are aware of any issue at all!
Logs are therefore key to after the event analysis of failures, unexpected behaviour or for analysis of the operation of the system for business reasons.
One question that you might be considering at this point is ‘what information should I log?’.
An application should log enough information so that post event investigators can understand what was happening, when and where. In general, this means that you will want to log the time of the log message, the module / filename, function name or method name executing, potentially the log level being used (see later) and in some cases the parameter values / state of the environment, program or class involved.
In many cases developers log the entry (and to a lesser extent) the exit from a function or method. However, it may also be useful to log what happens at branch points within a function or method so that the logic of the application can be followed.
All applications should log all errors / exceptions. Although care is needed to ensure that this is done appropriately. For example, if an exception is caught and then re thrown several times it is not necessary to log it every time it is caught. Indeed, doing this can make the log files much larger, cause confusion when the problem is being investigated and result in unnecessary overheads. One common approach is to log an exception where it is first raised and caught and not to log it after that.
The follow-on question to consider is ‘what information should I not log?’.
One general area not to log is any personal or sensitive information including any information that can be used to identify an individual. This sort of information is known as PII or Personally Identification Information.
Such information includes
All the above represents sensitive information and much of it can be used to identify an individual; none of this information should be logged directly.
That does not mean that you cannot and shouldn’t log that a user logged in; you may well need to do that. However, the information should at least be obfuscated and should not include any information not required. For example, you may record that a user represented by some id attempted to log in at a specific time and whether they were successful or not. However, you should not log their password and may not log the actual userid - instead you may log an id that can be used to map to their actual userid.
You should also be careful about directly logging data input into an application directly into a log file. One way in which a malicious agent can attack an application (particularly a web application) is by attempting to send very large amounts of data to it (as part of a field or as a parameter to an operation). If the application blindly logs all data submitted to it, then the log files can fill up very quickly. This can result in the file store being used by the application filling up and causing potential problems for all software using the same file store. This form of attack is known as a log (or log file) injection attack and is well documented (see https://www.owasp.org/index.php/Log_Injection which is part of the well-respected Open Web Application Security Project).
In general you should also aim for empty logs in a production system; that is only information that needs to be logged in a production system should be logged (often information about errors, exceptions or other unexpected behaviour). However, during testing much more detail is required so that the execution of the system can be followed. It should therefore be possible to select how much information is logged depending on the environment the code is running in (that is within a test environment or within a production environment).
A final point to note is that it is important to log information to the correct place. Many applications (and organisations) log general information to one log file, errors and exceptions to another and security information to a third. It is therefore important to know where your log information is being sent and not to send information to the wrong log location.
If you want to log information in your application then next question is how should you do that? At first sight it may appear that the print() style function in your favoured language would be the ideal way to log information (and indeed many training courses and introductory texts do exactly that). Thus we need to consider whether using a print() function the best way to log information.
In actual fact, using print() to log information in a production system is almost never the right answer, this is for several reasons:
There are many logging frameworks available across a wide range of technologies and programming languages, here are just a few.
Although different logging frameworks differ in the specific details of what they offer; almost all offer the same core elements (although different names are sometimes used). The Python logging module illustrated above is no different and the core elements that make up the logging framework and its processing pipeline are shown below (note that a very similar diagram could be drawn for logging frameworks in Java, Kotlin, C# etc.).
Typical Logging Framework (derived from Python’s logging module).
The core elements of the logging framework (some of which are optional) are shown above and described below:
It is worth noting that much of the logging framework is hidden from the developer who only sees the logger; the remainder of the logging pipeline is either configured by default or via log configuration information typically in the form of a log configuration file.
Most applications should provide some form of logging, however such log data should be handled appropriate using a logging frame developed for your programming language / technology stack.
If you found this article interesting you might be interested in some of our courses: