You may well have come across the term AIOps recently and wondered what on earth is that?
You may well have come across the term AIOps recently and wondered what on earth is that? In this blog we will try to explain what AIOps is and why it may be of interest to you (and why you may want to hold off investing in it just yet!).
The acronym AIOps is short for “Artificial Intelligence for IT Operations”. It is a term that was coined by Gartner back in 2016. They used it to indicate a category of technologies based on Artificial Intelligence and Machine Learning analytic techniques that can be used to enhance IT operation analytics. Although it should be noted that in practice it is possible to use straightforward numerical or statistical analytics in combination with AI or ML techniques in an AIOps implementation.
In general, the aim of an AIOps implementation is to support the IT Operations’ primary functions of monitoring and resolving issues within an IT organisations’ setup. This might involve providing an alert monitoring and rectification system, or a proactive anomaly detection system which predicts future issues (such as problems with a database or server farm), or provide a unified (and enriched) view of data coming from multiple data sources etc.
If we consider the typical (non-AIOps) alert monitoring system, then the processing cycle is something like:
In a tradition operations environment, many of these steps may be performed by a human, such as the initial review of the alerts, the removal of false positives, the analysis and root cause identification and finally the remedial action. This may be done via an alert monitoring system and its user interface etc.
In an AIOps environment Machine Learning techniques can be applied to many of these steps. In many cases this involves the analysis of previous data using one or more ML algorithms (such as decision trees, random forest or even simple linear regressors etc.).
These algorithms can be trained on available data and then be used to identify new issues or predict future issues using new (previously unseen) data.
In many cases the ‘previous data’ might be referred to as Big Data as the information available / to be used may be very large. This often means that AIOps also ranges into the domain of data analytics which may involve data cleansing, exploratory data analysis, the application of numerical and statistical techniques as well as the utilization of Machine Learning algorithms. It can also involve the application of data visualization techniques often derived from both the data analysts tool kit and the traditional IT Operations environment.
If this has piqued your interest, then perhaps the question to consider next is why should you consider adopting AIOps? Indeed, there are several compelling reasons driving the current interest in AIOps:
IT Operations needs are exceeding human scale. The typical large IT operations environment is growing both in size and complexity year on year. In addition, the expectation of the time between an issue appearing and being fixed is shrinking with both organisations and users expecting issues to be fixed immediately (if not before they even happen). In many cases this is overwhelming the human based systems requiring more automation. This automation increasingly needs to be sophisticated enough (intelligent enough) to be able to deal with the problem directly rather than just report on it.
IT Operations data is growing at an exponential rate. Along with almost all facets of an organisation, the data being collected (both historical and live) by IT Operations departments is growing at an exponential rate. Being able to analyse and exploit that data is becoming an increasingly difficult task. Utilizing techniques from AI/ML and data analytics can help with this task. Combining it with the IT Operations systems leads naturally to AIOps.
Increasingly dynamic, agile, environments. Many organisations are now their IT Infrastructure; if that stops working the organisation stops working. To enable such organisations to adapt to an ever-changing world and to grow within their marketplace, the IT infrastructure has had to evolve to become more dynamic, more agile in nature. In many cases this has been achieved by adopting a more service based, cloud-oriented architecture. This provides flexibility but requires that the IT Operations department and its tolling must also be able to adapt and change as the configurations, data flows, services change etc.
Business functions taking more responsibility for IT systems. Another trend is that the responsibility for owning and managing software systems is now becoming devolved to the business functions within an organisation. This evolution has occurred in part thanks to the rise of the DevOps world and the ability for different functions to request new services and have them provisions with limited or no IT Operations direct involvement. However, this means that monitoring, anomaly detection and rectification systems need to be able to handle such dynamic and fast-moving environments.
It should be noted that the drive to AIOps is not about replacing the human element, but rather about addressing the size, scale, and speed of current and future IT Operation needs. Admittedly the role of the human in these environments may change and may require additional skill sets, but the human will still be needed.
So, if AIOps is so great why isn't every organisation running to adopt it right now? Well, there are some challenges that need to be considered:
High cost of / time for implementation. Current platforms are expensive, and the time taken to implement / configure them is not insignificant. The typical system will require a lengthy implementation, testing and verification period. This is highlighted by Gartner in their Market Guide for AIOps Platforms where they state, “Successful deployments require time and effort, including a structured road map by the end user”.
Involve challenges in analysing data from legacy systems. Another challenge involves analysing data from legacy systems that were never intended for automated analysis; or at most were only intended to be stored in a database so that it could be viewed by a human. This can lead to time consuming ETL (Extract Transform and (re)Load) requirements and may still have issues with data accuracy, completeness, noise etc.
Can be a challenge to provide technical understanding of decisions made. One issue for many ML techniques is that they are quite opaque when it comes to understanding their decision-making process. The potential need for a system to explain why it came to the decisions it came to and what it performed the actions it did may be important both in understanding the systems behaviour but also in building trust in that system. However, ML techniques that are primarily statistical such as a Neural Net are not best suited to such explanations.
Lack of trust in decisions made. This point is related to the previous point but has a difference emphasis. Particularly in a modern IT Operations department it may be difficult to get system administrator (or possibly more significantly) security administrators to trust an automated AIOps system to manage critical systems.
AIOps is (probably) here to stay and as Gartner indicates there is no future in IT operations in which AIOps is not included.
If you found this article interesting you might be interested in our AIOps Training Course.