Network monitoring: why telcos need AI and machine learning

Network monitoring is a critical IT process that involves all network components. Routers, switches, firewalls, servers, virtual machines etc. they must be constantly monitored both to evaluate their performance and to avoid the occurrence of malfunctions or downtime.

 If this is true for any company, it is even more true for those in the telecommunications sector, characterized by very complex distributed architectures. For an operator in this segment, managing delayed service unavailability, even temporary, can have disastrous repercussions on the business, as it can push customers to switch to the competition. For this reason, the network monitoring in support of telcos must be proactive, and able to anticipate any problems before they arise. For this purpose, it is necessary to have a holistic and non-fragmented view of the entire architecture, which is why the most advanced systems today use artificial intelligence (AI) and machine learning in support of telco network monitoring.

 

Network monitoring and AIOps platforms

Gartner defines network performance monitoring and diagnostics (NPMD) solutions as those that “leverage a combination of packet data, flow data and infrastructure metrics to provide historical, real-time and predictive views of network availability and performance and on the traffic of applications running on the network ". Furthermore, in the same definition, NPMD tools increasingly use AIOps (Artificial Intelligence for IT Operations) platforms to quickly identify the root causes of the performance "degradation". In the case of telcos, this possibility applies to network and security systems, to telephone equipment (IPBX, media gateway, CTI, etc.), to physical and virtual servers, to middleware and databases. An ecosystem for which a type of end-to-end monitoring is required that eliminates silos between systems and avoids the proliferation of alarms which, instead of simplifying the identification of the solution, increase the work of dedicated teams.

 

Clustering of alerts thanks to AI

The example just mentioned, of an excessive availability of network anomaly reports, is useful to understand the difference between network monitoring as we have known it so far and its version enhanced through the artificial intelligence of AIOps solutions. A traditional alert model, in which the activation of alarms occurs according to the anomalies found, would be unmanageable in contemporary telcos due to the amount of data that can generate alarms. The introduction of 5G, among other things, will lead to an exponential growth of M2M (machine-to-machine) SIMs in the field of IoT architectures, in addition to Human SIMs. The amount of data affected by network monitoring will therefore make traditional management of configuration parameters, anomaly detection and, consequently, associated alarms impossible. Machine learning algorithms, on the other hand, help to cluster the detection of parameters that are different from those deemed correct and introduce automation mechanisms to make network monitoring more efficient.

 

Automation for network monitoring in telcos

The automation that can benefit from network monitoring in telco, thanks to artificial intelligence and machine learning, concerns various aspects. For example, it applies to “intelligent” remediation processes that allow optimization of network operation by comparing the results of incoming data with standard operating models. The definition of the KPI parameters (key performance indicators) and the quality indicators KQI (key quality indicators) can thus be continuously improved by the self-optimization of the network and the progressive selection of relevant indicators. Unlike the fixed thresholds provided for in the usual network monitoring systems, the data analyzes carried out with artificial intelligence provide real-time insights that allow you to correlate, for example, network failures with data relating to the customer experience. With the advantage of reducing investigation times, as they shorten the discovery and identification of performance problems in all telco services.