Figure 1.1 Network and service management at large.
1.2 Data Collection and Monitoring Protocols
Any decision process must be guided by the ability to obtain data about the status of the system. In a typical network, devices from different vendors, with different functionalities, different capabilities, different administrative domains create heterogeneous scenarios where collecting data calls for standardized instruments and tools. Often this heterogeneity produces custom solutions provided by each vendor, offering advanced and proprietary solutions to interact with the different and custom devices. Here we present an overview of the major standard protocols that allow one to collect data from network devices, leaving custom solutions out of this description.
1.2.1 SNMP Protocol Family
Original TCP/IP network management is based on the Simple Network Management Protocol (SNMP) family. SNMP standardizes the collection and organization of information about devices on an IP network. It is based on the manager/agent model with a simple request/response format. Here, the network manager issues a request and the managed agents will send responses in return. SNMP exposes management data in the form of variables organized in a Management Information Base (MIB) which describes the system status and configuration. These variables can then be remotely queried and manipulated, allowing both the collection of information and the changes in configuration – provided the manager has controlling authorization on such variables. SNMPv1 is the original version of the protocol [4]. More recent versions, SNMPv2c and SNMPv3, feature improvements in performance, flexibility, and especially security [5, 6].
Via this simple approach, an authorized agent can remotely check and change the configuration of devices under its administrative domain, propagating changes, while obtaining an updated picture of the network status. SNMP offers a means thus both to collect information from and to control the network devices, but does not provide any means to define which is the best configuration to deploy.
1.2.2 Syslog Protocol
Similarly to SNMP, the Syslog protocol family [7] offers mechanisms for collection of logging information. Initially used on Unix systems and developed since 1980, the protocol introduces a layered architecture allowing the use of any transport protocols. The Syslog protocol enables a machine to send system log messages across networks to event message collectors. It implements a push approach, where the devices send information to the collectors. The protocol is simply designed to transport and distribute these event messages, enabling the centralized collection of logs from servers, routers, and devices in general. Differently from SNMP – Syslog does not allow to distribute any configuration, which shall be achieved using other communication channels.
Messages include a facility code and a severity level. The former identifies the type of program that is logging the message (e.g. kernel, user, mail, daemon, etc.). The latter defines the urgency of the message (e.g. emergency, alert, critical, error, warning, debug, etc.). This allows for simple filtering and easy reading of the messages. When operating in a network, syslog uses a client‐server paradigm, where the collector server listens for messages from clients. Born to leverage User Datagram Protocol (UDP), recent versions support TCP and Transmission Level Security (TLS) protocol for reliable and secure communications.
Syslog suffers from the lack of standard message format, so that each application supports a custom set of messages. It is common that even different software releases of the same application use different formats, thus making the parsing of the messages complicated by automatic solutions.
1.2.3 IP Flow Information eXport (IPFIX)
Both syslog and SNMP allow to collect information about the status of devices. Internet Protocol Flow Information Export (IPFIX) Protocol defines instead a means to collect in a standard way information about the traffic flowing in the network. The granularity at which it works is the flow, i.e. a group of packets having the same source and destination [8]. It defines the components involved in the measurement and reporting of information on IP flows. A Metering Process generates Flow Records; an Exporting Process transmits the information using the IPFIX protocol; and a Collecting Process receives it as IPFIX Data Records. The IPFIX protocol is a push mechanism only, and IPFIX cannot distribute configurations to the Exporters. As Syslog, it offers the means to collect information about the traffic flowing in a network, but does not provide any means to process it. Being based on traffic meters, it opens the possibility of implementing traffic profiling, traffic engineering, QoS monitoring, and intrusion detection solutions that analyze the flow‐based traffic measurements and generate valuable feedback to the network managers. IPFIX is an evolution of NetFlow, a custom predecessor introduced by Cisco in 1996 to collect and monitor IP network flow information. IPFIX not only supports the Stream Control Transmission Protocol (SCTP) at the transport layer but also allows the use of the TCP or UDP to offload the meter application.
NetFlow and IPFIX protocols are examples of “metadata‐based” techniques which can provide valuable operational insight for network performance, security, and other applications. For instance, in IP networks, metadata records document the flows. In each flow record, the “who” and “whom” are IP addresses and port numbers, and the “how long” is byte and packet counts. Direct data capture and analysis of the underlying data packets themselves can also be used for network performance and security troubleshooting, e.g. exporting the raw packets. This typically involves a level of technical complexity and expense that in most situations does not produce more actionable understanding vs. an effective system for the collection and analysis of metadata comprising network flow records.
The main critical point of IPFIX is its lack of scalability, for the data collection at the exporter, and the excessive the network load at the collector. This forces often to activate packet sampling options which limits visibility.
1.2.4 IP Performance Metrics (IPPM)
Internet Protocol Performance Metrics (IPPM) is an example of a successful standardization effort [9]. It defines metrics for accurately measuring and reporting the quality, performance, and reliability of the network. These include connectivity, one‐way delay and loss, round‐trip delay and loss, delay variation, loss patterns, packet reordering, bulk transport capacity, and link bandwidth capacity measurements. It offers a standard and common ground to define and measure performance so that even measurements performed by different vendors and implementations shall refer to the same monitored metric. In a nutshell, it opens the ability for common performance monitoring.
Among the standard protocols, the One‐Way Active Measurement Protocol and Two‐Way Active Measurement Protocol (OWAMP