2.3. State of the art
Beyond the generalization of broadband, 5G promises to improve our quality of life via connected ecosystems. The mMTC will play a determining role in carrying these out. Although this vision of the IoT seems very attractive, it also creates many challenges for network manufacturers and operators to tackle. In fact, an excessively high number of sensors will need to be operated in IoT objects in the coming years.
As explained in section 2.2.3, each terminal wishing to connect to the network should initiate a random access procedure. However, this procedure was initially designed for a limited number of terminals and the high density targeted by NB-IoT can lead very quickly to a situation of severe congestion. In fact, since the number of preambles available to each RAO is limited, the higher the number of terminals attempting access, the higher the risk of collision, thus leading to failure of the procedure for all the terminals that have chosen the same preamble. Certainly, the terminals that have not succeeded in access can retransmit the preamble after observing a wait time, but these retransmissions can also lead to a poor use of spectral resources, on the one hand, but also to increase energy consumption at the terminals on the other (Harwahyu et al. 2019).
Given its criticality, the random access procedure has been the subject of several studies. Some studies, such as Baracat and Brito (2018), Harwahyu et al. (2018) and Jiang et al. (2018), have suggested analytical models for optimizing the probability of success for access attempts at the terminals and the average access time in different configurations especially under time constraints (Harwahyu et al. 2018). Others are focused on retransmissions. So in Sun et al. (2017), a model based on Markov chains has been proposed to model the number of retransmissions; in Harwahyu et al. (2019), the authors propose a model to find a compromise between the number of repetitions predicted in the physical layer and the number of retransmissions expected in the MAC layer to optimize these two values by using the probability of preamble detection. The study has shown that the retransmissions considered in NPRACH can reduce the number of repetitions. These are only necessary when the network conditions deteriorate.
In Lin et al. (2016), Hwang et al. (2018) and Jeon et al. (2018), the focus has been on transmission of the preamble and estimation of the arrival time. Thus, a detection algorithm on the receiver side, a new jump model in the NPRACH frequency domain and a framework to detect multiple users have been proposed, respectively. In Zhang et al. (2020), the TA preambles that have undergone a collision are used to improve performances of the random access procedure.
From the perspective of standardization, congestion control at the level of access to the network was very early on identified as a priority by the 3GPP and ETSI organisms (3GPP 2011). The cellular IoT, especially NB-IoT, therefore benefits naturally from the solutions proposed for the standards that precede them. Among the solutions proposed, we find ACB (Access Class Barring) and EAB extension, slotted random access, backoffs specific to MTC, a dynamic allocation of resources, etc. (Ali et al. 2017).
The ACB and EAB are those that tackle the problem at its root by blocking access to the network via the diffusion of blocking parameters in the system’s information blocks (SIB) blocks at each RAO. The terminals receive, especially, a probability of blocking p and a blocking time Tb for this opportunity. Each terminal wanting to access the network generates an access probability q. if q <p, the terminal has permission to make an access attempt, otherwise this is postponed for a time Tb This mechanism has been extended. In the EAB, the terminals are classed according to their requirements in terms of Quality of Service (QoS) and the EAB algorithm dynamically blocks low-priority terminals according to the arrival rate by diffusing a bitmap in the SIB14.
It seems clear that congestion control via these techniques relies entirely on the probability of blocking defined by the network. In fact, if the probability of blocking is too high, then a significant number of terminals would pass the access control, thus leading to collisions, and if, on the other hand, this probability is too small, then collisions will be reduced, but a large number of terminals will switch to the inactive mode and this will lead to under-use of resources. It is, therefore, essential to calculate a probability of optimal blocking for effective congestion management.
A study of ACB and EAB performances was carried out in Toor and Jin (2017). Comparison of the two techniques via simulation has shown that the ACB is best adapted to communications with high delay constraints and the EAB in the case of energy-constrained terminals. However, calculating the probability of optimal blocking relies on the base station ability to know the number of terminals attempting the access to the network. In practice, this is not the case. In fact, the base station does not have knowledge of the number of terminals whose access attempts have been blocked.
Several mechanisms have been proposed to estimate the number of terminals attempting to access the network (including the terminals blocked by the access control) so as to deduce from this the blocking probability to be used. In Park and Lim (2016), in the absence of knowledge of the number of blocked terminals, the authors use a heuristic to adapt the probability of blocking. The algorithm proposed in Liu et al. (2020) makes a recursive Bayesian estimation of the active terminals in each class, and depending on this estimation some preambles are allocated to different classes. The algorithm was then improved by assigning an ACB blocking factors to each of the classes, independently of the others, for better congestion control. In Jin et al. (2017), a recursive Bayesian estimation of the active terminals, based on the number of unchosen preambles, makes it possible to calculate a blocking factor for those arrivals from the terminals that are sporadic in character. Performance of the EAB technique is studied in Cheng et al. (2015) for LTE-A networks. The optimal values of the radio messaging cycle as well as the periodicity of SIB14 are then derived by submitting the analytic model to targeted QoS constraints.
In this chapter, we base ourselves on an estimator proposed in an earlier work (Bouzouita et al. 2019) and unlike the work cited above, we use reinforcement learning techniques, especially the TD3 algorithm, to calculate an optimal blocking factor from a set of past estimations. To our knowledge, this is the first time that this type of algorithm has been used in managing massive access to terminals in NB-IoT networks, excepting our previous contribution (Hadjadj-Aoul and Ait-Chellouche 2020).
2.4. Model for accessing IoT terminals
The model proposed represents an overview of IoT devices executing the ACB algorithm. During the random access attempt, the IoT devices compete for the same available preambles. As the 3GPP standard indicates, the number of preambles N should be an integer as explained in section 2.3 (ETSI 2011).
During each access opportunity (i.e. Random Access Channel, RACH), these preambles are divided into successful preambles, chosen by a single device, in collision, chosen by two or more devices, and free, not chosen by any of these devices.
In what follows, we calculate the average values of these quantities, which we determined in Bouzouita et al. (2015). These will thereafter be used by our algorithms.
Let us define qN=1 – 1/N. The average number of successful preambles NS, during RACH opportunities, is given as follows (this is a classic problem of throwing balls into bins):
where x2 represents the number of devices attempting access. As we have been able to demonstrate in Bouzouita et al. (2019), equation [2.1] is maximized