Multi-Processor System-on-Chip 1. Liliana Andrade. Читать онлайн. Newlib. NEWLIB.NET

Автор: Liliana Andrade
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Программы
Год издания: 0
isbn: 9781119818281
Скачать книгу
face detection for triggering an alert, accompanied by an image or video, on the owner’s smartphone;

       – smart speakers with voice control, employing local speech recognition for a limited vocabulary of voice commands while relaying other speech data into the cloud for more advanced analysis;

       – smart sensing devices used in agriculture to monitor and control, for example, soil quality, crop yield and livestock, while sporadically communicating data over cellular connections using, for example, NB-IoT protocols for low power consumption.

      Many IoT edge devices are battery-operated and demand an optimized implementation in order to enable a long battery life. Therefore, we must target low power consumption for functions that need to be performed in software locally on the IoT edge device. This, in turn, requires programmable processors that are optimized for executing these software functions efficiently, which is the topic of this chapter.

      1.2.1. Control processing, DSP and machine learning

      Low-power IoT edge devices typically perform a range of different functions locally on the device. They run a local application that controls the device, its sensors and other interfaces, such as a communications interface to the network and a user interface. For this purpose, a processor must have capabilities for efficient processing of control code, including low branch overheads, efficient interrupt handling, timers, efficient integration with peripherals, support for real-time kernels, etc.

      The processing of sensor data typically involves digital signal processing (DSP) with functions such as filtering (e.g. FIR, correlation, biquad), transforms (e.g. FFT, DCT), and vector and matrix operations. Voice data can be processed by various DSP functions, including noise reduction and echo cancellation. In addition, the IoT edge device can perform encoding and/or decoding of voice or audio data. For example, consider an audio playback function on the device.

      Communicating data involves further DSP functions. For example, some key functions in an NB-IoT protocol stack involve FFT, auto- and cross-correlations, and complex multiplications and convolutions. Furthermore, trigonometric functions such as sine and cosine must be performed. In addition, such protocol stacks perform convolutional coding, for example, Viterbi.

      We conclude that the efficient processing of sensor data on an IoT edge device requires processors equipped with DSP capabilities. The relevant DSP capabilities are:

       – support for fixed-point data types and arithmetic, including fixed-point multiply-accumulate (MAC) instructions, wide accumulators, and efficient saturation and rounding;

       – support for floating-point data types and instructions, including fused multiply-add instructions;

       – advanced address generation for efficient memory access, including circular and bit-reversed addressing for DSP kernels such as FIR filters and FFTs;

       – zero-overhead loops;

       – support for complex data types and arithmetic, including complex multiply and MAC instructions;

       – support for vector or SIMD processing to enable increased efficiency by exploiting data parallelism;

       – efficient divide and square root operations;

       – high load/store bandwidth, as DSP functions can be memory-access intensive.

      Integrated circuits for low-power IoT edge devices may use one or more processors for implementing the different types of processing. Multiple processors are required if a single processor cannot handle the complete software workload. A further reason for using multiple processors is that specialized processors can be used for the different types of processing. More specifically, different processors can be used for control processing, DSP and machine learning.

      However, there are also good reasons to aim to reduce the number of processors. Lower cost is a key benefit, which is particularly relevant for low-cost IoT edge devices that are produced in high volumes. The use of fewer processors also reduces design complexity, as it simplifies the interconnect and memory subsystem required to integrate the processors. Furthermore, if multiple interacting functions are combined to be executed on a single processor, then this will limit data movements and reduce the software overhead for communication. An additional benefit for software developers is that a single tool chain can be used. To enable the flexible combination of functions, we need versatile processors that can efficiently execute different types of workloads, including control tasks, DSP and machine learning. Such processors are also referred to as DSP-enhanced RISC cores. They add a broad set of instructions for DSP and machine learning to a RISC core. If done well, the hardware overhead of these additions is small, for example, by sharing the register file and having unified functional units (e.g. a multiplier) for control processing, DSP and machine learning. Today, optimized DSP-enhanced RISC cores are available from IP vendors.

      1.2.2. Configurability and extensibility

       – Configurability: the processor IP is delivered as a parameterized processor that can be configured by the chip designer for the targeted application. More specifically, unnecessary features can be deconfigured and optimal parameters can be selected for various architectural features. This may involve optimization of the compute capabilities, memory organization, external interfaces, etc. For example, the chip designer may configure the memory subsystem with closely coupled memories and/or caches. Configurability allows performance to be optimized for the application at hand, while reducing area and power consumption.

       – Extensibility: the processor can be extended with custom instructions to enhance the performance for specific application functions. For the application at hand, the performance