1.1.3. Previous work
The literature in remote sensing data fusion is extensive, indicating intense interest in this topic, as highlighted by the recent sharp increase in the number of papers published in the major remote sensing journals, and the increasing number of related sessions in international conferences. Indeed, data fusion has given rise to a continuing tradition in remote sensing, since EO is by definition dynamic (thus implying the multitemporal capability of remote sensing instruments), multiresolution (multiple spatial and spectral resolutions) and related to different physical quantities (thus requiring multiview/multisensor capability) (Waltz and Llinas 1990).
Data fusion is defined differently depending on the final goal of the user. Indeed, (Li et al. 1995; Pohl and van Genderen 2014) considered data fusion in remote sensing as the combination of two or more algorithms. This may include, but is not restricted to multiresolution fusion and pansharpening techniques, whose aim is to obtain multispectral images of increased spatial resolution (Vivone et al. 2015), resolution blending that consists of providing time series of data at their maximum spatial and spectral resolutions (referred to as parallel pansharpening in the multitemporal domain) (Huang and Song 2012), and data fusion for missing information reconstruction, by using complementary data (Wang and Liang 2014).
An alternative perspective is to define data fusion in remote sensing as a decision fusion process that combines the information that is obtained from different data sets and provides sufficient generalization capability (Wald 1999). According to this definition, any type of image processing that combines two or more data sets, for example, for land cover classification, atmospheric correction or application of vegetation indices, could be considered as data fusion.
Within the former definition, various families of data fusion techniques have been proposed in the literature. On the one hand, these methods may generally differ in their application requirements, such as the availability of ground reference data, the collected prior information and/or some ancillary data that can be used in the development of the system according to a multisource processing architecture. On the other hand, it is important to properly understand the user needs with respect to economic costs and processing time and performance. Figure 1.3 summarizes the general architecture of a data fusion technique.
Figure 1.3. General data fusion architecture. Images 1 through N have generally been acquired by distinct sensors, at different spatial resolutions and/or with different radar frequencies and spectral bands
As discussed previously, the availability of remote sensing imagery at varying resolutions has increased. Merging images of different spatial resolutions has become a significant operation in the field of digital remote sensing. A variety of different multiscale fusion approaches have been developed since the late 1980s. In the following, we give an overview of the most common approaches found in the literature. We can broadly divide them into two groups: (i) transformation techniques and (ii) modeling techniques.
Methods in (i) consist of replacing the entire set of multiscale images by a single composite representation that incorporates all relevant data. The multiscale transformations usually employ pyramid transforms (Burt 1984), the discrete wavelet transform (Piella 2003; Forster et al. 2004; Zhang and Hong 2005), the undecimated wavelet transform (Rockinger 1996; Chibani and Houacine 2003), the dual-tree complex wavelet transform (Demirel and Anbarjafari 2010; Iqbal et al. 2013; Zhang and Kingsbury 2015; Nelson et al. 2018), the curvelet transform (Choi et al. 2005; Nencini et al. 2007), the contourlet transform (ALEjaily et al. 2008; Shah et al. 2008) and the nonsubsampled contourlet transform (Yang et al. 2007).
Techniques in (ii) include multiscale approaches with a focus on the use of the coarser resolutions in the data set, in order to obtain fast computational algorithms. In the seminal papers (Basseville et al. 1992a, 1992b), the basis for multiscale autoregressive modeling in dyadic trees was introduced. Since then, straightforward approaches were performed to deal with multiresolution images using trees (Pérez 1993; Chardin 2000; Laferté et al. 2000; Kato and Zerubia 2012; Voisin 2012; Hedhli et al. 2014). A detailed review of some of these methods can be found in Graffigne et al. (1995) and Willsky (2002).
In broader terms, multisensor analysis encompasses all processes dealing with data and information from multiple sensors to achieve refined/improved information, compared to the result that could be obtained by using data from only one individual source (Waltz and Llinas 1990; Pohl and van Genderen 1998; Hall and Llinas 2001). The accuracy of the classification of remote sensing images, for instance, is generally improved when multiple source image data are introduced in the processing chain in a suitable manner (e.g. (Dousset and Gourmelon 2003; Nguyen et al. 2011; Gamba et al. 2011; Hedhli et al. 2015)). As mentioned above, images from microwave and optical sensors provide complementary information that helps in discriminating the different classes. Several procedures have been introduced in the literature including, on the one hand, post-classification techniques in which, first, the two data sets are separately segmented, and then the joint classification is produced by using, for example, random forest (e.g. Waske and van der Linden 2008), support vector machines with ad hoc kernels (Muñoz-Marí et al. 2010) and artificial neural networks (Mas and Flores 2008). On the other hand, other methods directly classify the combined multisensor data by using, for instance, statistical mixture models (e.g. (Dousset and Gourmelon 2003; Voisin et al. 2012; Prendes 2015)), entropy-based techniques (e.g. Roberts et al. 2008) and fuzzy analysis (e.g. Benz 1999; Stroppiana et al. 2015). Furthermore, for complex data, especially when dealing with urban areas, radar images can contribute to the differentiation between different land covers, owing to the differences in surface roughness, shape, and moisture content of the observed ground surface (e.g. Brunner et al. 2010). The use of multisensor data in image classification has become increasingly popular with the increased availability of sophisticated software and hardware facilities to handle the increasing volumes of data. The decision on which of these techniques is the most suitable is very much driven by the applications and the typology of input remote sensing data.
Recently, with the exposure of neural networks, several multisensor data fusion techniques have been proposed based on feed-forward multilayer perceptron and convolutional neural network (CNN) architectures. Indeed, the huge amount of data makes the use of deep neural network (DNN) models possible. Many effective multi-task approaches have been developed recently to train DNN models on some large-scale remote sensing benchmarks (e.g. Chen et al. 2017; Carvalho et al. 2019; Cheng et al. 2020). The aim of these multi-task methods is to learn an embedding space from different sensors (i.e. task). This could be done by first learning the embedding of each modality separately and then combining all of the learned features as a joint representation. Then, this representation is used as an input for the last layers of different high level visual applications, for example, remote sensing classification, monitoring or change detection. Alternatively, DNN models could be used as an heterogeneous data fusion framework, learning the related parameters from all of the input sources (e.g. Ghamisi et al. 2016; Benedetti et al. 2018; Minh et al.