1.4.2.4. Detecting the variance of the color difference
Shin et al. (2017) attempts to avoid the assumption that color channels are processed independently. Instead of working separately with each channel, as was done until then, they work on the difference between the green and red channels, as well as between the green and blue channels. This reflects more accurately the operations done by many demosaicing algorithms, which first interpolate the green channel before using the green channel’s information to interpolate the red and blue channels. They compute the variance of these differences in the four possible patterns on the two computed maps, and identify the correct pattern as the one featuring the highest variance, which is expected of the original pattern, whose pixels are all sampled instead of interpolated. Although the dependence of the color channels is hard-coded, the color difference is actually used in many current algorithms and represents a first step toward a full understanding of demosaicing artifacts.
1.4.2.5. Detection by neural networks of the relative position of blocks
More recently, Bammey et al. (2020) proposed to train a self-supervised convolutional neural network (CNN) to detect modulo-(2, 2) position of the blocks in the image. As CNNs are invariant to translation, they need to rely on image information to detect this position. Demosaicing artifacts, and to some extent JPEG artifacts, are the only relevant information a network can use to this end. As a result, training a network to detect this position will implicitly make it analyze demosaicing artifacts. This will thus lead to a local detection of the Bayer matrix’s position. Erroneous outputs of the network are caused by inconsistencies in the image’s mosaic, and can thus be seen as traces of forgery.
This method obtains better results than previous works, and can help further analyze the forgery as different kinds of forgeries will cause different artifacts. For instance, copy-move will cause a locally consistent shift in the network’s output, whereas inpainting – usually performed by cloning multiple small patches onto the target area – may show each cloned patch detected with a different pattern. Other manipulations, such as blurring, or the copy-move of an image that features no mosaic – for instance due to downsampling – may locally remove the mosaic, and the output of the network will thus be noise like in the forged region. It is possible to achieve even better results with internal learning, by retraining the network directly on images to study. This lets the network adapt to different post-processing, most importantly to JPEG compression.
However, this method is more computationally intense than the other presented algorithms, especially when internal learning is needed. This makes it less practical to use when many images are to be analyzed.
1.4.3. Limits of detection demosaicing
Recent methods proposed by Choi et al. (2011), Shin et al. (2017) or Bammey et al. (2020) are able to analyze the mosaic of images well enough for practical applications. It is now possible to detect, even locally, the position of the Bayer matrix. Detecting the presence of demosaicing artifacts is generally easy, even though their absence is not necessarily a sign of falsification because most modern demosaicing algorithms leave little to no artifacts on easy-to-interpolate regions. However, the range of images that can be detected remains limited. Demosaicing artifacts are 2-periodic, and they reside in the highest frequencies. As a result, they are entirely lost when the image is downsampled by a factor of at least 2. More generally, image resizing will also rescale the demosaicing artifacts; even though those might not always be lost, detection methods would need to be adapted to the new frequencies of the artifacts. JPEG compression is an even more important limitation. As compression mainly drops precision on the high-frequency components of an image, demosaicing artifacts are easily lost on compressed images. To date, even the best methods struggle to analyze CFA artifacts even at a relatively high compression quality factor of 95. Internal learning presented in Bammey et al. (2020) provides some degree of robustness to JPEG compression; however, demosaicing artifact detection remains limited to high-quality images, uncompressed or barely compressed, and at full resolution. This complements well the detection of JPEG compression, which we will now present.
1.5. JPEG compression, its traces and the detection of its alterations
In this section, we seek to determine the compression history of an image. We will focus on the JPEG algorithm, which is nowadays the most common method to store images. Most cameras use this format but others exist, such as HEIF, used in particular in Apple products since 2017. HEIF is also a lossy compression algorithm and therefore leaves traces; nevertheless, these traces are different from the ones produced by JPEG. As we will see, the analysis of the JPEG coding of an image makes it possible to detect local manipulations. For this, the methods take advantage of the structured loss of information caused by this step in the processing chain.
1.5.1. The JPEG compression algorithm
In JPEG encoding, the division of the image into 8 × 8 blocks and the application of a quantization step lead to the appearance of discontinuities at the edges of these blocks in the decompressed image.
Figure 1.9 shows the blocking effect that appears after JPEG compression. Contrast enhancement allows us to clearly see the 8 × 8 blocks. The greatest loss of information is during the quantization step, explored in more detail in section 1.2.4. The blocking effect is due to quantization, depending on the Q parameter, applied on all 8 × 8 size blocks. Therefore, standard JPEG compression leaves two characteristic traces: the division into 8×8 non-overlapping blocks and the quantization, according to a quantization matrix, of the DCT coefficients. In other words, the two features to be detected from the image are:
1) the origin of the 8×8 grids;
2) the values of the quantization matrix.
Figure 1.9. Close-ups on an image before and after compression. The contrast has been enhanced to observe the JPEG artifacts, in particular the blocking effect, allowing us to see the edges of the 8 × 8 blocks
In order to authenticate an image, the previous detections must verify that (1) the origin of the grid is aligned with the top left of the image; and (2) the quantization matrix calculated from the image is similar to the one in the header of the JPEG file. If the image is not in the JPEG format (providing the header file giving the quantization matrix), then estimating this information from the image itself is even more useful as an initial analysis.
Methods by Pevny and Fridrich (2008) make it possible to detect if an image has undergone a double compression, which creates an immediate argument against the authenticity of the image. Indeed, this would imply a duplicate in the processing chain of the image.
In the following sections, grid detection and quantization matrix estimation methods are illustrated. When no detection is made, the image may be classified as not having undergone JPEG compression.
1.5.2. Grid detection
In a JPEG-compressed image, the 8 × 8 blocks are created following a regular pattern starting at the pixel in the top left of the image and therefore coinciding with an original grid (0, 0).
The aim of the method is to find the stage of separation in 8 × 8 blocks of the JPEG algorithm. This leads to having the position of the grid by giving