2.2. Examples of discrete sources
2.2.1. Simple source (memoryless)
This is a source for which the probability of the occurrence of a symbol does not depend on the previous symbols:
[2.1]
[2.2]
2.2.2. Discrete source with memory
It is a source for which the probability of occurrence of a symbol depends on the symbols already emitted (statistical dependence) and on instants 1, 2, ..., n where they have been emitted:
[2.3]
If the source is in addition stationary then the dependence is only on the ranks and not on the moments when the symbols have been transmitted, so:
[2.4]
that is, the statistical properties are independent of any temporal translation. We can consider that a text written in any language is an example of such a source.
2.2.3. Ergodic source: stationary source with finite memory
The ergodicity of an information source implies that a temporal average calculated over an infinite duration is equal to the statistical average.
2.2.4. First order Markovian source (first order Markov chain)
First order Markov sources play an important role in several domains, for example, in (Caragata et al. 2015), the authors used such a source to model the cryptanalysis of a digital watermarking algorithm.
It is characterized by:
[2.5]
with:
The probability
[2.6]
The probability that at time n the source is in the state k is:
By introducing matrix notations:
Taking into account the relationship [2.8], the relation [2.7] is also written as:
where Tt is the transposed matrix of transition probabilities.
Moreover, if the source is, stationary, then:
in other words, the probabilities of the occurrence of the symbols do not depend on n. It is the same for the transition probabilities pl,k, then the relation [2.9] is written:
[2.10]
where P0 is the matrix of probabilities governing the generation of symbols by the source at the initial instant n = 0.
2.3. Uncertainty, amount of information and entropy (Shannon’s 1948 theorem)
The realization of an event x of probability p(x) conveys a quantity of information h(x) related to its uncertainty. h(x) is an increasing function of its improbability 1/p(x):
If an event x is certain, then p(x) = 1, the uncertainty is zero and therefore the quantity of information h(x) brought by its realization is null.
Moreover, let us consider the realization of a pair of independent events x and y. Their joint realization leads to the amount of information it brings being the sum of the quantities of information brought by each of them:
but ft(x,y) is also written:
so, from the relationships [2.11] and [2.12], the form of the function f is of logarithmic type, the base a of the logarithmic function can be any
[2.13]
It is agreed by convention that we select, as the unit of measure of information, the information obtained by the random selection of a single event out of two equally probable events pi = pj = 1/2. In this case, we can write:
If we choose the logarithm in base 2, λ becomes equal to unity and therefore h(xi) = h(xj) = log2(2) = 1 Shannon (Sh) or 1 bit of information, not to be confused with the digital bit (binary digit) which represents one of the binary digits 0 or 1.
Finally, we can then write:
[2.14]
It is sometimes convenient to work with logarithms in base e or with logarithms in base 10. In these cases, the units will be:
loge e = 1 natural unit = 1 nat (we choose 1 among e)
log10 10 = 1 decimal unit = dit (we choose 1 among 10)
Knowing that: