List of Figures
1.1 An example of a chain of threats with two levels of recursion.
1.2 The rollback recovery is enabled by periodically taking checkpoints and usually logging of the requests received.
1.3 With redundant instances in the system, the failure of a replica in some cases can be masked and the system continue providing services to its clients without any disruption.
1.4 Main types of assets in a distributed system.
2.1 An example distributed system.
2.2 Consistent and inconsistent global state examples.
2.3 An example of the domino effect in recovery with uncoordinated checkpointing.
2.4 Finite state machine specification for the coordinator in the Tamir and Sequin checkpointing protocol.
2.5 Finite state machine specification for the participant in the Tamir and Sequin checkpointing protocol.
2.6 Normal operation of the Tamir and Sequin checkpointing protocol in an example three-process distributed system.
2.7 Finite state machine specification for the Chandy and Lamport distributed snapshot protocol.
2.8 Normal operation of the Chandy and Lamport global snapshot protocol in an example three-process distributed system.
2.9 A comparison of the channel state definition between (a) the Chandy and Lamport distributed snapshot protocol and (b) the Tamir and Sequin global checkpointing protocol.
2.10 Example state intervals.
2.11 An example for pessimistic logging.
2.12 Transport level (a) and application level (b) reliable messaging.
2.13 Optimization of pessimistic logging: (a) concurrent message logging and execution (b) logging batched messages.
2.14 Probability density function of the logging latency.
2.15 A summary of the mean logging latency and mean end-to-end latency under various conditions.
2.16 Probability density function of the end-to-end latency.
2.17 Normal operation of the sender-based logging protocol.
2.18 An example normal operation of the sender-based logging protocol.
2.19 Two concurrent failures could result in the loss of determinant information for regular messages.
3.1 The three-tier architecture.
3.2 The Java EE architecture.
3.3 An example runtime path of an end-user request.
3.4 Component class and component instances.
3.5 The chi-square cumulative distribution function for degree of freedom of 1, 2, 3, 4, 5.
3.6 The path shape of the example runtime path shown in Figure 3.3.
3.7 Component class and component instances.
3.8 Dependency templates for nodes, processes, network paths, and the neighbor sets.
3.9 A partial dependency graph for an example system.
3.10 The error function.
3.11 A hypothetical dependency graph with abnormality for each component and the weight for each edge labeled.
3.12 The components that form a cycle in the f-map are reduced to a single unit in the r-map for recursive recovery.
3.13 The architecture of an Operator Undo framework.
4.1 The replication algorithm is typically implemented in a fault tolerance middleware framework.
4.2 Active replication, without (top) and with (bottom) voting at the client.
4.3 Passive replication.
4.4 Semi-active replication.
4.5 A write-all algorithm for data replication.
4.6 The problem of the write-all-available algorithm for data replication.
4.7 Preventing a transaction from accessing a not-fully-recovered replica is not sufficient to ensure one-copy serializable execution of transactions.
4.8 An example run of the quorum consensus algorithm on a single data item.
4.9 Basic steps for optimistic data replication for an operation-transfer system.
4.10 An example run of a system with three sites that uses Lamport clocks.
4.11 An example run of a system with three sites that uses vector clocks.
4.12 An example for the determination of the new version vector value after reconciling a conflict.
4.13 An example operation propagation using vector clocks in a system with three replicas.
4.14 An example for operation propagation using timestamp matrices in a system with three replicas.
4.15 Update commit using ack vectors in a system with three replicas.
4.16 Update commit using timestamp matrices in a system with three replicas.
4.17 An illustration of the CAP theorem.
4.18 Partition mode and partition recovery.
5.1 Examples of systems that ensure uniform total ordering and nonuniform total ordering.
5.2 In the sequencer based approach, a general system is structured into a combination of two subsystems, one with a single receiver