DISTRIBUTED SYSTEM
CHAPTER 10 : CASE STUDY
LAB WORK SOLUTION- DISTRIBUTED SYSTEM
DISTRIBUTED SYSTEM -BCA -ALL SLIDES
MCQ- DISTRIBUTED SYSTEM

Checkpointing

Checkpointing involves periodically saving the state of an application to stable storage. This saved state, or checkpoint, can be used to restart the application from the checkpointed state instead of from the beginning in case of a failure.

Coordinated Checkpointing

Coordinated checkpointing involves synchronizing checkpoints across all processes in a distributed system to ensure a globally consistent state. This approach avoids inconsistencies that can arise from uncoordinated checkpointing, where each process takes checkpoints independently.

Process of Coordinated Checkpointing

  1. Initiation:

    • A coordinator process initiates the checkpointing process. This could be a dedicated coordinator or any process that assumes this role.
  2. Checkpoint Request:

    • The coordinator sends a checkpoint request to all participating processes.
  3. Local Checkpoints:

    • Each process stops its execution temporarily and takes a local checkpoint. This involves saving its current state (e.g., variable values, execution point, etc.) to stable storage.
  4. Acknowledgment:

    • After taking the local checkpoint, each process sends an acknowledgment to the coordinator.
  5. Global Consistency:

    • Once the coordinator receives acknowledgments from all processes, it confirms that a globally consistent checkpoint has been taken. Processes can then resume normal execution.

Techniques for Coordinated Checkpointing

  1. Blocking Coordinated Checkpointing:

    • Processes stop their execution during checkpointing to ensure consistency.
    • Simple to implement but can lead to increased latency due to the blocking nature.
  2. Non-Blocking Coordinated Checkpointing:

    • Processes continue to execute while checkpointing, with mechanisms to ensure consistency.
    • More complex to implement but reduces latency and performance impact.

Applications of Coordinated Checkpointing

  1. Distributed Databases:
    • Ensures consistency across database replicas by synchronizing state during checkpoints.
  2. Distributed Computing Frameworks:
    • Frameworks like Apache Flink and Hadoop use coordinated checkpointing to ensure fault tolerance in distributed computations.
  3. Cloud Services:
    • Cloud platforms use coordinated checkpointing for maintaining consistency in distributed services and virtual machines.