DATABASE MANAGEMENT SYSTEM

DISTRIBUTED DATABASE MODEL

 

A distributed database is a type of database where data is stored across multiple physical locations. These locations can be within the same network or across different geographical areas. The distributed database model provides several advantages, including improved availability, scalability, and reliability. 

Properties of Distributed Database Model:

1. Data Distribution

Data can be distributed across multiple locations using different strategies:

  • Horizontal Partitioning (Sharding): Data is divided into rows, and each shard contains a subset of rows. Each shard is stored on a different database server.
  • Vertical Partitioning: Data is divided into columns, and each partition contains a subset of columns. Different columns are stored on different servers.
  • Replication: Copies of the same data are stored on multiple servers. This improves availability and fault tolerance.

2. Transparency

Distributed databases aim to provide transparency to users in several ways:

  • Location Transparency: Users do not need to know the location of the data.
  • Replication Transparency: Users are unaware of the replication of data.
  • Fragmentation Transparency: Users do not need to know how data is fragmented across the network.

3. Distributed Database Architecture

Distributed databases can be organized in various architectures:

  • Homogeneous Distributed Database: All sites use the same DBMS software and are aware of each other. They cooperate to process user requests.
  • Heterogeneous Distributed Database: Different sites may use different DBMS software, schemas, or data models. Special software is required to enable communication and coordination between different systems.

4. Distributed Transactions

Distributed transactions span multiple databases and need to ensure ACID (Atomicity, Consistency, Isolation, Durability) properties. They involve:

  • Two-Phase Commit Protocol (2PC): A protocol to ensure all nodes in a distributed system agree on the commit or rollback of a transaction.
  • Concurrency Control: Mechanisms to handle concurrent access to the database, ensuring data consistency.

6. Scalability and Fault Tolerance

  • Scalability: Distributed databases can scale horizontally by adding more nodes to the network, handling more data and more users.
  • Fault Tolerance: Redundancy and replication across multiple nodes improve the system’s fault tolerance. If one node fails, data can still be accessed from another node.
  • erations that are causally related are seen by all nodes in the same order.

Advantages of Distributed Databases:

  • Improved Availability and Reliability: Data replication ensures that the system remains operational even if some nodes fail.
  • Scalability: The system can handle growth by adding more nodes.
  • Geographical Distribution: Data can be stored closer to where it is needed, reducing access latency.
  • Flexibility: Allows integration of different types of databases and data models.

Disadvantages of Distributed Databases:

  • Complexity: More complex to design, implement, and manage compared to centralized databases.
  • Consistency Challenges: Ensuring data consistency across multiple nodes can be difficult.
  • Performance Overhead: Coordination and communication between nodes can introduce performance overhead.
  • Security Risks: Data distributed across multiple locations can be more vulnerable to security breaches.

Examples of Distributed Databases:

  • Google Spanner
  • Amazon Aurora
  • Apache Cassandra
  • CockroachDB
  • MongoDB (in a sharded configuration)