DATABASE MANAGEMENT SYSTEM

PARALLEL AND DISTRIBUTED DATABASE

Parallel and distributed databases are two types of database systems designed to handle large volumes of data and provide efficient processing. While they share some similarities, they have distinct properties that cater to different needs and use cases.

Parallel Databases

Parallel databases use multiple processors and storage devices within a single system to perform concurrent data processing tasks. The goal is to increase performance through parallelism.

Properties of Parallel Databases:

  1. Data Partitioning:
    • Horizontal Partitioning: Distributes rows of a table across multiple disks.
    • Vertical Partitioning: Distributes columns of a table across multiple disks.
    • Hybrid Partitioning: Combines both horizontal and vertical partitioning.
  2. Parallel Query Processing:
    • Inter-Query Parallelism: Multiple queries are executed simultaneously on different processors.
    • Intra-Query Parallelism: A single query is broken down into sub-queries, which are executed in parallel on different processors.
  3. Load Balancing:
    • Ensures that the workload is evenly distributed across all processors to prevent bottlenecks and maximize resource utilization.
  4. Fault Tolerance:
    • Incorporates mechanisms for detecting and recovering from hardware or software failures to ensure continuous operation.
  5. Scalability:
    • The system can handle increasing amounts of data and users by adding more processors and storage devices.
  6. Synchronization and Coordination:
    • Requires sophisticated algorithms to manage data consistency and synchronization between processors, especially for write operations.

Distributed Databases

Distributed databases consist of multiple interconnected databases spread across different locations, often geographically dispersed. Each node in the system operates independently and cooperatively.

Properties of Distributed Database:

  1. Data Distribution:
    • Fragmentation: Divides the database into smaller pieces, called fragments, which are distributed across different nodes.
      • Horizontal Fragmentation: Distributes rows of a table across multiple locations.
      • Vertical Fragmentation: Distributes columns of a table across multiple locations.
    • Replication: Copies of data are maintained at multiple sites to ensure availability and fault tolerance.
  2. Transparency:
    • Provides a unified view of the data to users, hiding the complexities of the distributed system. Types of transparency include:
      • Location Transparency: Users do not need to know the physical location of the data.
      • Replication Transparency: Users are unaware of data replication across multiple sites.
      • Fragmentation Transparency: Users do not need to know how data is fragmented.
  3. Concurrency Control:
    • Ensures that multiple transactions can occur simultaneously without leading to inconsistencies. Techniques include:
      • Two-Phase Commit (2PC): Ensures all nodes agree on a transaction before it is committed.
      • Distributed Locking: Manages access to data across multiple nodes to prevent conflicts.
  4. Data Consistency:
    • Maintains data integrity and consistency across all nodes. Strategies include:
      • Strong Consistency: Ensures immediate consistency across all nodes (more complex and costly).
      • Eventual Consistency: Ensures that data will become consistent over time (more scalable).
  5. Scalability:
    • Can handle growing amounts of data and users by adding more nodes to the system.
  6. Fault Tolerance and High Availability:
    • Redundant data storage and distributed control mechanisms ensure that the system can withstand failures at individual nodes without significant downtime.
  7. Autonomy:
    • Each node operates independently and can manage its own data, leading to decentralized control and administration.

Comparison of Parallel Database and Distributed Database

While both parallel and distributed databases aim to improve performance and handle large datasets, they do so in different ways:

  • Parallel Databases focus on dividing tasks within a single system to exploit parallelism and increase processing speed.
  • Distributed Databases focus on distributing data across multiple systems to ensure availability, fault tolerance, and scalability, often in geographically dispersed environments.