DISTRIBUTED SYSTEM
CHAPTER 10 : CASE STUDY
LAB WORK SOLUTION- DISTRIBUTED SYSTEM
DISTRIBUTED SYSTEM -BCA -ALL SLIDES
MCQ- DISTRIBUTED SYSTEM

DFS(DISTRIBUTED FILE SYSTEM)

 

A Distributed File System (DFS) is a file system that allows access to files from multiple hosts sharing via a network. This allows multiple users on multiple machines to share files and storage resources. By distributing file storage and access across many systems, a DFS aims to provide redundancy, reliability, and improved access times compared to a centralized file system.

 

Characteristics of Distributed File Systems

  1. Transparency: Users and applications should see the distributed file system as if it were a local file system, without needing to know the details of where files are physically stored.
    • Location Transparency: Users don’t need to know the physical location of the files.
    • Access Transparency: The way files are accessed is consistent, regardless of where they are located.
  2. Scalability: The system should scale well as the number of users and files increases. This involves handling large volumes of data efficiently and expanding storage capacity seamlessly.
  3. Reliability and Availability: The system should ensure data reliability and availability even in the face of hardware or network failures.
    • Replication: Storing copies of files on multiple machines to ensure availability even if one machine fails.
    • Fault Tolerance: Ability to recover from failures without data loss.
  4. Performance: Efficient file access and management, including fast read/write operations and low latency.
  5. Security: Ensuring that data is protected against unauthorized access and breaches.
    • Authentication and Authorization: Ensuring that only authorized users can access or modify the files.
    • Encryption: Protecting data in transit and at rest.

Examples of Distributed File Systems

  1. Network File System (NFS):
    • Developed by Sun Microsystems.
    • Allows a user on a client computer to access files over a network in a manner similar to how local storage is accessed.
  2. Andrew File System (AFS):
    • Developed at Carnegie Mellon University.
    • Emphasizes scalability and supports large distributed environments.
  3. Google File System (GFS):
    • Designed for Google’s internal data storage needs.
    • Optimized for large-scale data processing and fault tolerance.
  4. Hadoop Distributed File System (HDFS):
    • Part of the Apache Hadoop project.
    • Designed to run on commodity hardware and handle large datasets across many machines.
    • Provides high throughput access to application data and is highly fault-tolerant.

Working  of Distributed File Systems

First, a DFS distributes datasets across multiple clusters or nodes. Each node provides its own computing power, which enables a DFS to process the datasets in parallel.A DFS will also replicate datasets onto different clusters by copying the same pieces of information into multiple clusters. This helps the distributed file system to achieve fault tolerance to recover the data in case of a node or cluster failure as well as high concurrency, which enables the same piece of data to be processed at the same time.

Clients access data on a DFS using namespaces. Organizations can group shared folders into logical namespaces.  A DFS namespace is a virtual shared folder that contains shared folders from multiple serversThese present files to users as one shared folder with multiple subfolders.  When a user requests a file, the DFS brings up the first available copy of the file.

 

DFS Operations

  1. File Read/Write: Basic operations for reading and writing data to files.
  2. File Metadata Operations: Operations to manage file metadata, such as creation, deletion, renaming, and permission setting.
  3. File Replication: Ensuring data redundancy by creating and managing copies of files.
  4. Failure Recovery: Mechanisms to recover from hardware or software failures without data loss.

Challenges in Distributed File Systems

  1. Consistency: Ensuring that all clients see a consistent view of the file system, especially when updates occur.
    • Eventual Consistency: Updates propagate to all nodes eventually, but not immediately.
    • Strong Consistency: Ensures immediate consistency across all nodes.
  2. Network Latency and Partitioning: Dealing with the inherent latency in network communication and handling situations where parts of the network become unreachable.
  3. Scalability: Maintaining performance and reliability as the system grows.
  4. Security: Protecting data from unauthorized access and ensuring secure communication.
  5. Load Balancing: Distributing workload evenly across servers to prevent any single server from becoming a bottleneck.