Centralized and Distributed Database Systems

1. Centralized Database

The centralized database is that type of database, in which the storage devices are attached to a common CPU. Mostly all the programs and data are stored in a common CPU. All the attached systems to a central database are dependent on the central storage for the data and programs.

Advantages of centralized database:

Central database is easy to maintain
Consistency and updated database can be maintained easily
Central control and restriction on data can be maintained easily
There is no complexity in the structure compared to distributed database system

2. Distributed Database

A distributed database is a database in which storage devices are not all attached to a common CPU. Instead, the storage devices may be located in multiple computers in the same physical location or dispersed over a network of interconnected computers.

Collections of data in the form of databases can be distributed across multiple physical locations. A distributed database can reside on network servers on the Internet, corporate intranets or extranets, or on other company networks.

The replication and distribution of databases improve database performance at end-user worksites.

To ensure that distributed databases are up to date and current, there are two processes:

(a) Replication

Replication is an important process in distributed databases.

Specialized software is used to detect changes in distributed databases
Once changes are identified, replication ensures all databases look the same
Can be complex and time-consuming depending on:
- Size of databases
- Number of distributed locations
Requires significant time and computational resources

(b) Duplication

Duplication is a process in which:

One database is identified as a master
Other databases are created as copies

Key points:

Usually done at fixed intervals (e.g., after hours)
Ensures each distributed location has the same data
Only the master database allows updates
Prevents overwriting of local data

👉 Both replication and duplication help maintain data consistency across distributed systems

3. Basic Architecture

A database user accesses the distributed database through:

Local applications
Global applications

(a) Local Applications

Do not require data from other sites
All required data resides in the current site only

(b) Global Applications

Require data from multiple sites
Data resides in:
- Current site
- Other connected sites

4. Important Considerations

For distributed databases, the following conditions must be ensured:

(a) Distribution is Transparent

Users should interact with the system as if it were a single logical system
Applies to:
- Performance
- Data access methods

(b) Transactions are Transparent

Each transaction must maintain database integrity across multiple databases
Transactions are divided into sub-transactions, each affecting one database system

5. Advantages of Distributed Database

The advantages of distributed database can be summarized in the following points :

Distributed database can manage the distributed data with different levels of transparency like fragmentation transparency, replication transparency etc.

• It increases reliability and availability of data.

• It is easier for expansion of database.

• Distributed database reflects organizational structure. Database fragments are located in the departments they relate to.

• Distributed database makes possible the protection of valuable data - if there were ever a catastrophic event such as a fire, all of the data would not be in one place, but distributed in multiple locations.

• Distributed database ultimately proved economical. In fact it costs less to create a network of smaller computers with the power of a single large computer.

• There is modularity feature in distributed databases, it means systems can be modified, added and removed from the distributed database without affecting other modules (systems).

6. Disadvantages of Distributed Database

Extra work is to be performed by the database administrator. It has to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work must also be done to account for the disconnected nature of the database.

• In distributed database, there is more complexity. There is more extensive infrastructure, so more labour cost is involved therein.

• Remote database fragments must be secured and they are not centralized, so the remote sites must be secured as well. The infrastructure must also be secured.

• There is difficulty to maintain integrity, as in a distributed database, enforcing integrity over a network may require too much of the network’s resources to be feasible.

• There is no defined standard for distributed database. There are no tools or methodologies yet to help users convert a centralized DBMS into a distributed DBMS.

Distributed database design is very complex besides of the normal difficulties, the design of distributed database has to consider fragmentation of data, allocation fragments to specific sites and data replication.

• It requires additional software and operating system to support distributed environment.

• Concurrency control: it is a major issue in distributed database. Which requires locking and timestamping for solution.

Concurrency and Recovery

DDBS design of concurrency and recovery, has to consider different aspects other than of those of centralized DBS. These aspects include—

1. Multiple Copies of Data

Concurrency has to maintain the data copies consistent. Recovery on the other hand has to make a copy consistent with others whenever a site recovers from failure.

Failure of communication links

• Failure of individual sites

• Distributed commit: During transaction commit some sites may fail, so the two phase commit is used to solve this problem.

• Deadlocks on multiple sites.

The following two sections describe two suggestions to manage concurrency control :

2. Distinguished Copy of a Data Item

There are three variations to this method.

(a) Primary Site Technique

Primary site with backup site and primary copy technique. These techniques are described as follows :

(i) Primary Site

In this method, a single site is designated as the coordinator site. All locks and unlocks for all data units are controlled by this site. One advantage is, easy to implement. However, two downsides of this method are: overloading of the coordinator site and this site forms a single point of failure for the entire DDBS.

(ii) Primary Site with Backup Site

This technique addresses the second disadvantage in the 1st technique (primary site) by designating a backup site, that can take over as the new coordinator in case of failure, in which case, an other backup site has to be selected.

(iii) Primary copy Technique

This method distribute the load to the sites that have a designated primary copy of a data unit as opposed to centralizing the entire data units in one coordinator site. This way if a site goes down, only transactions involving the primary copies residing on that site will be effected.

(b) Voting

This method does not designate any distinguished copy or site to be the coordinator as suggested in the 1st two methods described above. When a site attempts to lock a data unit, requests to all sites having the desired copy, must be sent asking to lock this copy. If the requesting transaction did not was not granted the lock by the majority voting from the sites, then the transaction fails and sends cancellation to all. Otherwise it keeps the lock and informs all sites that it has been granted the lock.

(c) Recovery

The recovery process involves to identify that there was a failure, what type was it and at which site did that happen. Dealing with distributed recovery requires aspects include: database logs and update protocols, transaction failure recovery protocol etc.

3. Concurrency and Recovery Algorithm

A concurrency control algorithm ensures that transactions execute atomically. It does this by controlling the interleaving of concurrent transactions, to give the illusion that transactions execute serially, one after the next, with no interleaving at all. Interleaved executions whose effects are the same as serial executions are called serializable. Serializable executions are correct because they support this illusion of transaction atomicity.

A recovery algorithm monitors and controls the execution of programs so that the database includes only the results of transactions that run to a normal completion. If a failure occurs while a transaction is executing and the transaction is unable to finish executing, then the recovery algorithm must wipe out the effects of the partially completed transaction. That is, it must ensure that the database does not reflect the results of such transactions. Moreover, it must ensure that the results of transactions that do execute are never lost.

1. Centralized Database​

Advantages of centralized database:​

2. Distributed Database​

(a) Replication​

(b) Duplication​

3. Basic Architecture

(a) Local Applications​

(b) Global Applications​

4. Important Considerations

(a) Distribution is Transparent​

(b) Transactions are Transparent​

5. Advantages of Distributed Database​

6. Disadvantages of Distributed Database​

Concurrency and Recovery​

1. Multiple Copies of Data​

2. Distinguished Copy of a Data Item​

(a) Primary Site Technique​

3. Concurrency and Recovery Algorithm​