Saturday, February 23, 2008

Purpose of Database Systems

Consider part of a savings bank enterprise that keeps information about all customers and savings accounts maintained at the bank. The savings account and customer records are kept in permanent system files. In addition to these files, the system has a number of application programs that allow one to manipulate the files, including :

- A program to debit or credit an account
- A program to add a new account
- A program to find the balance of an account
- A program to generate monthly statements

These application programs have been written by system programmers in response to the needs of the bank organization.

New application programs are added to the systems as the need arises. For example, suppose that new government regulation allow the savings bank to offer checking accounts. As a result, new permanent files are created that contain information about all the checking accounts maintained in the bank, and new application programs may need to be written. Thus, as time goes by, more files and more application programs are added to the systems. Since these files and programs have been created over a long period of time, presumably by different programmers, the files are likely to have different formats and the programs may be written in several programming languages.

The environment described above is a typical file-processing system, which is supported by a conventional operating system. Permanent records are stored in various files, and a number of different application programs are written to extract records from and add records to the appropiate files.
This scheme has a number of major disadvantages :

Data redundancy and inconsistency. Since the files and application programs are created by different programmers over a long period of time, the same piece of information may be duplicated in several places (files). For example, the address and phone number of a particular customer may appear in a file that consist of savings account records, and in a file that consist of checking account records. This redundancy leads to higher storage and access cost as well as potential data inconsistency. By data inconsistency, we mean that various copies of the same data no longer agree. For example, if the address of a customer changes, and the change has been reflected in only savings account records, then data inconsistency result.

Difficulty in accessing data. Suppose that one of the officers in the bank needs to find out the names of all the customers who live in the area of the city with zip code 59318. The officers calls the data processing department and asks them to generate such a list. As this is an unusual request that was not anticipated when the original system was designed, there is no application program on hand to generate the list of all customers. The bank officer has now two choices. Either to get the list of customers and ask one of the secretaries to extract manually the needed information, or to ask the data-processing department to have one of the system programmers write such an application program. Both alternatives are obviously unsatisfactory. Suppose that such a program is actually written, and that several days later, the same officer needs to trim that list to include only those customers that have an account balance of $10,00 or more. As expected, a program to generate such a list does not exist. Again, the officer has the previous two options, neither of which is satisfactory.

What we are pointing out is that this environment does not allow one to retrieve needed data in a convenient and efficient manner. Better data retrieval systems must be developed for general use.

Data isolation. Since data is scattered in various files, and files may be in different formats, it is difficult to write new application programs to retrieve the appropriate data.

Multiple users. In order to improve the overall performance of the system and obtain a faster response time, many system allow multiple users to update the data simultaneously. In such an environment , interaction of concurrent updates may result in inconsistent data. For example, consider bank account A with $100. If two customers withdraw funds (say $50 and $100 respectively) from account A at about the same time, the result of the concurrent executions may leave the account may contain either $450, or $400, rather than $350. In order to guard against this possibility, some form of supervision must be maintained in the system. Since data may be accessed by many different application programs which have not been previously coordinated, such a supervisor is very difficult to obtain.

Security problems. Not every user of the database system should be able to access all the data. For example, in a banking system, a person who prepares the payroll checks can only see that part of the database that has information about the various bank employees. He or she cannot access only account information. They cannot access information concerning salaries of employees. Since application programs are added to the system in an ad hoc manner, it is difficult to enforce such security constrains.

Integrity problems. The data values stored in the database must satisfy certain types of consistency constraints. For example, the balance of a bank account may never fall below a prespecified amount (for example, $25). These constraints must be enforced in the system. This enforcement can be carried out by adding appropriate code in the various application programs. However, when new constraints are added, it is difficult to change the programs to enforce them. This is compounded in the case where constraints involve several data items from different files.

No comments: