Saturday, February 23, 2008

Purpose of Database Systems

Consider part of a savings bank enterprise that keeps information about all customers and savings accounts maintained at the bank. The savings account and customer records are kept in permanent system files. In addition to these files, the system has a number of application programs that allow one to manipulate the files, including :

- A program to debit or credit an account
- A program to add a new account
- A program to find the balance of an account
- A program to generate monthly statements

These application programs have been written by system programmers in response to the needs of the bank organization.

New application programs are added to the systems as the need arises. For example, suppose that new government regulation allow the savings bank to offer checking accounts. As a result, new permanent files are created that contain information about all the checking accounts maintained in the bank, and new application programs may need to be written. Thus, as time goes by, more files and more application programs are added to the systems. Since these files and programs have been created over a long period of time, presumably by different programmers, the files are likely to have different formats and the programs may be written in several programming languages.

The environment described above is a typical file-processing system, which is supported by a conventional operating system. Permanent records are stored in various files, and a number of different application programs are written to extract records from and add records to the appropiate files.
This scheme has a number of major disadvantages :

Data redundancy and inconsistency. Since the files and application programs are created by different programmers over a long period of time, the same piece of information may be duplicated in several places (files). For example, the address and phone number of a particular customer may appear in a file that consist of savings account records, and in a file that consist of checking account records. This redundancy leads to higher storage and access cost as well as potential data inconsistency. By data inconsistency, we mean that various copies of the same data no longer agree. For example, if the address of a customer changes, and the change has been reflected in only savings account records, then data inconsistency result.

Difficulty in accessing data. Suppose that one of the officers in the bank needs to find out the names of all the customers who live in the area of the city with zip code 59318. The officers calls the data processing department and asks them to generate such a list. As this is an unusual request that was not anticipated when the original system was designed, there is no application program on hand to generate the list of all customers. The bank officer has now two choices. Either to get the list of customers and ask one of the secretaries to extract manually the needed information, or to ask the data-processing department to have one of the system programmers write such an application program. Both alternatives are obviously unsatisfactory. Suppose that such a program is actually written, and that several days later, the same officer needs to trim that list to include only those customers that have an account balance of $10,00 or more. As expected, a program to generate such a list does not exist. Again, the officer has the previous two options, neither of which is satisfactory.

What we are pointing out is that this environment does not allow one to retrieve needed data in a convenient and efficient manner. Better data retrieval systems must be developed for general use.

Data isolation. Since data is scattered in various files, and files may be in different formats, it is difficult to write new application programs to retrieve the appropriate data.

Multiple users. In order to improve the overall performance of the system and obtain a faster response time, many system allow multiple users to update the data simultaneously. In such an environment , interaction of concurrent updates may result in inconsistent data. For example, consider bank account A with $100. If two customers withdraw funds (say $50 and $100 respectively) from account A at about the same time, the result of the concurrent executions may leave the account may contain either $450, or $400, rather than $350. In order to guard against this possibility, some form of supervision must be maintained in the system. Since data may be accessed by many different application programs which have not been previously coordinated, such a supervisor is very difficult to obtain.

Security problems. Not every user of the database system should be able to access all the data. For example, in a banking system, a person who prepares the payroll checks can only see that part of the database that has information about the various bank employees. He or she cannot access only account information. They cannot access information concerning salaries of employees. Since application programs are added to the system in an ad hoc manner, it is difficult to enforce such security constrains.

Integrity problems. The data values stored in the database must satisfy certain types of consistency constraints. For example, the balance of a bank account may never fall below a prespecified amount (for example, $25). These constraints must be enforced in the system. This enforcement can be carried out by adding appropriate code in the various application programs. However, when new constraints are added, it is difficult to change the programs to enforce them. This is compounded in the case where constraints involve several data items from different files.

Friday, February 22, 2008

Introduction - Database system concepts

A database management system (DBMS) consists of a collection of interrelated data and a set of programs to access that data. The collection of data is usually referred to as the database. The database contains information about one particular enterprise. The primary goal of a DBMS is to provide an environment that is both convenient and efficient to use in retrieving information from and storing information into the database.

Database systems are designed to manage large bodies of information. The management of data involves both the definition of structures for the storage of information and the provision of mechanisms for the manipulation of information. In addition, the database system must provide for the safety of information stored in the database, despite system crashes or attempts at unauthorized access. If data is to be shared among several users, the system must avoid possible anomalous results.

Because of the importance of information in most organizations, the database is a valueable resource. This has led to the development of a large body of concepts and techniques for the efficient management of data. In this champter, we present a brief introduction to the principles of database systems.

Purpose of Database Systems
· Data redundancy and inconsistency
· Difficulty in accessing data
· Data isolation
· Multiple users
· Security problems
· Integrity problems

Data Abstraction
· Physical level
· Conceptual level
· View level

Data Models
· Object-Based Logical Models
· Record-Based Logical Models
· Physical Data Models

Instances and Schemes

Data Independence

Data Definition Language

Data Manipulation Language

Database Manager
· Interaction with the file manager
· Integrity enforcement
· Security enforcement
· Backup and recovery
· Concurrency control

Database Administrator
· Scheme definition
· Storage structure and access method definition
· Scheme and physical organization modification
· Granting of authorization for data access
· Integrity constraint specification

Database Users
· Application programmers
· Casual users
· Naïve users
· Specialized users

Overall System Structure
· File manager
· Database manager
· Query processor
· DML precompiler
· DDL complier