Database Management Systems
Databases have been in use since the earliest days of electronic computing, but the vast majority of these were custom programs written to access custom databases. Unlike modern systems which can be applied to widely different databases and needs, these systems were tightly linked to the database in order to gain speed at the expense of flexibility.
A database is an organized collection of related information.
A Relational database is a collection of relations or two-dimensional tables.
A Thing of significance about which information needs to be known. Examples are Departments, Employees, and Orders.
Something that describes or qualifies an Entity. For example, for the employee entity the attributes would be the employee number, name, job title, Dept number and so on.
A named Association between entities is called Relationship.
Database Management System (DBMS)
A database management system (DBMS) is a computer program (or more typically, a suite of them) designed to manage a database (a large set of structured data), and run operations on the data requested by numerous clients.
Typical examples of DBMS use include accounting, human resources and customer support systems.
Originally found only in large organizations with the computer hardware needed to support large data sets, DBMSs have more recently emerged as a fairly standard part of any company back office.
DBMS’s are found at the heart of most database applications.
Sometimes DBMSs are built around a private multitasking kernel with built-in networking support although nowadays these functions are left to the operating system.
A database management system (DBMS) is a system, usually automated and computerized, for the management of any collection of compatible, and ideally normalized, data.
A database application is computer software written to manage the data of a particular application or problem.
As computers grew in capability, this trade off became increasingly unnecessary and a number of general purpose database systems emerged, by the mid-1960s there were a number of such systems in commercial use.
Interest in a standard began to grow, and Charles Bachman, author of one such product, IDS, founded the Database Task Group within CODASYL, the group responsible for the creation and standardization of COBOL.
In 1971 they delivered their standard, which generally became known as the Codasyl approach, and soon there were a number of commercial products based on it available.
The Codasyl approach was based on the manual navigation of a linked dataset which was formed into a large network.
When the database was first opened, the program was handed back a link to the first record in the database, which also contained pointers to other pieces of data.
To find any particular record the programmer had to step through these pointers one at a time until the required record was returned. Simple queries like
find all the people in Swedenrequired the program to walk the entire data set and collect the matching results.
There was, essentially, no concept of
This might sound like a serious limitation today, but in an era when the data was most often stored on magnetic tape such operations were too expensive to contemplate anyway.
IBM also had their own DBMS system in 1968, known as IMS. IMS was a development of software written for the Apollo program on the System/360. IMS was generally similar in concept to Codasyl, but used a strict hierarchy for its model of data navigation instead of Codasyl’s network model.
Both concepts later became known as navigational databases due to the way data was accessed, and Bachman’s 1973 Turing Award award presentation was The Programmer as Navigator.
IMS is classified as a hierarchical database. IDS and IDMS (both CODASYL databases) as well as CINCOMs TOTAL database are classified as network databases.
Edgar Coddworked at IBM in San Jose, California, in one of their offshoot offices that was primarily involved in the development of hard disk systems.
He was unhappy with the navigational model of the Codasyl approach, notably the lack of a “search” facility which was becoming increasingly useful when the database was stored on disk instead of tape.
In 1970 he wrote a number of papers that outlined a new approach to database construction that eventually culminated in the groundbreaking A Relational Model of Data for Large Shared Data Banks.
In this paper he described a new system for storing and working with large databases. Instead of records being stored in some sort of linked list of free-form records as in Codasyl, Codd’s idea was to use a
tableof fixed-length records.
A linked-list system would be very inefficient when storing
sparsedatabases where some of the data for any one record could be left empty.
The relational model solved this by splitting the data into a series of normalized tables, with optional elements being moved out of the main table to where they would take up room only if needed.
For instance, a common use of a database system is to track information about users, their name, login information, various addresses and phone numbers.
In the navigational approach all of these data would be placed in a single record, and unused items would simply not be placed in the database. In the relational approach, the data would be normalized into a user table, an address table and a phone number table (for instance).
Records would be created in these optional tables only if the address or phone numbers were actually provided.
Linking the information back together is the key to this system. In the relational model some bit of information was used as a “key”, uniquely defining a particular record.
- When information was being collected about a user, information stored in the optional (or related) tables would be found by searching for this key.
- For instance, if the login name of a user is unique, addresses and phone numbers for that user would be recorded with the login name as its key. This
re-linkingof related data back into a single collection is something that traditional computer languages are not designed for.
- Just as the navigational approach would require programs to loop in order to collect records, the relational approach would require loops to collect information about any one record.
- Codd’s solution to the necessary looping was a set-oriented language, a suggestion that would later spawn the ubiquitous SQL.
- Using a branch of mathematics known as tuple calculus, he demonstrated that such a system could support all the operations of normal databases (inserting, updating etc.) as well as providing a simple system for finding and returning sets of data in a single operation.
- Codd’s paper was picked up by two people at Berkeley, Eugene Wong and Michael Stonebraker. They started a project known as INGRES using funding that had already been allocated for a geographical database project, using student programmers to produce code.
- Beginning in 1973, INGRES delivered its first test products which were generally ready for widespread use in 1979.
- During this time a number of people had moved “through” the group perhaps as many as 30 people worked on the project, about five at a time.
- INGRES was similar to System R in a number of ways, including the use of a
languagefor data access, known as QUEL - QUEL was in fact relational, having been based on Codd’s own Alpha language, but has since been corrupted to follow SQL, thus violating much the same concepts of the relational model as SQL itself.
- IBM itself did only one test implementation of the relational model, PRTV, and a production one, Business System 12, both now discontinued.
- Honeywell did MRDS for Multics, and now there are two new implementations: Alphora Dataphor and Rel.
- All other DBMS implementations usually called relational are actually SQL DBMSs.