Relational Database/Model

Database Systems Design, Implementation, and Management, 6e
Chapter 2: Data Models

ISBN: 061921323X Author: Peter Rob, Carlos M. Coronel
copyright © 2005 Course Technology

2.4.3 The Relational Model

The relational model, first developed by E. F. Codd (of IBM) in 1970, represented a major breakthrough for both users and designers. To use an analogy, the relational model produced an “automatic transmission” database to replace the “standard transmission” databases that preceded it. Its conceptual simplicity set the stage for a genuine database revolution.

Codd’s work was considered ingenious but impractical in 1970. The relational model’s conceptual simplicity was bought at the expense of computer overhead; computers at that time lacked the power to implement the relational model. Fortunately, computer power grew exponentially, as did operating system efficiency. Better yet, the cost of computers diminished rapidly even as their power grew. Today, even microcomputers, costing but a fraction of what their mainframe ancestors did, can run sophisticated relational database software such as Informix, Oracle, Ingress, DB2, and other mainframe relational software.

The relational database model presented in this chapter is designed to introduce a more detailed discussion in the next chapter. In fact, the relational model is so important that it will serve as the basis for our discussions in most of the remaining chapters.
Basic Structure

The relational data model is implemented through a very sophisticated relational database managementsystem (RDBMS). The RDBMS performs the same basic functions provided by the hierarchical and network DBMS systems, plus a host of other functions that make the relational data model easier to understand and to implement.

Arguably the most important advantage of the RDBMS is its ability to let the user/designer operate in a human logical environment. The RDBMS manages all of the complex physical details. Thus, the relational database is perceived by the user to be a collection of tables in which data are stored.

Each table is a matrix consisting of a series of row/column intersections. Tables, also called relations, are related to each other by sharing a common entity characteristic. For example, the CUSTOMER table in Figure 2.4 might contain a sales agent’s number that is also contained in the AGENT table.

This chapter’s databases are stored in the Databases\Student\Ch02 folder on the student CD. For example, the CUSTOMER and AGENT table contents shown in Figure 2.4 are found in the database named Ch02_InsureCo.

Figure 2.4 Linking Relational Tables

[Click to enlarge]

The common link between the CUSTOMER and AGENT tables thus enables us to match the customer to his/her sales agent, even though the customer data are stored in one table and the sales representative data are stored in another table. For example, we can easily determine that customer Dunne’s agent is Alex Alby, because for customer Dunne the CUSTOMER table’s AGENT_CODE is 501, which matches the AGENT table’s AGENT_CODE for agent Alex Alby. Although the tables are completely independent of one another, we can easily connect the data between tables. The relational model thus provides a minimum level of controlled redundancy to eliminate most of the redundancies commonly found in file systems.

By linking the AGENT and the CUSTOMER through the AGENT_CODE, we can determine that agent Leah Hahn (AGENT_CODE = 502) works with customers Ramas (10010), Smith (10012), Olowski (10013), and Brown (10016). We can also determine that customer Dunne’s agent is Alex Alby, and so on.

The relationship type (1:1, 1:M, or M:N) is often shown in a relational schema, an example of which is depicted in Figure 2.5. A relational schema is a visual representation of the relational database’s entities, the attributes within those entities, and the relationships between those entities.

As you examine Figure 2.5, note that the relational schema shows the connecting fields (in this case, AGENT_CODE) and the relationship type, 1:M. Microsoft Access, the database software application used to generate Figure 2.5, employs the 8 symbol to indicate the “many” side. In this example, the CUSTOMER represents the “many” side, because an AGENT can have many CUSTOMERs. The AGENT represents the “1” side, because each CUSTOMER has only one AGENT.

A relational table stores a collection of related entities. In this respect, the relational database table resembles a file. But there is one crucial difference between a table and a file: a table yields complete data and structural independence because it is a purely logical structure. How the data are physically stored in the database is of no concern to the user or the designer; the perception is what counts here. And this property of the relational data model, explored in depth in the next chapter, became the source of a real database revolution.

Figure 2.5 A Relational Schema

[Click to enlarge]

Like the hierarchical and network databases, the relational database is a single data repository in which data independence is maintained. However, the relational data model adds significant advantages:


Structural independence. Because the relational data model does not use a navigational data access system, data access paths are irrelevant to relational database designers, programmers, and end users. Changes in the relational database structure do not affect the DBMS’s data access in any way. Therefore, the relational data model achieves the structural independence not found in the hierarchical and network data models. (Recall that structural independence exists when it is possible to make changes in the database structure without affecting the application programs’ ability to access the data.) In contrast to the relational database, any change in the hierarchical database’s tree structure or in the network database’s sets will affect the data access paths, thus requiring changes in all data access programs.

Improved conceptual simplicity. Although the hierarchical and network data models were conceptually much simpler than the file management systems they replaced, the relational data model is even simpler at the conceptual level. Because the relational data model gives us the luxury of ignoring physical data storage characteristics, we can concentrate on the logical view of the database. That is, we can focus on the human perception of data storage rather than on the often difficult-to-comprehend manner in which the computer “sees” the same data.

Easier database design, implementation, management, and use. Because the relational model achieves both data independence and structural independence, it becomes much easier to design the database and to manage its contents.

Ad hoc query capability. One of the reasons for the relational data model’s rise to dominance in the database market is its very powerful and flexible query capability. For most relational database software, the query language is Structured Query Language (SQL). In fact, we have heard the sentiment, “If it doesn’t use SQL, it’s not relational,” expressed in professional meetings as well as in popular computer magazines such as InfoWeek. Given its importance, we will devote two chapters to SQL: Chapter 6, “An Introduction to Structured Query Language (SQL),” and Chapter 7, “Advanced SQL.”

SQL makes pure ad hoc queries a reality; it is a fourth-generation language (4GL) that allows the user to specify what must be done without specifying how it must be done. The RDBMS uses SQL to translate the user query into the technical code required to retrieve the requested data. Consequently, the relational database SQL language makes it possible to retrieve data with far less effort than any other database or file environment.

Keep in mind that any SQL-based relational database application involves three parts: a user interface, a set of tables within the database, and the SQL “engine.” The interface might include menus, query operations, and report generators. Basically, the interface allows the end user to interact with the data. Each interface is a product of the software vendor’s idea of meaningful interaction with the data. You can also design your own customized interface with the help of application generators that are now standard fare in the database software arena. The database tables simply store the data.

Largely hidden from the end user, the SQL engine does the tough database jobs. Within the RDBMS, SQL is used to create table structures, maintain the data dictionary and the system catalog, channel database table access and maintenance, and translate user requests into a format that the computer can handle. The database system’s maintenance is the crucial task that the RDBMS handles largely without the end user’s knowledge.

A powerful database management system. A good RDBMS is a much more complex piece of software than the DBMS found in the hierarchical and network databases. Its complexity stems from the fact that it performs far more (and more complex) tasks for both the system’s designers and the users. Consequently, a good RDBMS makes it possible to hide a system’s (physical) complexity from both the database designer and the end user.

Because the RDBMS performs the unseen hardware tasks, we need not focus on the physical aspects of the database. Instead, we will concentrate our efforts on the logical portion of the relational database and its design in the chapters that follow.

The relational database’s substantial advantages over the hierarchical and network databases are gained at the cost of some disadvantages:

1. Substantial hardware and system software overhead. The same RDBMS that hides most of the system’s complexity also necessitates its substantial hardware and causes its operating system overhead. It simply takes a more powerful computer to perform all the RDBMS-assigned tasks. Consequently, early relational database systems tended to be slower than the other contemporary database systems. However, with the rapid growth of hardware capability and the constant stream of operating system improvements, that “slow” label is fading.
2. Can facilitate poor design and implementation. In a sense, the relational environment’s easy-to-use asset is also its liability. Relational software, especially at the microcomputer level, is so simple to use that relatively untrained people find it easy to generate handy reports and queries without giving much thought to the need to design a proper database. As the database grows, lack of proper design slows down the system and produces the data anomalies found in file systems.
3. May promote “islands of information” problems. Because the relational data model is so easy to use, too many people find it easy to create their own database subsets and applications. Although end-user autonomy is desirable when the end users query a common database, such autonomy may also lead to the development of database subsets that are “owned” by divisions or individuals. The proliferation of such database subsets may produce the same “islands of information” problem that was characteristic of the file management system, with the same tendency to promote inconsistent data, thus causing problems in information generation and validation.

Because the relational database’s disadvantages are relatively minor when compared to its substantial advantages, it has become the dominant model for production databases. However, the ever-increasing complexity of the data environment kept database professionals searching for alternate data models, especially for models with a greater visual component.

Source: Wikipedia

A relational database matches data by using common characteristics found within the data set. The resulting groups of data are organized and are much easier for many people to understand.

For example, a data set containing all the real-estate transactions in a town can be grouped by the year the transaction occurred; or it can be grouped by the sale price of the transaction; or it can be grouped by the buyer's last name; and so on.

Such a grouping uses the relational model (a technical term for this is schema). Hence, such a database is called a "relational database."

The software used to do this grouping is called a relational database management system. The term "relational database" often refers to this type of software.

Relational databases are currently the predominant choice in storing financial records, manufacturing and logistical information, personnel data and much more.

* 1 Contents
o 1.1 Terminology
o 1.2 Relations or Tables
o 1.3 Base and derived relations
+ 1.3.1 Domain
o 1.4 Constraints
+ 1.4.1 Primary Keys
+ 1.4.2 Foreign keys
o 1.5 Stored procedures
o 1.6 Indices
* 2 Relational operations
* 3 Normalization
* 4 Relational database management systems
* 5 References

[edit] Contents

Strictly, a relational database is a collection of relations (frequently called tables). Other items are frequently considered part of the database, as they help to organize and structure the data, in addition to forcing the database to conform to a set of requirements.
[edit] Terminology

The term relational database was originally defined and coined by Edgar Codd at IBM Almaden Research Center in 1970.[1]
Relational database terminology.

Relational database theory uses a set of mathematical terms, which are roughly equivalent to SQL database terminology. The table below summarizes some of the most important relational database terms and their SQL database equivalents.
Relational term SQL equivalent
relation, base relvar table
derived relvar view, query result, result set
tuple row
attribute column
[edit] Relations or Tables
Main articles: Relation (database) and Table (database)

A relation is defined as a set of tuples that have the same attributes. A tuple usually represents an object and information about that object. Objects are typically physical objects or concepts. A relation is usually described as a table, which is organized into rows and columns. All the data referenced by an attribute are in the same domain and conform to the same constraints.

The relational model specifies that the tuples of a relation have no specific order and that the tuples, in turn, impose no order on the attributes. Applications access data by specifying queries, which use operations such as select to identify tuples, project to identify attributes, and join to combine relations. Relations can be modified using the insert, delete, and update operators. New tuples can supply explicit values or be derived from a query. Similarly, queries identify tuples for updating or deleting. It is necessary for each tuple of a relation to be uniquely identifiable by some combination (one or more) of its attribute values. This combination is referred to as the primary key.
[edit] Base and derived relations
Main articles: Relvar and View (database)

In a relational database, all data are stored and accessed via relations. Relations that store data are called "base relations", and in implementations are called "tables". Other relations do not store data, but are computed by applying relational operations to other relations. These relations are sometimes called "derived relations". In implementations these are called "views" or "queries". Derived relations are convenient in that though they may grab information from several relations, they act as a single relation. Also, derived relations can be used as an abstraction layer.
[edit] Domain
Main article: data domain

A domain describes the set of possible values for a given attribute. Because a domain constrains the attribute's values and name, it can be considered constraints. Mathematically, attaching a domain to an attribute means that "all values for this attribute must be an element of the specified set."

The character data value 'ABC', for instance, is not in the integer domain. The integer value 123, satisfies the domain constraint.
[edit] Constraints
Main article: Constraint

Constraints allow you to further restrict the domain of an attribute. For instance, a constraint can restrict a given integer attribute to values between 1 and 10. Constraints provide one method of implementing business rules in the database. SQL implements constraint functionality in the form of check constraints.

Constraints restrict the data that can be stored in relations. These are usually defined using expressions that result in a boolean value, indicating whether or not the data satisfies the constraint. Constraints can apply to single attributes, to a tuple (restricting combinations of attributes) or to an entire relation.

Since every attribute has an associated domain, there are constraints (domain constraints). The two principal rules for the relational model are known as entity integrity and referential integrity.
[edit] Primary Keys
Main article: Primary Key

A primary key uniquely defines a relationship within a database. In order for an attribute to be a good primary key it must not repeat. While natural attributes are sometimes good primary keys, Surrogate keys are often used instead. A surrogate key is an artificial attribute assigned to an object which uniquely identifies it (For instance, in a table of information about students at a school they might all be assigned a Student ID in order to differentiate them). The surrogate key has no intrinsic meaning, but rather is useful through its ability to uniquely identify a tuple.

Another common occurrence, especially in regards to N:M cardinality is the composite key. A composite key is a key made up of two or more attributes within a table that (together) uniquely identify a record. (For example, in a database relating students, teachers, and classes. Classes could be uniquely identified by a composite key of their room number and time slot, since no other class could have that exact same combination of attributes. In fact, use of a composite key such as this can be a form of data verification, albeit a weak one.)
[edit] Foreign keys
Main article: Foreign key

A foreign key is a reference to a key in another relation, meaning that the referencing table has, as one of its attributes, the values of a key in the referenced table. Foreign keys need not have unique values in the referencing relation. Foreign keys effectively use the values of attributes in the referenced relation to restrict the domain of one or more attributes in the referencing relation.

A foreign key could be described formally as: "For all tables in the referencing relation projected over the referencing attributes, there must exist a table in the referenced relation projected over those same attributes such that the values in each of the referencing attributes match the corresponding values in the referenced attributes."
[edit] Stored procedures
Main article: Stored procedure

A stored procedure is executable code that is associated with, and generally stored in, the database. Stored procedures usually collect and customize common operations, like inserting a tuple into a relation, gathering statistical information about usage patterns, or encapsulating complex business logic and calculations. Frequently they are used as an application programming interface (API) for security or simplicity. Implementations of stored procedures on SQL DBMSs often allow developers to take advantage of procedural extensions (often vendor-specific) to the standard declarative SQL syntax.

Stored procedures are not part of the relational database model, but all commercial implementations include them.
[edit] Indices
Main article: Index (database)

An index is one way of providing quicker access to data. Indices can be created on any combination of attributes on a relation. Queries that filter using those attributes can find matching tuples randomly using the index, without having to check each tuple in turn. This is analogous to using the index of a book to go directly to the page on which the information you are looking for is found i.e. you do not have to read the entire book to find what you are looking for. Relational databases typically supply multiple indexing techniques, each of which is optimal for some combination of data distribution, relation size, and typical access pattern. B+ trees, R-trees, and bitmaps.

Indices are usually not considered part of the database, as they are considered an implementation detail, though indices are usually maintained by the same group that maintains the other parts of the database.
[edit] Relational operations
Main article: Relational algebra

Queries made against the relational database, and the derived relvars in the database are expressed in a relational calculus or a relational algebra. In his original relational algebra, Codd introduced eight relational operators in two groups of four operators each. The first four operators were based on the traditional mathematical set operations:

* The union operator combines the tuples of two relations and removes all duplicate tuples from the result. The relational union operator is equivalent to the SQL UNION operator.
* The intersection operator produces the set of tuples that two relations share in common. Intersection is implemented in SQL in the form of the INTERSECT operator.
* The difference operator acts on two relations and produces the set of tuples from the first relation that do not exist in the second relation. Difference is implemented in SQL in the form of the EXCEPT or MINUS operator.
* The cartesian product of two relations is a join that is not restricted by any criteria, resulting in every tuple of the first relation being matched with every tuple of the second relation. The cartesian product is implemented in SQL as the CROSS JOIN join operator.

The remaining operators proposed by Codd involve special operations specific to relational databases:

* The selection, or restriction, operation retrieves tuples from a relation, limiting the results to only those that meet a specific criteria, i.e. a subset in terms of set theory. The SQL equivalent of selection is the SELECT query statement with a WHERE clause.
* The projection operation is essentially a selection operation in which duplicate tuples are removed from the result. The SQL GROUP BY clause, or the DISTINCT keyword implemented by some SQL dialects, can be used to remove duplicates from a result set.
* The join operation defined for relational databases is often referred to as a natural join. In this type of join, two relations are connected by their common attributes. SQL's approximation of a natural join is the INNER JOIN join operator.
* The relational division operation is a slightly more complex operation, which involves essentially using the tuples of one relation (the dividend) to partition a second relation (the divisor). The relational division operator is effectively the opposite of the cartesian product operator (hence the name).

Other operators have been introduced or proposed since Codd's introduction of the original eight including relational comparison operators and extensions that offer support for nesting and hierarchical data, among others.
[edit] Normalization
Main article: Database normalization

Normalization was first proposed by Codd as an integral part of the relational model. It encompasses a set of best practices designed to eliminate the duplication of data, which in turn prevents data manipulation anomalies and loss of data integrity. The most common forms of normalization applied to databases are called the normal forms. Normalization trades reducing redundancy for increased information entropy. Normalization is criticised because it increases complexity and processing overhead required to join multiple tables representing what are conceptually a single item[citation needed].
[edit] Relational database management systems
Main article: Relational database management system

Relational databases, as implemented in relational database management systems, have become a predominant choice for the storage of information in new databases used for financial records, manufacturing and logistical information, personnel data and much more. Relational databases have often replaced legacy hierarchical databases and network databases because they are easier to understand and use, even though they are much less efficient. As computer power has increased, the inefficiencies of relational databases, which made them impractical in earlier times, have been outweighed by their ease of use. However, relational databases have been challenged by Object Databases, which were introduced in an attempt to address the object-relational impedance mismatch in relational database, and XML databases.

The three leading commercial relational database vendors are Oracle, Microsoft, and IBM.[2]. The three leading open source implementations are MySQL, PostgreSQL, and SQLite.
[edit] References

1. ^ Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377–387. doi:10.1145/362384.362685.
2. ^ Gartner Says Worldwide Relational Database Market Increased 14 Percent in 2006, includes revenue estimates for leading database companies

Retrieved from ""
Categories: Databases | Database theory | Types of databases

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License