Structured Query Language (SQL)

The Structured Query Language (SQL) is the set of instructions used to interact with a relational database.

In fact, SQL is the only language that most databases actually understand. Whenever you interact with such a database, the software translates your commands (whether they are mouse clicks or form entries) into SQL statements that the database knows how to interpret.

SQL has three major components:

Data Manipulation Language (DML)
Data Definition Language (DDL)
Data Control Language (DCL)

1. Data Definition Language (DDL)

The Data Definition Language (DDL) contains commands that are less frequently used. DDL commands modify the actual structure of a database rather than the database’s contents.

Examples of commonly used DDL commands include:

CREATE TABLE (to generate a new database table)
ALTER TABLE (to modify the structure of a database table)
DROP TABLE (to delete a database table)

2. Data Manipulation Language (DML)

The Data Manipulation Language (DML) contains the subset of SQL commands used most frequently—those that simply manipulate the contents of a database in some forms.

The four most common DML commands are:

SELECT (retrieve information from a database)
INSERT (add new information to a database)
UPDATE (modify information currently stored in a database)
DELETE (remove information from a database)

3. Data Control Language (DCL)

The Data Control Language (DCL) is used to manage user access to databases.

It consists of two commands:

GRANT (used to add database permissions for a user)
REVOKE (used to take away existing permissions)

These two commands form the core of the relational database security model.

Query By Example (QBE)

QBE or Query By Example is a language similar to SQL and is used for the same purpose as SQL.

It is different from SQL and from most other database query languages, in having a graphical user interface that allows users to write queries by creating example tables on the screen.

A user needs minimal information to get started and the whole language contains relatively few concepts. QBE is suitable for small operations. It means QBE is especially suited for queries that are not too complex and can be expressed in terms of a few tables.

QBE was developed at IBM and QBE is an IBM trademark.

1. Basic QBE Queries

A user writes queries by creating example tables. QBE uses domain variables to create example tables.

Domain of a variable is determined by the column in which it appears and variable symbols are prefixed with underscore (_) to distinguish them from constants. Constants, including strings, appear unquoted, in contrast to SQL.

The fields that should appear in the answer are specified by using the command P, which stands for print. The fields containing this command are analogous to the target list in SQL.

We introduce QBE through example queries involving just one relation. To print the names and ages of all sailors, we would create the following example table:

| Sailors | sid | sname | rating | age   |
| ------- | --- | ----- | ------ | ----- |
|         |     | P.\_N |        | P.\_A |

A variable that appears only once can be omitted; QBE supplies a unique new name internally. Thus, the previous query could also be written by omitting variables \_N and \_A, leaving just P. in the sname and age columns.

The query corresponds to the following DRC query:

{⟨N, A⟩ | ∃ I, T (⟨I, N, T, A⟩ ∈ Sailors)}

A convenient shorthand notation is that if we want to print all fields in some relation, we can place P. under the name of the relation. This notation is like the SELECT * convention in SQL. It is equivalent to placing a P. in every field.

| Sailors | sid | sname | rating | age |
| ------- | --- | ----- | ------ | --- |
| P.      |     |       |        |     |

Selections are expressed by placing a constant in some field:

| Sailors | sid | sname | rating | age |
| ------- | --- | ----- | ------ | --- |
| P.      |     |       | 10     |     |

Placing a constant, say 10, in a column is the same as placing the condition = 10.

This query is very similar in form to the equivalent DRC query:

{⟨I, N, 10, A⟩ | ⟨I, N, 10, A⟩ ∈ Sailors}

We can use other comparison operators

(<, >, ≤, ≥, ≠) as well.

Example:

- rating < 10 → retrieve sailors with rating less than 10
- rating ≠ 10 → retrieve sailors whose rating is not equal to 10

The expression ¬10 in an attribute column is the same as ≠ 10.

2. Aggregates

Like SQL, QBE supports aggregate operations such as AVG, COUNT, MAX, MIN and SUM.

By default, these aggregate operators do not eliminate duplicates, with the exception of COUNT, which does eliminate duplicates. To eliminate duplicate values, the variants AVG.UNQ and SUM.UNQ must be used (irrelevant for MIN and MAX).

Curiously, there is no variant of COUNT that does not eliminate duplicates.

Example Table

sid	sname	rating	age
22	dustin	7	45.0
58	rusty	10	35.0
44	horatio	7	35.0

The query will print the value as 38.3. It means the value 35 is counted twice.

If we want to count each age once, we have to specify P.AVG._A as follows:

| Sailors | sid | sname | rating | age       |
| ------- | --- | ----- | ------ | --------- |
|         |     |       |        | \_A       |
|         |     |       |        | P.AVG.\_A |

In this way, we will get the result as 40.

3. The Conditions Box

Simple conditions can be expressed directly in columns of the example tables. For more complex conditions, QBE provides a feature called a conditions box.

Conditions boxes are used to:

- Express a condition involving two or more columns (e.g., R/A > 0.2)
- Express a condition involving an aggregate operation on a group (e.g., AVG.\_A > 30)

This is similar to the HAVING clause in SQL.

Example

The following query prints those ratings for which the average age is more than 30:

| Sailors | sid | sname | rating | age | Conditions   |
| ------- | --- | ----- | ------ | --- | ------------ |
|         |     |       | G.P    | \_A | AVG.\_A > 30 |

AND / OR Conditions

We can express conditions involving AND and OR operators.

Example: Print names of sailors who are younger than 20 or older than 30:

| Sailors | sid | sname | rating | age | Conditions           |
| ------- | --- | ----- | ------ | --- | -------------------- |
|         |     | P.    |        | \_A | \_A < 20 OR 30 < \_A |

Query Processing and Optimization

All database systems must be able to respond to requests for information from the user i.e., process queries. Obtaining the desired information from a database efficiently is the goal of query processing and optimization.

A database system in a predictable and reliable fashion is the scientific art of query processing. Getting these results back in a timely manner deals with the technique of query optimization.

For example, consider a database of cars and drivers. Each car and driver are unique and each car can have 0 or more drivers but only one owner. A driver can own and drive multiple cars. There are three relations: cars, drivers, and car_driver with the following attributes:

Relation: Vehicles

Attribute	Length	Whether a key
Vehicle_id	20	Yes
Manufacturer	20	No
Model	20	No
Year	4	No
Owner	20	No

Relation: Drivers

Attribute	Length	Whether a key
Driver_id	10	Yes
First_name	20	No
Last_name	20	No
Age	2	No

Relation: Car_Driver

Attribute	Length	Whether a key
Car_Name	15	Yes
Driver_id	10	No

The Query Processing

There are three phases that a query passes through during the DBMS processing of that query:

(a) Parsing and translation
(b) Optimization
(c) Evaluation

(a) Parsing and Translation

The parsing and translation phase has three steps:

(i) During the first step a query submitted to a DBMS is to convert the query into a form usable by the query processing engine. Usually the query is represented in a high-level language such as SQL by using certain sequences of characters to represent various types of tokens such as keywords, operators, operands, literal strings, etc.

Like all languages, there are rules (syntax and grammar) that govern how the tokens can be combined into understandable (i.e., valid) statements.

(ii) During the second step the primary job of parser is performed. Under this step the parser extracts the tokens from the raw string of characters and translate them into the corresponding internal data elements (i.e., relational algebra operations and operands) and structures (i.e., query tree, query graph).

(iii) During the third step the parser verifies the validity and syntax of the original query string.

(b) Optimization

During the optimization phase, the query processor applies rules to the internal data structures of the query to transform these structures into equivalent but more efficient representations.

The rules can be based upon:

Mathematical models of relational algebra expressions
Tree (heuristics)
Cost estimates of different algorithms applied to operations
Semantics within the query and the relations involved

Selecting the proper rules, when to apply them, and how they are applied is the function of the query optimization engine.

(c) Evaluation

The last phase of this process is the evaluation. During this phase, a query is evaluated.

The best evaluation plan candidate generated by the optimization engine is selected and executed.
There may be multiple methods of executing a query.
Queries can be executed:
- Sequentially
- In parallel (independent processes)
- As interdependent pipelines (threads/processes)

Whatever method is selected, the actual results should remain the same.

The Types of Indices

Indices are very useful for a database. The utilization of indices can reduce the execution time of operations such as SELECT and JOIN.

Some common types of indices used in DBMS are:

(a) Dense Index

In a dense index:

The data file is ordered by the search key
Every search key value has a separate index record
This structure requires only a single seek to find the first occurrence of a set of contiguous records with the desired search value

(b) Sparse Index

In sparse index, the data file is ordered by the index search key and only some of the search key values have corresponding index records. Each index record’s data file pointer points to the first data file record with the search key value.

While this structure can be less efficient (in terms of number of disk accesses) than a dense index to find the desired records, it requires less storage space and less overhead during insertion and deletion operations.

(c) Primary Index

In primary index, the data file is ordered by the attribute that is also the search key in the index file. Primary indices can be dense or sparse. This is also referred to as an index sequential file.

For scanning through a relation’s records in sequential order by a key value, this is one of the fastest and more efficient structures—locating a record has a cost of 1 seek and the contiguous makeup of the records in sorted order minimizes the number of blocks that have to be read.

However, after large numbers of insertions and deletions, the performance can degrade quite quickly and the only way to restore the performance is to perform a reorganization.

(d) Secondary Index

In secondary index, the data file is ordered by an attribute that is different from the search key in the index file. Secondary indices must be dense.

(e) Multi-level Index

In multi-level index, an index structure consists of 2 or more tiers of records, where an upper tier’s records point to associated index records of the tier below. The bottom tier’s index records contain the pointers to the data file records.

Multi-level indices can be used, for instance, to reduce the number of disk block reads needed during a binary search.

(f) Clustering Index

In the clustering index, a two-level index structure is used:

The first level contains the clustering field value in one field
A second field points to a block (of second-level records)

The records in the second level have one field that points to an actual data file record or to another second-level block.

(g) B⁺-Tree Index

In a B⁺-tree index, the multi-level index is organized as a balanced tree structure.

Finding a search key value in a B⁺-tree is proportional to the height of the tree
Maximum number of seeks required is: ⌈logᵦ(n)⌉

While this, on average, is more than a single-level dense index that requires only one seek, the B⁺-tree structure has a distinct advantage in that it does not require reorganization—it is self-optimizing because the tree is kept balanced during insertions and deletions.

Many mission-critical applications require high performance with near-100% uptime, which cannot be achieved with structures requiring reorganization. The leaves of the B⁺-tree are used to reorganize the data file.

1. Data Definition Language (DDL)​

2. Data Manipulation Language (DML)​

3. Data Control Language (DCL)​