Query Algorithms

Queries are useful as they reduce the number of scan operations on the underlying physical file structure.

For each relational operation, there can be several different access paths to the particular records needed. The query execution engine can have many algorithms designed to process particular relational operations and access path combinations.

Some examples of algorithms for both the select and join operations are given below.

(a) Selection Algorithms

The select operation must search through the data files for records meeting the selection criteria.

The following are some examples of simple (one attribute):

(i) S1. Linear Search

Every record from the file is read and compared to the selection criteria.

- Execution cost for searching on a non-key attribute is **br**, where **br** is the number of blocks in the file representing relation _r_
- On a key attribute, the average cost is **br / 2**, with a worst case of **br**

(ii) S2. Binary Search on Primary Key

A binary search, on equality, performed on a primary key attribute (file ordered by the key) has a worst-case cost of:

⌈log⁡2(br)⌉\lceil \log_2(br) \rceil

⌈log2(br)⌉

This can be significantly more efficient than linear search, particularly for a large number of records.

(iii) S3. Search using a Primary Index on Equality

With a B⁺-tree index, an equality comparison on a key attribute will have a worst-case cost of:

Height of the tree (in the index file)
Plus one to retrieve the record from the data file

An equality comparison on a non-key attribute will be the same except that multiple records may meet the condition. In that case, we add the number of blocks containing the records to the cost.

(iv) S4. Search using a Primary Index on Comparison

When the comparison operators (**<, ≤, >, ≥**) are used to retrieve multiple records from a file sorted by the search attribute, the first record satisfying the condition is located and the total blocks before (**<, ≤**) or after (**>, ≥**) is added to the cost of locating the first record.
```text
---

# **(b) Join Algorithms**

Like selection, the join operation can be implemented in a variety of ways. In terms of disk accesses, join operations can be very expensive, so implementing and utilizing efficient join algorithms is critical in minimizing a query’s execution time.

The following are four well-known types of join algorithms:

---

## **(i) J1. Nested-Loop Join**

This algorithm consists of an inner for loop nested within an outer for loop.

To illustrate this algorithm, we use the following notations:

- **r, s** → Relations r and s
- **tr** → Tuple (record) in relation r
- **ts** → Tuple (record) in relation s
- **nr** → Number of records in relation r
- **ns** → Number of records in relation s
- **br** → Number of blocks with records in relation r
- **bs** → Number of blocks with records in relation s

---

```text

for each tuple tr in r
    for each tuple ts in s
        if join condition is true for (tr, ts)
            add tr + ts to the result

Cost Analysis

Each record in outer relation r is scanned once
Each record in inner relation s is scanned nr times
Total record scans = nr × ns

If only one block of each relation fits into memory:

Cost = nr × bs + br

If all blocks of both relations fit into memory:

Cost = br + bs

If all blocks of relation s (inner relation) fit into memory:

Cost = br + bs

Therefore, if one relation can fully fit into memory, it is better to use it as the inner relation.

Note

Even though nested-loop join is expensive in worst cases, it has an advantage:

It does not impose restrictions on access paths
Works regardless of join condition

(ii) J2. Index Nested Loop Join

This algorithm is similar to nested-loop join, but:

Uses an index on the inner relation’s join attribute
Avoids full scan of the inner relation

Each lookup in the inner loop becomes an equality selection

Let:

c = cost of lookup

(iii) J3. Sort-Merge Join

This algorithm:

Works for natural joins and equi-joins
Requires both relations (r and s) to be sorted on join attributes

Key Points

Each record in both relations is scanned only once
Worst and best-case cost:

br+bsbr + bs

br+bs

Variants are used when:
- Data is unsorted
- Secondary indices exist

(iv) J4. Hash Join

Like sort-merge join, hash join is used for:

Natural joins
Equi-joins

Working

Uses two hash tables (one per relation)
Records are partitioned based on hash values of join attributes
Matching partitions are joined

Steps

Partition both relations using hash function
Build hash table for smaller relation
Probe with records from other relation
Output matching tuples

Note

Collisions may occur
Still efficient due to partitioning

Query Optimization

The function of a DBMS query optimization engine is to:

Find an evaluation plan that reduces total execution cost

Heuristic-Based Optimization

Uses mathematical rules
Transforms query into more efficient form

Based On

Relational algebra expressions
Query trees
Cost estimation
Query semantics

Goal

Select:

Which rules to apply
When to apply
How to apply

Typically, optimization works on internal representations such as:

Query tree
Generated from high-level language (e.g., SQL)

Expression

After a high-level query (i.e., SQL statement) has been parsed into an equivalent relational algebra expression, the query optimizer can perform heuristic rules on the expression and tree to transform the expression and tree into equivalent but optimized forms.

(a) Selection Algorithms

(i) S1. Linear Search​

(ii) S2. Binary Search on Primary Key​

(iii) S3. Search using a Primary Index on Equality​

(iv) S4. Search using a Primary Index on Comparison​

Cost Analysis​

Note​

(ii) J2. Index Nested Loop Join​

(iii) J3. Sort-Merge Join​

Key Points​

(iv) J4. Hash Join​

Working​

Steps​

Note​

Query Optimization

Heuristic-Based Optimization​

Based On​

Goal​

Expression

(i) S1. Linear Search

(ii) S2. Binary Search on Primary Key

(iii) S3. Search using a Primary Index on Equality

(iv) S4. Search using a Primary Index on Comparison

Cost Analysis

Note

(ii) J2. Index Nested Loop Join

(iii) J3. Sort-Merge Join

Key Points

(iv) J4. Hash Join

Working

Steps

Note

Heuristic-Based Optimization

Based On

Goal