Skip to main content

Query Algorithms

Queries are useful as they reduce the number of scan operations on the underlying physical file structure.

For each relational operation, there can be several different access paths to the particular records needed. The query execution engine can have many algorithms designed to process particular relational operations and access path combinations.

Some examples of algorithms for both the select and join operations are given below.

(a) Selection Algorithms

The select operation must search through the data files for records meeting the selection criteria.

The following are some examples of simple (one attribute):

Every record from the file is read and compared to the selection criteria.


- Execution cost for searching on a non-key attribute is **br**, where **br** is the number of blocks in the file representing relation _r_
- On a key attribute, the average cost is **br / 2**, with a worst case of **br**

(ii) S2. Binary Search on Primary Key​

A binary search, on equality, performed on a primary key attribute (file ordered by the key) has a worst-case cost of:


⌈log⁑2(br)βŒ‰\lceil \log_2(br) \rceil

⌈log2(br)βŒ‰

This can be significantly more efficient than linear search, particularly for a large number of records.


(iii) S3. Search using a Primary Index on Equality​

With a B⁺-tree index, an equality comparison on a key attribute will have a worst-case cost of:

  • Height of the tree (in the index file)
  • Plus one to retrieve the record from the data file

An equality comparison on a non-key attribute will be the same except that multiple records may meet the condition. In that case, we add the number of blocks containing the records to the cost.

(iv) S4. Search using a Primary Index on Comparison​

When the comparison operators (**<, ≀, >, β‰₯**) are used to retrieve multiple records from a file sorted by the search attribute, the first record satisfying the condition is located and the total blocks before (**<, ≀**) or after (**>, β‰₯**) is added to the cost of locating the first record.
```text
---

# **(b) Join Algorithms**

Like selection, the join operation can be implemented in a variety of ways. In terms of disk accesses, join operations can be very expensive, so implementing and utilizing efficient join algorithms is critical in minimizing a query’s execution time.

The following are four well-known types of join algorithms:

---

## **(i) J1. Nested-Loop Join**

This algorithm consists of an inner for loop nested within an outer for loop.

To illustrate this algorithm, we use the following notations:

- **r, s** β†’ Relations r and s
- **tr** β†’ Tuple (record) in relation r
- **ts** β†’ Tuple (record) in relation s
- **nr** β†’ Number of records in relation r
- **ns** β†’ Number of records in relation s
- **br** β†’ Number of blocks with records in relation r
- **bs** β†’ Number of blocks with records in relation s

---

```text

for each tuple tr in r
for each tuple ts in s
if join condition is true for (tr, ts)
add tr + ts to the result

Cost Analysis​

  • Each record in outer relation r is scanned once
  • Each record in inner relation s is scanned nr times
  • Total record scans = nr Γ— ns

If only one block of each relation fits into memory:

  • Cost = nr Γ— bs + br

If all blocks of both relations fit into memory:

  • Cost = br + bs

If all blocks of relation s (inner relation) fit into memory:

  • Cost = br + bs

Therefore, if one relation can fully fit into memory, it is better to use it as the inner relation.

Note​

Even though nested-loop join is expensive in worst cases, it has an advantage:

  • It does not impose restrictions on access paths
  • Works regardless of join condition

(ii) J2. Index Nested Loop Join​

This algorithm is similar to nested-loop join, but:

  • Uses an index on the inner relation’s join attribute
  • Avoids full scan of the inner relation

Each lookup in the inner loop becomes an equality selection

Let:

  • c = cost of lookup

(iii) J3. Sort-Merge Join​

This algorithm:

  • Works for natural joins and equi-joins
  • Requires both relations (r and s) to be sorted on join attributes

Key Points​

  • Each record in both relations is scanned only once
  • Worst and best-case cost:

br+bsbr + bs

br+bs
  • Variants are used when:
    • Data is unsorted
    • Secondary indices exist

(iv) J4. Hash Join​

Like sort-merge join, hash join is used for:

  • Natural joins
  • Equi-joins

Working​

  • Uses two hash tables (one per relation)
  • Records are partitioned based on hash values of join attributes
  • Matching partitions are joined

Steps​

  1. Partition both relations using hash function
  2. Build hash table for smaller relation
  3. Probe with records from other relation
  4. Output matching tuples

Note​

  • Collisions may occur
  • Still efficient due to partitioning

Query Optimization

The function of a DBMS query optimization engine is to:

Find an evaluation plan that reduces total execution cost

Heuristic-Based Optimization​

  • Uses mathematical rules
  • Transforms query into more efficient form

Based On​

  • Relational algebra expressions
  • Query trees
  • Cost estimation
  • Query semantics

Goal​

Select:

  • Which rules to apply
  • When to apply
  • How to apply

Typically, optimization works on internal representations such as:

  • Query tree
  • Generated from high-level language (e.g., SQL)

Expression

After a high-level query (i.e., SQL statement) has been parsed into an equivalent relational algebra expression, the query optimizer can perform heuristic rules on the expression and tree to transform the expression and tree into equivalent but optimized forms.