Skip to main content

File Organization and Indexing

File Organization governs how records inside a Table are actually stored on physical disk drives.

Common Organizations

  1. Heap File Organization Records are placed anywhere in the file where there is free space. They are completely unordered. Fast for insertions, terrible for structured reading.

  2. Sequential File Organization Records are stored in a strict sequential order, dictated by a Search Key. Excellent for range queries.

  3. Hashing File Organization A hash function is computed on a key attribute, outputting the direct physical location block. Offers O(1) retrieval speeds.


Indexing

Whenever we run a query across millions of rows, scanning every record (linear search) is exceptionally slow. An index is a secondary data structure built to vastly increase the speed of data retrieval operations.

Types of Indexing:

  • Primary Index: Ordered index built on the Primary Key.
  • Secondary Index: An auxiliary index built on secondary non-key fields (like Email or Username).

Trees in Indexing

B-Tree

A balanced multi-way search tree. All leaf nodes sit at the exact same depth. Internal nodes carry both keys and actual pointer records. Can make sequential traversal complicated.

B+ Tree

The most dominant indexing structure in modern databases (MySQL InnoDB, PostgreSQL). In a B+ Tree, all data pointers are stored exclusively at the leaf nodes. The internal nodes only store routing keys. The leaf nodes are doubly-linked together, providing incredibly fast multi-page sequential scans.