File Organization and Indexing
File Organization governs how records inside a Table are actually stored on physical disk drives.
Common Organizations
-
Heap File Organization Records are placed anywhere in the file where there is free space. They are completely unordered. Fast for insertions, terrible for structured reading.
-
Sequential File Organization Records are stored in a strict sequential order, dictated by a
Search Key. Excellent for range queries. -
Hashing File Organization A hash function is computed on a key attribute, outputting the direct physical location block. Offers
O(1)retrieval speeds.
Indexing
Whenever we run a query across millions of rows, scanning every record (linear search) is exceptionally slow. An index is a secondary data structure built to vastly increase the speed of data retrieval operations.
Types of Indexing:
- Primary Index: Ordered index built on the Primary Key.
- Secondary Index: An auxiliary index built on secondary non-key fields (like
EmailorUsername).
Trees in Indexing
B-Tree
A balanced multi-way search tree. All leaf nodes sit at the exact same depth. Internal nodes carry both keys and actual pointer records. Can make sequential traversal complicated.
B+ Tree
The most dominant indexing structure in modern databases (MySQL InnoDB, PostgreSQL). In a B+ Tree, all data pointers are stored exclusively at the leaf nodes. The internal nodes only store routing keys. The leaf nodes are doubly-linked together, providing incredibly fast multi-page sequential scans.