Core Concepts & Architecture
To understand how AI models are built in the real world, you must grasp core data science terminology and the architectural hierarchy.
The AI / ML / DL Hierarchy
It is crucial to understand that AI, Machine Learning, and Deep Learning are not synonymous—they are Russian nesting dolls of technology.
- AI: Any technique that enables computers to mimic human behavior.
- ML: Statistical techniques that give computers the ability to learn without being explicitly programmed.
- DL: Computationally heavy algorithms inspired by the human brain (neural networks).
The Data Science Pipeline
Data, Features, and Labels
Data is the lifeblood of AI. A dataset is a collection of structured or unstructured data.
- Features: The input variables (e.g., patient age, blood pressure, heart rate).
- Labels: The target variable you are trying to predict (e.g., whether the patient has diabetes).
Training vs Testing
You cannot test a model on the same data it learned from, because it might just memorize the answers (known as overfitting). Data is always split:
- Training Set (typically 80%): Given to the algorithm to learn the patterns.
- Testing Set (typically 20%): Kept highly secret. Used at the very end to evaluate the model's true accuracy on unseen data.
By iterating on different algorithms (like Random Forests or Support Vector Machines) and measuring their accuracy on the testing set, data scientists select the best-performing Model to deploy to production.