Skip to main content

Large Language Models

Large Language Models (LLMs) are the backbone of modern text-generation platforms like ChatGPT, Claude, and Llama.

What is an LLM?

At its core, an LLM is a colossal neural network trained on a significant fraction of the entire public internet. Its fundamental goal is shockingly simple: predict the next word in a sequence.

When you ask ChatGPT a question, it doesn't "think" about the answer. It uses advanced probability, based on the terabytes of text it has absorbed, to mathematically predict the absolute best word to output next, one token at a time.

Key Terminology

1. Parameters

You often hear about models having "70 Billion" or "1 Trillion" parameters. Parameters are the internal variables (weights and biases) that the neural network learned during training. Generally, the more parameters a model has, the better it is at reasoning and retaining complex contextual knowledge, though it requires exponentially more hardware to run.

2. Tokens

LLMs don't read words; they read "tokens." A token can be a whole word (like "apple") or parts of a word (like "un-" and "-believable"). There is a strict "context window" (token limit) on how much text a model can hold in its short-term memory at once. If you paste a 500-page book into ChatGPT, it will hit its token limit and forget the beginning of the book.

3. Pre-training vs Fine-Tuning

  • Pre-training: The expensive, months-long process of feeding raw internet data into the model so it learns grammar, facts, and reasoning.
  • Fine-Tuning: Taking that raw "foundation model" and teaching it to behave correctly (e.g., teaching it to be an obedient assistant, or teaching it medical jargon).