1. Supervised learning
The model trains on labeled examples: input → correct output. The classic recipe for spam classifiers, sentiment models, image tagging. It needs labeled data, which is expensive at scale — that's why pure supervised learning isn't how foundation models are pre-trained, but it's how they're fine-tuned for narrow tasks.
2. Unsupervised learning
No labels. The model finds structure on its own — clusters, anomalies, embeddings, dimensionality reduction. Less of a headline today but still core for grouping unknown data, fraud anomaly detection, and building embeddings used in RAG.
3. Self-supervised learning
The trick that made modern LLMs possible. The model invents its own labels from raw data — predict the next token, fill in the masked word, contrast a matching image-caption pair. Huge unlabeled corpora become trainable, and the model picks up structure that transfers everywhere.
4. Reinforcement learning (RL)
The model interacts with an environment and learns from a reward signal. Used in robotics, game-playing, and increasingly in agent training where the 'environment' is a set of tools and the 'reward' is task success. RL is sample-hungry and unstable, but it's how you teach a model to choose actions, not just predict text.
5. Reinforcement learning from human feedback (RLHF)
RL with a learned reward model that approximates human preference. The standard alignment recipe for instruction-tuned chat models: humans rank outputs, a reward model learns to predict their rankings, the LLM is fine-tuned to produce highly-ranked outputs. Recent variants (DPO, KTO) skip the explicit reward model but keep the same idea.
6. Transfer learning & fine-tuning
Not a learning paradigm so much as a workflow: take a model trained on a huge generic corpus and continue training on your narrow data. Cheaper than training from scratch and the foundation of every domain-specific LLM project today.
