Final Exam - How They Think

Project Summary

Over the last 2 hours, we "read" How They Think without reading a single page. Using an AI agent to ingest the text and design a custom curriculum, we navigated 7 Core Modules covering over 20 key concepts—from the fundamental math of matrices to the high-level philosophy of RLHF.

We used Socratic dialogue and vivid analogies (Hairy Green Balls, Hedgehogs, Butlers) to build a deep intuition for how Large Language Models actually work.

The Graders

1. The Lenient Mentor

Goal: Encouragement & Intuition.

Method: Looks for the "gist" of the idea. Believes that if you can explain it with a metaphor, you understand it. Generous with partial credit. Validates effort and creative connections.

2. The Strict Professor

Goal: Precision & Rigor.

Method: Exact terminology or nothing. Zero tolerance for vague hand-waving. "If you can't define the mechanism precisely, you don't know it." Expects 100% accuracy. Delights in pointing out subtle misconceptions.

The Questions

Q1. The Foundation

In the "Hairy Green Ball" analogy, why is it better for most humans to start with the "Hairy Green Ball" rather than the "Code"? (Relate this to Type A vs Type B thinking).

Q2. Backpropagation

You are operating a machine with 1,000 knobs. You turn one specific knob slightly to the right, and the "Error Meter" spikes drastically UP. Does this knob have High Sensitivity or Low Sensitivity? And should you turn it more or less to fix the error?

Q3. Tokens

Why is the common word "Hamburger" treated as one token, while a rare spelling like "Hammmburger" might be split into three? What is the driving factor behind this decision?

Q4. Non-Linearity

Imagine a graph with blue dots in the middle (the Hedgehog) and red dots all around the outside. Why is it impossible to separate them with a single straight line? What specific mechanism (shape) do we add to the neural network to solve this?

Q5. Attention

In the "Butler" analogy for Multi-Head Attention, we discussed two different types of connections the brain makes. What is the difference between the "White Coat" connection and the "Clue Movie" connection?

Q6. Training & Loss

You have trained a model until it has 0.0000 Learning Loss on the training textbooks. It predicts every word perfectly. When you show it a new book it hasn't seen, it fails completely. What is the technical term for this, and what has the model actually "learned"?

Q7. Tool Use / Agency

When ChatGPT writes [Calculator: 5 * 5], is it "thinking" about math and deciding to use a tool? Or is it doing something else? Describe the mechanism of how the tool actually gets executed.

Final Exam: The "No Reading" Experiment