Lessons Worth Arguing About
Why learning science cannot settle how to teach the lessons that matter most
Maths teachers have been introducing the same concepts for centuries, and they still cannot agree on how best to do it. Take negative numbers. Someone shares their preferred introduction — debt and borrowing, or a thermometer dipping below zero, or the abstract number line, or a physical line that pupils walk along — and the replies fill with teachers who have thought about this for years and settled on something different. The arguments are detailed, sincere, and unresolved. The same thing happens with fractions, with the equals sign, with the first lesson on algebra.
What strikes me about these arguments is not that they happen but that they persist. Negative numbers have been taught for centuries, to millions of children a year, by a profession that now has access to a substantial body of cognitive science. If there were a settled best way to introduce them, we have had every opportunity to converge on it. We have not. And the disagreement is not coming from the uninformed edges of the profession. The people who disagree most precisely are the ones who have thought hardest.
Notice, too, which lessons attract this kind of argument. Nobody fights about the third consolidation lesson on percentages, or the mixed practice on simultaneous equations. Those lessons matter in the sense that every lesson has the potential to be cognitively demanding, setting off substantial reorganisation of ideas in a student’s mind. But in a practice lesson, that reorganisation is dispersed across a designed sequence of examples and the learner’s own restructuring, much of which runs beyond the teacher’s direct reach. In the lesson where negative numbers first appear, or where a fraction stops being a slice of pizza and becomes a number with a position, the reorganisation is concentrated in a single deliberate act of teacher instruction. It matters how the first representation of something is presented, because it becomes the object through which much of what follows is interpreted.
Those are the lessons the arguments cluster around: the ones where an important and complex idea is met for the first time. Some subjects seem to have more of them than others. Maths is full of these moments; history and English literature revisit their central abstractions (such as causation, metaphor, tone) across so many contexts that fewer single lessons seem to carry that kind of weight, though I am open to teachers of those subjects telling me I am wrong.
Here is the strange thing. If these lessons carry so much weight, you would expect them to be the ones our science of instruction speaks to most clearly. The opposite seems true. The closer you look at a canonical lesson, the harder it becomes to say what teaching it well actually requires. Why doesn’t learning science give us clear principles for exactly the lessons where the stakes are highest?
Both sides have a principle
It is not that the principles are missing. There are higher-order principles in instruction. Make ideas concrete for novices. Prefer representations that generalise. Surface likely misconceptions early. Avoid overloading working memory. Vary examples along the dimension that matters and keep the irrelevant features constant. We know all these principles well. Our problem is that they conflict.
Go back to negative numbers. The diver or submarine below the waterline is vivid and immediately graspable; it honours concreteness. Debt and borrowing extends cleanly into addition and subtraction of signed quantities; it honours generalisability. But, whilst concrete as an example in theory, it is only a concrete application for a child who has met the thing being applied. To one who has never owed or been owed, debt is not a familiar situation lent to an unfamiliar number. Debt is itself unfamiliar, and the supposed concrete context is as abstract as the line it was meant to improve upon. The walked number line makes direction physical; enacting an idea with the whole body is a reliable way to make it stick, which is the principle it honours. But it piles load onto working memory since the child must track facing, direction, and backwards steps at once. Thus, what sticks may be the walk rather than the structure it was meant to teach.
A teacher who insists on the most generalisable model is teaching for a payoff that arrives later, and pays for it now. And they do so with no immediate signal to tell them that the payoff is worthwhile. Generalisability is a real principle. So is the comprehensibility that they trade it against. Neither is wrong, and neither simply outranks the other: a model too abstract to grasp has no long run to pay off into, and a model grasped but ungeneralisable strands the child later. They are not rivals on a scale so much as conditions on each other, and the judgement is how far you can press one before the other gives way.1
These collisions appears everywhere once you look for it. In chemistry, the Bohr model of the atom with its electrons in tidy orbits is enormously comprehensible and generative: it carries energy levels, it makes bonding tractable, it gets novices moving. It is also wrong in a way that has to be dismantled later and teachers take different approaches to the timing and method of dismantling the model. In history, a framework for causation that might include long-term conditions, triggers, the interplay of circumstance and agency, is what lets a novice see structure in events at all, but the debate is about how firmly to impose it before it starts doing the student’s thinking for them. In English, teachers disagree about whether to give the definition of a device like metaphor first, or to build it from examples and non-examples: the definition gives pupils something precise to look for, but met too early it can become a label they apply rather than a distinction they understand.
The hard instructional questions do not usually arise because one side has a principle and the other side has a prejudice. They arise because both sides have a principle that is worthy of application.
Two wrong conclusions
When principles collide like this, two conclusions become tempting, and I think both are wrong.
The first is that the rules simply don’t work and so real teaching is just craft and intuition all the way down. If “make it concrete” and “make it generalisable” can each be cited against the other, what use is either?
The second is that every concept is sui generis. So, the introduction to fractions and the introduction to algebra are entirely separate design problems, each requiring its own novel pedagogy reasoned out from nothing.
What actually happens, I think, is something in between. You do have to reason at the level of the individual concept — there is no escaping that. But the reasoning doesn’t necessarily need to involve anything novel. When a teacher deliberates between the diving board and the debt model, they are not inventing new pedagogy. They are adjudicating between principles they already hold: deciding how much comprehensibility is worth giving up for generalisability, for this concept, with these learners, at this point in the sequence. The question is not which principle is true. They all are. The question is which matters most here.
This refines an argument I made in Regularities All The Way Down. I suggested there that beneath the general regularities of cognitive science, and the knowledge-type regularities of Engelmann, sits a third layer of idea-level specifics in the form of irreducible facts about what makes this particular concept hard. What I under-described was where that locality comes from. The idea-level layer may not mostly consistent of new rules. It is the level at which the higher rules collide, and at which their relative weights have to be set. The regularities go all the way down; what changes at the bottom is that they start to compete.
It also explains why the profession’s oldest arguments never resolve. Conceptual understanding first or procedure first; concrete before abstract or abstract before concrete; one canonical representation or several. These debates persist partly because each side is generalising from a different region of the curriculum or with different kinds of students in mind.
Settling the argument
Whether the hard choices in canonical lessons reflect unsettled weightings of higher-order principles or irreducible facts about the concept itself, they are in either case empirical questions in principle. We should be able to know whether the generalisable model of negative numbers is worth its cost in early confusion to 8 year-olds. I argued last month that platforms and student-level randomisation may finally make this kind of fine-grained verification affordable, by testing thousands of small instructional bets and preserving what we learn.
But canonical lessons are also where the verification signal is hardest to construct. The whole problem is that the competing principles may pay off over different horizons. A representation chosen for comprehensibility shows its advantage in next week’s quiz. A representation chosen for generalisability may not pay off until subtraction of negatives, or coordinates, or algebra — months or years downstream. A trial that measures only immediate success will systematically reward the comprehensible model over the generalisable one, which is not a neutral measurement but a verdict on the weighting, smuggled in through the outcome measure. To adjudicate fairly between colliding principles, the probes have to span both horizons: boundary cases and transfer items as well as familiar ones, delayed measures as well as immediate ones, the error patterns that distinguish a usable model from a brittle procedure. That is demanding. It is most demanding precisely where it matters most.
So I have stopped reading the negative numbers arguments as a sign that the profession is muddled. The arguments persist because the trade-off is real and nobody yet has the evidence to settle it. When experienced teachers argue about a first representation, they are not failing to apply the science; they are doing the part of the work the science has not yet reached — arguing about weights. That, I think, is the right way to understand the lessons worth arguing about. They are not the lessons where the rules run out. They are the lessons where everyone in the argument is holding a rule that is true.
There is a research literature doing precisely this kind of weighting, even when it doesn’t describe itself that way. A recent variation-theory study of addition and subtraction with negative numbers (Grade 6, five iterated teaching cycles) concluded that the central rationale is best made visible not through any applied context but through the sum of opposite numbers — the observation that a number and its negative cancel. That is a judgement about weights: it privileges the model that pays off later, in algebra and in the meaning of the minus sign, over the contexts that are easier to grasp on day one. Their warning is the same one I am making here, i.e. that a child who can rewrite 7 − (−4) as 7 + 4 without knowing why has been handed a procedure, not an understanding.
https://www.tandfonline.com/doi/full/10.1080/00313831.2025.2591121#d1e149



Increasingly, I’m starting to think that there not being a right way, merely a bunch of wrong ways, is why we need humans. How a teacher explains something is also linked to their own personality and preferences in ways that are non-trivial!
On the negative numbers example, I am still trying to work out how helpful the double sided counters are. They are a good model… but to what extent are they useful in introducing the concept and to what extent does it add unhelpful complexity?