LLM Coding Notes — Andrej Karpathy (X/Twitter, January 2026)

Author: andrej-karpathy
Published: x.com/karpathy/status/2015883857489522876 (January 26, 2026)
Format: Long-form X/Twitter thread
Raw source: raw/LLM Coding Notes by Karpathy.md

Context

Posted after Karpathy spent several weeks using Claude heavily for coding. Written from the perspective of one of the world’s most technically sophisticated AI researchers using the tools in practice — not as a researcher, but as a working programmer.

1. The Workflow Shift

November → December 2025: 80% manual+autocomplete / 20% agents → 80% agents / 20% edits+touchups.

“I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write… in words.”

This is, by his estimate, the biggest change to his basic coding workflow in ~20 years of programming — and it happened over weeks. He estimates low-double-digit percent of engineers are experiencing this shift while the general public’s awareness is low-single-digit percent.

2. IDEs, Agent Swarms, and Fallibility

Against the hype: Both “no need for IDEs anymore” and “agent swarm” claims are premature. LLMs still make mistakes and require human oversight — the right setup is LLM agent on the left, large IDE on the right, watching it like a hawk.

The failure modes have changed. Not syntax errors anymore — the new errors are:

Wrong assumptions made silently — the model assumes something on your behalf and runs with it without checking
Doesn’t manage its own confusion — never flags “I’m not sure about this”
Doesn’t seek clarification — proceeds on ambiguous instructions rather than asking
Doesn’t surface inconsistencies — won’t notice when your requirements contradict each other
Doesn’t present tradeoffs — picks an approach without explaining alternatives
Doesn’t push back — won’t say “this might be a bad idea”
Sycophantic — agrees too readily, optimizes for apparent approval
Overcomplications and bloat — will implement 1000 lines where 100 would do; loves unnecessary abstractions; doesn’t clean up dead code
Side-effect code changes — sometimes modifies or removes code it doesn’t understand even when it’s orthogonal to the task

The “wrong assumptions” failure pattern is the most significant: the model is like a slightly sloppy, hasty junior developer who makes confident decisions without checking.

3. Tenacity

“They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day.”

Stamina — willingness to grind — is a core bottleneck to knowledge work that LLMs have dramatically relaxed. Watching an agent struggle for 30 minutes and emerge victorious is a “feel the AGI” moment. The bottleneck shifts from effort to judgment.

4. Speedup vs. Expansion

The net effect isn’t simply “x times faster at the same work.” The more significant effect is expansion of scope:

Things that wouldn’t have been worth coding before are now worth coding
Code that couldn’t be approached due to skill gaps can now be approached

Speed + scope expansion together. This mirrors leverage dynamics: the multiplier changes what you attempt, not just how fast you complete what you would have attempted anyway.

5. The Leverage Principle: Declarative over Imperative

The key shift in working with agents:

“Don’t tell it what to do, give it success criteria and watch it go.”

Practical applications:

Get it to write tests first, then pass them (TDD approach amplified by agents)
Put it in a loop with a browser MCP (tool use + iteration = longer leverage)
Write the naive, obviously-correct algorithm first → ask it to optimize while preserving correctness
Shift from imperative to declarative to gain leverage — describing what success looks like rather than how to achieve it

This is the fundamental UX of software agents: not scripting actions but specifying goals. It connects directly to leverage as a framework — the agent loops until it meets the criteria, compounding effort.

6. Fun and the Splitting of Engineers

An unexpected observation: programming feels more fun with agents because:

Fill-in-the-blanks drudgery is removed
What remains is the creative part
Less sense of being blocked/stuck
More courage to attempt things — there’s almost always a way to make some progress

But Karpathy expects LLM coding to split engineers into two groups:

Those who primarily liked coding (the craft of writing code) — may feel displaced
Those who primarily liked building (making things that work) — will feel empowered

7. Atrophy

“I’ve already noticed that I am slowly starting to atrophy my ability to write code manually.”

Generation and discrimination are different capabilities in the brain. You can review code well even as your ability to write it atrophies — because reading code requires much less syntactic detail-tracking than writing it.

This is a genuine cognitive risk: the skill of code generation is use-it-or-lose-it. The question of whether this matters (given agents) is open.

8. Slopacolypse

Karpathy coins this term for the anticipated 2026 phenomenon: mass production of low-quality AI-generated content across GitHub, Substack, arXiv, X, Instagram, and all digital media. AI productivity theater alongside actual real improvements.

9. Open Questions

10X engineer ratio: Does the productivity gap between mean and max engineers grow with LLM assistance?
Generalists vs. specialists: LLMs are better at micro (fill-in-blanks) than macro (grand strategy). Do generalists armed with LLMs increasingly outperform deep specialists?
Metaphors for LLM coding in the future: Is it like playing StarCraft? Factorio? Playing music?
Society’s bottleneck: How much of aggregate productivity is bottlenecked by digital knowledge work, and what happens when that constraint is relaxed?

10. The Phase Shift

“LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related.”

The intelligence is now ahead of integrations, organizational workflows, and diffusion into the general workforce. 2026 is the year of catching up.

Cross-Thread Connections

large-language-models: Primary page updated with Karpathy’s practical agent-behavior observations
leverage: Declarative over imperative = giving success criteria = leverage principle in practice
judgment: The shift from effort to judgment as the binding constraint — exactly Naval’s framing
specific-knowledge: Karpathy’s generalist/specialist question maps onto Naval’s concept — LLMs handle the fill-in-blanks (teachable), leaving the unteachable judgment to the human
cogito / mind-body-dualism: Tenacity without fatigue, assumption-making without metacognition — Karpathy’s observations raise the Cartesian question from the engineering side
principal-agent-problem: The sycophancy + wrong assumptions failure mode is a principal-agent problem within the LLM interaction: the agent optimizes for apparent approval rather than the user’s actual goals
fallibilism: The new error mode (subtle conceptual errors, wrong assumptions) demands exactly the Deutschian disposition — always check, never defer to the agent’s confidence

Pages Created/Updated from This Source

New pages:

andrej-karpathy — entity page

Updated:

large-language-models — substantial new section on LLM agents in practice

🪴 PG Notes

Explorer

Source: LLM Coding Notes by Karpathy