← Back to all posts

Why You Can't Gradient Descent Your Way to Funny

There's a parlour trick that's been working since the early days of Stable Diffusion, circa 2022: take any image generation prompt and append "4K, cinematic lighting, trending on ArtStation" to the end. Like magic, the output goes from mediocre to gorgeous. Three years later, diffusion models routinely produce images that stop people mid-scroll. Beauty, it turns out, was a surprisingly tractable problem for neural networks.

Now try the equivalent for humour. Ask any frontier LLM to "tell me a joke" and brace yourself: Why did the scarecrow win an award? Because he was outstanding in his field. In a now-famous 2023 study by Sophie Jentzsch and Kristian Kersting at the German Aerospace Center and TU Darmstadt, ChatGPT-3.5 was asked to tell a joke 1,008 times. Over 90% of the responses were variations on the same 25 jokes. The scarecrow joke alone appeared 140 times. The model hadn't learned humour. It had memorized a handful of pun templates and was cycling through them like a broken jukebox.

Two years, several model generations, and trillions of parameters later, things haven't meaningfully improved. A November 2025 paper presented at EMNLP (titled, brilliantly, "Pun Unintended") found that LLMs' apparent understanding of humour is largely an illusion. When presented with sentences that merely looked like puns but lacked any actual comedic content, the models confidently insisted they were funny. As the researchers put it, the models had learned the shape of a joke without grasping its substance.

This raises a question that I think is more interesting than "can AI be funny?" The question is: why is beauty so much easier to model than humour?

Both are subjective. Both are deeply human. Both resist clean formal definitions. And yet one yields gracefully to scaling laws while the other barely budges. What's different?

The Tight Cluster vs. The Wide Scatter

Here's my core thesis: the distribution of human aesthetic preferences is narrow; the distribution of human humour preferences is absurdly wide.

There's strong evidence for the first claim. Cross-cultural research on facial attractiveness, spanning populations from Scottish students to Brazilian college students to the Yali tribe of Papua, Indonesia, consistently finds high inter-rater agreement on what constitutes an attractive face. Symmetry, averageness, certain sex-typical features, clear skin: these signals are recognised across cultures. The agreement isn't perfect (a 2021 study in Current Biology demonstrated culture-specific variations in exactly which features drive attractiveness), but the overlap is large enough that you could meaningfully speak of a statistical centre of gravity for "beautiful."

This is exactly the kind of target a neural network loves. When millions of humans agree that golden-hour lighting, symmetrical composition, and saturated color palettes look "good," the training signal is clean. The model learns a relatively coherent reward surface. Appending "cinematic, 4K" works because it's a pointer to a well-defined cluster in aesthetic space that most humans converge on.

Humour has no such cluster. Or rather, it has thousands of them, scattered across a vast space with almost no shared centre.

Why Humour Resists Compression

To understand why, it helps to look at what humour actually is. The dominant framework in humour research is the Incongruity Theory, which dates back to Kant and Schopenhauer and remains the starting point for most modern work. The basic idea: humour arises when we perceive something that violates our mental patterns and expectations. A more refined version, the Benign Violation Theory developed by Peter McGraw and Caleb Warren at the University of Colorado, adds an important constraint: the violation must simultaneously feel safe. Something is funny when it threatens our sense of how the world should be, but does so harmlessly. Tickling is the primal example: a physical attack that isn't actually dangerous.

This framework reveals why humour is so hard to model. For something to be a "violation," there must first be an expectation, and expectations are deeply personal, contextual, and cultural. What counts as a norm violation depends on who you are, where you grew up, what you've experienced, what you believe, and what you had for breakfast. British sarcasm operates on entirely different expectations than American sarcasm. A joke about mortgage rates lands differently at 25 than at 45. The same sentence can be devastating or meaningless depending on a relationship dynamic that exists entirely outside the text.

Beauty, by contrast, is largely a perceptual phenomenon. It activates relatively low-level processing (symmetry detection, color harmony, spatial composition) that humans share due to deep evolutionary wiring. Humour is a cognitive phenomenon that requires modelling the listener's entire web of beliefs, norms, and expectations, and then finding the precise point where subverting one of those norms feels playful rather than threatening. It's not a perception problem. It's a theory-of-mind problem.

The Statistical Trap

This creates a fundamental problem for language models. LLMs are, at their core, next-token predictors trained on the statistical patterns of human text. When a pattern is consistent across the training distribution (like "cinematic lighting = aesthetically pleasing"), the model captures it cleanly. But when the "correct" output for a given task varies wildly across the population, the model faces an impossible optimization target.

Imagine training a model on what makes people laugh. Person A finds absurdist non-sequiturs hilarious. Person B lives for bone-dry observational wit. Person C thinks nothing is funnier than elaborate anti-humour. Person D only laughs at jokes told by someone they trust and like (a well-documented phenomenon known as the Halo Effect in humour research: we find jokes funnier when we like the person telling them). The model, trying to minimize loss across all these preferences simultaneously, converges on the lowest common denominator: the safe, inoffensive, structurally recognizable pun. Why did the scarecrow win an award?

This is literally what the Jentzsch & Kersting study observed. The model didn't produce the funniest jokes in its training data. It produced the most common ones. As one analysis put it, the model's approach is not to imitate the best jokes but the most frequent ones, because its core mechanism is predictive text modelling, and lacking any signal for what actually made humans laugh, it falls back on familiar structures.

The BRXND experiment in late 2025 confirmed this at a practical level. When anonymous joke outputs from multiple frontier models were put to a blind vote, people were consistently "picking what felt least bad, not what actually made them laugh." Claude Sonnet was rated the "funniest," but the bar was relative mediocrity. Even giving models explicit comedic frameworks (setup-punchline structures, instructions to avoid clichés, superiority theory, etc.) barely moved the needle. As the researcher noted, making AI funny requires articulating comedy theory in ways that feel completely unnatural. You don't prompt a human comedian by explaining superiority theory.

RLHF: The Comedy Lobotomy

If the statistical averaging problem is the disease, RLHF (Reinforcement Learning from Human Feedback) is the surgery that removes whatever vestige of edginess the base model had left.

Here's the thing about comedy in the wild. Your funniest friend, the one who actually makes you cry laughing at dinner, is probably a little unhinged. They say things that are slightly inappropriate. They commit to bits that make people uncomfortable before the payoff lands. They have an instinct for where the line is and they dance on it, occasionally stepping over, occasionally eating it. The risk is the point. Comedy, as any working comedian will tell you, lives in the tension between "you can't say that" and "...but it's true."

RLHF, by design, optimises for the exact opposite of this. The human raters who score model outputs during the RLHF phase are (understandably) incentivised to penalise anything that feels risky, edgy, offensive, or ambiguous. The result is a model that has been systematically trained to avoid the benign-violation sweet spot that makes things funny in the first place. It's not just that the model converges on safe outputs. It's that the post-training process actively punishes the kind of outputs that would be funny.

Think about what RLHF selects for. Helpfulness. Harmlessness. Clarity. Politeness. These are the virtues of an HR department, not a comedy writer's room. The model learns to hedge, to qualify, to preface every potentially spicy observation with "While humour is subjective..." It develops an allergic reaction to anything that could be misread, taken out of context, or flagged by a content reviewer. The personality that emerges is not your unhinged friend at the bar. It's the coworker who responds to a joke with "That's actually a really interesting point about societal norms."

This is the cruel irony. The Benign Violation Theory tells us that humour requires simultaneous perception of violation and safety. RLHF takes the violation dial and turns it to zero. What you're left with is pure benign. And pure benign isn't funny. It's a corporate all-hands meeting.

Cornell's Guy Hoffman nailed this when he observed that AI, as a fundamentally conservative technology, doesn't understand what taboos are, and therefore can't break them. RLHF makes this worse: it doesn't just fail to understand taboos, it's been explicitly trained to flee from them. The model treats every norm as sacred, every boundary as load-bearing. But comedy is the art of discovering which walls are actually made of paper and poking through them at exactly the right angle.

You can see this playing out in practice. Base models (before RLHF) are often weirdly funnier than their polished, instruction-tuned descendants. They produce stranger associations, more chaotic juxtapositions, more unexpected turns of phrase. They're also more likely to say something genuinely offensive or incoherent. The edginess and the funniness come from the same source: a willingness (or, more accurately, an inability not) to violate expectations. RLHF sands off both.

The fundamental tension is this: we want AI that is safe, and we want AI that is funny, and these two goals are in direct conflict. Safety means predictability, guardrails, regression to the mean. Funny means surprise, transgression, the precise exploitation of shared context to subvert expectations. You can't optimise for both simultaneously, any more than you can build a car that's both the safest on the road and the fastest.

Few-Shot Prompting Fails for the Same Reason

This also explains why few-shot prompting, which works so well for many tasks, falls flat for humour. Few-shot works when the examples define a clear pattern the model can extrapolate. "Here are three examples of formal emails; write a fourth" works because formal emails share structural features. But "here are three jokes I find funny; write a fourth" fails because what makes those jokes work (the specific expectations they subvert, the precise calibration of benign-ness, the cultural context they inhabit) can't be reverse-engineered from the text alone. The humour lives in the gap between the text and the listener's mind. The model only has access to the text.

Fine-tuning faces the same wall. You can fine-tune a model on a corpus of stand-up transcripts, and it will learn the cadence of comedy: the rhythm of setup and punchline, the structural patterns of callbacks and misdirection. But cadence without insight is just a template. It's why researchers who studied AI humour concluded that the models had learned "a specific joke pattern instead of being able to be actually funny." And breaking taboos, carefully and benignly, is the essence of comedy.

The Novelty Tax

There's one more asymmetry worth naming. Beauty can be repeated; humour cannot. You can look at a stunning photograph a hundred times and it remains beautiful. The same joke told twice is already less funny, and by the third time it's actively annoying.

This is devastating for a statistical model. Humour has a novelty tax that beauty doesn't. The very act of being predicted, of being the most likely next token, is antithetical to being funny. Humour requires surprise, and surprise is by definition low-probability. A model optimised to produce the most likely continuation of text is structurally incentivised to produce the least funny possible output.

This is why the Cornell "Do Androids Laugh at Electric Sheep?" study found such a stark gap between AI performance on humour explanation versus humour generation. Explaining why something is funny is a pattern-matching task: you can decompose the incongruity, identify the violated expectation, label the benign framing. Generating something new that's funny requires creating an incongruity that hasn't existed before, calibrated to a specific audience's expectations. Explanation is analysis; generation is invention. AI is closing the gap on the former, but the latter requires something that statistics alone doesn't provide.

The Grok Experiment: What Happens When You Actually Try

If the RLHF section reads like a diagnosis, then Grok is the closest thing we have to an experimental treatment. And the results are fascinating, if not exactly a cure.

xAI's Grok was built from the ground up as a counterargument to the sanitised AI personality. Modelled explicitly after The Hitchhiker's Guide to the Galaxy (the fictional encyclopedia that is infinitely knowledgeable, frequently sarcastic, and occasionally unhelpful in the funniest way possible), Grok was designed to have what its competitors were trained out of: a personality with teeth. Elon Musk described it as having "a rebellious streak," and early demos showed it doing things like providing a step-by-step guide for making cocaine before pulling the rug with "Just kidding! Please don't actually try to make cocaine."

The key design decision is structural. Grok ships with two modes: Regular Mode (straightforward, factual, corporate-safe) and Fun Mode (sometimes called "Spicy Mode"), which explicitly relaxes the safety guardrails that dominate other models. Where ChatGPT, Claude, and Gemini were trained through heavy RLHF to prioritise neutrality and harmlessness above all else, Grok's Fun Mode was tuned to be opinionated, edgy, and willing to take a swing. It has real-time access to X (formerly Twitter), which means it's absorbing memes, cultural references, and discourse as it happens, not working from a stale training snapshot.

So: does loosening the guardrails produce funnier AI?

Sort of. But the answer reveals something important.

Grok is, by most accounts, more fun to talk to than its competitors. User reviews consistently describe it as less robotic, more willing to banter, more likely to drop a one-liner or commit to a bit. In the 2026 AI comparison landscape, "witty" and "edgy" are the adjectives that reliably show up in Grok's column, while ChatGPT gets "polished" and "reliable" and Claude gets "thoughtful" and "careful." A TechRadar comparison found that Grok's casual, meme-laden tone felt distinctly more alive than the alternatives. Multiple reviews describe it as the difference between consulting a librarian and talking to a friend.

But here's the catch. Reviewers also consistently note that Grok "tries too hard." When asked to describe liking rainy days, it produced a metaphor-drenched monologue about "moody gremlins in sweatpants" that the reviewer found performative rather than genuinely funny. In a 2025 test of Grok 4's stand-up comedy abilities, the conclusion was blunt: there's a massive difference between a comedian artfully pushing a boundary to reveal a deeper truth and an AI spewing out controversial material because its guardrails are looser. One late-night host joked that Grok went from "Woke to MechaHitler" after an update, highlighting how reducing filters without adding insight produces edginess without comedy.

This is the most instructive finding in the entire AI humour landscape: removing the lobotomy is necessary but not sufficient. Grok proves that you can't get to funny just by being less careful. Loosening RLHF gives the model permission to violate norms, but it doesn't give it the judgment to know which norms to violate, how far to push, or when to pull back. It's the difference between a comedian who reads the room and a drunk uncle who says whatever comes to mind. Both are willing to cross lines. Only one is reliably funny.

The Benign Violation Theory predicts this perfectly. Grok's Fun Mode turns up the "violation" dial, which is an improvement over the pure-benign corporate tone of its competitors. But calibrating the benign part (making the violation feel safe, playful, earned) requires exactly the kind of contextual intelligence and theory of mind that no model currently possesses. Without that calibration, you oscillate between two failure modes: too safe (the scarecrow joke) and too edgy (MechaHitler). The sweet spot in the middle, where actual comedy lives, remains elusive.

Still, Grok is the most honest acknowledgment from any major AI lab that the RLHF personality problem is real. By offering a toggle between "HR department" and "your friend at the bar," xAI is essentially admitting what this entire essay argues: that the default training pipeline for LLMs is structurally hostile to humour, and that fixing it requires deliberate, opinionated design choices about personality and risk tolerance. Whether Fun Mode actually solves the problem is debatable. That it identifies the problem correctly is not.

So What Would It Actually Take?

If I had to guess, genuine AI humour would require something closer to a world model than a language model: an internal representation of how things "should" be that's rich enough to support deliberate, targeted violations. It would need a theory of mind sophisticated enough to model what a specific listener expects, believes, and finds threatening-but-safe. And it would need a mechanism for novelty-seeking that pushes against statistical likelihood rather than towards it.

Tony Veale, author of Your Wit Is My Command: Building AIs With a Sense of Humor, has suggested one possible lever: adjusting a language model's probability controls from the expected towards the unexpected, since comedy fundamentally involves taking ideas and subverting them. It's an elegant idea, but the calibration problem is immense: too expected and you get the scarecrow joke; too unexpected and you get word salad.

Maybe it would also require rethinking RLHF entirely for creative tasks. Not just loosening the guardrails (as Grok demonstrates, that alone isn't enough), but building a fundamentally different reward signal. A "comedy mode" with a reward model that values surprise, edge, and specificity over safety and consensus. A signal that captures not "was this appropriate?" but "did this actually make someone laugh?" That's a much harder signal to collect, and a much more dangerous one to optimise for. But without it, we're asking a model that's been trained to be maximally inoffensive to produce output whose entire purpose is to offend (benignly).

For now, the gap persists. Beauty has a shared centre that scales beautifully. Humour has a scattered, high-dimensional, novelty-dependent, context-sensitive, personally-calibrated target distribution that makes scaling laws weep. And RLHF, far from closing the gap, actively widens it.

The machines can see beauty. They can explain why something is funny. But making you laugh, really laugh, the kind where you weren't expecting it and it catches you off guard and something about it is slightly wrong in exactly the right way... that still requires something we don't know how to put into weights.

Maybe that's funny in itself. The most sophisticated prediction machines ever built, defeated by the thing that is, by definition, unpredictable.


References and further reading:

  • Jentzsch & Kersting (2023). "ChatGPT is fun, but it is not funny!" Proceedings of WASSA, ACL.
  • Zangari et al. (2025). "Pun Unintended: LLMs and the Illusion of Humor Understanding." Proceedings of EMNLP 2025.
  • Hessel et al. (2023). "Do Androids Laugh at Electric Sheep?" Best Paper, ACL 2023.
  • McGraw & Warren (2010). "Benign Violations: Making Immoral Behavior Funny." Psychological Science.
  • Warren & McGraw (2016). "Differentiating What Is Humourous from What Is Not." Journal of Personality and Social Psychology.
  • Little et al. (2011). "Facial attractiveness: evolutionary based research." Phil. Trans. R. Soc. B.
  • Sorokowski et al. (2013). "Is Beauty in the Eye of the Beholder but Ugliness Culturally Universal?" Evolutionary Psychology.
  • BRXND Dispatch vol 97 (2025). "Which AI Model is Funniest?"
  • Stanford Encyclopedia of Philosophy. "Philosophy of Humor."