Waboom AI
AI Training
AI Automation
AI Voice Agents
Case Studies
Resources
Contact
09 888 0402
Back to BlogAdvanced

The Karpathy Loop: build a Claude skill that improves itself

Leonardo Garcia-Curtis13/05/2026
TL;DR

The Karpathy Loop is a feedback pattern named after Andrej Karpathy's iterative training playbook. Applied to Claude skills, it looks like this: a skill produces output, a second skill critiques it against criteria you defined, the original skill updates its own instructions based on the critique, and the next run is slightly better. Set it up as a weekly Cowork task and the skill is genuinely better six months later than it was the day you shipped it. The trick is defining the critique criteria sharply, otherwise the loop drifts. Four moving parts: producer skill, critic skill, evaluation criteria, and a change log.

The Karpathy Loop: build a Claude skill that improves itself

8 min read  ·  Operator-level pattern  ·  Last updated 13 May 2026

Part of Learn Claude Code: The Complete Operator's Guide. For the operator's overview of Skills, Connectors, Cowork, and Artifacts, start there.

Most people ship a skill, use it for a month, and the skill stays the same.

The output gets better only when you remember to fix it. Which is almost never.

Karol Zieminski borrowed an idea from Andrej Karpathy's training loops and applied it to Cowork. The skill itself iterates. Every week it critiques its own output and updates its instructions. The version you ship in month six is genuinely better than the version you shipped in week one, and you did not touch it.

This is how to build one.

What a Karpathy Loop actually is

In machine learning, Karpathy's iterative loop is the discipline of building a baseline, evaluating it carefully, identifying the single biggest failure mode, and fixing that one thing. Repeat forever. Compounding.

Applied to a Claude skill, the loop has four moving parts:

  • The producer skill. The thing that does the work. Writes the blog post, summarises the meeting, builds the report.
  • The critic skill. A separate skill whose only job is to score the producer's output against criteria you defined.
  • The evaluation criteria. A short rubric the critic uses. "Does the post sound like Leo? Does it use any of the banned phrases? Does it have a clear call to action?"
  • The change log. A file the loop writes describing what changed in the producer's instructions this cycle.
  • The loop runs weekly. The producer writes. The critic scores. If the score is below threshold on a specific criterion, the loop updates the producer's instructions to address that one criterion. Change log written. Next week the producer is slightly better.

    Why it works

    Three things compound at once.

    First, the critic stays calibrated. It is reading the same rubric every week, so its scoring is consistent.

    Second, the producer never drifts off the criteria. Every cycle, the instructions get refined toward what the criteria reward.

    Third, you stay informed. The change log every week is a 30-second read. You spot when the loop is improving or going in a weird direction.

    It is the same pattern good editors use on writers. The skill gets edited by another skill, on a schedule, against rules you defined.

    The four-part setup

    Part 1: Build the producer skill normally

    Use the Module 4 walkthrough in Skills 101. Build the skill that does the work you actually want done. Get it to v1.

    For our example we will use a blog-post writer skill called blog-writer.

    Part 2: Build a critic skill

    This is the new piece. The critic skill's only job is to score output against your rubric.

    In Claude Desktop, hit the + next to Skills, pick Create skill, then Write skill instructions. The dialog gives you three fields: name, description, instructions. Fill them in for the critic.

    The Create skill flow in Claude Desktop, showing the path: plus button to Create skill to Write skill instructions

    The critic skill instructions look like this:

    You are a critic skill for blog posts. Read the post the

    producer wrote. Score it 1-5 on each of these criteria:

    1. Voice match: does it sound like Leo?

    2. Phrase ban: any use of "leverage", "stakeholders",

    "moving forward", em-dashes, en-dashes?

    3. Call to action: does it end with a workshop CTA?

    4. Internal links: are there at least 3 real internal links

    to existing pages?

    5. Structure: bold sub-headers on their own line, short

    paragraphs, no walls of text?

    For each criterion below 4, write one sentence of specific

    feedback. Output as JSON.

    Sharp criteria are everything. "Better tone" is useless. "Does it use the word stakeholders" is testable.

    Part 3: Schedule the loop in Cowork

    In Cowork, schedule a weekly task. Something like Sunday at 8pm.

    Take the three blog posts the blog-writer skill produced this

    week from the "Blog drafts" Notion database. For each, invoke

    the critic-blog skill to score it. Identify the criterion with

    the lowest aggregate score across the three posts. Update the

    blog-writer skill's instructions to specifically address that

    criterion. Write what-changed.md noting which criterion scored

    lowest and what change was made.

    That is the loop. The critic looks at last week's work, finds the weakest pattern, updates the producer to fix it.

    Part 4: Read the change log every week

    This is the discipline that keeps the loop honest.

    Open the change log every Monday. 30 seconds. Three questions:

    • Does the change make sense?
    • Is the loop moving the producer in the direction I actually want?
    • Is any criterion stuck at a low score for multiple weeks (meaning the loop cannot fix it on its own)?
    • If anything looks off, intervene. The loop is a junior teammate, not a god. You are still the lead.

      Where this breaks

      Soft criteria. "Tone is better" is not measurable. The loop cannot improve against it. Every criterion in your rubric must be checkable in one read.

      Too many criteria. Five is sharp. Twenty is noise. The loop will jitter between trying to improve everything and fix nothing.

      Critic too lenient. If the critic scores everything 4-5, no improvement signal. Calibrate the critic on a deliberately bad draft and a deliberately good one. Make sure the score range is wide.

      No human in the read. The change log is what keeps the loop accountable. Skip it for a month and you stop knowing what your producer skill is doing.

      What we use this for at Waboom AI

      If you want to see a Karpathy Loop running live on real client work, we demo it at our Claude Code course. Watching a skill rewrite its own instructions on a Sunday night is the moment most operators stop treating Claude as a chat tool.

      Our voice-DNA enforcer skill runs a Karpathy Loop weekly. Every weekend it reads the week's blog drafts, scores them against our voice rules, and tightens the rules where the drafts drifted.

      Six months in, the skill catches things we did not even know to flag in week one. Em-dashes hidden in image alt-text. Three-adjective fanfare. "Imagine this" openings.

      The original v1 of that skill had nine rules. The current version has 28. We wrote nine of them. The loop wrote the other 19.

      What to do next

      You need v1 of a producer skill before you build a loop. Do Skills 101 first if you have not.

      Then pick the work you do most often, build a producer, build a critic, schedule the loop. By month three you will feel the compounding.

      Credit: this post adapts Karol Zieminski's original Karpathy Loop write-up. We strongly recommend reading the source for additional context.

      Self-paced

      Build your first skill before you build a loop. Six short modules. One hour. Free.

      Start Claude Skills 101 →

      Hands-on with us

      Live workshop covers loops, critics, and the change-log discipline live. You leave with a producer + critic running on your real work.

      See the workshop →
      LG

      Leonardo Garcia-Curtis

      Founder & CEO at Waboom AI. Building voice AI agents that convert.

      Ready to Build Your AI Voice Agent?

      Let's discuss how Waboom AI can help automate your customer conversations.

      Book a Free Demo

      Related Pages

      AI Receptionist Australia

      24/7 inbound call answering with Australian accent.

      AI Sales Agent Australia

      Outbound dialling, qualification, meeting booking. Live in hours.

      AI Voice Agents for Mortgage Brokers AU

      Outbound to homeowners hitting fixed-term rollover.

      Related Articles

      AU$374.4M in sales already lost to competitors, uncovered from just the first 5,200 calls

      AU$374.4M in sales already lost to competitors, uncovered from just the first 5,200 calls

      Browse the Claude skills directory: find a working skill in 60 seconds

      Browse the Claude skills directory: find a working skill in 60 seconds

      Claude for Excel: the 15-minute setup for non-devs

      Claude for Excel: the 15-minute setup for non-devs

      Waboom AI

      Empowering New Zealand and Australian businesses with AI voice agents and automation that deliver real, measurable value.

      info@waboom.ai+64 9 888 0402
      Level 8, 139 Quay Street
      Auckland CBD, New Zealand

      Voice Agents

      • AI Voice Agents
      • AI Phone Answering
      • AI Virtual Receptionist
      • Medical Answering Service
      • Answering Service Australia
      • AI Sales Agent
      • Voice Agent Pricing
      • Listen to Voices
      • Real Estate Guide

      By Industry

      • Real Estate
      • Mortgage Brokers
      • Insurance Brokers
      • Property Managers
      • Medical Clinics
      • Dentists
      • Vets
      • Childcare + ECE
      • Car Dealerships
      • Construction + Builders
      • Electricians
      • Plumbers
      • HVAC
      • Accountants
      • Law Firms

      Workshops

      • AI Team Training
      • AI Strategy Workshop
      • AI Champion Workshop
      • Claude Team Training
      • Claude Code Workshop
      • Lovable Workshop
      • Free AI Workshop

      Automation

      • AI Automation
      • Microsoft Copilot Agents
      • Integrations

      Company

      • About Us
      • Contact
      • Partners
      • Resources
      • Blog
      • AI Agency NZ
      • AI Agency Australia

      Powered by leading AI technologies

      VAPIRetell AIOpenAIZapierMakeStripe

      © 2026 Waboom.ai. All rights reserved.

      PrivacyTermsSecurity