( ・_・)ノ Ritchot's Corner

I built an AI Literacy course

In my last piece on this site, I closed with a line about how my goal for 2026 was to build. At the time, I did not know exactly what that would look like. I had just finished a stretch of writing about LLMs, benchmarks, and the state of the field, and I was ready to stop talking about AI tools and start making something with them. That vague aspiration resulted in expanding my master's work into a four-module, research grounded AI literacy program for the corporate workforce, built on a custom coded platform, live now at ai-literacy.ritchot.me.

I finished my Master's in Educational Technology and Instructional Design just under a year ago, and the capstone project was the seed of what you see at that link. The capstone was a course teaching AI Literacy through the mechanism of tokenization: how language models actually break text apart, predict the next token, and generate language one probability at a time. I built it on Canva, and it worked well enough to fulfill the academic requirements. But the capstone was constrained by the rubric and by the reality that academic deliverables are written for evaluators, not for the people who would actually use them (I mentioned this in my piece on the MIT study, where I called it a capstone built for rubric requirements that would largely be tossed into a void). The core idea, though, was sound. I wanted to take it and rebuild it as a real learning product, something an L&D director would envision as a useful corporate AI Literacy course.

The gap the capstone had started to address still largely exists when I see people interact with "AI." People have access to AI tools and are already using them, but they do not understand how these systems actually generate language. People mostly still treat them as magic information retrieval machines, and they lack the judgment framework to evaluate whether what comes back is reliable enough to act on. The World Economic Forum's 2025 Future of Jobs Report found that 63% of employers identify skill gaps as the primary barrier to AI-driven transformation. Tamkin and McCrory's productivity research documented an 81% median task time reduction for workers using AI effectively (with other RCTs showing between 14-56%), but "effectively" is doing a lot of work in that sentence. The productivity gains are available. The workforce does not yet have the skills to capture them. And the training that exists tends to address tool familiarity (how to prompt your LLM of choice for email drafting) rather than the underlying understanding that would let someone evaluate whether to delegate certain tasks and if the output is reliable.

I found Anthropic's 4D competency framework a few months after the completion of my capstone. The framework defines four observable dimensions of AI fluency: Delegation (knowing which tasks to assign to AI and which to retain), Description (communicating effectively with AI systems), Discernment (evaluating the reliability of AI outputs), and Diligence (maintaining transparency and accountability in AI augmented work). The 4D framework mapped cleanly onto a curriculum I had already built around the same intuitions, and it provided the competency language that connects learning objectives to measurable workplace behaviors. The research I had been collecting, from Handa et al.'s task level adoption data to cognitive science findings on overconfidence and unchecked AI acceptance, mapped onto those four dimensions cleanly enough that the framework became the organizing spine of the program.

The four modules follow a specific sequence: context, then evidence, then mechanism, then application. Module 1 establishes why AI literacy is a business problem using workforce data. Module 2 surfaces what the workforce is actually doing with AI through interactive data dashboards built on task level adoption research and productivity data. Module 3 teaches how language models actually generate text (tokenization, next-token prediction, attention mechanisms, context windows). This is the module that traces most directly back to my original capstone: if you do not understand how these systems produce language, you cannot evaluate what they produce. Module 4 integrates all four competency dimensions into applied practice: task decomposition, prompt reformulation, output verification, iterative refinement, and a diligence statement exercise that asks learners to articulate their own accountability framework for AI assisted work.

Across those four modules, there are 37 sections, 12 interactive practice activities (including filterable data dashboards, a tokenizer playground, a next-token prediction demonstration, and a multi-step AI interaction sandbox), and 7 downloadable reference materials designed as take-home tools for on-the-job application. Every module section traces backward to a documented gap in the research corpus and forward to a measurable assessment. If it could not be justified by evidence and measured by an observable behavior change, I cut it. The course today is what I consider the minimal viable product. I intend to expand it with further research papers I have and have already read, but I felt an urgency to get this course out now. It has been almost a year since my original capstone, and I still feel that the L&D/AI Enablement programs I have seen are teaching AI Literacy fundamentally wrong.

The program scopes to the AI interaction model most people use today: conversational interfaces, not agentic systems. Tools like Claude Code, Cowork, Codex, and Antigravity operate on a slightly different paradigm (autonomous multi-step execution rather than turn-by-turn dialogue) and would complicate the curriculum substantially. I may build that as a separate course. But the core competencies the program develops — knowing what to delegate, how to evaluate outputs, and when to intervene — transfer upward when the tools get more capable.

The course was built on a custom coded platform (Vite, React, TypeScript, Tailwind CSS, Recharts for data visualization) rather than in an off the shelf authoring tool. The Canva version had done what it could, but authoring tools impose their own constraints on how content flows and how interactions behave, and they limit what data you can capture. I wanted a learning experience where the instructional design drove the architecture rather than the other way around. Building the platform from scratch was also a bet on a question I think L&D will have to answer soon: how do you architect a learning experience as software? Organizations building internal tools rather than purchasing them will need people who can bridge instructional design and software engineering — and the implications of that shift extend well beyond platform choice.

I built this program in approximately 150–160 hours of total development time, as a solo developer, while holding a full-time teaching position at an international school in Singapore. Roughly 120 of those hours went to research, instructional design, and platform build. The remaining 30–40 hours went to iterative review: verifying content accuracy against source papers, refining instructional sequencing, and integrating additional research — work that is easy to leave off a project timeline but represents the difference between a product that passes a surface-level review and one that holds up under scrutiny. The early phases (research gathering, evidence compilation, instructional design documentation) consumed roughly four hours per weekend session (and at a rather casual pace). Once I had the specifications locked and the content documents written, the build phase picked up in pace, but even then, the total calendar span was approximately eight weeks, punctuated by a full week lost to illness and reduced capacity from a separate injury. To put that in context, the industry benchmark for this level of interactivity estimates roughly 735 hours of development time.

This 735 hours of development time calculation is from Bryan Chapman's standard industry benchmark survey of approximately 4,000 learning professionals. His Level 3 category (simulations, individualized interactions, gamified elements) estimates 217–716 hours of development per finished hour of learning content, with a midpoint around 490:1. The program contains approximately 1.5 finished hours of content at Level 3 interactivity (interactive dashboards, a tokenizer playground, a next-token prediction simulation, a multi-step sandbox). That gives a baseline of roughly 735 hours for the content alone. Chapman's estimates assume off-the-shelf authoring tools and account for content development only. A custom coded platform, an evaluation framework with xAPI event taxonomy, a research corpus compiled from primary sources, and project management documentation would push the realistic estimate higher, though I will not pin a specific multiplier on work Chapman's survey was not designed to measure.

Against the content baseline alone, the compression ratio is roughly 4.6–4.9x — a 78–80% reduction in development time that accounts for the full development cycle, not just the build phase. That range falls within what Tamkin and McCrory's productivity research documented across AI augmented knowledge work. I achieved that compression as an independent developer building a portfolio piece with no corporate legal review, no compliance requirements, and no learner data collection to worry about. An enterprise team would face security audits, accessibility certification, data handling policies, and cross functional coordination overhead that would narrow the ratio. The time savings would still be significant, but the compression would not be as dramatic, and that gap is itself one of the things the program teaches learners to anticipate when scoping AI augmented work.

I was able to build this program in such a compressed timeline because I had the domain expertise to specify what needed to be built and the instructional design background to know why. I also had enough hours with AI development tools to treat them as a working partner. Without those three things, the same tools produce something that looks plausible — the formatting is clean and the sections are logically ordered — but the instructional sequencing is wrong, the assessment alignment is off, and the practice activities test recall rather than judgment. It passes a surface-level review. It does not change behavior. The tool did not replace the expertise. The expertise is what made the tool productive.

The economics of learning development have changed, but not in the direction most people assume. AI does not make L&D cheaper. It makes expert L&D practitioners significantly more productive and forces every member to more fluidly work across the entire stack. A single practitioner with domain expertise, instructional design training, and fluency in AI augmented development workflows can now produce work that previously required an expansive cross functional team. The value proposition of L&D teams has shifted. Volume of output and mastery of a specific authoring tool are no longer valuable metrics. What separates useful L&D teams from obsolete ones is "taste" — the ability to design programs grounded in evidence, specify clearly what needs to be built (and what "done" looks like), more carefully evaluate whether what was built actually changes behavior, iterate quickly, and use AI tools as a development partner throughout.

The workflow that produced this compression mirrors what the program teaches. Every build session followed a specification-verification loop: I would write a detailed specification document (what to build, what content to use, what acceptance criteria to meet, what decisions the AI should make silently versus what required my approval), hand it to Claude Code for implementation, review the output against the specification, and iterate. The AI handled the volume — generating component code, populating content, wiring up state management — and that freed me to work on other tasks in parallel. I handled the judgment: verifying accuracy against source papers, checking instructional sequencing, and catching the moments where a technically correct implementation missed the pedagogical intent. That loop — specification then verification — is the Delegation-Discernment cycle the program teaches. The build process became the case study.

The platform tracks learner progress (locally, I am not collecting anything on my end for this version), knowledge check responses, practice activity completion, and time-on-task using an xAPI-aligned event taxonomy. A built-in sample admin dashboard (Cmd+Shift+A on Mac, Ctrl+Shift+A on Windows) surfaces completion patterns, knowledge check response distributions, and event timelines. The Kirkpatrick evaluation framework (reaction, learning, behavior, transfer) is visible in the program's architecture diagrams by default to signal thinking at the management level rather than the course level. Because evaluation lives in the architecture, the program generates the data an L&D director needs to justify continued investment and measure behavior change — reporting that usually requires a separate project. If the bet I described earlier pays off — if organizations do start architecting learning experiences as software rather than purchasing them — L&D teams will need to work much more closely with technical teams. Someone has to verify that the AI generated code is secure, accessible, and maintainable. Someone has to ensure the platform architecture does not introduce compliance risks or data handling problems. The era of "vibe coding" a learning platform into existence without technical oversight is a liability waiting to materialize. This means organizations will need to hire L&D practitioners who are technical enough to collaborate with engineers on shared problems, or engineers who understand instructional design well enough to evaluate what the AI produces against pedagogical intent. I can do both for this project because I am an independent developer with none of the enterprise constraints I described earlier. An enterprise team would not have that luxury. But the people who can bridge that gap between instructional design and software engineering are exactly the people organizations will need to teach AI fluency to their workforce in the first place. For me, my desire is that this ends up with L&D teams being small, technically minded, and able to move and pivot quickly.

I keep thinking about id Software1. Four people built Wolfenstein 3D in four months. Six built DOOM in thirteen. Fewer than ten developers shipped 28 games (if we include their Softdisk era) across the studio's first five and a half years. Their speed did not come from cutting corners — it came from cumulative expertise, tight iteration loops, and an obsessive investment in building their own tools rather than working around someone else's constraints. Deep technical ownership per person rather than distributed accountability across a department, and speed from domain mastery and custom tooling rather than headcount. John Carmack made the counterpoint himself when he left Meta in 2022, describing an organization with a "ridiculous amount of people and resources" that constantly self-sabotaged and squandered effort. The version of L&D I want to see looks more like id in 1993 than a corporate training department in 2024.

Organizations that invest in building this internal capability will have an advantage. Buying AI tools for an L&D team is table stakes. The leverage is in the specification-verification fluency that makes those tools productive — the ability to write clear build instructions and evaluate outputs against evidence until the product meets a standard. Teams that develop this fluency will produce learning programs at a pace and cost their competitors cannot match. The 85% of employers who plan to prioritize workforce upskilling by 2030 will need to decide whether that upskilling happens through traditional development timelines or through the kind of AI augmented workflow this project demonstrates. The program is live. You can explore it at ai-literacy.ritchot.me. Over the coming weeks, I will be publishing follow-up posts that walk through the portfolio documentation behind the program: the needs analysis and research grounding, the evaluation framework, the program design artifacts, and the project management records. Each post will stand on its own as a discussion of what careful L&D development looks like in that particular discipline, with the supporting documents available for anyone who wants to see the full scope of what went into this. The question I keep sitting with is whether the field will develop its own fluency fast enough to shape what that change looks like — or whether it will be shaped by people who understand the technology but not the discipline.

If you want to chat, shoot me an email. If you would like to get updates, subscribe to my blog via email or RSS feed. You can also follow me at LinkedIn, and X.

  1. John Romero has described the early id team as a "hive mind" — no manager, each person owning a specific domain, everyone technically deep enough to self-direct. David Kushner's Masters of Doom is the definitive account of those years and is worth reading for anyone interested in what small, technically obsessive teams can produce when the constraints are self-imposed rather than organizational.