( ・_・)ノ Ritchot's Corner

the needs analysis for my AI literacy course

Key Points

A little under a year ago I finished my Master's in Educational Technology and Instructional Design, and the capstone was a course teaching AI literacy through the mechanism of tokenization: how language models break text into tokens, predict the next one, and generate language one probability at a time. The instructional problem I wrote then was narrow and specific. Learners in Grades 9–11, my capstone read, "lack foundational understanding of how AI models process text through tokenization, limiting their ability to critically engage with AI tools." I built the thing on Canvas, it satisfied the rubric, and I was reasonably proud of it. But it was a course written for evaluators, scoped to a secondary classroom, and constrained by an academic deliverable's requirements rather than the needs of anyone who would actually use it. I said as much in my last post.

When I decided to rebuild it as a real learning product for the corporate workforce (live now at ai-literacy.ritchot.me), I built the whole thing back from scratch, starting with the needs analysis. A learner analysis written for fifteen-year-olds does not transfer to a thirty-eight-year-old operations manager who already uses LLMs every day and does not know they are overrelying on it.

Starting with the problem

The temptation in any instructional design project is to start building. You have a topic, you can picture the modules, and the authoring tool is right there. You believe you are at the very least directionally correct based on your experience, so just get to it. The discipline of needs analysis exists to stop you, because the order of operations matters: you define the measurable business goal first, then the observable behaviors that would achieve it, then the practice that builds those behaviors, and only then the content. I structured the analysis as four documents: an executive problem statement, a market-level capability gap analysis, an evidence-based learner persona, and an action map following Cathy Moore's framework. Each one exists to make a later design decision defensible rather than intuitive. All four are also available directly as PDFs: the problem statement, the capability gap analysis, the learner persona, and the action map.

My capstone had versions of some of this. It contained a learner analysis and a section I called "anticipated learner needs," and those were honest as far as they went. But they went as far as a rubric and a secondary audience, which is to say not very far into the world the course was now meant for. Rebuilding meant re-grounding every claim in primary research about the actual target population: mid-career knowledge workers in functions like marketing, operations, HR, finance, and product. The persona that came out of that is a composite, but it is a composite where every attribute traces to a documented finding rather than to my sense of what a learner is probably like.

What the evidence found

The first thing the research found was a judgment gap. Sixty-five percent of professionals describe their AI use as augmentative: the model supports their judgment rather than replacing it. The behavioral data disagrees, and by a margin that depends on which dataset you use. Against Anthropic's task-level analysis of actual Claude conversations, the real split is 57% augmentation to 43% automation (Handa et al., 2025), an eight-point gap; against the more recent internal analysis the Anthropic Interviewer cites, it widens to eighteen. Either way the direction holds: people believe they are exercising oversight in situations where the task structure shows they are not. They do not know they are overrelying. That is not a motivation problem you can solve with a better pep talk; it is a design constraint, and it determined how the whole program treats delegation and verification.

The productivity stakes are large, but the headline number is easy to oversell when there is really a larger range depending on the trials and tasks tested. Tamkin and McCrory's analysis of Claude conversations reports a median task-time reduction of 81%. But that 81% measures only the work done inside the AI interaction. The randomized controlled trials in the same analysis measure the full work cycle, the AI exchange plus the time spent afterward verifying and correcting what it produced, and they land between 14% and 56%. Both numbers are real; they just measure different things. The distance between them is mostly that post-interaction work, which is exactly the judgment a competency program is supposed to build.

The compliance risk compounds the productivity story. A survey of 319 knowledge workers across 936 real tasks found that higher confidence in generative AI predicts reduced critical thinking, and that the effect is sharpest precisely when a task feels routine (Lee et al., CHI 2025). The worker who has used a tool successfully for three months is more likely to skip verification than the one who started last week. So the program could not treat confidence as the goal. Confidence without calibration is a failure mode.

There is also a geographic dimension that I happen to live. I am writing this from Singapore, which Anthropic's Economic Index puts at 4.6 times the Claude usage you would expect for its working-age population; Canada, my home country, sits at 2.9 times, and several large economies sit well below parity (Appel, McCrory, and Tamkin, 2025). For a multinational, that means teams in the same company, working the same deliverables, are operating at very different levels of fluency, and a standardized competency baseline is one of the few things that puts them on common footing.

The last finding was the one I did not expect to be so prevalent. Sixty-nine percent of professionals report that social stigma is an active barrier to AI use at work, not reluctance to adopt but reluctance to disclose adoption that is already happening (Anthropic Interviewer, 2025). People are using these tools, seeing the gains, and hiding both. That made a social dimension, to me, as important as the mechanical understanding. A course that builds individual competence but ignores the social conditions for visible use produces learners who apply what they learned privately and never change anything their organization can measure.

This is where the capstone's core idea earned its place. The original problem, that tokenization stays a "black box" and reinforces surface-level engagement with AI, turned out to be the same problem the workforce data describes, just dressed in adult clothing. The misconception I had named for teenagers, confusing generation with retrieval, is the precise misconception that produces uncritical acceptance of a fabricated citation in a client deliverable. The mechanism survived the audience change. It became Module 3, and everything in the program that asks a learner to evaluate an output rests on it.

Why the market solves the wrong problem

The best offering from the big labs is Anthropic's AI Capabilities and Limitations course, and it landed about a year after I had already done the same groundwork. It frames generative AI around behavioral properties, addresses hallucination mechanics directly, and teaches learners to treat specificity (names, dates, statistics, citations) as the zone where fabrication concentrates. That is real foundational content, and most programs do not approach it.¹

The structural limitation is, in my opinion, the target competency. The dominant design paradigm across the market trains people to operate a specific interface, whether a Copilot summary or a Gemini draft, without building the conceptual model needed to evaluate what the interface produces. CompTIA's own AI's Impact on Productivity and the Workforce research notes that just 34% of companies require AI training at all, and the programs that exist tend to measure completion rather than behavior change. A 95% completion rate confirms that employees clicked through modules. It says nothing about whether they retained anything or did their jobs differently afterward. None of the major programs I looked at include a behavioral follow-up, a manager observation protocol, or any structural feature that addresses the 69% stigma finding. They are, for the most part, well-produced answers to a question the research suggests is not the binding one. The binding question is judgment, and judgment is harder to package.

From analysis to design

The payoff of doing the analysis this way is a property most courses cannot demonstrate: traceability. In the action map, every observable behavior traces backward to a documented gap in the research and forward to a specific practice activity and assessment. That backward chain, from a business goal through behaviors to the research that warrants them, is the thing I would point a hiring manager to. It is also the thing the section count and the production polish cannot tell you, because a course can look finished and still be built on assumption.

The 4D competency vocabulary (Delegation, Description, Discernment, Diligence), drawn from Anthropic's fluency framework, does double duty here. It is a competency taxonomy, but it is also a change-management instrument, because a shared language for how you use AI is a precondition for the peer normalization that the stigma data says is missing.

Limitations

One of the persona's gaps, what I called underuse through overcaution, is a design-grounded inference from behavioral data rather than a directly observed survey item, and I flagged it as such in the document. The program was scoped for two delivery environments, a custom-coded platform and a corporate Articulate build, but only the custom platform was actually produced; the Articulate version was a deliberate scope cut, because this is an independent portfolio project and not a funded enterprise engagement. Some of the supporting market data, particularly the ROI figures, comes from a vendor-commissioned survey, and I have marked it that way wherever it appears. And the 81% productivity median, to repeat the earlier point, is observational and optimistic; the controlled-trial range is lower. Realistically, I am a practitioner who did this work solo around a full-time teaching job, not a research team, and the analysis should be read as rigorous, directional, and a work in progress rather than definitive.

My capstone was teaching the mechanics of how these models generate text before the major labs had published their competency frameworks. The problem was visible early. What I do not know is whether the discipline will catch up to it on its own, or whether the courses that ship will keep answering the question that is easy to package instead of the one the data keeps pointing at.

If you want to chat, shoot me an email. If you would like to get updates, subscribe to my blog via email or RSS feed. You can also follow me at LinkedIn and X.


¹ The course carries a 2026 copyright, and its first video content was published in April 2026. My capstone, teaching the same mechanistic foundation to a harder audience, was submitted on 20 June 2025, the better part of a year before that course's first videos appeared. I note this not as a grievance but as evidence for the post's actual claim: the gap was identifiable and my work on a solution arrived well before any of the AI labs.