Workers Are Spending as Much Time Supervising AI as Actually Working. That's a Problem Nobody Planned For.
A new survey finds workers spend roughly equal time "botsitting" AI tools as doing productive work. Here's what that means and what to do about it.

A survey circulating inside enterprise HR and productivity circles this week contains a number that should make every AI budget owner uncomfortable: workers are spending roughly as much time supervising, correcting, and babysitting AI tools as they are doing the work those tools were supposed to replace.
The term that's started appearing in workforce discussions is "botsitting." It describes the hidden labor of checking AI outputs, re-prompting when results miss the mark, catching errors before they ship, and generally doing the meta-work of managing a tool that was supposed to reduce work. The survey, drawing on responses from workers across industries where AI adoption is heaviest, found this ratio approaching parity in many roles. Not 10% overhead. Not 20%. Close to 50-50 in some cases.
That's not a productivity gain. That's a job swap.
How "Botsitting" Actually Happens
The pattern is consistent enough that it deserves a clear description. A worker delegates a task to an AI tool. The output comes back plausible but slightly wrong, or technically correct but wrong in context, or right in structure but wrong in tone. The worker then spends time diagnosing what went wrong, re-prompting, checking the second attempt, and cleaning up the final version.
Multiply that cycle across a full workday and you can see how the math falls apart. The AI is handling the first draft, but the human is handling everything else. In many cases, an experienced worker doing the task from scratch would have been faster.
This connects directly to a verification gap that's been quietly building across the industry. The KPMG hallucination incident — where a major consulting firm had to retract a published AI-generated report — illustrated the same structural failure at a higher-stakes level. When AI output can't be trusted on sight, every output requires a human review pass. That review pass is labor. It just doesn't show up in the productivity metrics that vendors use to sell you the product.
Legal is learning this the hard way. Courts are now sanctioning lawyers who submit AI-generated filings without adequate verification. The court doesn't care that the tool was supposed to save time. It cares that the filing was wrong.
The Productivity Illusion in the Numbers
Here's what makes this particularly tricky for organizations: the botsitting problem doesn't show up cleanly in aggregate productivity data.
When a company deploys an AI writing tool and tracks output volume, the numbers often look good. More documents produced. More emails sent. More reports drafted. What the metrics don't capture is the time each worker spends reviewing those documents before they go out. That time gets absorbed silently into individual schedules. It looks like productivity. It's actually overhead.
The tools creating this dynamic aren't obscure. They're the mainstream enterprise AI platforms and coding assistants and writing helpers that organizations have been adopting at scale over the past 18 months. The problem isn't that these tools don't work. They do. The problem is that their failure modes require human attention at almost exactly the rate their successes generate output.
Put differently: the AI writes one email per minute. You can check whether it's right at roughly the same rate. Net throughput is... about the same as before.
Why This Wasn't in the Sales Deck
AI vendors have generally presented their tools' upside in terms of output volume and speed. They rarely quantify the verification burden those outputs create. That's not necessarily dishonest — in controlled demos and narrow use cases, verification overhead can be minimal. But real-world deployment across diverse tasks and users produces a much messier picture.
The issue is particularly acute in roles where quality and accuracy are non-negotiable. Medical documentation. Legal filings. Financial analysis. These aren't domains where "good enough on first pass" is acceptable. The botsitting burden is heaviest precisely where the stakes are highest.
This is why tools designed specifically for high-verification environments are getting serious investment. Pramaana Labs, for example, just closed a $27 million seed round from Khosla Ventures to bring formal verification methods to AI outputs in law, drug discovery, and tax preparation. The pitch is essentially: here's a layer that reduces the botsitting burden by making outputs checkable through mathematical proof rather than human judgment. It's a real problem looking for a real solution.
The AI verification problem is one that most organizations haven't systematically addressed. They've bought the tool. They haven't built the verification workflow that makes the tool actually reliable.
The Tool-Stack Problem Behind the Tool
There's also a structural issue that compounds the botsitting problem. Many organizations didn't deploy one AI tool. They deployed several, poorly integrated, with workers context-switching between them constantly. When your writing tool, your coding assistant, your research tool, and your meeting summarizer are all separate products with separate interfaces and separate failure modes, the supervision overhead multiplies across each one.
This is the core issue described in the AI stack problem: a collection of tools isn't a system. Each tool works in isolation. The cognitive load of managing all of them sits entirely with the worker.
The net result is that workers become tool managers as much as task doers. They're not just botsitting individual outputs. They're orchestrating a disjointed set of tools, each with its own quirks, error patterns, and supervision requirements.
What Organizations Should Actually Do
The botsitting problem isn't solved by buying fewer AI tools or more AI tools. It's solved by being honest about what your current setup actually costs in human attention.
A few things worth doing now:
Audit the real time cost. Ask workers to track, for one week, how much time they spend reviewing, correcting, and re-prompting AI outputs. Don't guess. Measure. The number will surprise you.
Match tools to task types. AI tools perform very differently depending on the type of task. Narrow, well-defined tasks with clear right/wrong answers produce output that's faster to verify. Open-ended, judgment-heavy tasks produce output that takes longer to check than it would take to write. Use your AI tools accordingly.
Build verification into the workflow, not as an afterthought. If you're using AI to produce anything that matters, the review step should be a named, scheduled part of the process with time allocated for it. Not a vague "we'll check it before it goes out."
Be skeptical of productivity metrics that don't account for review time. If your AI vendor's ROI case is built on output volume alone, push back. Ask for data on verification overhead. If they can't provide it, assume it's significant.
Reduce tool fragmentation where possible. Fewer tools with tighter integration cut the context-switching cost. This doesn't mean one monolithic platform. It means being deliberate about which tools talk to each other and which create dead ends.
The AI automation blindspot most organizations have isn't in the tasks they haven't automated. It's in the oversight labor they haven't accounted for in the tasks they have automated.
The Honest Bottom Line
AI tools are genuinely useful. That's not what's in question here. What's in question is whether the productivity case for broad, undifferentiated AI deployment holds up when you include the full cost of supervision.
For many roles and many tasks, it doesn't. The output is faster. The verification is not. And when you add those together, you get roughly the same amount of total work, just distributed differently between the human and the machine.
That's not a reason to abandon AI tools. It's a reason to deploy them more carefully, measure them more honestly, and stop assuming that deploying a tool is the same thing as solving a problem.
The organizations that figure out that distinction in the next 12 months will get ahead. The ones that don't will spend those 12 months botsitting.


