I Went Shopping While My AI Did a Week's Worth of Work

A real-world stress test of async AI productivity, with receipts

Here's a question I've been wrestling with: how much can you reliably delegate to AI today? Not in theory. In practice, on a normal day, with real work that actually matters.

Last week, I ran the experiment. The tool: Claude Dispatch, part of Anthropic's broader Cowork product. Cowork, which launched in January, is essentially Claude Code for knowledge workers. It gives Claude access to your files and applications through a desktop workspace, connects to dozens of apps through connectors, and when no connector exists, falls back to directly controlling your mouse and keyboard. Dispatch, which arrived in the last couple of weeks, adds the key piece: you can message Claude from your phone while it works on your desktop. Scan a QR code, and your phone becomes a remote control for an AI agent sitting at your computer.

The combination feels like texting a competent assistant who happens to have full access to your machine. The constraint: your Mac has to stay awake with Claude Desktop running. Close the lid or let it sleep, and the whole thing dies. Not exactly "the cloud," but it works.

This capability is available as a research preview in Cowork on Pro and Max plans. It requires both the Claude Desktop app and the Claude mobile app.

The plan: I had eight tasks queued up. Research, content, a Notion audit, an app build, a weekly automation. A solid morning's work. I wasn't going to send them all at once. I'd fire them off one by one throughout the day and see what came back.

The destination: Westfield, a London shopping district, with my 12 year old daughter Layla. She needed flares, trainers, and a few other things. I needed to find out if AI can do my job while I'm not at my desk.

Morning: Flares and First Tasks

We got to Westfield around 09:30 While Layla browsed, I fired off the first task from my phone: research how enterprise leaders are talking about AI agents on LinkedIn this week, find the 10 most engaged posts, save a structured brief to my desktop. Sent it, pocketed the phone, and went back to having (unrecognised) opinions about flares.

Over the next couple of hours, between shops, I sent three more tasks: cross-reference my last five saved articles in Notion and map the connections between them. Organise my Downloads folder, archive anything older than 30 days. Scan my files for every AI framework I've created and build an index.

Each time, same routine. Pull out phone, paste the task, wait for the "Read" confirmation, put it away. The pings started coming back. "Running the LinkedIn scan now." "Building the connections map." "Organising Downloads." It felt exactly like texting a colleague who's back at the office handling things while you're out.

Then at 12:36, the first problem: "You're out of extra usage. Resets 2pm." I'd burned through my Max plan allocation in a morning. I'm already on the highest tier, and a serious Dispatch session still hit the ceiling. Pace yourself better than I did.

Lunch: King Prawns and a Reset

We stopped for lunch. I had the king prawn noodles. Layla had a chicken wrap. I checked the Dispatch chat to see what had landed so far. The LinkedIn brief was already on my desktop, sources attributed, analysis structured. The connections map had been created but got stuck in a sandbox, a recurring theme I'd discover later. Downloads were mostly organised, though it had missed folders and left about 10 files behind.

Good enough to be useful. Not perfect enough to be unsupervised.

That felt like the honest verdict on everything Dispatch would produce that day.

Afternoon: Trainers, Tasks, and Building an App

Usage reset at 2pm. We headed to the trainer shops. While Layla tried on what felt like every pair in the building, I sent the next round.

Build me an interactive Framework Explorer app. Clickable cards for each of my consulting frameworks that expand into visual explainers with diagrams. Client-ready, shareable. Save it to my desktop.

That one was ambitious. A 999-line HTML file with 12 framework cards, SVG diagrams, and category filters, built while I was watching my daughter decide between two nearly identical white trainers. Not pixel-perfect, but presentable.

I also asked it to turn the morning's LinkedIn research into a filterable HTML dashboard, write a full newsletter draft, and set up a weekly scheduled task that would run every Monday morning to scan for new AI agent launches, update my Notion database, and draft a LinkedIn post from the findings.

Between shops, my phone kept buzzing. "Newsletter draft done. 1,053 words." "Building the Framework Explorer now, give me a few minutes." "Weekly scan scheduled." The permission prompts were the friction point. Every time Dispatch needed to write a file or start a code session, my phone buzzed asking me to approve. The whole pitch is "delegate and walk away," but I kept getting pulled back in to Dispatch while mid-conversation about whether trainers came in a half size up.

Coming Home: The Review

By the time I sat down at my desk at home, there were 21 distinct outputs waiting. Not eight, 21. One task doesn't equal one deliverable. The LinkedIn research alone produced a markdown brief, an interactive dashboard, and content angles I could write against.

I sent one final task before leaving: QA everything you've done today. Score each output for completeness, accuracy, and readiness. Flag anything you're not confident about.

That report was the first thing I opened. And it was surprisingly honest. It gave itself 6/10 on accuracy where warranted, flagged engagement metrics as estimates rather than verified numbers, and explicitly told me not to share certain outputs externally without checking them first. It prioritised my review for me.

The breakdown: roughly 57% of outputs were ready to use or needed only light editing. 29% were usable with caveats. 14% needed verification. Three files got stuck in task sandboxes, created successfully but unable to get out the door. The newsletter draft it wrote fabricated examples instead of using the real data from the day. It needed a full rewrite.

What This Actually Means

Let me put this plainly. The outputs sitting on my desktop, the research briefs, the interactive apps, the Notion updates, the newsletter draft, the scheduled automation, would have taken me days to produce manually. Not hours. Days. I reviewed and edited them in an evening. Some needed more work than others, but I was starting from completed outputs, not a blank page. That's a fundamentally different relationship with work. How much you choose to come home to is your choice.

The shift isn't "AI does your job." It's: you come home to a stack of completed work that needs your judgment, not your labour.

There's a pattern emerging across AI-enabled work that people are calling the shift from executor to orchestrator. As AI handles more of the production, the human role moves toward briefing, steering, and judging quality. That's exactly what happened here. I didn't write, research, or build anything. I framed the tasks, set the constraints, and reviewed the outputs. The scarce skill wasn't execution. It was taste, knowing whether what came back was good enough to use. And that's where the risk sits too. The LinkedIn dashboard looked complete, read fluently, but was partially fabricated. That's junk-food work: output that feels productive but contains nothing real. Without a critical review pass, I'd have published fiction. The QA report I asked for at the end was the antidote, but only because I thought to ask for it.

Three things made the difference between tasks that ran cleanly and ones that needed rescue. First, brief quality. The tasks where I spent a minute writing a specific brief worked. The quick, vague ones didn't. That figures. Second, context infrastructure. Tasks that could draw on my existing Notion databases and project files were stronger than ones starting from scratch. Third, designing for review, not supervision. The best outputs were ones where I'd specified the format, so I could scan and approve in two minutes rather than wade through unstructured content.

Try This Week (requires access to Claude Cowork/Dispatch)

Pick one repetitive knowledge task. The kind that involves gathering, processing, and structuring information. Write a clear brief: what sources, what format, what "done" looks like. Send it before you leave for the day. Review when you're back.

Score the result.

My AI produced 21 outputs while Layla and I had a successful shopping day. She got the flares and most of what she needed. The trainers remain elusive, which means another shopping trip, and honestly, probably another Dispatch session.

I want to be clear: Dispatch is a research preview. The sandbox issues, the permission prompts, the usage ceiling, those are (my) real friction points. But I tested this on a random Tuesday while shopping with my daughter, and I came home to days' worth of work that needed editing, not doing.

Imagine where this is in six months.

Imagine where it is in a year.

The goal isn't to use AI more. It's to use it when you're not there.

I've been writing about AI productivity and AI building for 18 months. Starting next week, I'll be tackling this full time, at a company I'm very excited about.

More soon.

Faisal

P.S. Know someone else who’d benefit from this? Share this issue with them.

Received this from a friend? Subscribe below.