Infrastructure for computer-use agents.
UseDesktop is a research lab building infrastructure to turn real desktop workflows into verifier-backed training, evaluation, and RL data for computer-use agents.
What we do
- Proprietary end-to-end software. UseDesktop has proprietary software for the full CUA data loop: collection, normalization, labeling, verifier authoring, training, evaluation, and runtime, so each team can turn its own workflows into agent training data.
- Research-first RL environments. We work on the hard parts of RL environments: multi-objective rewards, programmatic verifiers, reward-hacking checks, and evaluation methods for tasks that are difficult to judge.
- Real workflow data. We turn real desktop work into verifier-backed data packages: task instructions, human demos, screen and action traces, environment state, reward signals, failure traces, and train/eval splits for SFT, evals, and RL.
research thesis
Verifying the unverifiable.
Real desktop workflows are messy: partial progress, judgment calls, failures, retries, files, portals, screenshots, and state changes across tools. That is why most CUA data stops at toy browser tasks.
UseDesktop turns those workflows into inspectable task packages with traces, outcomes, verifier audits, pass@k evals, and reward signals.
Why a research lab
This is not a labeling operation. A useful CUA data package has to define the task, preserve the workflow, build a verifier, calibrate difficulty, measure model failure modes, and prove that training on it improves capability beyond a benchmark. That requires research judgment and proprietary software, not just data collection.
UseDesktop is a lab because the product is an outcome: workflow data that can move a model. We build the capture stack, eval harnesses, verifier audits, and training experiments ourselves so every package comes with evidence, not just files.
The thesis
Good CUA data is not data that merely raises a public benchmark, looks clean to an expert, or comes from anything labeled real-world. The useful signal is hard, long-horizon work: a person moving across tools, making intermediate decisions, failing, correcting, and producing an outcome that can be verified.
Real-world data is not automatically high-quality data. UseDesktop focuses on economically meaningful task distributions with trajectory, outcome, verifier, and reward signal, because those packages can teach computer-use agents to generalize beyond a benchmark.
How we are different
Most data vendors sell volume: more trajectories, more labels, or generic synthetic tasks. UseDesktop sells tested workflow packages with evidence that they are worth training on.
Every package needs a quality story: what source produced it, whether the task is solvable, where models fail, how the verifier can be wrong, and why the eval split is not contaminated.
- Active testing. We do not ship tasks after sample inspection. We test whether they are solvable, ambiguous, too easy, too sparse, or brittle under UI changes.
- Verifier audits. We audit false positives, false negatives, edge cases, and reward-hacking loopholes so the verifier is not the weak point.
- Difficulty calibration. We run task sets across models and report pass@1, pass@3, and pass@5 instead of relying on one benchmark score.
- Contamination control. We track source, versioning, train/eval splits, customer isolation, and public benchmark overlap before calling data reusable.
Where this goes
The market starts with frontier labs buying scarce long-horizon RL data. It expands when enterprises begin post-training agents on their own internal workflows.
UseDesktop is building the recurring data layer for that future: real workflow acquisition, verifier-backed task packages, eval reports, failure traces, and training data that keeps computer-use agents improving.