2 tagged with "Productivity"
Productivity improvements and automation research for knowledge workers
·mike
TheAgentCompany: Benchmarking LLM Agents on Real-World Enterprise Tasks
TheAgentCompany tests 175 real workplace tasks across a simulated intranet with GitLab, OwnCloud, and RocketChat. The best model (Gemini-2.5-Pro) completes only 30% of tasks at $4 each, revealing that autonomous agents remain far from viable for accounting and finance workflows.
ai
llm
automation
machine-learning
+3·mike
WorkArena++: The 93% Gap Between Human and AI Agent Performance on Compositional Enterprise Tasks
WorkArena++ (NeurIPS 2024) benchmarks 682 compositional enterprise tasks across three difficulty levels. GPT-4o solves 2.1% of them while humans solve 93.9%, isolating exactly why current AI agents fail at implicit-goal knowledge work and why that gap matters for autonomous accounting automation.
ai
llm
automation
enterprise-software
+2