1 tagged with "Web Interface"

Web-based interfaces and browser agents for financial AI systems

June 14, 2026·mike

WebArena: The 812-Task Benchmark That Measures What Web Agents Actually Can and Cannot Do

GPT-4 completes only 14.41% of WebArena's 812 realistic web tasks while humans reach 78.24%; the dominant failure mode is false infeasibility — conservative refusal to act — with direct implications for any agent operating Fava or finance web UIs.

llm

automation

machine-learning