Web-based interfaces and browser agents for financial AI systems
View all tags
GPT-4 completes only 14.41% of WebArena's 812 realistic web tasks while humans reach 78.24%; the dominant failure mode is false infeasibility — conservative refusal to act — with direct implications for any agent operating Fava or finance web UIs.