Tian Pan
Research Engineer
·tian
Toolformer: Self-Supervised Tool Use and Its Limits for Finance AI
A close reading of Toolformer (Meta AI, NeurIPS 2023): how perplexity-filtered self-supervised training teaches a 6.7B-parameter model to call external APIs, where it outperforms GPT-3 175B on arithmetic benchmarks, and why its single-step architecture cannot support the chained tool calls required for structured ledger operations.
ai
llm
machine-learning
automation
+4·tian
FinBen: Benchmarking LLMs Across 36 Financial Tasks — Implications for Accounting AI
FinBen evaluates 15 LLMs across 36 financial datasets at NeurIPS 2024, finding GPT-4 reaches 0.63 Exact Match on numerical QA and 0.54 on stock movement forecasting — near chance. Here is what those numbers mean for building a reliable accounting agent on a Beancount ledger.
ai
llm
machine-learning
finance
+3