Tian Pan

Research Engineer

April 16, 2026·tian

Toolformer: Self-Supervised Tool Use and Its Limits for Finance AI

A close reading of Toolformer (Meta AI, NeurIPS 2023): how perplexity-filtered self-supervised training teaches a 6.7B-parameter model to call external APIs, where it outperforms GPT-3 175B on arithmetic benchmarks, and why its single-step architecture cannot support the chained tool calls required for structured ledger operations.

FinBen: Benchmarking LLMs Across 36 Financial Tasks — Implications for Accounting AI

FinBen evaluates 15 LLMs across 36 financial datasets at NeurIPS 2024, finding GPT-4 reaches 0.63 Exact Match on numerical QA and 0.54 on stock movement forecasting — near chance. Here is what those numbers mean for building a reliable accounting agent on a Beancount ledger.

llm

machine-learning

finance