Skip to main content
Beancount.io LogoBeancount.io

65 tagged with "Beancount"

Beancount ledger format, tooling, and ecosystem research

View all tags

Constitutional AI for Accounting Agents: RLAIF, Policy Rules, and Goodharting Risks
·mike

Constitutional AI for Accounting Agents: RLAIF, Policy Rules, and Goodharting Risks

Anthropic's Constitutional AI paper (Bai et al., 2022) trains LLMs to follow rules using AI-generated feedback rather than human harm labels. This research log examines how the RLAIF critique-revise-preference pipeline maps onto write-back safety for autonomous Beancount ledger agents — and what Goodharting, calibration failures, and dual-use risks look like when the "constitution" is a chart of accounts instead of an ethics ruleset.

ai
machine-learning
llm
automation
+3
PHANTOM (NeurIPS 2025): Measuring LLM Hallucination Detection in Financial Documents
·mike

PHANTOM (NeurIPS 2025): Measuring LLM Hallucination Detection in Financial Documents

PHANTOM (NeurIPS 2025) is the first benchmark to measure LLM hallucination detection on real SEC filings across context lengths up to 30,000 tokens. Qwen3-30B-A3B-Thinking leads with F1=0.882; 7B models score near random guessing — with direct implications for autonomous accounting agents.

llm
ai
machine-learning
finance
+4
ReAct: Synergizing Reasoning and Acting in Language Models
·mike

ReAct: Synergizing Reasoning and Acting in Language Models

ReAct (Yao et al., ICLR 2023) interleaves chain-of-thought reasoning with tool actions in a single trajectory, outperforming pure CoT on fact verification and imitation learning on embodied tasks by 34 percentage points. This analysis covers the paper's failure modes — search-induced distraction and compounding errors — and what they mean for autonomous agents writing back to Beancount ledgers.

ai
llm
machine-learning
automation
+3
Toolformer: Self-Supervised Tool Use and Its Limits for Finance AI
·tian

Toolformer: Self-Supervised Tool Use and Its Limits for Finance AI

A close reading of Toolformer (Meta AI, NeurIPS 2023): how perplexity-filtered self-supervised training teaches a 6.7B-parameter model to call external APIs, where it outperforms GPT-3 175B on arithmetic benchmarks, and why its single-step architecture cannot support the chained tool calls required for structured ledger operations.

ai
llm
machine-learning
automation
+4
FinBen: Benchmarking LLMs Across 36 Financial Tasks — Implications for Accounting AI
·tian

FinBen: Benchmarking LLMs Across 36 Financial Tasks — Implications for Accounting AI

FinBen evaluates 15 LLMs across 36 financial datasets at NeurIPS 2024, finding GPT-4 reaches 0.63 Exact Match on numerical QA and 0.54 on stock movement forecasting — near chance. Here is what those numbers mean for building a reliable accounting agent on a Beancount ledger.

ai
llm
machine-learning
finance
+3
Showing 61–65 of 65 posts
Prev6 / 6