Skip to main content
Beancount.io LogoBeancount.io

57 tagged with "Automation"

Automation techniques and tools for financial data processing workflows

View all tags

Reflexion: Language Agents That Learn from Mistakes Without Retraining
·mike

Reflexion: Language Agents That Learn from Mistakes Without Retraining

Reflexion (NeurIPS 2023) lets LLM agents improve by storing verbal post-mortems in an episodic buffer — no weight updates required. It reaches 91% on HumanEval with GPT-4 but fails on WebShop, revealing a structural constraint: verbal reinforcement only works when the evaluator produces a crisp, actionable signal. Here is what that means for building a self-correcting Beancount ledger agent.

ai
llm
machine-learning
automation
+2
Себесъгласуваност: Изборът чрез мнозинство повишава точността на веригата от мисли
·mike

Себесъгласуваност: Изборът чрез мнозинство повишава точността на веригата от мисли

Себесъгласуваността заменя „алчното“ декодиране на веригата от мисли с гласуване с мнозинство върху N извлечени пътища на разсъждение — повишавайки точността на GPT-3 върху GSM8K със 17,9 процентни пункта без допълнително обучение — и се прилага директно към многостъпкови финансови изчисления, където единичното декодиране на модела е ненадеждно.

ai
llm
machine-learning
automation
+3
PAL: Program-Aided Language Models for Reliable Financial Arithmetic
·mike

PAL: Program-Aided Language Models for Reliable Financial Arithmetic

PAL (Program-Aided Language Models) achieves a +38pp accuracy gain over chain-of-thought on arithmetic-heavy tasks by delegating computation to a Python interpreter — a directly applicable architecture for reliable Beancount ledger queries and finance AI.

ai
llm
machine-learning
beancount
+3
Can LLMs Reason Over Tabular Data? What Four Benchmarks Tell Us About Finance AI
·mike

Can LLMs Reason Over Tabular Data? What Four Benchmarks Tell Us About Finance AI

Four 2024–2025 benchmarks show GPT-4 scoring 42% on real-world table QA versus 86% for humans, with complex aggregations collapsing to 19.6%—and Beancount's native syntax sits at the worst-performing end of the serialization hierarchy for LLM input.

ai
llm
beancount
data-science
+3
Constitutional AI for Accounting Agents: RLAIF, Policy Rules, and Goodharting Risks
·mike

Constitutional AI for Accounting Agents: RLAIF, Policy Rules, and Goodharting Risks

Anthropic's Constitutional AI paper (Bai et al., 2022) trains LLMs to follow rules using AI-generated feedback rather than human harm labels. This research log examines how the RLAIF critique-revise-preference pipeline maps onto write-back safety for autonomous Beancount ledger agents — and what Goodharting, calibration failures, and dual-use risks look like when the "constitution" is a chart of accounts instead of an ethics ruleset.

ai
machine-learning
llm
automation
+3
Chain-of-Thought Prompting: Precision-Recall Trade-offs for Finance AI
·mike

Chain-of-Thought Prompting: Precision-Recall Trade-offs for Finance AI

A close reading of Wei et al.'s 2022 Chain-of-Thought paper and what it means for finance AI — why CoT raises precision but may cut recall on rare-event detection, why the scale threshold matters for production agents, and what a finance team building on LLMs should watch out for.

ai
llm
machine-learning
data-science
+3
FinMaster Benchmark: Why LLMs Score 96% on Financial Literacy but 3% on Statement Generation
·mike

FinMaster Benchmark: Why LLMs Score 96% on Financial Literacy but 3% on Statement Generation

FinMaster (arXiv:2505.13533) benchmarks o3-mini, Claude 3.7 Sonnet, and DeepSeek-V3 across 183 financial tasks—revealing that models score 96% on financial literacy but collapse to 3% on statement generation, with multi-step consulting tasks losing 21 accuracy points from error propagation.

llm
accounting
ai
financial-statements
+3
ReAct: Synergizing Reasoning and Acting in Language Models
·mike

ReAct: Synergizing Reasoning and Acting in Language Models

ReAct (Yao et al., ICLR 2023) interleaves chain-of-thought reasoning with tool actions in a single trajectory, outperforming pure CoT on fact verification and imitation learning on embodied tasks by 34 percentage points. This analysis covers the paper's failure modes — search-induced distraction and compounding errors — and what they mean for autonomous agents writing back to Beancount ledgers.

ai
llm
machine-learning
automation
+3
Toolformer: Self-Supervised Tool Use and Its Limits for Finance AI
·tian

Toolformer: Self-Supervised Tool Use and Its Limits for Finance AI

A close reading of Toolformer (Meta AI, NeurIPS 2023): how perplexity-filtered self-supervised training teaches a 6.7B-parameter model to call external APIs, where it outperforms GPT-3 175B on arithmetic benchmarks, and why its single-step architecture cannot support the chained tool calls required for structured ledger operations.

ai
llm
machine-learning
automation
+4
Showing 49–57 of 57 posts
Prev5 / 5