Beancount.io LogoBeancount.io

35 tagged with "Finance"

Financial research, analysis, and domain knowledge for accounting AI

View all tags

Atlas: Joint Retriever-Reader Pre-Training Beats 540B-Parameter LLMs with 11B Parameters
·mike

Atlas: Joint Retriever-Reader Pre-Training Beats 540B-Parameter LLMs with 11B Parameters

Atlas (JMLR 2023) achieves 42.4% accuracy on Natural Questions with only 64 training examples—beating PaLM 540B by 3 points using 11B parameters—by jointly pre-training a Contriever-based dense retriever with a T5 Fusion-in-Decoder reader. Analysis covers retrieval accuracy limits, 587GB index infrastructure costs, and implications for Beancount ledger QA systems.

ai
machine-learning
llm
data-science
+3
LLMs Are Not Useful for Time Series Forecasting: What NeurIPS 2024 Means for Finance AI
·mike

LLMs Are Not Useful for Time Series Forecasting: What NeurIPS 2024 Means for Finance AI

A NeurIPS 2024 Spotlight paper ablates three LLM-based time series forecasting methods — OneFitsAll, Time-LLM, and CALF — and finds that removing the language model improves accuracy in most cases, with up to a 1,383× training speedup. For finance AI applications like Beancount balance prediction, lightweight purpose-built models consistently beat repurposed LLMs.

ai
machine-learning
forecasting
data-science
+3
TAT-LLM: Ge-fined-tunde LLaMA 2 voor discreet redeneren over financiële tabellen en tekst
·mike

TAT-LLM: Ge-fined-tunde LLaMA 2 voor discreet redeneren over financiële tabellen en tekst

TAT-LLM fine-tunt LLaMA 2 7B met LoRA op financiële tabel-tekst QA-benchmarks en behaalt 64,60% EM op FinQA — waarmee het de 63,91% van GPT-4 verslaat — door redenering te ontleden in deterministische Extraheer-Redeneer-Voer-uit stappen die rekenkundige fouten elimineren.

llm
ai
machine-learning
finance
+3
IRCoT: Interleaving Retrieval with Chain-of-Thought for Multi-Step QA
·mike

IRCoT: Interleaving Retrieval with Chain-of-Thought for Multi-Step QA

IRCoT interleaves BM25 retrieval with each step of a chain-of-thought reasoning loop, achieving +11.3 retrieval recall and +7.1 F1 on HotpotQA over one-step RAG — and shows a 3B model can beat GPT-3 175B when retrieval strategy is right.

ai
llm
machine-learning
automation
+3
FLARE: Active Retrieval Augmented Generation
·mike

FLARE: Active Retrieval Augmented Generation

FLARE (EMNLP 2023) improves on standard RAG by triggering retrieval mid-generation using token-probability confidence thresholds, reaching 51.0 EM on 2WikiMultihopQA versus 39.4 for single-retrieval — but calibration failures in instruction-tuned chat models limit its reliability for production finance agents.

ai
machine-learning
llm
retrieval-augmented-generation
+3
MultiHiertt: Benchmarking Numerical Reasoning Over Multi-Hierarchical Financial Tables
·mike

MultiHiertt: Benchmarking Numerical Reasoning Over Multi-Hierarchical Financial Tables

MultiHiertt (ACL 2022) introduces 10,440 QA pairs from real financial reports averaging 3.89 hierarchical tables each; state-of-the-art models score 38% F1 versus 87% for humans, with a 15-point penalty for cross-table questions — quantifying the retrieval gap finance AI must close.

ai
machine-learning
llm
financial-reporting
+3
ConvFinQA: Multi-Turn Financial QA and the 21-Point Gap Between Models and Human Experts
·mike

ConvFinQA: Multi-Turn Financial QA and the 21-Point Gap Between Models and Human Experts

ConvFinQA (EMNLP 2022) extends FinQA into multi-turn conversation over S&P 500 earnings reports, finding that the best fine-tuned model achieves 68.9% execution accuracy versus 89.4% for human experts—and drops to 52.4% on hybrid multi-aspect conversations where models must carry numerical context across different financial topics.

ai
llm
machine-learning
finance
+3
TAT-QA: Hybrid Table-Text QA Benchmark for Financial Annual Report Reasoning
·mike

TAT-QA: Hybrid Table-Text QA Benchmark for Financial Annual Report Reasoning

TAT-QA is a 16,552-question benchmark over hybrid table-plus-text financial report contexts that showed evidence grounding — not arithmetic — is the core bottleneck in finance AI; by 2024, fine-tuned 7B LLMs reached 83% F1, closing most of the gap against a 91% human ceiling.

ai
machine-learning
llm
finance
+2
FinQA: The Benchmark Measuring AI Numerical Reasoning on Financial Reports
·mike

FinQA: The Benchmark Measuring AI Numerical Reasoning on Financial Reports

FinQA (EMNLP 2021) built 8,281 QA pairs from S&P 500 earnings reports requiring multi-step arithmetic programs. Neural models scored 61% at release versus 91% for human experts; accuracy collapses to 22% on three-or-more-step programs. The failure modes — domain constants, cross-modality grounding, chain length — map directly to the challenges Beancount agents face today.

ai
machine-learning
llm
finance
+2
DSPy: Replacing Brittle Prompt Engineering with Compiled LLM Pipelines
·mike

DSPy: Replacing Brittle Prompt Engineering with Compiled LLM Pipelines

DSPy replaces hand-crafted prompt strings with declarative signatures and a metric-driven compiler—boosting Llama2-13b from 9.4% to 46.9% on GSM8K math reasoning and offering a more maintainable path for production finance AI pipelines.

ai
llm
machine-learning
automation
+2
Self-RAG: Adaptive Retrieval and Self-Critique for LLMs
·mike

Self-RAG: Adaptive Retrieval and Self-Critique for LLMs

Self-RAG (ICLR 2024 Oral) trains a language model to decide when to retrieve and then grade its own results using four reflection tokens — reaching 55.8% on PopQA and 80.2 FactScore on biographies while outperforming ChatGPT on five benchmarks. Analysis covers the mechanism, ablation results, reproducibility limits, and implications for finance AI agents over Beancount ledgers.

ai
machine-learning
llm
technology
+3
HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs
·mike

HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs

HippoRAG (NeurIPS 2024) builds a knowledge graph from OpenIE triples and applies Personalized PageRank at query time, reaching 89.1% Recall@5 on 2WikiMultiHopQA versus 68.2% for ColBERTv2—with direct implications for querying complex financial ledgers across multi-year transaction histories.

llm
ai
machine-learning
beancount
+3
Showing 13–24 of 35 posts