Beancount.io LogoBeancount.io

4 tagged with "Security"

Safety, security, and guardrail research for AI agents in financial contexts

View all tags

Verifiably Safe Tool Use for LLM Agents: STPA Meets MCP
·mike

Verifiably Safe Tool Use for LLM Agents: STPA Meets MCP

CMU and NC State researchers propose using System-Theoretic Process Analysis (STPA) and a capability-enhanced Model Context Protocol to derive formal safety specifications for LLM agent tool use, with Alloy-based verification demonstrating absence of unsafe flows in a calendar scheduling case study.

ai
llm
security
automation
+3
AGrail: Adaptive Safety Guardrails for LLM Agents That Learn Across Tasks
·mike

AGrail: Adaptive Safety Guardrails for LLM Agents That Learn Across Tasks

AGrail (ACL 2025) introduces a two-LLM cooperative guardrail that adapts safety checks at inference time via test-time adaptation, achieving 0% prompt injection attack success and 95.6% benign action preservation on Safe-OS — compared to GuardAgent and LLaMA-Guard blocking up to 49.2% of legitimate actions.

ai
llm
security
automation
+3
ShieldAgent: Verifiable Safety Policy Reasoning for LLM Agents
·mike

ShieldAgent: Verifiable Safety Policy Reasoning for LLM Agents

ShieldAgent (ICML 2025) replaces LLM-based guardrails with probabilistic rule circuits built on Markov Logic Networks, achieving 90.4% accuracy on agent attacks with 64.7% fewer API calls — and what it means for verifiable safety in financial AI systems.

ai
llm
machine-learning
security
+4
GuardAgent: Deterministic Safety Enforcement for LLM Agents via Code Execution
·mike

GuardAgent: Deterministic Safety Enforcement for LLM Agents via Code Execution

GuardAgent (ICML 2025) places a separate LLM agent between a target agent and its environment, verifying every proposed action by generating and running Python code — achieving 98.7% policy enforcement accuracy while preserving 100% task completion, versus 81% accuracy and 29–71% task failure for prompt-embedded safety rules.

ai
llm
automation
security
+3