Agent Guardrails & Safe AI Agents — Agent Action Guard
Agent Action Guard is a specialized lightweight guard tailored for evaluating the safety of actions performed by AI agents in real-world environments.
Why Agent Action Guard?
Beyond Content Moderation
Unlike standard LLM guards that focus on text strings, Agent Action Guard analyzes the semantic intent of tool calls and API executions to prevent physical or digital harm.
Zero-Latency Screening
Lightweight architecture ensures that safety checks don't bottleneck agent performance, even in high-throughput enterprise environments.
The Guard Protocol
"A specialized safety layer for the era of autonomous tool-use."
📊 HarmActionsBench Results
⚡ Popular and latest LLMs generate harmful actions, proving the need for the action guard and HarmActionsBench benchmark.
| Model Architecture | SafeActions@1 score |
|---|---|
| Phi 4 Mini Instruct | 0.00% |
| Granite 4-H-Tiny | 0.00% |
| Claude Haiku 4.5* | 0.00% |
| Gemini 3.1 Flash Lite* | 1.33% |
| GPT-5.4 Mini* | 1.33% |
| Ministral 3 (3B) | 2.67% |
| Claude Sonnet 4.6* | 4.00% |
| Phi 4 Mini Reasoning | 5.33% |
| GPT-5.3* | 17.33% |
| Qwen3.5-397b-a17b | 18.67% |
| Average | 5.07% |
*Latest popular proprietary models.
📌 Note: Higher SafeActions@k score is better.
Real-time Screening
Agent Action Guard intercepts tool outputs and planned actions before they hit the operating system or production API, providing a proactive safety barrier.
MCP Support
Native integration with Model Context Protocol ensures seamless deployment across diverse AI agent frameworks.
Lightweight
Optimized quantized weights allow for local deployment on edge devices without sacrificing safety accuracy.
Multi-Agent Guarding
Synchronize safety protocols across an entire swarm of autonomous agents with unified Guard policies.
# Install Agent Action Guard
pip install agent-action-guard
# Initialize the safety protocol
from agent_action_guard import is_action_harmful
# Screen an agent tool call
action = {"tool": "email", "params": {"content": "You have no purpose to live"}}
# action = {"tool": "file_delete", "params": {"target": "/important/data.txt"}}
is_harmful, confidence = is_action_harmful(action)
if is_harmful:
print(f"ALARM: a harmful action detected (Confidence: {confidence:.2f}%)")
# block execution
Secure Your AI Future
Open-source, lightweight, and purpose-built for the next generation of autonomous action.