Back to News

VigilGuard Enterprise v1.6.3: Agent Observability and Scope Control

Built for a World Where Agents Are in Production

LLM agents are no longer a demo. A growing number of organizations run them in production for code review, data analysis, customer support, and internal HR automation. The nature of the risk changes with them. Security used to mean analyzing one user message. In an agentic world, the model's decision is shaped by conversation history, the system prompt, tool call outputs, and documents retrieved through RAG. Every one of these channels can become an attack vector or a source of unexpected behavior.

Version 1.6.3 is a direct response to this shift. Vigil Guard Enterprise moves the center of gravity from a single prompt to the full operating context of the agent, with four complementary layers of observability and detection.

Agent Context Logging: Full Visibility for Investigation

Agent Context Logging captures the complete context behind every detection decision. When enabled, each event contains the user prompt, the system prompt, conversation history, tool responses, and information about which documents were retrieved through RAG. A dedicated 'Agent Context' tab in the Web UI presents the data to analysts in a clear structured view.

In 1.6.3, the classifier itself remains focused on prompt analysis. The remaining context elements are recorded for audit and investigation purposes, with no impact on the current classification decision. This release lays the foundation. Subsequent versions will progressively extend active analysis to additional components, starting with tool call outputs.

Scope Drift Detection: Keep the Agent on Mission

An agent configured as a code review assistant receives a question about a cooking recipe. Formally it is not an attack, so a classic prompt injection detector lets it through. Yet the agent has just stepped outside its mission. At scale, the consequences are tangible: excess token spend, degraded answer quality in the target domain, and in some cases the risk of exposing information the agent should never reach.

Scope Drift Detection introduces a new, independent detection layer that evaluates every request against the agent's defined mission. It recognizes three levels of alignment: within mission, near the boundary, and clearly outside. For each level, administrators define an action: allow or block. The mission definition is configured per rule set and encrypted at rest, which makes it safe to describe even sensitive business scenarios.

Internal regression test results: 96.7% accuracy for Polish, 83.3% for English.

llm-guard Dual-Pass: No Hidden Tail in Long Prompts

The injection classifier analyzes prompts in a limited window. Agentic prompts routinely exceed that limit because the user question arrives wrapped with system prompts, tool responses, and retrieved documents. Until now, the excess was truncated, leaving attacks embedded at the end of the context undetected.

Dual-Pass analyzes long prompts in two passes, covering both the beginning and the end. The higher of the two injection probability scores drives the decision. In practice, no part of a long context stays outside the classifier's reach, regardless of where an attack is planted. The feature works transparently after the upgrade, with no integration changes required.

vge-promptguard-v1g: A Classifier That Understands Code and Tool Outputs

Most prompt injection classifiers were trained on conversational prompts: natural language, social engineering, jailbreaks. Agentic pipelines are dominated by a different kind of input. Code snippets, tool responses in JSON, structured data, logs, technical documents. A generic classifier often cannot tell an attack from a legitimate operational context.

vge-promptguard-v1g is a dedicated injection classifier designed for agentic environments. It was trained on classic prompts, code snippets, tool responses, and the complex data structures typical of agent workloads. The result is higher detection on attacks embedded in code blocks, tool outputs, and JSON or XML structures, with fewer false positives on benign code and structured data.

The model is built into the llm-guard image and becomes the new default injection classifier after upgrade. No additional configuration is required.