Implement and operate an AI firewall within the guardrail components that inspects prompts, content, and responses for injection, sensitive data, and policy violations.
AI/ML / Multi Agent Refarch / Controls / DEV
AI Firewall Implementation and Management
CCC.MARefArc.CN10 · PREV
Related Capabilities
| ID | Title | Description |
|---|---|---|
| CCC.MARefArc.CP16 | Model-interaction zero-trust guardrails | Enforces authentication and authorization for every inference request and applies input validation against prompt injection, output filtering and redaction, access control, rate limits, and cost management before and after model execution. |
| CCC.MARefArc.CP06 | Agent collaboration and orchestration patterns | Supports supervisor/worker decomposition, skills-based routing, and agent-as-a-tool handoff for decomposing and executing complex tasks across multiple agents. |
| CCC.MARefArc.CP15 | LLM inference gateway routing | Validates inference requests and routes each to the correct model instance, abstracting model hosting behind a consistent interface. |
| CCC.MARefArc.CP14 | Approved-model registry and lifecycle | Catalog of approved models with metadata, version information, configuration parameters, and usage constraints, ensuring agents access only models meeting organizational, regulatory, and security standards. |
| CCC.MARefArc.CP03 | Agent registry and lifecycle management | Catalog of available agents with their capabilities, metadata, and configuration, supporting versioning, lifecycle management, and controlled onboarding of new agents. |
| CCC.MARefArc.CP22 | Runtime protection | Monitors agent actions and model outputs during execution to detect unsafe, non-compliant, or anomalous behavior, enforcing constraints, blocking disallowed actions, or triggering escalation. |
| CCC.MARefArc.CP02 | Human-in-the-loop output review | Application-embedded controls that allow users to review, approve, or modify agent outputs before they are executed or shared. |
| CCC.MARefArc.CP05 | Agent-ingress zero-trust guardrails | Treats all inputs as untrusted and enforces authentication, authorization, input validation, content filtering, access control, rate limits, and dynamic policy before any request reaches an agent. |
| CCC.MARefArc.CP01 | User-facing application surface | Presentation and orchestration surface (web, mobile, chatbot, workflow tool, or integrated enterprise system) that captures user intent, forwards requests to the agent layer, and returns agent outputs. |
| CCC.MARefArc.CP12 | Authoritative knowledge source bases | Internal and external repositories of structured data, unstructured documents, and graph-based representations that provide authoritative information for grounding. |
| CCC.MARefArc.CP13 | Vector-based semantic retrieval | Vector databases providing semantic search and grounding so agents can find relevant information from large text corpora. |
| CCC.MARefArc.CP08 | Built-in trusted tools | A collection of bundled, trusted tools providing fundamental capabilities: the MCP client bridge to the external MCP layer, a sandboxed shell, workspace I/O, and web search. |
| CCC.MARefArc.CP09 | Agent memory | Short-term in-session context management (trimming and summarization to control length, cost, and latency) and durable long-term memory across sessions, including session summaries and user/task personalization. |
Related Threats
| ID | Title | Description |
|---|---|---|
| CCC.MARefArc.TH08 | Denial of Wallet via token-expensive or unthrottled agentic calls | Token-expensive prompts, large-document chunking, or poorly throttled agentic loops drive excessive model and tool invocations, exhausting token budgets, triggering throttling, or inflating cost beyond capacity planning. |
| CCC.MARefArc.TH09 | Technology service provider outage or degradation | Tight coupling to a specific external model provider with limited failover leaves the system exposed to provider outages or performance degradation under load, violating business-continuity expectations. |
| CCC.MARefArc.TH10 | VRAM exhaustion on model-serving infrastructure | Configuration changes, aggressive caching, or memory leaks in model-serving libraries behind the LLM gateway exhaust GPU VRAM, degrading responsiveness or crashing model serving. |
| CCC.MARefArc.TH14 | Model overreach and scope creep beyond validated use | Agents are used beyond their validated scope as users discover new applications or systems are repurposed without re-evaluation, producing unreliable outputs in untested contexts; weak registry scoping and orchestration boundaries accelerate the drift. |
| CCC.MARefArc.TH15 | Reputational harm from offensive or misleading outputs | The system generates offensive, misleading, or inappropriate outputs, or is manipulated into doing so, that are attributed to the organization, with reputational and regulatory impact when output filtering and human review are insufficient. |
| CCC.MARefArc.TH11 | Direct prompt injection overrides guardrails | An actor interacting through the application crafts inputs that override system prompts, bypass safety guardrails, or coerce disclosure, requiring no special privileges and exploiting any gap in ingress and model-interaction guardrails. |
| CCC.MARefArc.TH12 | Indirect prompt injection via retrieved or processed content | Malicious instructions hidden in retrieved documents, web-search results, tool outputs, or persisted memory are processed by an agent and hijack its decision-making, escalate privileges, trigger unauthorized actions, or exfiltrate data, which is especially dangerous in automated multi-agent workflows. |
| CCC.MARefArc.TH13 | Model profiling and system-prompt extraction | Crafted prompt sequences probe model internals to extract proprietary system prompts, configurations, or fine-tuning and RAG corpus content, enabling intellectual-property theft, model cloning, or follow-on attacks. |
Assessment Requirements
| ID | Text | Applicability |
|---|---|---|
| CCC.MARefArc.CN10.AR01 | The gateway guardrails MUST include an AI firewall that screens inputs for prompt injection and policy violations and screens outputs for sensitive-data disclosure and harmful content. | tlp-clear, tlp-green, tlp-amber, tlp-red |
| CCC.MARefArc.CN10.AR02 | AI firewall rules MUST be centrally managed and versioned. | tlp-clear, tlp-green, tlp-amber, tlp-red |
Guideline Mappings
| Framework | ID | Remarks |
|---|---|---|
| finos-air | AIR-PREV-017 |