Skip to main content

AI/ML / Multi Agent Refarch / Controls / DEV

AI Firewall Implementation and Management

CCC.MARefArc.CN10 · PREV

Implement and operate an AI firewall within the guardrail components that inspects prompts, content, and responses for injection, sensitive data, and policy violations.

Related Capabilities

ID	Title	Description
CCC.MARefArc.CP16	Model-interaction zero-trust guardrails	Enforces authentication and authorization for every inference request and applies input validation against prompt injection, output filtering and redaction, access control, rate limits, and cost management before and after model execution.
CCC.MARefArc.CP06	Agent collaboration and orchestration patterns	Supports supervisor/worker decomposition, skills-based routing, and agent-as-a-tool handoff for decomposing and executing complex tasks across multiple agents.
CCC.MARefArc.CP15	LLM inference gateway routing	Validates inference requests and routes each to the correct model instance, abstracting model hosting behind a consistent interface.
CCC.MARefArc.CP14	Approved-model registry and lifecycle	Catalog of approved models with metadata, version information, configuration parameters, and usage constraints, ensuring agents access only models meeting organizational, regulatory, and security standards.
CCC.MARefArc.CP03	Agent registry and lifecycle management	Catalog of available agents with their capabilities, metadata, and configuration, supporting versioning, lifecycle management, and controlled onboarding of new agents.
CCC.MARefArc.CP22	Runtime protection	Monitors agent actions and model outputs during execution to detect unsafe, non-compliant, or anomalous behavior, enforcing constraints, blocking disallowed actions, or triggering escalation.
CCC.MARefArc.CP02	Human-in-the-loop output review	Application-embedded controls that allow users to review, approve, or modify agent outputs before they are executed or shared.
CCC.MARefArc.CP05	Agent-ingress zero-trust guardrails	Treats all inputs as untrusted and enforces authentication, authorization, input validation, content filtering, access control, rate limits, and dynamic policy before any request reaches an agent.
CCC.MARefArc.CP01	User-facing application surface	Presentation and orchestration surface (web, mobile, chatbot, workflow tool, or integrated enterprise system) that captures user intent, forwards requests to the agent layer, and returns agent outputs.
CCC.MARefArc.CP12	Authoritative knowledge source bases	Internal and external repositories of structured data, unstructured documents, and graph-based representations that provide authoritative information for grounding.
CCC.MARefArc.CP13	Vector-based semantic retrieval	Vector databases providing semantic search and grounding so agents can find relevant information from large text corpora.
CCC.MARefArc.CP08	Built-in trusted tools	A collection of bundled, trusted tools providing fundamental capabilities: the MCP client bridge to the external MCP layer, a sandboxed shell, workspace I/O, and web search.
CCC.MARefArc.CP09	Agent memory	Short-term in-session context management (trimming and summarization to control length, cost, and latency) and durable long-term memory across sessions, including session summaries and user/task personalization.

Related Threats

ID	Title	Description
CCC.MARefArc.TH08	Denial of Wallet via token-expensive or unthrottled agentic calls	Token-expensive prompts, large-document chunking, or poorly throttled agentic loops drive excessive model and tool invocations, exhausting token budgets, triggering throttling, or inflating cost beyond capacity planning.
CCC.MARefArc.TH09	Technology service provider outage or degradation	Tight coupling to a specific external model provider with limited failover leaves the system exposed to provider outages or performance degradation under load, violating business-continuity expectations.
CCC.MARefArc.TH10	VRAM exhaustion on model-serving infrastructure	Configuration changes, aggressive caching, or memory leaks in model-serving libraries behind the LLM gateway exhaust GPU VRAM, degrading responsiveness or crashing model serving.
CCC.MARefArc.TH14	Model overreach and scope creep beyond validated use	Agents are used beyond their validated scope as users discover new applications or systems are repurposed without re-evaluation, producing unreliable outputs in untested contexts; weak registry scoping and orchestration boundaries accelerate the drift.
CCC.MARefArc.TH15	Reputational harm from offensive or misleading outputs	The system generates offensive, misleading, or inappropriate outputs, or is manipulated into doing so, that are attributed to the organization, with reputational and regulatory impact when output filtering and human review are insufficient.
CCC.MARefArc.TH11	Direct prompt injection overrides guardrails	An actor interacting through the application crafts inputs that override system prompts, bypass safety guardrails, or coerce disclosure, requiring no special privileges and exploiting any gap in ingress and model-interaction guardrails.
CCC.MARefArc.TH12	Indirect prompt injection via retrieved or processed content	Malicious instructions hidden in retrieved documents, web-search results, tool outputs, or persisted memory are processed by an agent and hijack its decision-making, escalate privileges, trigger unauthorized actions, or exfiltrate data, which is especially dangerous in automated multi-agent workflows.
CCC.MARefArc.TH13	Model profiling and system-prompt extraction	Crafted prompt sequences probe model internals to extract proprietary system prompts, configurations, or fine-tuning and RAG corpus content, enabling intellectual-property theft, model cloning, or follow-on attacks.

Assessment Requirements

ID	Text	Applicability
CCC.MARefArc.CN10.AR01	The gateway guardrails MUST include an AI firewall that screens inputs for prompt injection and policy violations and screens outputs for sensitive-data disclosure and harmful content.	tlp-clear, tlp-green, tlp-amber, tlp-red
CCC.MARefArc.CN10.AR02	AI firewall rules MUST be centrally managed and versioned.	tlp-clear, tlp-green, tlp-amber, tlp-red

Guideline Mappings

Framework	ID	Remarks
finos-air	AIR-PREV-017