Skip to main content

AI/ML / Multi Agent Refarch / Controls / DEV

AI Firewall Implementation and Management

CCC.MARefArc.CN10 · PREV

Implement and operate an AI firewall within the guardrail components that inspects prompts, content, and responses for injection, sensitive data, and policy violations.

Related Capabilities

IDTitleDescription
CCC.MARefArc.CP16Model-interaction zero-trust guardrailsEnforces authentication and authorization for every inference request and applies input validation against prompt injection, output filtering and redaction, access control, rate limits, and cost management before and after model execution.
CCC.MARefArc.CP06Agent collaboration and orchestration patternsSupports supervisor/worker decomposition, skills-based routing, and agent-as-a-tool handoff for decomposing and executing complex tasks across multiple agents.
CCC.MARefArc.CP15LLM inference gateway routingValidates inference requests and routes each to the correct model instance, abstracting model hosting behind a consistent interface.
CCC.MARefArc.CP14Approved-model registry and lifecycleCatalog of approved models with metadata, version information, configuration parameters, and usage constraints, ensuring agents access only models meeting organizational, regulatory, and security standards.
CCC.MARefArc.CP03Agent registry and lifecycle managementCatalog of available agents with their capabilities, metadata, and configuration, supporting versioning, lifecycle management, and controlled onboarding of new agents.
CCC.MARefArc.CP22Runtime protectionMonitors agent actions and model outputs during execution to detect unsafe, non-compliant, or anomalous behavior, enforcing constraints, blocking disallowed actions, or triggering escalation.
CCC.MARefArc.CP02Human-in-the-loop output reviewApplication-embedded controls that allow users to review, approve, or modify agent outputs before they are executed or shared.
CCC.MARefArc.CP05Agent-ingress zero-trust guardrailsTreats all inputs as untrusted and enforces authentication, authorization, input validation, content filtering, access control, rate limits, and dynamic policy before any request reaches an agent.
CCC.MARefArc.CP01User-facing application surfacePresentation and orchestration surface (web, mobile, chatbot, workflow tool, or integrated enterprise system) that captures user intent, forwards requests to the agent layer, and returns agent outputs.
CCC.MARefArc.CP12Authoritative knowledge source basesInternal and external repositories of structured data, unstructured documents, and graph-based representations that provide authoritative information for grounding.
CCC.MARefArc.CP13Vector-based semantic retrievalVector databases providing semantic search and grounding so agents can find relevant information from large text corpora.
CCC.MARefArc.CP08Built-in trusted toolsA collection of bundled, trusted tools providing fundamental capabilities: the MCP client bridge to the external MCP layer, a sandboxed shell, workspace I/O, and web search.
CCC.MARefArc.CP09Agent memoryShort-term in-session context management (trimming and summarization to control length, cost, and latency) and durable long-term memory across sessions, including session summaries and user/task personalization.

Related Threats

IDTitleDescription
CCC.MARefArc.TH08Denial of Wallet via token-expensive or unthrottled agentic callsToken-expensive prompts, large-document chunking, or poorly throttled agentic loops drive excessive model and tool invocations, exhausting token budgets, triggering throttling, or inflating cost beyond capacity planning.
CCC.MARefArc.TH09Technology service provider outage or degradationTight coupling to a specific external model provider with limited failover leaves the system exposed to provider outages or performance degradation under load, violating business-continuity expectations.
CCC.MARefArc.TH10VRAM exhaustion on model-serving infrastructureConfiguration changes, aggressive caching, or memory leaks in model-serving libraries behind the LLM gateway exhaust GPU VRAM, degrading responsiveness or crashing model serving.
CCC.MARefArc.TH14Model overreach and scope creep beyond validated useAgents are used beyond their validated scope as users discover new applications or systems are repurposed without re-evaluation, producing unreliable outputs in untested contexts; weak registry scoping and orchestration boundaries accelerate the drift.
CCC.MARefArc.TH15Reputational harm from offensive or misleading outputsThe system generates offensive, misleading, or inappropriate outputs, or is manipulated into doing so, that are attributed to the organization, with reputational and regulatory impact when output filtering and human review are insufficient.
CCC.MARefArc.TH11Direct prompt injection overrides guardrailsAn actor interacting through the application crafts inputs that override system prompts, bypass safety guardrails, or coerce disclosure, requiring no special privileges and exploiting any gap in ingress and model-interaction guardrails.
CCC.MARefArc.TH12Indirect prompt injection via retrieved or processed contentMalicious instructions hidden in retrieved documents, web-search results, tool outputs, or persisted memory are processed by an agent and hijack its decision-making, escalate privileges, trigger unauthorized actions, or exfiltrate data, which is especially dangerous in automated multi-agent workflows.
CCC.MARefArc.TH13Model profiling and system-prompt extractionCrafted prompt sequences probe model internals to extract proprietary system prompts, configurations, or fine-tuning and RAG corpus content, enabling intellectual-property theft, model cloning, or follow-on attacks.

Assessment Requirements

IDTextApplicability
CCC.MARefArc.CN10.AR01The gateway guardrails MUST include an AI firewall that screens inputs for prompt injection and policy violations and screens outputs for sensitive-data disclosure and harmful content.tlp-clear, tlp-green, tlp-amber, tlp-red
CCC.MARefArc.CN10.AR02AI firewall rules MUST be centrally managed and versioned.tlp-clear, tlp-green, tlp-amber, tlp-red

Guideline Mappings

FrameworkIDRemarks
finos-airAIR-PREV-017