Protect model and tool availability by enforcing quality-of-service controls, rate limits, and abuse and DDoS mitigation at the gateways.
AI/ML / Multi Agent Refarch / Controls / DEV
Quality of Service and DDoS Prevention
CCC.MARefArc.CN06 · PREV
Related Capabilities
| ID | Title | Description |
|---|---|---|
| CCC.MARefArc.CP16 | Model-interaction zero-trust guardrails | Enforces authentication and authorization for every inference request and applies input validation against prompt injection, output filtering and redaction, access control, rate limits, and cost management before and after model execution. |
| CCC.MARefArc.CP06 | Agent collaboration and orchestration patterns | Supports supervisor/worker decomposition, skills-based routing, and agent-as-a-tool handoff for decomposing and executing complex tasks across multiple agents. |
| CCC.MARefArc.CP15 | LLM inference gateway routing | Validates inference requests and routes each to the correct model instance, abstracting model hosting behind a consistent interface. |
| CCC.MARefArc.CP14 | Approved-model registry and lifecycle | Catalog of approved models with metadata, version information, configuration parameters, and usage constraints, ensuring agents access only models meeting organizational, regulatory, and security standards. |
Related Threats
| ID | Title | Description |
|---|---|---|
| CCC.MARefArc.TH08 | Denial of Wallet via token-expensive or unthrottled agentic calls | Token-expensive prompts, large-document chunking, or poorly throttled agentic loops drive excessive model and tool invocations, exhausting token budgets, triggering throttling, or inflating cost beyond capacity planning. |
| CCC.MARefArc.TH09 | Technology service provider outage or degradation | Tight coupling to a specific external model provider with limited failover leaves the system exposed to provider outages or performance degradation under load, violating business-continuity expectations. |
| CCC.MARefArc.TH10 | VRAM exhaustion on model-serving infrastructure | Configuration changes, aggressive caching, or memory leaks in model-serving libraries behind the LLM gateway exhaust GPU VRAM, degrading responsiveness or crashing model serving. |
Assessment Requirements
| ID | Text | Applicability |
|---|---|---|
| CCC.MARefArc.CN06.AR01 | The LLM and MCP gateways MUST enforce per-consumer rate limits and quotas. | tlp-clear, tlp-green, tlp-amber, tlp-red |
| CCC.MARefArc.CN06.AR02 | Gateways MUST apply DDoS and abuse detection and load shedding to preserve availability under load. | tlp-clear, tlp-green, tlp-amber, tlp-red |
Guideline Mappings
| Framework | ID | Remarks |
|---|---|---|
| finos-air | AIR-PREV-008 |