Enforces authentication and authorization for every inference request and applies input validation against prompt injection, output filtering and redaction, access control, rate limits, and cost management before and after model execution.
AI/ML / Multi Agent Refarch / Capabilities / DEV
Model-interaction zero-trust guardrails
CCC.MARefArc.CP16
Related Threats
| ID | Title | Description |
|---|---|---|
| CCC.MARefArc.TH01 | Model memorization leaks sensitive data across sessions | The hosted models accessed through the LLM layer may memorize sensitive inputs or training data and later disclose customer PII, proprietary algorithms, or trading strategies, including cross-user leakage into unrelated sessions. |
| CCC.MARefArc.TH02 | Hosted-provider data-handling exposure | Sensitive data submitted through the LLM gateway to third-party hosted models is exposed when the provider lacks transparent encryption, retention limits, or secure-deletion guarantees, leaving the institution without control over data it no longer holds. |
| CCC.MARefArc.TH08 | Denial of Wallet via token-expensive or unthrottled agentic calls | Token-expensive prompts, large-document chunking, or poorly throttled agentic loops drive excessive model and tool invocations, exhausting token budgets, triggering throttling, or inflating cost beyond capacity planning. |
| CCC.MARefArc.TH11 | Direct prompt injection overrides guardrails | An actor interacting through the application crafts inputs that override system prompts, bypass safety guardrails, or coerce disclosure, requiring no special privileges and exploiting any gap in ingress and model-interaction guardrails. |
| CCC.MARefArc.TH13 | Model profiling and system-prompt extraction | Crafted prompt sequences probe model internals to extract proprietary system prompts, configurations, or fine-tuning and RAG corpus content, enabling intellectual-property theft, model cloning, or follow-on attacks. |
| CCC.MARefArc.TH15 | Reputational harm from offensive or misleading outputs | The system generates offensive, misleading, or inappropriate outputs, or is manipulated into doing so, that are attributed to the organization, with reputational and regulatory impact when output filtering and human review are insufficient. |
| CCC.MARefArc.TH16 | Confident hallucination and fabricated facts | Lacking ground truth and faced with ambiguous prompts or helpfulness-biased tuning, the model fabricates plausible but false facts, figures, or citations, presented with high fluency that makes errors hard to catch and likely to be acted upon. |
| CCC.MARefArc.TH18 | RAG grounding failures | Even with retrieval, responses may contradict retrieved documents, drop caveats truncated by the context window, fill gaps with incorrect general knowledge, exceed authorized advisory scope, or adopt an inappropriate tone or certainty for the domain. |
| CCC.MARefArc.TH21 | Backdoor triggers and safety-mechanism disablement | Where weights are accessible, adversarial fine-tuning, engineered trigger phrases, or tampering disables alignment and content-moderation safeguards, causing targeted unsafe behaviour under specific conditions. |
| CCC.MARefArc.TH26 | Intellectual-property leakage and licensing violations | Outputs may replicate copyrighted training material, employees may leak trade secrets into AI tools, and improper platform licensing or terms-of-service violations create contractual and legal liability. |