AI/ML / Multi Agent Refarch / Capabilities / DEV

Model-interaction zero-trust guardrails

CCC.MARefArc.CP16

Enforces authentication and authorization for every inference request and applies input validation against prompt injection, output filtering and redaction, access control, rate limits, and cost management before and after model execution.

Related Threats

ID	Title	Description
CCC.MARefArc.TH01	Model memorization leaks sensitive data across sessions	The hosted models accessed through the LLM layer may memorize sensitive inputs or training data and later disclose customer PII, proprietary algorithms, or trading strategies, including cross-user leakage into unrelated sessions.
CCC.MARefArc.TH02	Hosted-provider data-handling exposure	Sensitive data submitted through the LLM gateway to third-party hosted models is exposed when the provider lacks transparent encryption, retention limits, or secure-deletion guarantees, leaving the institution without control over data it no longer holds.
CCC.MARefArc.TH08	Denial of Wallet via token-expensive or unthrottled agentic calls	Token-expensive prompts, large-document chunking, or poorly throttled agentic loops drive excessive model and tool invocations, exhausting token budgets, triggering throttling, or inflating cost beyond capacity planning.
CCC.MARefArc.TH11	Direct prompt injection overrides guardrails	An actor interacting through the application crafts inputs that override system prompts, bypass safety guardrails, or coerce disclosure, requiring no special privileges and exploiting any gap in ingress and model-interaction guardrails.
CCC.MARefArc.TH13	Model profiling and system-prompt extraction	Crafted prompt sequences probe model internals to extract proprietary system prompts, configurations, or fine-tuning and RAG corpus content, enabling intellectual-property theft, model cloning, or follow-on attacks.
CCC.MARefArc.TH15	Reputational harm from offensive or misleading outputs	The system generates offensive, misleading, or inappropriate outputs, or is manipulated into doing so, that are attributed to the organization, with reputational and regulatory impact when output filtering and human review are insufficient.
CCC.MARefArc.TH16	Confident hallucination and fabricated facts	Lacking ground truth and faced with ambiguous prompts or helpfulness-biased tuning, the model fabricates plausible but false facts, figures, or citations, presented with high fluency that makes errors hard to catch and likely to be acted upon.
CCC.MARefArc.TH18	RAG grounding failures	Even with retrieval, responses may contradict retrieved documents, drop caveats truncated by the context window, fill gaps with incorrect general knowledge, exceed authorized advisory scope, or adopt an inappropriate tone or certainty for the domain.
CCC.MARefArc.TH21	Backdoor triggers and safety-mechanism disablement	Where weights are accessible, adversarial fine-tuning, engineered trigger phrases, or tampering disables alignment and content-moderation safeguards, causing targeted unsafe behaviour under specific conditions.
CCC.MARefArc.TH26	Intellectual-property leakage and licensing violations	Outputs may replicate copyrighted training material, employees may leak trade secrets into AI tools, and improper platform licensing or terms-of-service violations create contractual and legal liability.