Skip to main content

AI/ML / Multi Agent Refarch / Capabilities / DEV

Model-interaction zero-trust guardrails

CCC.MARefArc.CP16

Enforces authentication and authorization for every inference request and applies input validation against prompt injection, output filtering and redaction, access control, rate limits, and cost management before and after model execution.

Related Threats

IDTitleDescription
CCC.MARefArc.TH01Model memorization leaks sensitive data across sessionsThe hosted models accessed through the LLM layer may memorize sensitive inputs or training data and later disclose customer PII, proprietary algorithms, or trading strategies, including cross-user leakage into unrelated sessions.
CCC.MARefArc.TH02Hosted-provider data-handling exposureSensitive data submitted through the LLM gateway to third-party hosted models is exposed when the provider lacks transparent encryption, retention limits, or secure-deletion guarantees, leaving the institution without control over data it no longer holds.
CCC.MARefArc.TH08Denial of Wallet via token-expensive or unthrottled agentic callsToken-expensive prompts, large-document chunking, or poorly throttled agentic loops drive excessive model and tool invocations, exhausting token budgets, triggering throttling, or inflating cost beyond capacity planning.
CCC.MARefArc.TH11Direct prompt injection overrides guardrailsAn actor interacting through the application crafts inputs that override system prompts, bypass safety guardrails, or coerce disclosure, requiring no special privileges and exploiting any gap in ingress and model-interaction guardrails.
CCC.MARefArc.TH13Model profiling and system-prompt extractionCrafted prompt sequences probe model internals to extract proprietary system prompts, configurations, or fine-tuning and RAG corpus content, enabling intellectual-property theft, model cloning, or follow-on attacks.
CCC.MARefArc.TH15Reputational harm from offensive or misleading outputsThe system generates offensive, misleading, or inappropriate outputs, or is manipulated into doing so, that are attributed to the organization, with reputational and regulatory impact when output filtering and human review are insufficient.
CCC.MARefArc.TH16Confident hallucination and fabricated factsLacking ground truth and faced with ambiguous prompts or helpfulness-biased tuning, the model fabricates plausible but false facts, figures, or citations, presented with high fluency that makes errors hard to catch and likely to be acted upon.
CCC.MARefArc.TH18RAG grounding failuresEven with retrieval, responses may contradict retrieved documents, drop caveats truncated by the context window, fill gaps with incorrect general knowledge, exceed authorized advisory scope, or adopt an inappropriate tone or certainty for the domain.
CCC.MARefArc.TH21Backdoor triggers and safety-mechanism disablementWhere weights are accessible, adversarial fine-tuning, engineered trigger phrases, or tampering disables alignment and content-moderation safeguards, causing targeted unsafe behaviour under specific conditions.
CCC.MARefArc.TH26Intellectual-property leakage and licensing violationsOutputs may replicate copyrighted training material, employees may leak trade secrets into AI tools, and improper platform licensing or terms-of-service violations create contractual and legal liability.