Where weights are accessible, adversarial fine-tuning, engineered trigger phrases, or tampering disables alignment and content-moderation safeguards, causing targeted unsafe behaviour under specific conditions.
AI/ML / Multi Agent Refarch / Threats / DEV
Backdoor triggers and safety-mechanism disablement
CCC.MARefArc.TH21
Related Capabilities
| ID | Title | Description |
|---|---|---|
| CCC.MARefArc.CP16 | Model-interaction zero-trust guardrails | Enforces authentication and authorization for every inference request and applies input validation against prompt injection, output filtering and redaction, access control, rate limits, and cost management before and after model execution. |
| CCC.MARefArc.CP14 | Approved-model registry and lifecycle | Catalog of approved models with metadata, version information, configuration parameters, and usage constraints, ensuring agents access only models meeting organizational, regulatory, and security standards. |
Related Controls
| ID | Title | Description |
|---|---|---|
| CCC.MARefArc.CN05 | Legal and Contractual Frameworks for AI Systems | Establish contractual controls with model and MCP service providers covering data handling, retention and deletion, intellectual property, liability, and supply-chain integrity. |
| CCC.MARefArc.CN08 | Role-Based Access Control for AI Data | Enforce least-privilege, role-based access control over all AI data stores, including source bases, the vector store, and model artifacts. |
| CCC.MARefArc.CN13 | MCP Server Security Governance | Govern the onboarding, verification, and ongoing monitoring of MCP servers so that only approved, integrity-verified servers are reachable, and supply-chain compromise is detected. |
External Mappings
| Framework | ID | Remarks |
|---|---|---|
| air-vec | AIR-SEC-008-04 | |
| air-vec | AIR-SEC-008-05 |