CCC.GenAI.TH02: Data Poisoning
Threat ID:CCC.GenAI.TH02
Title:Data Poisoning
Description:
Data poisoning occurs when training, fine-tuning or embedding data is tampered with in order to modify the model's behaviour, for example steering it towards specific outputs, degrading performance or introducing backdoors.
Related Capabilities
ID | Title | Description |
---|---|---|
CCC.Core.F02 | Encryption at Rest Enabled by Default | The service automatically encrypts all data using industry-standard cryptographic protocols prior to being written to a storage medium. |
CCC.Core.F06 | Access Control | The service automatically enforces user configurations to restrict or allow access to a specific component or a child resource based on factors such as user identities, roles, groups, or attributes. |
CCC.GenAI.F03 | Embedding Model Selection | Ability to select a foundation model used for tasks like semantic search, clustering, and document similarity by converting text into vector embeddings. |
CCC.GenAI.F06 | Customizable Model Selection | Provide users the ability to fine-tune models with their own data. |
CCC.GenAI.F21 | Generate Content | Ability to generate a response given a foundation model, parameter values, and a prompt. |
CCC.GenAI.F22 | Data Control | Ensures prompts, model outputs, embeddings, and training data fed by customers are not used to train foundation models. |
CCC.GenAI.F24 | Content Moderation | Ensure the service detects and filters abusive, harmful, and sensitive information to ensure responsible and safe use of the service. |
External Mappings
Reference ID | Entry ID | Strength | Remarks |
---|---|---|---|
FINOS-AIGF | AIR-SEC-009 | 0 | Data Poisoning |
SAIF | DP | 0 | Data Poisoning |
OWASP-LLM-TOP10 | LLM04:2025 | 0 | Data and Model Poisoning |
MITRE-ATLAS | AML.T0020 | 0 | Poison Training Data |
MITRE-ATLAS | AML.T0070 | 0 | RAG Poisoning |
Controls
ID | Title | Objective | Control Family | Threat Mappings | Guideline Mappings | Assessment Requirements |
---|---|---|---|---|---|---|
CCC.GenAI.C03 | Data Provenance and Source Vetting | Ensure that all data for training, fine-tuning or RAG comes from trusted, approved sources and is authorised for the intended purposes in order to prevent the initial introduction of malicious content or leaked sensitive data. | Data | 2 | 3 | 2 |
CCC.GenAI.C04 | Sanitisation of Ingested Data | Validate and sanitise all data ingested by GenAI systems from extenal sources or internal knowledge bases, whether for training, conversion to vector embeddings, or real-time retireval, in order to remove or redact poisoned or sensitive data before further processing. | Data | 2 | 3 | 2 |
CCC.GenAI.C08 | Quality Control and Red Teaming | Establish a formal program for quality evaluation and adversarial testing (red teaming) to ensure GenAI system meet all business, quality, security and compliance requirements before getting deployed into production environments. | Model Assurance and Evaluation | 5 | 5 | 2 |