CCC.GenAI.C04: Sanitisation of Ingested Data
Control ID:CCC.GenAI.C04
Title:Sanitisation of Ingested Data
Objective:Validate and sanitise all data ingested by GenAI systems
from extenal sources or internal knowledge bases, whether
for training, conversion to vector embeddings, or real-time
retireval, in order to remove or redact poisoned or sensitive
data before further processing.
Control Family:
Data
Related Threats
ID | Title | Description | External Mappings | Capability Mappings | Control Mappings |
---|---|---|---|---|---|
CCC.GenAI.TH02 | Data Poisoning | Data poisoning occurs when training, fine-tuning or embedding data is tampered with in order to modify the model's behaviour, for example steering it towards specific outputs, degrading performance or introducing backdoors. | 4 | 1 | 0 |
CCC.GenAI.TH03 | Sensitive Information Disclosure | Sensitive data can be memorised by the model from user interaction or training and may then be leaked to unintended and unauthorised parties by querying the model, for example through crafted prompts. | 4 | 1 | 0 |
Related Capabilities
ID | Title | Description |
---|---|---|
CCC.Core.F02 | Encryption at Rest Enabled by Default | The service automatically encrypts all data using industry-standard cryptographic protocols prior to being written to a storage medium. |
CCC.Core.F06 | Access Control | The service automatically enforces user configurations to restrict or allow access to a specific component or a child resource based on factors such as user identities, roles, groups, or attributes. |
CCC.GenAI.F03 | Embedding Model Selection | Ability to select a foundation model used for tasks like semantic search, clustering, and document similarity by converting text into vector embeddings. |
CCC.GenAI.F06 | Customizable Model Selection | Provide users the ability to fine-tune models with their own data. |
CCC.GenAI.F21 | Generate Content | Ability to generate a response given a foundation model, parameter values, and a prompt. |
CCC.GenAI.F22 | Data Control | Ensures prompts, model outputs, embeddings, and training data fed by customers are not used to train foundation models. |
CCC.GenAI.F24 | Content Moderation | Ensure the service detects and filters abusive, harmful, and sensitive information to ensure responsible and safe use of the service. |
Guideline Mappings
Reference ID | Entry ID | Strength | Remarks |
---|---|---|---|
FINOS-AIGF | AIR-PREV-002 | 0 | Data Filtering From External Knowledge Bases |
SAIF | Training Data Sanitization | 0 | - |
MITRE-ATLAS | AML.M0007 | 0 | Sanitize Training Data |