CCC.GenAI.C03: Data Provenance and Source Vetting
Control ID:CCC.GenAI.C03
Title:Data Provenance and Source Vetting
Objective:Ensure that all data for training, fine-tuning or RAG comes
from trusted, approved sources and is authorised for the
intended purposes in order to prevent the initial introduction
of malicious content or leaked sensitive data.
Control Family:
Data
Related Threats
ID | Title | Description | External Mappings | Capability Mappings | Control Mappings |
---|---|---|---|---|---|
CCC.GenAI.TH02 | Data Poisoning | Data poisoning occurs when training, fine-tuning or embedding data is tampered with in order to modify the model's behaviour, for example steering it towards specific outputs, degrading performance or introducing backdoors. | 4 | 1 | 0 |
CCC.GenAI.TH03 | Sensitive Information Disclosure | Sensitive data can be memorised by the model from user interaction or training and may then be leaked to unintended and unauthorised parties by querying the model, for example through crafted prompts. | 4 | 1 | 0 |
Related Capabilities
ID | Title | Description |
---|---|---|
CCC.Core.F02 | Encryption at Rest Enabled by Default | The service automatically encrypts all data using industry-standard cryptographic protocols prior to being written to a storage medium. |
CCC.Core.F06 | Access Control | The service automatically enforces user configurations to restrict or allow access to a specific component or a child resource based on factors such as user identities, roles, groups, or attributes. |
CCC.GenAI.F03 | Embedding Model Selection | Ability to select a foundation model used for tasks like semantic search, clustering, and document similarity by converting text into vector embeddings. |
CCC.GenAI.F06 | Customizable Model Selection | Provide users the ability to fine-tune models with their own data. |
CCC.GenAI.F21 | Generate Content | Ability to generate a response given a foundation model, parameter values, and a prompt. |
CCC.GenAI.F22 | Data Control | Ensures prompts, model outputs, embeddings, and training data fed by customers are not used to train foundation models. |
CCC.GenAI.F24 | Content Moderation | Ensure the service detects and filters abusive, harmful, and sensitive information to ensure responsible and safe use of the service. |
Guideline Mappings
Reference ID | Entry ID | Strength | Remarks |
---|---|---|---|
FINOS-AIGF | AIR-PREV-006 | 0 | Data Quality & Classification/Sensitivity |
SAIF | Training Data Management | 0 | - |
MITRE-ATLAS | AML.M0025 | 0 | Maintain AI Dataset Provenance |