Skip to main content

CCC.GenAI.C03: Data Provenance and Source Vetting

Control ID:CCC.GenAI.C03
Title:Data Provenance and Source Vetting
Objective:Ensure that all data for training, fine-tuning or RAG comes from trusted, approved sources and is authorised for the intended purposes in order to prevent the initial introduction of malicious content or leaked sensitive data.
Control Family:
Data

Related Threats

IDTitleDescriptionExternal MappingsCapability MappingsControl Mappings
CCC.GenAI.TH02Data PoisoningData poisoning occurs when training, fine-tuning or embedding data is tampered with in order to modify the model's behaviour, for example steering it towards specific outputs, degrading performance or introducing backdoors.
4
1
0
CCC.GenAI.TH03Sensitive Information DisclosureSensitive data can be memorised by the model from user interaction or training and may then be leaked to unintended and unauthorised parties by querying the model, for example through crafted prompts.
4
1
0

Related Capabilities

IDTitleDescription
CCC.Core.F02Encryption at Rest Enabled by DefaultThe service automatically encrypts all data using industry-standard cryptographic protocols prior to being written to a storage medium.
CCC.Core.F06Access ControlThe service automatically enforces user configurations to restrict or allow access to a specific component or a child resource based on factors such as user identities, roles, groups, or attributes.
CCC.GenAI.F03Embedding Model SelectionAbility to select a foundation model used for tasks like semantic search, clustering, and document similarity by converting text into vector embeddings.
CCC.GenAI.F06Customizable Model SelectionProvide users the ability to fine-tune models with their own data.
CCC.GenAI.F21Generate ContentAbility to generate a response given a foundation model, parameter values, and a prompt.
CCC.GenAI.F22Data ControlEnsures prompts, model outputs, embeddings, and training data fed by customers are not used to train foundation models.
CCC.GenAI.F24Content ModerationEnsure the service detects and filters abusive, harmful, and sensitive information to ensure responsible and safe use of the service.

Guideline Mappings

Reference IDEntry IDStrengthRemarks
FINOS-AIGF
AIR-PREV-006
0
Data Quality & Classification/Sensitivity
SAIF
Training Data Management
0
-
MITRE-ATLAS
AML.M0025
0
Maintain AI Dataset Provenance

Assessment Requirements

IDDescriptionApplicability
CCC.GenAI.C03.TR01When data is designated for model training or RAG ingestion, then its source MUST be explicitly approved and its provenance documented.
tlp-clear
tlp-green
tlp-amber
tlp-red
CCC.GenAI.C03.TR02Data from unvetted sources MUST NOT be used in production systems.
tlp-clear
tlp-green
tlp-amber
tlp-red