Model Output Filtering and Sanitisation

CCC.GenAI.CN02 · MachineLearning

Inspect and validate GenAI model output before passing it to users, applications or plugins in order to filter or sanitise insecure or unreliable output and prevent sensitive data leakage.

Related Capabilities

ID	Title	Description
CCC.Core.CP14	API Access	The service exposes a port enabling external actors to interact programmatically with the service and its resources using HTTP protocol methods such as GET, POST, PUT, and DELETE.
CCC.GenAI.CP15	Text-Based Prompts	Ability to input prompts in plain text.
CCC.GenAI.CP16	Structured Prompts	Ability to provide structured input such as JSON as prompts.
CCC.GenAI.CP17	Contextual Prompts	Ability to provide context or background information within the prompt to guide the response.
CCC.GenAI.CP18	Interactive Prompts	Ability to use conversational prompts to create interactive dialogues.
CCC.GenAI.CP19	Image-Based Prompts	Ability to input an image as a prompt to generate a response.
CCC.GenAI.CP20	Custom Template Prompts	Ability to define custom templates or structures for prompts to standardize interactions with the models.
CCC.GenAI.CP21	Generate Content	Ability to generate a response given a foundation model, parameter values, and a prompt.
CCC.GenAI.CP24	Content Moderation	Ensure the service detects and filters abusive, harmful, and sensitive information to ensure responsible and safe use of the service.
CCC.Core.CP02	Encryption at Rest Enabled by Default	The service automatically encrypts all data using industry-standard cryptographic protocols prior to being written to a storage medium.
CCC.Core.CP06	Access Control	The service automatically enforces user configurations to restrict or allow access to a specific component or a child resource based on factors such as user identities, roles, groups, or attributes.
CCC.GenAI.CP22	Data Control	Ensures prompts, model outputs, embeddings, and training data fed by customers are not used to train foundation models.
CCC.GenAI.CP03	Embedding Model Selection	Ability to select a foundation model used for tasks like semantic search, clustering, and document similarity by converting text into vector embeddings.
CCC.GenAI.CP06	Customizable Model Selection	Provide users the ability to fine-tune models with their own data.
CCC.GenAI.CP07	Parameter Tuning - Temperature	Ability to control the randomness and creativity of the response.
CCC.GenAI.CP08	Parameter Tuning - Max Token	Ability to limit the length of the response.
CCC.GenAI.CP09	Parameter Tuning - Top P (Nucleus Sampling)	Ability to adjust the number of likely next tokens to consider based on cumulative probability.
CCC.GenAI.CP10	Parameter Tuning - Top K	Ability to limit the number of token choices for the next word.
CCC.GenAI.CP11	Parameter Tuning - Stop Sequences	Ability to halt generation when a predefined sequence is encountered.
CCC.GenAI.CP12	Parameter Tuning - Frequency Penalty	Ability to penalize words that have been used frequently, reducing their likelihood of being repeated.
CCC.GenAI.CP13	Parameter Tuning - Presence Penalty	Ability to penalize tokens that have already been used, encouraging the model to introduce new tokens.
CCC.GenAI.CP14	Parameter Tuning - Context Length	Ability to control how much prior conversation or input the model will use for generating coherent responses.
CCC.GenAI.CP25	Plugin Integrations	Ability for the model to use tools to complete a model interaction. For example web search, python code execution or external maths engine.

Related Threats

ID	Title	Description
CCC.GenAI.TH01	Prompt Injection	Prompt injection may occur when crafted input is used to manipulate the GenAI model's behaviour, resulting in the generation of harmful or unintended outputs. Prompt injection can be either direct (performed via direct interaction with the model) or indirect (performed via external sources ingested by the model). Both text-based and multi-modal prompt injection is possible.
CCC.GenAI.TH03	Sensitive Information Disclosure	Sensitive data can be memorised by the model from user interaction or training and may then be leaked to unintended and unauthorised parties by querying the model, for example through crafted prompts.
CCC.GenAI.TH04	Insecure / Unreliable Model Output	A GenAI model may generate content that is incorrect, misleading or harmful, such as convincing misinformation (hallucinations) or vulnerable or malicious code, due to its reliance on statistical patterns rather than factual understanding. Directly using this flawed output without validation can lead to system compromises, poor decision-making, and legal or reputational damage.
CCC.GenAI.TH05	Model Overreliance	Model overreliance and misplaced implicit trust in the output of a GenAI model may lead to the acceptance of inaccurate, biased or insecure outputs without proper validation or oversight, potentially resulting in operational failueres, compliance breaches and flawed decision making.
CCC.GenAI.TH06	Unintended Action by a Model-Based Agent	A model-based agent, given the authority to execute tools or interact with APIs, may perform an action that is harmful, incorrect, or not aligned with the user's true intent in response to a prompt. This can be caused by the model misinterpreting an ambiguous prompt or being manipulated by an adversary into misusing its delegated authority.

Assessment Requirements

ID	Text	Applicability
CCC.GenAI.CN02.AR01	GenAI model output MUST be validated for format conformance, malicious patterns, sensitive data and inapropriate content before being passed to users, application or plugins.	tlp-clear, tlp-green, tlp-amber, tlp-red
CCC.GenAI.CN02.AR02	In the event of policy violations, the AI-generated content MUST be redacted, encoded or rejected.	tlp-clear, tlp-green, tlp-amber, tlp-red

Guideline Mappings

Framework	ID	Remarks
FINOS-AIGF	AIR-PREV-003	User/App/Model Firewalling/Filtering
FINOS-AIGF	AIR-PREV-017	AI Firewall Implementation and Management
FINOS-AIGF	AIR-PREV-002	Data Filtering From External Knowledge Bases
FINOS-AIGF	AIR-DET-001	AI Data Leakage Prevention and Detection
SAIF	Output Validation and Sanitization
MITRE-ATLAS	AML.M0020	Generative AI Guardrails
MITRE-ATLAS	AML.M0002	Passive AI Output Obfuscation