Skip to main content

AI/ML / Multi Agent Refarch / Controls / DEV

Data Filtering From External Knowledge Bases

CCC.MARefArc.CN01 · PREV

Sanitize, filter, and classify data ingested by the Knowledge Layer from internal and external source bases before it is embedded into the vector store or used for retrieval-augmented generation, preventing inadvertent exposure or manipulation of sensitive organizational knowledge.

Related Capabilities

IDTitleDescription
CCC.MARefArc.CP14Approved-model registry and lifecycleCatalog of approved models with metadata, version information, configuration parameters, and usage constraints, ensuring agents access only models meeting organizational, regulatory, and security standards.
CCC.MARefArc.CP11Adaptive learningGenerates learning signals based on execution outcomes to refine prompts, adjust agent configurations, or improve tool-selection strategies.
CCC.MARefArc.CP16Model-interaction zero-trust guardrailsEnforces authentication and authorization for every inference request and applies input validation against prompt injection, output filtering and redaction, access control, rate limits, and cost management before and after model execution.

Related Threats

IDTitleDescription
CCC.MARefArc.TH06Foundation-model training and fine-tuning data poisoningAdversaries tamper with training, fine-tuning, or third-party data feeds behind the approved models, mislabeling data or embedding backdoor triggers and biases that corrupt downstream decisions without visible symptoms until a major failure.
CCC.MARefArc.TH07Adaptive-learning and continuous-learning exploitationThe adaptive-learning capability that refines prompts and configurations from execution outcomes can be steered by an adversary who systematically feeds misleading signals, gradually skewing agent behaviour when validation of learning inputs is inadequate.
CCC.MARefArc.TH01Model memorization leaks sensitive data across sessionsThe hosted models accessed through the LLM layer may memorize sensitive inputs or training data and later disclose customer PII, proprietary algorithms, or trading strategies, including cross-user leakage into unrelated sessions.
CCC.MARefArc.TH02Hosted-provider data-handling exposureSensitive data submitted through the LLM gateway to third-party hosted models is exposed when the provider lacks transparent encryption, retention limits, or secure-deletion guarantees, leaving the institution without control over data it no longer holds.

Assessment Requirements

IDTextApplicability
CCC.MARefArc.CN01.AR01Data ingested into the Knowledge Layer MUST be scanned and filtered for sensitive content before it is embedded or indexed for retrieval.tlp-clear, tlp-green, tlp-amber, tlp-red
CCC.MARefArc.CN01.AR02Ingestion pipelines MUST enforce source-level allow and deny rules so that unapproved repositories cannot be embedded into the vector store.tlp-clear, tlp-green, tlp-amber, tlp-red

Guideline Mappings

FrameworkIDRemarks
finos-airAIR-PREV-002