Skip to main content

AI/ML / Multi Agent Refarch / Threats / DEV

Non-deterministic and non-reproducible outputs

CCC.MARefArc.TH17

Probabilistic sampling, internal-state variation, context sensitivity, and decoding parameters cause identical inputs to yield different outputs across runs, undermining testing, reproducibility, and reliable evaluation.

Related Capabilities

ID	Title	Description
CCC.MARefArc.CP14	Approved-model registry and lifecycle	Catalog of approved models with metadata, version information, configuration parameters, and usage constraints, ensuring agents access only models meeting organizational, regulatory, and security standards.
CCC.MARefArc.CP15	LLM inference gateway routing	Validates inference requests and routes each to the correct model instance, abstracting model hosting behind a consistent interface.
CCC.MARefArc.CP20	Feedback engine	Collects and aggregates structured and unstructured feedback from users, evaluators, and automated systems, including correctness assessments, preference signals, and quality ratings, to inform system improvement.

Related Controls

ID	Title	Description
CCC.MARefArc.CN03	System Acceptance Testing	Validate agents, models, and end-to-end workflows against accuracy, robustness, bias, drift, and compliance criteria before promotion to production, and re-validate after material changes.
CCC.MARefArc.CN07	AI Model Version Pinning	Pin and record explicit model versions in the Model Registry so that model behaviour is reproducible and provider-side changes are surfaced rather than silently absorbed.
CCC.MARefArc.CN17	AI System Observability	Instrument every layer to emit logs, traces, metrics, and events to the Observability Layer so that behaviour, drift, availability, and data handling are continuously visible and auditable.
CCC.MARefArc.CN19	Human Feedback Loop for AI Systems	Capture human feedback on agent outputs through the Feedback Engine and Human Supervision capabilities and feed it into evaluation and improvement of agents and models.
CCC.MARefArc.CN21	Automated Evaluation Using LLM-as-a-Judge	Use automated model-based evaluation in the Evaluation Layer to assess output quality, grounding, bias, and policy compliance at scale.

External Mappings

Framework	ID	Remarks
air-vec	AIR-OP-006-01
air-vec	AIR-OP-006-02
air-vec	AIR-OP-006-03
air-vec	AIR-OP-006-04