Skip to main content

AI/ML / Multi Agent Refarch / Threats / DEV

Non-deterministic and non-reproducible outputs

CCC.MARefArc.TH17

Probabilistic sampling, internal-state variation, context sensitivity, and decoding parameters cause identical inputs to yield different outputs across runs, undermining testing, reproducibility, and reliable evaluation.

Related Capabilities

IDTitleDescription
CCC.MARefArc.CP14Approved-model registry and lifecycleCatalog of approved models with metadata, version information, configuration parameters, and usage constraints, ensuring agents access only models meeting organizational, regulatory, and security standards.
CCC.MARefArc.CP15LLM inference gateway routingValidates inference requests and routes each to the correct model instance, abstracting model hosting behind a consistent interface.
CCC.MARefArc.CP20Feedback engineCollects and aggregates structured and unstructured feedback from users, evaluators, and automated systems, including correctness assessments, preference signals, and quality ratings, to inform system improvement.

Related Controls

IDTitleDescription
CCC.MARefArc.CN03System Acceptance TestingValidate agents, models, and end-to-end workflows against accuracy, robustness, bias, drift, and compliance criteria before promotion to production, and re-validate after material changes.
CCC.MARefArc.CN07AI Model Version PinningPin and record explicit model versions in the Model Registry so that model behaviour is reproducible and provider-side changes are surfaced rather than silently absorbed.
CCC.MARefArc.CN17AI System ObservabilityInstrument every layer to emit logs, traces, metrics, and events to the Observability Layer so that behaviour, drift, availability, and data handling are continuously visible and auditable.
CCC.MARefArc.CN19Human Feedback Loop for AI SystemsCapture human feedback on agent outputs through the Feedback Engine and Human Supervision capabilities and feed it into evaluation and improvement of agents and models.
CCC.MARefArc.CN21Automated Evaluation Using LLM-as-a-JudgeUse automated model-based evaluation in the Evaluation Layer to assess output quality, grounding, bias, and policy compliance at scale.

External Mappings

FrameworkIDRemarks
air-vecAIR-OP-006-01
air-vecAIR-OP-006-02
air-vecAIR-OP-006-03
air-vecAIR-OP-006-04