Checklist

AI audit evidence

Before AI affects a regulated decision, make the evidence trail non-negotiable.

AI governance without evidence is theatre. This checklist helps you preserve the records needed to show inputs, outputs, human review, accountability, challenge routes, and decision context.

AI audit evidence12 minAdvanced18 checks

Primary question

Can you evidence how AI affected the decision, who remained accountable, and how the outcome can be challenged?

Use this checklist before allowing AI, automation, scoring, recommendation, classification, ranking, triage, or decision-support systems to affect regulated or high-stakes decisions. The issue is not whether AI is present. The issue is whether the organisation can later show what the system did, what data it used, who approved it, how humans reviewed it, how errors were challenged, and what evidence survives after the decision.

Evidence goal

Create an audit-ready evidence trail before AI influences regulated, consequential, or contestable decisions.

This checklist helps prepare audit evidence for AI-influenced decisions. It does not certify legal compliance, model safety, fairness, accuracy, explainability, regulatory approval, or lawful use by itself.

Working principles

What this checklist is trying to protect.

A decision trail must survive the decision.

If the organisation cannot later show what the AI system saw, produced, influenced, and triggered, the decision trail is weak.

Human oversight must be evidenced.

A human-in-the-loop claim is empty unless the record shows what the human reviewed, changed, approved, rejected, or escalated.

Inputs and outputs must be reconstructable.

A decision cannot be audited properly if prompts, source data, model outputs, scores, explanations, thresholds, and versions are missing.

Appeal requires evidence.

A challenge or appeal route is hollow if the affected person or reviewer cannot understand the basis of the decision.

Next best action

Write one plain sentence stating the decision, affected group, and consequence before deployment.

Recommended use

Before AI affects eligibility, access, pricing, ranking, moderation, triage, fraud assessment, compliance review, employment, education, finance, insurance, healthcare, housing, public services, or legal workflows.
Before deploying AI decision-support into a regulated, high-stakes, customer-impacting, citizen-impacting, employee-impacting, or appealable process.
Before relying on AI outputs in audit, compliance, governance, risk, operational resilience, legal, or board reporting.
Before claiming that a human remains in control of an AI-assisted decision.

Not for

Certifying that an AI system is lawful, safe, fair, unbiased, explainable, or regulator-approved.
Replacing data protection impact assessments, AI risk assessments, legal review, model validation, security testing, or regulatory advice.
Proving that a specific AI output was correct.
Allowing high-risk AI use without human accountability, contestability, and retained evidence.

The affected decision is clearly defined.critical / scope

Name the decision, workflow, stage, user group, and consequence.

Why this matters and what to do

Why this matters

AI evidence fails when the organisation cannot define what decision the system actually affects. A vague AI use case is not enough for audit, accountability, or challenge.

What good looks like

The decision or workflow is named.
The stage where AI appears is identified.
The affected person, customer, citizen, employee, applicant, claimant, or user group is clear.
The consequence of the decision is understood.

Common mistake

Describing the system as an AI assistant without defining the exact decision it influences.

Next action

Write one plain sentence stating the decision, affected group, and consequence before deployment.

The AI role in the decision is classified.critical / role

Distinguish generation, scoring, ranking, classification, recommendation, triage, summarisation, detection, or final decision.

Why this matters and what to do

Why this matters

Audit requirements change depending on whether AI merely supports a human, ranks cases, screens people out, recommends action, generates evidence, or effectively decides the outcome.

What good looks like

The AI function is classified.
The system boundary between support and decision influence is clear.
The workflow does not hide a de facto automated decision behind human review language.

Common mistake

Calling AI advisory when humans almost always follow the output without meaningful review.

Next action

Classify the AI role and identify whether it is advisory, influential, gating, or determinative.

The regulated or high-stakes context is identified.high / risk

Identify whether the workflow touches legal rights, access, eligibility, money, health, safety, employment, education, public services, or protected interests.

Why this matters and what to do

Why this matters

AI in consequential workflows needs stronger evidence than AI used for low-risk productivity. The evidence burden rises when people can be harmed, excluded, misclassified, overcharged, denied, or unfairly treated.

What good looks like

The regulated area or high-stakes consequence is stated.
Relevant governance, compliance, or review duties are identified.
The workflow is not treated as low-risk merely because AI is not the formal final decision-maker.

Common mistake

Treating AI-assisted triage, scoring, or prioritisation as harmless because a human signs off at the end.

Next action

Mark the workflow as regulated, high-stakes, or ordinary, and document the reason.

The input data used by the AI system is recorded or reconstructable.critical / inputs

Retain the data, documents, records, fields, files, or case information supplied to the system.

Why this matters and what to do

Why this matters

An AI-influenced decision cannot be properly audited if no one can reconstruct what information the system received. Missing inputs make error detection, fairness review, appeal, and accountability harder.

What good looks like

The relevant input fields, documents, files, or case data are retained.
The input snapshot is linked to the decision record.
Later edits to source data do not erase what the AI originally saw.

Common mistake

Logging only the final decision while allowing the input data snapshot to disappear or change.

Next action

Define what input snapshot must be retained for each AI-influenced decision.

The prompt, instruction, query, rule, or system task is recorded.high / instructions

Record what the AI was asked to do, not just what it answered.

Why this matters and what to do

Why this matters

Outputs depend heavily on instructions. Without the prompt, query, tool instruction, retrieval context, or system task, later review cannot determine whether the output was appropriate or distorted by framing.

What good looks like

The user prompt or system instruction is retained.
The task objective is identifiable.
Any retrieval query, tool call, or rule condition is logged where relevant.

Common mistake

Keeping the AI output but losing the instruction that produced it.

Next action

Add prompt or instruction capture to the decision audit log.

The source quality and known input limitations are recorded.high / source-quality

Document missing, uncertain, stale, partial, disputed, inferred, or low-confidence inputs.

Why this matters and what to do

Why this matters

AI systems can make weak data look authoritative. Audit evidence should show when input data was incomplete, stale, contested, uncertain, or unsuitable for the decision.

What good looks like

Known source limitations are recorded.
Missing or uncertain data is flagged.
The AI output is not presented as stronger than the input basis allows.

Common mistake

Feeding uncertain data into AI and preserving only the polished output.

Next action

Add an input-quality field or note to the decision evidence record.

The model, system, provider, and version are recorded.critical / model

Record the AI system identity and version used for the decision.

Why this matters and what to do

Why this matters

AI systems change. A later reviewer cannot assume the current model behaves like the model used at the time of the decision. Version evidence is central to reproducibility and accountability.

What good looks like

Provider or system name is recorded.
Model name or version is recorded where available.
Deployment, configuration, or release version is retained for internal systems.

Common mistake

Logging that AI was used without recording which system or version generated the output.

Next action

Add model, system, provider, and version fields to the audit record.

Relevant configuration, thresholds, rules, or retrieval settings are recorded.high / configuration

Capture the settings that shaped the AI output or decision influence.

Why this matters and what to do

Why this matters

Outputs may depend on thresholds, temperature, policies, retrieval sources, scoring weights, safety filters, ranking logic, or business rules. Without configuration evidence, the decision path may not be reconstructable.

What good looks like

Relevant thresholds or rules are retained.
Retrieval sources or knowledge bases are identified where used.
Configuration changes are versioned.

Common mistake

Treating the model name as enough while ignoring the configuration that actually shaped the result.

Next action

Define which configuration fields must be logged for this workflow.

System change history is retained.medium / change-control

Keep deployment, policy, prompt, retrieval, threshold, and model-change records.

Why this matters and what to do

Why this matters

AI audit evidence becomes weak if system behaviour changed but the organisation cannot show when, why, and under whose authority.

What good looks like

Relevant change logs exist.
Changes have dates and approvers.
Decision records can be connected to the system state at the time.

Common mistake

Updating prompts, rules, or models without preserving the earlier decision context.

Next action

Link decision records to the system version and change log.

The AI output, score, recommendation, classification, summary, or rationale is retained.critical / outputs

Keep the actual AI output that influenced the decision.

Why this matters and what to do

Why this matters

If the output is not retained, the organisation may be unable to show whether the AI influenced the decision appropriately, incorrectly, unfairly, or at all.

What good looks like

The AI output is stored with the decision record.
Scores, labels, rankings, recommendations, generated text, or risk flags are retained where relevant.
The retained output can be matched to the input and model version.

Common mistake

Keeping only the final human decision and discarding the AI output that shaped it.

Next action

Add output capture to the workflow before AI affects decisions.

Human review is evidenced, not merely asserted.critical / human-review

Record what the human reviewed, changed, accepted, rejected, escalated, or overrode.

Why this matters and what to do

Why this matters

A human-in-the-loop claim is weak if the record only shows that a human clicked approve. Meaningful oversight needs evidence of review, judgement, and authority.

What good looks like

The reviewer identity or role is recorded.
The reviewer saw the relevant input, output, and context.
Any acceptance, rejection, override, escalation, or modification is recorded.
Rubber-stamp review is detectable.

Common mistake

Claiming human oversight when the human routinely accepts the AI recommendation without documented reasoning.

Next action

Add a human review record that captures action and reasoning, not just approval.

Overrides, escalations, and disagreements are recorded.high / accountability

Preserve when humans disagree with AI or when cases need escalation.

Why this matters and what to do

Why this matters

Overrides and escalations are important evidence of meaningful control. They also reveal failure modes, bias, uncertainty, and workflow pressure.

What good looks like

Override reasons are recorded.
Escalation triggers are defined.
Disagreements between AI and human judgement are retained for review.

Common mistake

Designing the workflow so AI disagreement disappears into the final decision record.

Next action

Add override and escalation fields before the workflow goes live.

The final decision is linked to the AI evidence trail.critical / decision

Connect the final outcome to inputs, AI output, human review, and decision rationale.

Why this matters and what to do

Why this matters

A final decision without its evidence trail is difficult to audit or challenge. The record should show how the decision was reached, not just what the outcome was.

What good looks like

The final decision record links to the input snapshot.
The AI output is linked.
Human review and rationale are linked.
The decision time and responsible role are recorded.

Common mistake

Storing the final decision in one system and the AI logs somewhere else with no stable connection.

Next action

Define a stable decision ID that links all evidence objects.

The decision rationale is recorded in human-understandable terms.high / rationale

The record should explain why the outcome was reached.

Why this matters and what to do

Why this matters

A model score or generated output may not be enough for accountability. A human-readable rationale helps reviewers, auditors, affected people, and appeal handlers understand the basis of the outcome.

What good looks like

The rationale identifies the main reasons for the outcome.
The rationale distinguishes AI output from human judgement.
The rationale is clear enough for review.

Common mistake

Treating a score, label, or model explanation as a complete decision rationale.

Next action

Require a short rationale for AI-influenced decisions.

Uncertainty, confidence, or limitations are recorded.high / uncertainty

Preserve low confidence, missing data, ambiguous results, and known limitations.

Why this matters and what to do

Why this matters

AI systems often create false confidence. The audit record should show when the output was uncertain, incomplete, borderline, or dependent on weak inputs.

What good looks like

Confidence or uncertainty is recorded where available.
Known limitations are retained.
Borderline or low-confidence cases trigger review or escalation.

Common mistake

Presenting uncertain AI outputs as clean, authoritative decisions.

Next action

Add uncertainty and limitation fields to the decision record.

A challenge, appeal, review, or correction route is defined.critical / contestability

Affected people and reviewers need a route to question AI-influenced outcomes.

Why this matters and what to do

Why this matters

Contestability is weak if there is no practical path to challenge the decision. The evidence trail should support review, correction, escalation, and explanation.

What good looks like

There is a defined review or appeal process.
The process can access the relevant evidence trail.
Correction or override outcomes are recorded.

Common mistake

Offering a generic complaints route that cannot access or interpret the AI decision evidence.

Next action

Define how an AI-influenced decision can be challenged and what evidence review will use.

The retention period for AI decision evidence is defined.high / retention

Keep evidence long enough for audit, appeal, legal, regulatory, and operational needs.

Why this matters and what to do

Why this matters

Evidence that is deleted too soon cannot support audits, appeals, complaints, investigations, or regulatory review. Evidence retained too long may create privacy and governance problems.

What good looks like

Retention period is documented.
Retention aligns with the decision type and obligations.
Deletion, minimisation, and preservation rules are defined.

Common mistake

Letting logs expire before appeals, audits, or disputes could arise.

Next action

Define evidence retention for this AI workflow before deployment.

Privacy, access, and disclosure controls are defined.high / privacy

Decision evidence may contain personal, sensitive, confidential, or regulated information.

Why this matters and what to do

Why this matters

Strong evidence does not mean unlimited disclosure. AI audit records need access controls, minimisation, purpose limitation, and safe review paths.

What good looks like

Access to AI decision evidence is role-controlled.
Sensitive inputs and outputs are handled deliberately.
Disclosure boundaries are defined for audit, appeal, and external review.

Common mistake

Creating rich decision logs without controlling who can access or disclose them.

Next action

Define access controls and disclosure boundaries before retaining AI decision evidence at scale.

Completion

What stronger AI audit evidence looks like

You are in a stronger position when every AI-influenced decision can be connected to the input snapshot, prompt or instruction, model and configuration, output, human review, final decision, rationale, uncertainty, challenge route, and retention rule.

Stronger position

The affected decision and AI role are clearly defined.
The input snapshot is retained or reconstructable.
Prompts, instructions, queries, rules, or system tasks are recorded.
Model, provider, system, version, and relevant configuration are recorded.
AI outputs, scores, labels, recommendations, or summaries are retained.
Human review is evidenced through action and reasoning.
The final decision links to the AI evidence trail.
Uncertainty and limitations are recorded.
A challenge or appeal route can access the decision evidence.
Retention, privacy, and access controls are defined.

Weak position

The organisation cannot define where AI affects the decision.
Only the final decision is retained.
Inputs, prompts, outputs, model versions, or configurations are missing.
Human oversight is asserted but not evidenced.
Overrides and escalations disappear from the record.
The affected person cannot meaningfully challenge the outcome.
Logs expire before audit, appeal, investigation, or regulatory review.
Evidence records contain sensitive data without proper access controls.

Next steps

What to do from here.

If the AI workflow is not live yet

Do not deploy into regulated or high-stakes decisions until the evidence trail is designed. Define the decision ID, input snapshot, output record, human review record, final decision link, challenge route, and retention rule first.

If the AI workflow is already live

Map the current evidence gaps immediately. Start with missing inputs, outputs, model versions, human review records, and final decision links. Do not pretend later governance paperwork fixes missing decision evidence.

If AI-influenced decisions are already being challenged

Preserve relevant records immediately. Avoid overwriting model logs, prompts, source data, outputs, review notes, decision rationales, and appeal records. Consider legal, compliance, and forensic support before making conclusions.

This checklist prepares evidence. It does not decide legal outcomes, certify ownership, prove infringement, prove compliance, or replace professional advice.