Before AI affects a regulated decision, make the evidence trail non-negotiable.
AI governance without evidence is theatre. This checklist helps you preserve the records needed to show inputs, outputs, human review, accountability, challenge routes, and decision context.
AI audit evidence12 minAdvanced18 checks
Primary question
Can you evidence how AI affected the decision, who remained accountable, and how the outcome can be challenged?
Use this checklist before allowing AI, automation, scoring, recommendation, classification, ranking, triage, or decision-support systems to affect regulated or high-stakes decisions.
The issue is not whether AI is present. The issue is whether the organisation can later show what the system did, what data it used, who approved it, how humans reviewed it, how errors were challenged, and what evidence survives after the decision.
Evidence goal
Create an audit-ready evidence trail before AI influences regulated, consequential, or contestable decisions.
This checklist helps prepare audit evidence for AI-influenced decisions. It does not certify legal compliance, model safety, fairness, accuracy, explainability, regulatory approval, or lawful use by itself.
Working principles
What this checklist is trying to protect.
A decision trail must survive the decision.
If the organisation cannot later show what the AI system saw, produced, influenced, and triggered, the decision trail is weak.
Human oversight must be evidenced.
A human-in-the-loop claim is empty unless the record shows what the human reviewed, changed, approved, rejected, or escalated.
Inputs and outputs must be reconstructable.
A decision cannot be audited properly if prompts, source data, model outputs, scores, explanations, thresholds, and versions are missing.
Appeal requires evidence.
A challenge or appeal route is hollow if the affected person or reviewer cannot understand the basis of the decision.
Section 1
Decision scope
Define exactly where AI affects the workflow and what kind of decision is being influenced.
0%complete
Name the decision, workflow, stage, user group, and consequence.
Why this matters and what to do
Why this matters
AI evidence fails when the organisation cannot define what decision the system actually affects. A vague AI use case is not enough for audit, accountability, or challenge.
What good looks like
The decision or workflow is named.
The stage where AI appears is identified.
The affected person, customer, citizen, employee, applicant, claimant, or user group is clear.
The consequence of the decision is understood.
Common mistake
Describing the system as an AI assistant without defining the exact decision it influences.
Next action
Write one plain sentence stating the decision, affected group, and consequence before deployment.
Distinguish generation, scoring, ranking, classification, recommendation, triage, summarisation, detection, or final decision.
Why this matters and what to do
Why this matters
Audit requirements change depending on whether AI merely supports a human, ranks cases, screens people out, recommends action, generates evidence, or effectively decides the outcome.
What good looks like
The AI function is classified.
The system boundary between support and decision influence is clear.
The workflow does not hide a de facto automated decision behind human review language.
Common mistake
Calling AI advisory when humans almost always follow the output without meaningful review.
Next action
Classify the AI role and identify whether it is advisory, influential, gating, or determinative.
Identify whether the workflow touches legal rights, access, eligibility, money, health, safety, employment, education, public services, or protected interests.
Why this matters and what to do
Why this matters
AI in consequential workflows needs stronger evidence than AI used for low-risk productivity. The evidence burden rises when people can be harmed, excluded, misclassified, overcharged, denied, or unfairly treated.
What good looks like
The regulated area or high-stakes consequence is stated.
Relevant governance, compliance, or review duties are identified.
The workflow is not treated as low-risk merely because AI is not the formal final decision-maker.
Common mistake
Treating AI-assisted triage, scoring, or prioritisation as harmless because a human signs off at the end.
Next action
Mark the workflow as regulated, high-stakes, or ordinary, and document the reason.
Section 2
Input evidence
Preserve what the AI system received and what context shaped the output.
0%complete
Retain the data, documents, records, fields, files, or case information supplied to the system.
Why this matters and what to do
Why this matters
An AI-influenced decision cannot be properly audited if no one can reconstruct what information the system received. Missing inputs make error detection, fairness review, appeal, and accountability harder.
What good looks like
The relevant input fields, documents, files, or case data are retained.
The input snapshot is linked to the decision record.
Later edits to source data do not erase what the AI originally saw.
Common mistake
Logging only the final decision while allowing the input data snapshot to disappear or change.
Next action
Define what input snapshot must be retained for each AI-influenced decision.
Record what the AI was asked to do, not just what it answered.
Why this matters and what to do
Why this matters
Outputs depend heavily on instructions. Without the prompt, query, tool instruction, retrieval context, or system task, later review cannot determine whether the output was appropriate or distorted by framing.
What good looks like
The user prompt or system instruction is retained.
The task objective is identifiable.
Any retrieval query, tool call, or rule condition is logged where relevant.
Common mistake
Keeping the AI output but losing the instruction that produced it.
Next action
Add prompt or instruction capture to the decision audit log.
Document missing, uncertain, stale, partial, disputed, inferred, or low-confidence inputs.
Why this matters and what to do
Why this matters
AI systems can make weak data look authoritative. Audit evidence should show when input data was incomplete, stale, contested, uncertain, or unsuitable for the decision.
What good looks like
Known source limitations are recorded.
Missing or uncertain data is flagged.
The AI output is not presented as stronger than the input basis allows.
Common mistake
Feeding uncertain data into AI and preserving only the polished output.
Next action
Add an input-quality field or note to the decision evidence record.
Section 3
Model and system evidence
Record which system produced the output and what configuration affected it.
0%complete
Record the AI system identity and version used for the decision.
Why this matters and what to do
Why this matters
AI systems change. A later reviewer cannot assume the current model behaves like the model used at the time of the decision. Version evidence is central to reproducibility and accountability.
What good looks like
Provider or system name is recorded.
Model name or version is recorded where available.
Deployment, configuration, or release version is retained for internal systems.
Common mistake
Logging that AI was used without recording which system or version generated the output.
Next action
Add model, system, provider, and version fields to the audit record.
Capture the settings that shaped the AI output or decision influence.
Why this matters and what to do
Why this matters
Outputs may depend on thresholds, temperature, policies, retrieval sources, scoring weights, safety filters, ranking logic, or business rules. Without configuration evidence, the decision path may not be reconstructable.
What good looks like
Relevant thresholds or rules are retained.
Retrieval sources or knowledge bases are identified where used.
Configuration changes are versioned.
Common mistake
Treating the model name as enough while ignoring the configuration that actually shaped the result.
Next action
Define which configuration fields must be logged for this workflow.
Keep deployment, policy, prompt, retrieval, threshold, and model-change records.
Why this matters and what to do
Why this matters
AI audit evidence becomes weak if system behaviour changed but the organisation cannot show when, why, and under whose authority.
What good looks like
Relevant change logs exist.
Changes have dates and approvers.
Decision records can be connected to the system state at the time.
Common mistake
Updating prompts, rules, or models without preserving the earlier decision context.
Next action
Link decision records to the system version and change log.
Section 4
Output and human review
Preserve what the AI produced and what humans did with it.
0%complete
Keep the actual AI output that influenced the decision.
Why this matters and what to do
Why this matters
If the output is not retained, the organisation may be unable to show whether the AI influenced the decision appropriately, incorrectly, unfairly, or at all.
What good looks like
The AI output is stored with the decision record.
Scores, labels, rankings, recommendations, generated text, or risk flags are retained where relevant.
The retained output can be matched to the input and model version.
Common mistake
Keeping only the final human decision and discarding the AI output that shaped it.
Next action
Add output capture to the workflow before AI affects decisions.
Record what the human reviewed, changed, accepted, rejected, escalated, or overrode.
Why this matters and what to do
Why this matters
A human-in-the-loop claim is weak if the record only shows that a human clicked approve. Meaningful oversight needs evidence of review, judgement, and authority.
What good looks like
The reviewer identity or role is recorded.
The reviewer saw the relevant input, output, and context.
Any acceptance, rejection, override, escalation, or modification is recorded.
Rubber-stamp review is detectable.
Common mistake
Claiming human oversight when the human routinely accepts the AI recommendation without documented reasoning.
Next action
Add a human review record that captures action and reasoning, not just approval.
Preserve when humans disagree with AI or when cases need escalation.
Why this matters and what to do
Why this matters
Overrides and escalations are important evidence of meaningful control. They also reveal failure modes, bias, uncertainty, and workflow pressure.
What good looks like
Override reasons are recorded.
Escalation triggers are defined.
Disagreements between AI and human judgement are retained for review.
Common mistake
Designing the workflow so AI disagreement disappears into the final decision record.
Next action
Add override and escalation fields before the workflow goes live.
Section 5
Decision record
Ensure the final decision can be connected to the evidence that influenced it.
0%complete
Connect the final outcome to inputs, AI output, human review, and decision rationale.
Why this matters and what to do
Why this matters
A final decision without its evidence trail is difficult to audit or challenge. The record should show how the decision was reached, not just what the outcome was.
What good looks like
The final decision record links to the input snapshot.
The AI output is linked.
Human review and rationale are linked.
The decision time and responsible role are recorded.
Common mistake
Storing the final decision in one system and the AI logs somewhere else with no stable connection.
Next action
Define a stable decision ID that links all evidence objects.
The record should explain why the outcome was reached.
Why this matters and what to do
Why this matters
A model score or generated output may not be enough for accountability. A human-readable rationale helps reviewers, auditors, affected people, and appeal handlers understand the basis of the outcome.
What good looks like
The rationale identifies the main reasons for the outcome.
The rationale distinguishes AI output from human judgement.
The rationale is clear enough for review.
Common mistake
Treating a score, label, or model explanation as a complete decision rationale.
Next action
Require a short rationale for AI-influenced decisions.
Preserve low confidence, missing data, ambiguous results, and known limitations.
Why this matters and what to do
Why this matters
AI systems often create false confidence. The audit record should show when the output was uncertain, incomplete, borderline, or dependent on weak inputs.
What good looks like
Confidence or uncertainty is recorded where available.
Known limitations are retained.
Borderline or low-confidence cases trigger review or escalation.
Common mistake
Presenting uncertain AI outputs as clean, authoritative decisions.
Next action
Add uncertainty and limitation fields to the decision record.
Section 6
Challenge and retention
Make sure the decision can be reviewed, challenged, explained, and recovered later.
0%complete
Affected people and reviewers need a route to question AI-influenced outcomes.
Why this matters and what to do
Why this matters
Contestability is weak if there is no practical path to challenge the decision. The evidence trail should support review, correction, escalation, and explanation.
What good looks like
There is a defined review or appeal process.
The process can access the relevant evidence trail.
Correction or override outcomes are recorded.
Common mistake
Offering a generic complaints route that cannot access or interpret the AI decision evidence.
Next action
Define how an AI-influenced decision can be challenged and what evidence review will use.
Keep evidence long enough for audit, appeal, legal, regulatory, and operational needs.
Why this matters and what to do
Why this matters
Evidence that is deleted too soon cannot support audits, appeals, complaints, investigations, or regulatory review. Evidence retained too long may create privacy and governance problems.
What good looks like
Retention period is documented.
Retention aligns with the decision type and obligations.
Deletion, minimisation, and preservation rules are defined.
Common mistake
Letting logs expire before appeals, audits, or disputes could arise.
Next action
Define evidence retention for this AI workflow before deployment.
Decision evidence may contain personal, sensitive, confidential, or regulated information.
Why this matters and what to do
Why this matters
Strong evidence does not mean unlimited disclosure. AI audit records need access controls, minimisation, purpose limitation, and safe review paths.
What good looks like
Access to AI decision evidence is role-controlled.
Sensitive inputs and outputs are handled deliberately.
Disclosure boundaries are defined for audit, appeal, and external review.
Common mistake
Creating rich decision logs without controlling who can access or disclose them.
Next action
Define access controls and disclosure boundaries before retaining AI decision evidence at scale.
Completion
What stronger AI audit evidence looks like
You are in a stronger position when every AI-influenced decision can be connected to the input snapshot, prompt or instruction, model and configuration, output, human review, final decision, rationale, uncertainty, challenge route, and retention rule.
Stronger position
The affected decision and AI role are clearly defined.
The input snapshot is retained or reconstructable.
Prompts, instructions, queries, rules, or system tasks are recorded.
Model, provider, system, version, and relevant configuration are recorded.
AI outputs, scores, labels, recommendations, or summaries are retained.
Human review is evidenced through action and reasoning.
The final decision links to the AI evidence trail.
Uncertainty and limitations are recorded.
A challenge or appeal route can access the decision evidence.
Retention, privacy, and access controls are defined.
Weak position
The organisation cannot define where AI affects the decision.
Only the final decision is retained.
Inputs, prompts, outputs, model versions, or configurations are missing.
Human oversight is asserted but not evidenced.
Overrides and escalations disappear from the record.
The affected person cannot meaningfully challenge the outcome.
Logs expire before audit, appeal, investigation, or regulatory review.
Evidence records contain sensitive data without proper access controls.
Next steps
What to do from here.
If the AI workflow is not live yet
Do not deploy into regulated or high-stakes decisions until the evidence trail is designed. Define the decision ID, input snapshot, output record, human review record, final decision link, challenge route, and retention rule first.
If the AI workflow is already live
Map the current evidence gaps immediately. Start with missing inputs, outputs, model versions, human review records, and final decision links. Do not pretend later governance paperwork fixes missing decision evidence.
If AI-influenced decisions are already being challenged
Preserve relevant records immediately. Avoid overwriting model logs, prompts, source data, outputs, review notes, decision rationales, and appeal records. Consider legal, compliance, and forensic support before making conclusions.
This checklist prepares evidence. It does not decide legal outcomes, certify ownership, prove infringement, prove compliance, or replace professional advice.