Data Governance

Purpose

This document defines how the AI Conveyor reviews data for initiatives: availability, quality, access, confidentiality, the legality of processing, and suitability for delivery.

Data is one of the main sources of initiative failure. If the data question is deferred until delivery, the team often finds out too late that the required source does not exist, access is impossible, quality is low, or the data cannot be used in the chosen environment.

Core ideas

The data owner must be known. You cannot build an initiative on a source that no one is responsible for.
Data quality is checked before delivery. At the assessment stage a preliminary check is enough, but before adoption you need facts.
Access must match the purpose. Access for analysis, training, validation, and production operation are different modes.
Sensitive data requires separate control. Personal data, banking secrecy, trade secrets, and customer information must not end up in unsuitable environments.
Minimization matters more than convenience. An initiative should use only the data that is truly needed for the result.

How it works

Data governance is embedded in the stage gates:

Stage	What is checked	Typical decision
New	whether it is clear which data might be needed	send to assessment or clarify the brief
Assessment	whether sources exist, who the owner is, whether there are constraints	choose a product, request access, defer, or reject
Delivery	quality, access, processing environment, masking, logging	allow development, restrict the environment, request rework
Awaiting impact	whether actual data is available to measure the result	confirm the impact or revise the methodology
Support	source stability, quality control, access changes	continue, reconsider, or stop the solution

Related sections: AI governance model, AI risks, architecture governance.

Minimum data card

For an initiative you need to record:

which data sources are used;
who owns each source;
what the data is used for;
which fields or datasets are needed;
whether there is personal data;
whether there is banking or trade secrecy;
where the data will be processed;
who gets access;
how long the data is retained;
how the data is deleted or anonymized;
which data quality metric is critical for the result.

This does not always need to be a separate large form. At early maturity, a set of mandatory fields on the initiative card plus clarification tasks is enough.

Data classification

Minimum classification:

Class	Examples	Control
Open	public reference data, published materials	basic source verification
Internal	regulations, instructions, anonymized metrics	access for employees and permitted environments only
Confidential	management reporting, commercial data, contracts	access restriction, logging, owner sign-off
Sensitive	personal data, banking secrecy, customer transactions	separate sign-off, minimization, masking, closed environment
Critical	data affecting money, risk, legal actions, or security	extended control, independent review, rollback plan

The data class affects which AI product can be used, where the data may be processed, and which artifacts are needed to move forward.

Data quality check

Data quality is assessed not in the abstract, but relative to the task.

Minimum criteria:

completeness — is there enough data for the solution;
timeliness — is the data out of date;
accuracy — how well the data reflects the real process;
stability — does the source structure change without warning;
linkability — can the data be matched across systems;
reproducibility — can the calculation or training be repeated;
availability — can the data be obtained at the required frequency.

If data quality is unknown, an initiative may go into assessment, but it should not go into full delivery without a data check task.

Access and environments

For each initiative you need to distinguish access modes:

viewing data for assessment;
exporting a limited set for validation;
processing in an isolated environment;
training or tuning the solution;
production access in operation;
access by the AI assistant or an agentic scenario.

The rule: the higher the data sensitivity and the impact of the solution, the fewer manual exports and the stricter the processing environment.

For external or cloud services, you must separately verify:

whether data can be sent there;
where the information is physically processed;
who has access to the logs;
whether the data is used to train external models;
whether the data can be deleted on request;
whether there are contractual and regulatory constraints.

What blocks a transition

An initiative should not move into delivery if:

a data source has not been chosen;
the data owner is unknown;
the data is sensitive but there is no decision on the processing environment;
there is no permission to use the data for the chosen purpose;
data quality does not allow the hypothesis to be tested;
the AI product is unsuitable for the data class;
security gave a negative sign-off;
the impact cannot be measured from the available sources.

An initiative may move forward with a restriction if the risk is clear, there is an exception owner, and compensating actions are assigned.

The role of the AI assistant

The AI assistant can help to:

assemble a description of data sources;
ask questions about the owner and access;
prepare a draft classification;
point out processing risks;
create data check tasks;
prepare a description for security or architecture.

But the AI assistant must not authorize the use of sensitive data on its own. The decision rests with the data owner, security, and the accountable roles.

Anti-patterns

Bad data governance:

"we'll find the data later";
exports to personal computers;
training on data without a purpose or retention period;
no source owner;
using customer data in an unsuitable environment;
impact measured by a metric that is not accessible;
data quality checked after a prototype has been built.

Good data governance makes data part of the early assessment, not a late obstacle.

Purpose​

Core ideas​

How it works​

Minimum data card​

Data classification​

Data quality check​

Access and environments​

What blocks a transition​

The role of the AI assistant​

Anti-patterns​