Skip to main content

Data Governance

Purpose

This document defines how the AI Conveyor reviews data for initiatives: availability, quality, access, confidentiality, the legality of processing, and suitability for delivery.

Data is one of the main sources of initiative failure. If the data question is deferred until delivery, the team often finds out too late that the required source does not exist, access is impossible, quality is low, or the data cannot be used in the chosen environment.

Core ideas

  • The data owner must be known. You cannot build an initiative on a source that no one is responsible for.
  • Data quality is checked before delivery. At the assessment stage a preliminary check is enough, but before adoption you need facts.
  • Access must match the purpose. Access for analysis, training, validation, and production operation are different modes.
  • Sensitive data requires separate control. Personal data, banking secrecy, trade secrets, and customer information must not end up in unsuitable environments.
  • Minimization matters more than convenience. An initiative should use only the data that is truly needed for the result.

How it works

Data governance is embedded in the stage gates:

StageWhat is checkedTypical decision
Newwhether it is clear which data might be neededsend to assessment or clarify the brief
Assessmentwhether sources exist, who the owner is, whether there are constraintschoose a product, request access, defer, or reject
Deliveryquality, access, processing environment, masking, loggingallow development, restrict the environment, request rework
Awaiting impactwhether actual data is available to measure the resultconfirm the impact or revise the methodology
Supportsource stability, quality control, access changescontinue, reconsider, or stop the solution

Related sections: AI governance model, AI risks, architecture governance.


Minimum data card

For an initiative you need to record:

  • which data sources are used;
  • who owns each source;
  • what the data is used for;
  • which fields or datasets are needed;
  • whether there is personal data;
  • whether there is banking or trade secrecy;
  • where the data will be processed;
  • who gets access;
  • how long the data is retained;
  • how the data is deleted or anonymized;
  • which data quality metric is critical for the result.

This does not always need to be a separate large form. At early maturity, a set of mandatory fields on the initiative card plus clarification tasks is enough.


Data classification

Minimum classification:

ClassExamplesControl
Openpublic reference data, published materialsbasic source verification
Internalregulations, instructions, anonymized metricsaccess for employees and permitted environments only
Confidentialmanagement reporting, commercial data, contractsaccess restriction, logging, owner sign-off
Sensitivepersonal data, banking secrecy, customer transactionsseparate sign-off, minimization, masking, closed environment
Criticaldata affecting money, risk, legal actions, or securityextended control, independent review, rollback plan

The data class affects which AI product can be used, where the data may be processed, and which artifacts are needed to move forward.


Data quality check

Data quality is assessed not in the abstract, but relative to the task.

Minimum criteria:

  • completeness — is there enough data for the solution;
  • timeliness — is the data out of date;
  • accuracy — how well the data reflects the real process;
  • stability — does the source structure change without warning;
  • linkability — can the data be matched across systems;
  • reproducibility — can the calculation or training be repeated;
  • availability — can the data be obtained at the required frequency.

If data quality is unknown, an initiative may go into assessment, but it should not go into full delivery without a data check task.


Access and environments

For each initiative you need to distinguish access modes:

  • viewing data for assessment;
  • exporting a limited set for validation;
  • processing in an isolated environment;
  • training or tuning the solution;
  • production access in operation;
  • access by the AI assistant or an agentic scenario.

The rule: the higher the data sensitivity and the impact of the solution, the fewer manual exports and the stricter the processing environment.

For external or cloud services, you must separately verify:

  • whether data can be sent there;
  • where the information is physically processed;
  • who has access to the logs;
  • whether the data is used to train external models;
  • whether the data can be deleted on request;
  • whether there are contractual and regulatory constraints.

What blocks a transition

An initiative should not move into delivery if:

  • a data source has not been chosen;
  • the data owner is unknown;
  • the data is sensitive but there is no decision on the processing environment;
  • there is no permission to use the data for the chosen purpose;
  • data quality does not allow the hypothesis to be tested;
  • the AI product is unsuitable for the data class;
  • security gave a negative sign-off;
  • the impact cannot be measured from the available sources.

An initiative may move forward with a restriction if the risk is clear, there is an exception owner, and compensating actions are assigned.


The role of the AI assistant

The AI assistant can help to:

  • assemble a description of data sources;
  • ask questions about the owner and access;
  • prepare a draft classification;
  • point out processing risks;
  • create data check tasks;
  • prepare a description for security or architecture.

But the AI assistant must not authorize the use of sensitive data on its own. The decision rests with the data owner, security, and the accountable roles.


Anti-patterns

Bad data governance:

  • "we'll find the data later";
  • exports to personal computers;
  • training on data without a purpose or retention period;
  • no source owner;
  • using customer data in an unsuitable environment;
  • impact measured by a metric that is not accessible;
  • data quality checked after a prototype has been built.

Good data governance makes data part of the early assessment, not a late obstacle.