Skip to main content

Data Warehouse

Why this function belongs in the AI Operating Model

An AI initiative without data is a presentation, not a product. Yet data usually lives in the data warehouse, the data lake, and dozens of source systems owned by a separate data function. If the AI Operating Model isn't connected to the data warehouse, the typical thing happens: the initiative is approved, the budget is allocated, but the needed data isn't there, is of poor quality, or can't be used for legal reasons.

The data warehouse engages early — so that a data feasibility assessment is part of an initiative's intake qualification, not a surprise in the middle of delivery.

Where it engages

AI Operating Model stageRole of the data warehouse / data team
AssessmentConfirms the availability, accessibility, and suitability of data
DeliveryGrants access to data marts, prepares datasets, sets up pipelines
Before productionLocks in the data contract and SLAs for freshness/quality
Impact confirmationSupplies data to calculate metrics and impact

What the function receives as input

  • The use-case scenario: what data is needed, in what volume, and with what freshness.
  • Quality requirements and the processing regime (together with InfoSec — classification).
  • The purpose of the data: training, RAG context, analytics, impact calculation.

What the function delivers as output

  • Access to sources and data marts (or a justified refusal with an alternative).
  • An assessment of data readiness: completeness, quality, history, documentation.
  • A data contract — an agreed structure, freshness, owner, and SLA.

Key touchpoint artifacts

  • Data contract — what the data is, who owns it, and what quality and SLA are guaranteed.
  • Initiative/product data mart — a prepared data layer for RAG, training, or analytics.
  • Impact metrics — the data on which an initiative's confirmed impact is calculated.

Anti-patterns

  • "We'll find the data along the way." The initiative starts without checking the data and stalls in delivery. The cure is a mandatory data assessment at the assessment stage.
  • Manual exports instead of data marts. The pilot runs on a one-off export that can't be reproduced in production.
  • No data contract. The source changes, the pipeline breaks, and no one is accountable for freshness and quality.
  • No one to calculate the impact. The data for metrics wasn't planned in advance, and there's nothing to confirm impact with.