Data Warehouse
Why this function belongs in the AI Operating Model
An AI initiative without data is a presentation, not a product. Yet data usually lives in the data warehouse, the data lake, and dozens of source systems owned by a separate data function. If the AI Operating Model isn't connected to the data warehouse, the typical thing happens: the initiative is approved, the budget is allocated, but the needed data isn't there, is of poor quality, or can't be used for legal reasons.
The data warehouse engages early — so that a data feasibility assessment is part of an initiative's intake qualification, not a surprise in the middle of delivery.
Where it engages
| AI Operating Model stage | Role of the data warehouse / data team |
|---|---|
| Assessment | Confirms the availability, accessibility, and suitability of data |
| Delivery | Grants access to data marts, prepares datasets, sets up pipelines |
| Before production | Locks in the data contract and SLAs for freshness/quality |
| Impact confirmation | Supplies data to calculate metrics and impact |
What the function receives as input
- The use-case scenario: what data is needed, in what volume, and with what freshness.
- Quality requirements and the processing regime (together with InfoSec — classification).
- The purpose of the data: training, RAG context, analytics, impact calculation.
What the function delivers as output
- Access to sources and data marts (or a justified refusal with an alternative).
- An assessment of data readiness: completeness, quality, history, documentation.
- A data contract — an agreed structure, freshness, owner, and SLA.
Key touchpoint artifacts
- Data contract — what the data is, who owns it, and what quality and SLA are guaranteed.
- Initiative/product data mart — a prepared data layer for RAG, training, or analytics.
- Impact metrics — the data on which an initiative's confirmed impact is calculated.
Anti-patterns
- "We'll find the data along the way." The initiative starts without checking the data and stalls in delivery. The cure is a mandatory data assessment at the assessment stage.
- Manual exports instead of data marts. The pilot runs on a one-off export that can't be reproduced in production.
- No data contract. The source changes, the pipeline breaks, and no one is accountable for freshness and quality.
- No one to calculate the impact. The data for metrics wasn't planned in advance, and there's nothing to confirm impact with.