Skip to main content

ML Platform

ML Platform is an AI product that provides infrastructure for training, deploying, and monitoring classical machine learning models.


Purpose

The ML Platform addresses standardizing the process of working with ML models. Instead of each team building its own infrastructure, the organization provides a single platform.

The platform team maintains the infrastructure; data scientists build models on top of it.


Platform Components

A typical ML Platform includes the following components:

  • Feature Store — centralized storage and management of features
  • Experiment Tracking — tracking experiments (MLflow, Weights & Biases)
  • Model Registry — a registry of model versions with metadata
  • Training Pipelines — automated training pipelines
  • Serving Infrastructure — infrastructure for inference (batch and real-time)
  • Monitoring — monitoring data drift, quality degradation, and anomalies

Use Cases

The ML Platform serves the following classes of tasks:

  • Scoring models — credit scoring, risk assessment, customer scoring
  • Forecasting — demand forecasting, financial forecasting, forecasting
  • Anomaly detection — fraud detection, transaction monitoring
  • Recommendation systems — offer personalization, next best action
  • Classification — categorization of documents, applications, and requests

Key Principles

The ML Platform is built on the following principles:

  • Reproducibility — any experiment can be reproduced; data, code, and parameters are recorded
  • Versioning — models, data, and configurations are versioned
  • Automatic retraining — models are retrained on a schedule or upon quality degradation
  • Monitoring — continuous quality control of models in production
  • Standardization — a unified process from experiment to deployment

Delivery Model

The ML Platform is a platform product with a clear separation of responsibilities:

RoleResponsibility
Platform TeamInfrastructure, tooling, CI/CD for models
Data ScientistsDeveloping and training models
ML EngineersProductionization, inference optimization
Data EngineersData preparation, feature pipelines

Model Monitoring

Monitoring in production includes:

  • Data drift — changes in the distribution of input data
  • Model drift — degradation of prediction quality
  • Performance — latency, throughput, error rate
  • Business metrics — the model's impact on business metrics

When degradation is detected, a retraining process or a rollback to the previous version is triggered.


Model Risk Management