ML Platform
ML Platform is an AI product that provides infrastructure for training, deploying, and monitoring classical machine learning models.
Purpose
The ML Platform addresses standardizing the process of working with ML models. Instead of each team building its own infrastructure, the organization provides a single platform.
The platform team maintains the infrastructure; data scientists build models on top of it.
Platform Components
A typical ML Platform includes the following components:
- Feature Store — centralized storage and management of features
- Experiment Tracking — tracking experiments (MLflow, Weights & Biases)
- Model Registry — a registry of model versions with metadata
- Training Pipelines — automated training pipelines
- Serving Infrastructure — infrastructure for inference (batch and real-time)
- Monitoring — monitoring data drift, quality degradation, and anomalies
Use Cases
The ML Platform serves the following classes of tasks:
- Scoring models — credit scoring, risk assessment, customer scoring
- Forecasting — demand forecasting, financial forecasting, forecasting
- Anomaly detection — fraud detection, transaction monitoring
- Recommendation systems — offer personalization, next best action
- Classification — categorization of documents, applications, and requests
Key Principles
The ML Platform is built on the following principles:
- Reproducibility — any experiment can be reproduced; data, code, and parameters are recorded
- Versioning — models, data, and configurations are versioned
- Automatic retraining — models are retrained on a schedule or upon quality degradation
- Monitoring — continuous quality control of models in production
- Standardization — a unified process from experiment to deployment
Delivery Model
The ML Platform is a platform product with a clear separation of responsibilities:
| Role | Responsibility |
|---|---|
| Platform Team | Infrastructure, tooling, CI/CD for models |
| Data Scientists | Developing and training models |
| ML Engineers | Productionization, inference optimization |
| Data Engineers | Data preparation, feature pipelines |
Model Monitoring
Monitoring in production includes:
- Data drift — changes in the distribution of input data
- Model drift — degradation of prediction quality
- Performance — latency, throughput, error rate
- Business metrics — the model's impact on business metrics
When degradation is detected, a retraining process or a rollback to the previous version is triggered.