

Introduction
Machine Learning has moved from research notebooks to mission‑critical services such as recommendation engines, fraud detectors, and predictive maintenance. Yet many teams still struggle to move models from experiment to production reliably. MLOps—the practice of applying DevOps rigor to the entire ML lifecycle—solves this problem by automating, versioning, and monitoring every step.
This guide gives developers, data scientists, and platform engineers a practical, reproducible blueprint for building production‑grade ML systems.
The End‑to‑End MLOps Lifecycle
| Stage | Primary Goal | Typical Artifacts |
|---|---|---|
| 1️⃣ Data collection & ingestion | Gather raw signals, ensure lineage | Raw files, streaming topics, data catalog entries |
| 2️⃣ Feature engineering & storage | Transform raw data into model‑ready features | Feature definitions, feature store tables |
| 3️⃣ Model training & experimentation | Iterate quickly, track performance | Code, hyper‑parameters, metrics, artifacts |
| 4️⃣ Validation & testing | Verify functional & non‑functional requirements | Unit tests, bias checks, performance thresholds |
| 5️⃣ Model packaging & registration | Freeze a reproducible version | Docker image, model binary, registry entry |
| 6️⃣ Deployment (online/offline) | Serve predictions at scale | REST/GRPC endpoint, batch jobs, edge firmware |
| 7️⃣ Monitoring & observability | Detect drift, latency spikes, errors | Metrics dashboards, alert rules |
| 8️⃣ Automated retraining & rollback | Keep the model fresh and safe | Retraining triggers, CI pipelines, version rollbacks |
Key Insight: Every stage must be automated, versioned, and observable. Missing a link creates hidden technical debt that shows up as production incidents.
Core MLOps Components & Recommended Open‑Source Tools
| Component | What It Solves | Popular Open‑Source Choices | Quick Pros / Cons |
|---|---|---|---|
| Data Management & Feature Store | Centralize raw data, enforce schema, enable feature reuse |
|
Feast is lightweight and integrates with Spark/Flink; Hopsworks adds UI and governance but is heavier. |
| Experiment Tracking & Model Registry | Log metrics, compare runs, store model binaries |
|
MLflow is open source, easy to self‑host, multi‑language support. |
| Pipeline Orchestration | Define reproducible DAGs for training, validation, deployment |
|
Kubeflow is native to Kubernetes and GPU‑friendly; Airflow is mature but less ML‑centric. |
| Model Serving | Low‑latency inference, scaling, A/B testing |
|
KServe offers serverless on K8s and supports many frameworks. |
| Monitoring & Drift Detection | Track latency, accuracy, data drift, resource usage |
|
Evidently provides quick open‑source dashboards. |
| CI/CD for ML | Automate build‑test‑deploy cycles for models |
|
Argo shines in Kubernetes‑centric environments. |
| Security & Governance | Enforce access controls, audit trails, data privacy |
|
OPA enables policy‑as‑code across K8s, CI, and serving layers. |
Tip: Start with one tool per component that integrates well with your existing stack. You can replace or add tools later without re‑architecting the whole pipeline.
Hands‑On Minimal Production‑Ready Pipeline
We’ll build a toy loan‑default predictor using the public UCI Credit Card dataset. The stack includes Spark for ingestion, Feast as a feature store, MLflow for tracking, Kubeflow Pipelines for orchestration, KServe for serving, and Prometheus + Evidently for monitoring.
Prerequisites (Ubuntu 22.04, Python 3.11)
# System packages
sudo apt-get update && sudo apt-get install -y docker.io kind kubectl
# Python environment
python -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install \
pandas scikit-learn \
mlflow feast[sqlite] \
kfp==2.5.0 \
kserve==0.11.0 \
prometheus-client \
evidently==0.4.2
In production you’d use a managed K8s service (EKS, GKE, AKS) and a remote Feast backend such as Redis or BigQuery.
Step‑by‑Step Code Walkthrough
1️⃣ Ingest & Register Features with Feast
# feature_repo/feature_store.py
import pandas as pd
from feast import FeatureStore, Entity, FeatureView, FileSource, ValueType
# Define a source (CSV on local disk)
source = FileSource(
path="data/creditcard.csv",
event_timestamp_column="Timestamp",
created_timestamp_column="CreatedAt",
)
# Define an entity – the primary key
customer = Entity(name="customer_id", join_keys=["CustomerID"], value_type=ValueType.INT64)
# Define a FeatureView (the feature set)
features = FeatureView(
name="customer_features",
entities=[customer],
ttl=86400, # 1 day cache
schema=[
("Age", ValueType.INT64),
("Income", ValueType.FLOAT),
("CreditScore", ValueType.FLOAT),
("NumOpenAccounts", ValueType.INT64),
],
source=source,
online=True,
)
# Materialize to the online store (SQLite in this demo)
if __name__ == "__main__":
store = FeatureStore(repo_path=".")
store.apply([customer, features])
store.materialize_incremental(end_date=pd.Timestamp.utcnow())
Run python feature_repo/feature_store.py to create a local SQLite online_store.db that will serve low‑latency feature lookups during inference.
2️⃣ Experiment Tracking with MLflow
# training/train.py
import mlflow
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("loan-default-prediction")
def main():
df = pd.read_csv("data/creditcard.csv")
X = df[["Age", "Income", "CreditScore", "NumOpenAccounts"]]
y = df["Default"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
with mlflow.start_run():
mlflow.log_params({"n_estimators": 200, "learning_rate": 0.05, "max_depth": 3})
model = GradientBoostingClassifier(
n_estimators=200, learning_rate=0.05, max_depth=3, random_state=42
)
model.fit(X_train, y_train)
preds = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, preds)
mlflow.log_metric("test_auc", auc)
mlflow.sklearn.log_model(model, artifact_path="model")
print(f"Run completed – Test AUC: {auc:.4f}")
if __name__ == "__main__":
main()
Start the MLflow server in another terminal:
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./mlruns \
--host 0.0.0.0 --port 5000
After the script finishes, open to compare runs, then register the best model:http://localhost:5000
from mlflow.tracking import MlflowClient
client = MlflowClient()
run_id = "YOUR_RUN_ID"
client.create_registered_model("loan-default-model")
client.create_model_version(
name="loan-default-model",
source=f"runs:/{run_id}/model",
run_id=run_id,
)
3️⃣ Orchestrate with Kubeflow Pipelines
# pipeline/pipeline.py
import kfp
from kfp import dsl
from kfp.dsl import ContainerOp
def data_preprocess_op():
return ContainerOp(
name="Preprocess",
image="python:3.11-slim",
command=["python", "-c"],
arguments=["""
import pandas as pd
df = pd.read_csv('/data/creditcard.csv')
df.to_parquet('/output/features.parquet')
"""],
file_outputs={"features": "/output/features.parquet"},
pvolumes={"/data": kfp.dsl.PipelineVolume(pvc="data-pvc"),
"/output": kfp.dsl.PipelineVolume(pvc="output-pvc")},
)
def train_op(features_path: str):
return ContainerOp(
name="Train",
image="python:3.11-slim",
command=["python", "-m", "training.train"],
arguments=["--features", features_path],
pvolumes={"/data": kfp.dsl.PipelineVolume(pvc="output-pvc")},
)
@dsl.pipeline(
name="Loan Default Prediction",
description="From raw CSV → feature store → model registry → KServe endpoint",
)
def loan_default_pipeline():
preprocess = data_preprocess_op()
train = train_op(preprocess.outputs["features"])
if __name__ == "__main__":
kfp.Client(host="http://localhost:8080").create_run_from_pipeline_func(
loan_default_pipeline, arguments={}
)
Deploy the pipeline via the Kubeflow UI (port‑forward kubectl port-forward svc/ml-pipeline-ui 8080:80 and open ).http://localhost:8080
4️⃣ Serve the Model with KServe
# kserve/loan-default.yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: loan-default
spec:
predictor:
sklearn:
storageUri: "gs://my-bucket/mlflow/loan-default-model/1/model"
resources:
requests:
cpu: "500m"
memory: "1Gi"
Apply the manifest: kubectl apply -f kserve/loan-default.yaml. Test the endpoint:
curl -X POST http:///v1/models/loan-default:predict \
-H "Content-Type: application/json" \
-d '{"instances": [[45, 72000, 680, 3]]}'
The response contains the default probability.
5️⃣ Monitoring, Drift Detection, and Automated Retraining
- Prometheus Exporter: Add a sidecar to the KServe pod that emits request latency and error counters.
- Evidently Drift Dashboard: Nightly job compares live feature distribution against the training baseline.
# monitoring/drift_check.py import pandas as pd from evidently.dashboard import Dashboard from evidently.tabs import DataDriftTab baseline = pd.read_parquet("gs://my-bucket/training_features.parquet") current = pd.read_parquet("gs://my-bucket/live_features.parquet") dashboard = Dashboard(tabs=[DataDriftTab()]) dashboard.calculate(baseline, current) dashboard.save("drift_report.html") - Retraining Trigger: An Alertmanager rule fires when
evidently_data_drift_score > 0.2for three consecutive checks, invoking a webhook that restarts the Kubeflow pipeline.
# alertmanager/rules.yml
groups:
- name: mlops-retrain
rules:
- alert: DataDriftDetected
expr: evident_data_drift_score > 0.2
for: 15m
labels:
severity: critical
annotations:
summary: "Data drift exceeds threshold"
runbook_url: "https://github.com/yourorg/mlops-runbooks#retrain"
Tool Comparison Matrix
| Dimension | MLflow | Kubeflow | Airflow | Dagster |
|---|---|---|---|---|
| Primary Focus | Experiment tracking & model registry | End‑to‑end ML pipelines on K8s | General workflow orchestration | Data‑centric pipelines with strong typing |
| Kubernetes Native? | No (but container‑friendly) | Yes (full K8s integration) | No | Yes (via Dagster‑K8s) |
| UI for Run Comparison | ✅ Rich UI | ✅ Kubeflow Pipelines UI | ❌ Requires plugins | ✅ Dagit |
| Built‑in Feature Store | ❌ | ✅ (Feast integration) | ❌ | ❌ |
| CI/CD Friendly | ✅ CLI + REST | ✅ Pipeline as code | ✅ DAG as code | ✅ |
| Learning Curve | Low | Moderate‑High | Moderate | Moderate |
| Best For | Small‑to‑medium teams needing quick tracking | Large GPU‑heavy workloads on K8s | Legacy ETL + occasional ML | Teams that love typed pipelines and testing |
Recommendation: Begin with MLflow + Airflow if you don’t have Kubernetes. When you adopt K8s, migrate training pipelines to Kubeflow while keeping MLflow as the model registry.
Common MLOps Pitfalls & Mitigations
| Pitfall | Why It Happens | Mitigation |
|---|---|---|
| Data drift goes unnoticed | No systematic monitoring; reliance on ad‑hoc checks | Deploy automated drift dashboards (Evidently, WhyLabs) and set alert thresholds. |
| Model version confusion | Manual copy‑pasting of model files, ambiguous naming | Enforce a single source of truth—the model registry (MLflow, Sagemaker). |
| Environment mismatch (dev vs prod) | Different library versions, hardware (CPU vs GPU) | Package every stage in Docker images and version‑control the Dockerfiles. |
| Unreliable CI pipelines | Flaky tests, hidden state (random seeds) | Pin random seeds, enable deterministic training, run smoke tests after each deployment. |
| Security gaps | Exposed model artifacts, unsecured data pipelines | Apply IAM policies, encrypt data at rest, use OPA for policy enforcement on K8s resources. |
| Cost runaway | Unlimited auto‑scaling of GPU pods | Set resource quotas, use spot instances for batch training, monitor cloud cost dashboards. |
Career & Business Impact of MLOps
| Aspect | Impact |
|---|---|
| Speed to Market | Automated CI/CD shrinks model rollout from weeks to days. |
| Reliability | Proven rollback mechanisms cut downtime after a bad release. |
| Governance | Auditable pipelines satisfy GDPR, HIPAA, and other regulations. |
| Team Collaboration | Clear hand‑offs (data → model → ops) improve morale and reduce ownership friction. |
| Cost Efficiency | Resource‑aware pipelines avoid over‑provisioned GPUs; auto‑retraining runs only when needed. |
| Career Growth | MLOps expertise bridges data science and platform engineering—highly marketable in 2024‑2025. |
TL;DR – Actionable Checklist
- ✅ Define a feature store (Feast, Hopsworks) and register all raw‑to‑feature transformations.
- ✅ Install MLflow (or your preferred tracker) as the single source of truth for models.
- ✅ Choose an orchestrator (Kubeflow for K8s, Airflow otherwise) and codify the full lifecycle as a DAG/pipeline.
- ✅ Containerize training and inference code; store images in a private registry.
- ✅ Deploy with KServe (or Seldon) and expose Prometheus metrics.
- ✅ Add drift & bias monitoring (Evidently) and set up alerting for automated retraining.
- ✅ Document runbooks for rollback, data‑quality checks, and compliance audits.
Conclusion
MLOps is not a single tool—it’s a discipline that stitches together data engineering, model development, DevOps automation, and continuous monitoring. By adopting the practices and stack outlined above, teams can ship models faster, guarantee reproducibility, and detect problems early.
Start small, iterate, and let the feedback loop between data, models, and operations drive continuous improvement. Your future self—and your stakeholders—will thank you.
Meta Description: Learn how to bring DevOps discipline to machine learning with a 2025 MLOps guide, covering lifecycle, tools, hands‑on pipeline, pitfalls, and best practices.
Focus Keywords: MLOps, DevOps for machine learning, production ML, model deployment, ML pipeline, feature store, MLflow, Kubeflow, KServe
Related: Scaling Machine Learning: How We Unified Feature Management, MLOps, and Model Go.
Related: Mastering Machine Learning Workflows: Lessons from my 2025 Notes.
Discover more from Susiloharjo
Subscribe to get the latest posts sent to your email.