MLOps Guide 2025: Bringing DevOps Discipline to Machine Learning

MLOps Guide 2025: Bringing DevOps Discipline to Machine Learning
MLOps Guide 2025: Bringing DevOps Discipline to Machine Learning

Introduction

Machine Learning has moved from research notebooks to mission‑critical services such as recommendation engines, fraud detectors, and predictive maintenance. Yet many teams still struggle to move models from experiment to production reliably. MLOps—the practice of applying DevOps rigor to the entire ML lifecycle—solves this problem by automating, versioning, and monitoring every step.

This guide gives developers, data scientists, and platform engineers a practical, reproducible blueprint for building production‑grade ML systems.

The End‑to‑End MLOps Lifecycle

Stage Primary Goal Typical Artifacts
1️⃣ Data collection & ingestion Gather raw signals, ensure lineage Raw files, streaming topics, data catalog entries
2️⃣ Feature engineering & storage Transform raw data into model‑ready features Feature definitions, feature store tables
3️⃣ Model training & experimentation Iterate quickly, track performance Code, hyper‑parameters, metrics, artifacts
4️⃣ Validation & testing Verify functional & non‑functional requirements Unit tests, bias checks, performance thresholds
5️⃣ Model packaging & registration Freeze a reproducible version Docker image, model binary, registry entry
6️⃣ Deployment (online/offline) Serve predictions at scale REST/GRPC endpoint, batch jobs, edge firmware
7️⃣ Monitoring & observability Detect drift, latency spikes, errors Metrics dashboards, alert rules
8️⃣ Automated retraining & rollback Keep the model fresh and safe Retraining triggers, CI pipelines, version rollbacks

Key Insight: Every stage must be automated, versioned, and observable. Missing a link creates hidden technical debt that shows up as production incidents.

Core MLOps Components & Recommended Open‑Source Tools

Component What It Solves Popular Open‑Source Choices Quick Pros / Cons
Data Management & Feature Store Centralize raw data, enforce schema, enable feature reuse
  • Feast (cloud‑agnostic)
  • Hopsworks Feature Store
  • Databricks Feature Store (proprietary)
Feast is lightweight and integrates with Spark/Flink; Hopsworks adds UI and governance but is heavier.
Experiment Tracking & Model Registry Log metrics, compare runs, store model binaries
  • MLflow Tracking + Registry
  • Weights & Biases (SaaS)
  • Neptune.ai
MLflow is open source, easy to self‑host, multi‑language support.
Pipeline Orchestration Define reproducible DAGs for training, validation, deployment
  • Kubeflow Pipelines
  • Apache Airflow (with ML plugins)
  • Dagster
Kubeflow is native to Kubernetes and GPU‑friendly; Airflow is mature but less ML‑centric.
Model Serving Low‑latency inference, scaling, A/B testing
  • TensorFlow Serving
  • Seldon Core
  • KServe (formerly KFServing)
KServe offers serverless on K8s and supports many frameworks.
Monitoring & Drift Detection Track latency, accuracy, data drift, resource usage
  • Prometheus + Grafana
  • Evidently AI (drift & bias)
  • WhyLabs (SaaS)
Evidently provides quick open‑source dashboards.
CI/CD for ML Automate build‑test‑deploy cycles for models
  • GitHub Actions + MLflow
  • GitLab CI with DVC
  • Argo Workflows (K8s native)
Argo shines in Kubernetes‑centric environments.
Security & Governance Enforce access controls, audit trails, data privacy
  • Open Policy Agent (OPA)
  • HashiCorp Vault
  • MLflow + cloud IAM integrations
OPA enables policy‑as‑code across K8s, CI, and serving layers.

Tip: Start with one tool per component that integrates well with your existing stack. You can replace or add tools later without re‑architecting the whole pipeline.

Hands‑On Minimal Production‑Ready Pipeline

We’ll build a toy loan‑default predictor using the public UCI Credit Card dataset. The stack includes Spark for ingestion, Feast as a feature store, MLflow for tracking, Kubeflow Pipelines for orchestration, KServe for serving, and Prometheus + Evidently for monitoring.

Prerequisites (Ubuntu 22.04, Python 3.11)

# System packages
sudo apt-get update && sudo apt-get install -y docker.io kind kubectl

# Python environment
python -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install \
    pandas scikit-learn \
    mlflow feast[sqlite] \
    kfp==2.5.0 \
    kserve==0.11.0 \
    prometheus-client \
    evidently==0.4.2

In production you’d use a managed K8s service (EKS, GKE, AKS) and a remote Feast backend such as Redis or BigQuery.

Step‑by‑Step Code Walkthrough

1️⃣ Ingest & Register Features with Feast

# feature_repo/feature_store.py
import pandas as pd
from feast import FeatureStore, Entity, FeatureView, FileSource, ValueType

# Define a source (CSV on local disk)
source = FileSource(
    path="data/creditcard.csv",
    event_timestamp_column="Timestamp",
    created_timestamp_column="CreatedAt",
)

# Define an entity – the primary key
customer = Entity(name="customer_id", join_keys=["CustomerID"], value_type=ValueType.INT64)

# Define a FeatureView (the feature set)
features = FeatureView(
    name="customer_features",
    entities=[customer],
    ttl=86400,  # 1 day cache
    schema=[
        ("Age", ValueType.INT64),
        ("Income", ValueType.FLOAT),
        ("CreditScore", ValueType.FLOAT),
        ("NumOpenAccounts", ValueType.INT64),
    ],
    source=source,
    online=True,
)

# Materialize to the online store (SQLite in this demo)
if __name__ == "__main__":
    store = FeatureStore(repo_path=".")
    store.apply([customer, features])
    store.materialize_incremental(end_date=pd.Timestamp.utcnow())

Run python feature_repo/feature_store.py to create a local SQLite online_store.db that will serve low‑latency feature lookups during inference.

2️⃣ Experiment Tracking with MLflow

# training/train.py
import mlflow
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score

mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("loan-default-prediction")

def main():
    df = pd.read_csv("data/creditcard.csv")
    X = df[["Age", "Income", "CreditScore", "NumOpenAccounts"]]
    y = df["Default"]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    with mlflow.start_run():
        mlflow.log_params({"n_estimators": 200, "learning_rate": 0.05, "max_depth": 3})
        model = GradientBoostingClassifier(
            n_estimators=200, learning_rate=0.05, max_depth=3, random_state=42
        )
        model.fit(X_train, y_train)
        preds = model.predict_proba(X_test)[:, 1]
        auc = roc_auc_score(y_test, preds)
        mlflow.log_metric("test_auc", auc)
        mlflow.sklearn.log_model(model, artifact_path="model")
        print(f"Run completed – Test AUC: {auc:.4f}")

if __name__ == "__main__":
    main()

Start the MLflow server in another terminal:

mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns \
    --host 0.0.0.0 --port 5000

After the script finishes, open http://localhost:5000 to compare runs, then register the best model:

from mlflow.tracking import MlflowClient
client = MlflowClient()
run_id = "YOUR_RUN_ID"
client.create_registered_model("loan-default-model")
client.create_model_version(
    name="loan-default-model",
    source=f"runs:/{run_id}/model",
    run_id=run_id,
)

3️⃣ Orchestrate with Kubeflow Pipelines

# pipeline/pipeline.py
import kfp
from kfp import dsl
from kfp.dsl import ContainerOp

def data_preprocess_op():
    return ContainerOp(
        name="Preprocess",
        image="python:3.11-slim",
        command=["python", "-c"],
        arguments=["""
import pandas as pd
df = pd.read_csv('/data/creditcard.csv')
df.to_parquet('/output/features.parquet')
"""],
        file_outputs={"features": "/output/features.parquet"},
        pvolumes={"/data": kfp.dsl.PipelineVolume(pvc="data-pvc"),
                  "/output": kfp.dsl.PipelineVolume(pvc="output-pvc")},
    )

def train_op(features_path: str):
    return ContainerOp(
        name="Train",
        image="python:3.11-slim",
        command=["python", "-m", "training.train"],
        arguments=["--features", features_path],
        pvolumes={"/data": kfp.dsl.PipelineVolume(pvc="output-pvc")},
    )

@dsl.pipeline(
    name="Loan Default Prediction",
    description="From raw CSV → feature store → model registry → KServe endpoint",
)
def loan_default_pipeline():
    preprocess = data_preprocess_op()
    train = train_op(preprocess.outputs["features"])

if __name__ == "__main__":
    kfp.Client(host="http://localhost:8080").create_run_from_pipeline_func(
        loan_default_pipeline, arguments={}
    )

Deploy the pipeline via the Kubeflow UI (port‑forward kubectl port-forward svc/ml-pipeline-ui 8080:80 and open http://localhost:8080).

4️⃣ Serve the Model with KServe

# kserve/loan-default.yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: loan-default
spec:
  predictor:
    sklearn:
      storageUri: "gs://my-bucket/mlflow/loan-default-model/1/model"
      resources:
        requests:
          cpu: "500m"
          memory: "1Gi"

Apply the manifest: kubectl apply -f kserve/loan-default.yaml. Test the endpoint:

curl -X POST http:///v1/models/loan-default:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [[45, 72000, 680, 3]]}'

The response contains the default probability.

5️⃣ Monitoring, Drift Detection, and Automated Retraining

  1. Prometheus Exporter: Add a sidecar to the KServe pod that emits request latency and error counters.
  2. Evidently Drift Dashboard: Nightly job compares live feature distribution against the training baseline.
    # monitoring/drift_check.py
    import pandas as pd
    from evidently.dashboard import Dashboard
    from evidently.tabs import DataDriftTab
    
    baseline = pd.read_parquet("gs://my-bucket/training_features.parquet")
    current = pd.read_parquet("gs://my-bucket/live_features.parquet")
    
    dashboard = Dashboard(tabs=[DataDriftTab()])
    dashboard.calculate(baseline, current)
    dashboard.save("drift_report.html")
    
  3. Retraining Trigger: An Alertmanager rule fires when evidently_data_drift_score > 0.2 for three consecutive checks, invoking a webhook that restarts the Kubeflow pipeline.
# alertmanager/rules.yml
groups:
- name: mlops-retrain
  rules:
  - alert: DataDriftDetected
    expr: evident_data_drift_score > 0.2
    for: 15m
    labels:
      severity: critical
    annotations:
      summary: "Data drift exceeds threshold"
      runbook_url: "https://github.com/yourorg/mlops-runbooks#retrain"

Tool Comparison Matrix

Dimension MLflow Kubeflow Airflow Dagster
Primary Focus Experiment tracking & model registry End‑to‑end ML pipelines on K8s General workflow orchestration Data‑centric pipelines with strong typing
Kubernetes Native? No (but container‑friendly) Yes (full K8s integration) No Yes (via Dagster‑K8s)
UI for Run Comparison ✅ Rich UI ✅ Kubeflow Pipelines UI ❌ Requires plugins ✅ Dagit
Built‑in Feature Store ✅ (Feast integration)
CI/CD Friendly ✅ CLI + REST ✅ Pipeline as code ✅ DAG as code
Learning Curve Low Moderate‑High Moderate Moderate
Best For Small‑to‑medium teams needing quick tracking Large GPU‑heavy workloads on K8s Legacy ETL + occasional ML Teams that love typed pipelines and testing

Recommendation: Begin with MLflow + Airflow if you don’t have Kubernetes. When you adopt K8s, migrate training pipelines to Kubeflow while keeping MLflow as the model registry.

Common MLOps Pitfalls & Mitigations

Pitfall Why It Happens Mitigation
Data drift goes unnoticed No systematic monitoring; reliance on ad‑hoc checks Deploy automated drift dashboards (Evidently, WhyLabs) and set alert thresholds.
Model version confusion Manual copy‑pasting of model files, ambiguous naming Enforce a single source of truth—the model registry (MLflow, Sagemaker).
Environment mismatch (dev vs prod) Different library versions, hardware (CPU vs GPU) Package every stage in Docker images and version‑control the Dockerfiles.
Unreliable CI pipelines Flaky tests, hidden state (random seeds) Pin random seeds, enable deterministic training, run smoke tests after each deployment.
Security gaps Exposed model artifacts, unsecured data pipelines Apply IAM policies, encrypt data at rest, use OPA for policy enforcement on K8s resources.
Cost runaway Unlimited auto‑scaling of GPU pods Set resource quotas, use spot instances for batch training, monitor cloud cost dashboards.

Career & Business Impact of MLOps

Aspect Impact
Speed to Market Automated CI/CD shrinks model rollout from weeks to days.
Reliability Proven rollback mechanisms cut downtime after a bad release.
Governance Auditable pipelines satisfy GDPR, HIPAA, and other regulations.
Team Collaboration Clear hand‑offs (data → model → ops) improve morale and reduce ownership friction.
Cost Efficiency Resource‑aware pipelines avoid over‑provisioned GPUs; auto‑retraining runs only when needed.
Career Growth MLOps expertise bridges data science and platform engineering—highly marketable in 2024‑2025.

TL;DR – Actionable Checklist

  • ✅ Define a feature store (Feast, Hopsworks) and register all raw‑to‑feature transformations.
  • ✅ Install MLflow (or your preferred tracker) as the single source of truth for models.
  • ✅ Choose an orchestrator (Kubeflow for K8s, Airflow otherwise) and codify the full lifecycle as a DAG/pipeline.
  • ✅ Containerize training and inference code; store images in a private registry.
  • ✅ Deploy with KServe (or Seldon) and expose Prometheus metrics.
  • ✅ Add drift & bias monitoring (Evidently) and set up alerting for automated retraining.
  • ✅ Document runbooks for rollback, data‑quality checks, and compliance audits.

Conclusion

MLOps is not a single tool—it’s a discipline that stitches together data engineering, model development, DevOps automation, and continuous monitoring. By adopting the practices and stack outlined above, teams can ship models faster, guarantee reproducibility, and detect problems early.

Start small, iterate, and let the feedback loop between data, models, and operations drive continuous improvement. Your future self—and your stakeholders—will thank you.

Meta Description: Learn how to bring DevOps discipline to machine learning with a 2025 MLOps guide, covering lifecycle, tools, hands‑on pipeline, pitfalls, and best practices.

Focus Keywords: MLOps, DevOps for machine learning, production ML, model deployment, ML pipeline, feature store, MLflow, Kubeflow, KServe

Related: Scaling Machine Learning: How We Unified Feature Management, MLOps, and Model Go.

Related: Mastering Machine Learning Workflows: Lessons from my 2025 Notes.


Discover more from Susiloharjo

Subscribe to get the latest posts sent to your email.

Discover more from Susiloharjo

Subscribe now to keep reading and get access to the full archive.

Continue reading