MLOps Guide 2025: Bringing DevOps Discipline To Machine Learning Susiloharjo

Introduction

Machine Learning has moved from research notebooks to mission‑critical services such as recommendation engines, fraud detectors, and predictive maintenance. Yet many teams still struggle to move models from experiment to production reliably. MLOps—the practice of applying DevOps rigor to the entire ML lifecycle—solves this problem by automating, versioning, and monitoring every step.

This guide gives developers, data scientists, and platform engineers a practical, reproducible blueprint for building production‑grade ML systems.

The End‑to‑End MLOps Lifecycle

Stage	Primary Goal	Typical Artifacts
1️⃣ Data collection & ingestion	Gather raw signals, ensure lineage	Raw files, streaming topics, data catalog entries
2️⃣ Feature engineering & storage	Transform raw data into model‑ready features	Feature definitions, feature store tables
3️⃣ Model training & experimentation	Iterate quickly, track performance	Code, hyper‑parameters, metrics, artifacts
4️⃣ Validation & testing	Verify functional & non‑functional requirements	Unit tests, bias checks, performance thresholds
5️⃣ Model packaging & registration	Freeze a reproducible version	Docker image, model binary, registry entry
6️⃣ Deployment (online/offline)	Serve predictions at scale	REST/GRPC endpoint, batch jobs, edge firmware
7️⃣ Monitoring & observability	Detect drift, latency spikes, errors	Metrics dashboards, alert rules
8️⃣ Automated retraining & rollback	Keep the model fresh and safe	Retraining triggers, CI pipelines, version rollbacks

Key Insight: Every stage must be automated, versioned, and observable. Missing a link creates hidden technical debt that shows up as production incidents.

Core MLOps Components & Recommended Open‑Source Tools

Component	What It Solves	Popular Open‑Source Choices	Quick Pros / Cons
Data Management & Feature Store	Centralize raw data, enforce schema, enable feature reuse	Feast (cloud‑agnostic) Hopsworks Feature Store Databricks Feature Store (proprietary)	Feast is lightweight and integrates with Spark/Flink; Hopsworks adds UI and governance but is heavier.
Experiment Tracking & Model Registry	Log metrics, compare runs, store model binaries	MLflow Tracking + Registry Weights & Biases (SaaS) Neptune.ai	MLflow is open source, easy to self‑host, multi‑language support.
Pipeline Orchestration	Define reproducible DAGs for training, validation, deployment	Kubeflow Pipelines Apache Airflow (with ML plugins) Dagster	Kubeflow is native to Kubernetes and GPU‑friendly; Airflow is mature but less ML‑centric.
Model Serving	Low‑latency inference, scaling, A/B testing	TensorFlow Serving Seldon Core KServe (formerly KFServing)	KServe offers serverless on K8s and supports many frameworks.
Monitoring & Drift Detection	Track latency, accuracy, data drift, resource usage	Prometheus + Grafana Evidently AI (drift & bias) WhyLabs (SaaS)	Evidently provides quick open‑source dashboards.
CI/CD for ML	Automate build‑test‑deploy cycles for models	GitHub Actions + MLflow GitLab CI with DVC Argo Workflows (K8s native)	Argo shines in Kubernetes‑centric environments.
Security & Governance	Enforce access controls, audit trails, data privacy	Open Policy Agent (OPA) HashiCorp Vault MLflow + cloud IAM integrations	OPA enables policy‑as‑code across K8s, CI, and serving layers.

Tip: Start with one tool per component that integrates well with your existing stack. You can replace or add tools later without re‑architecting the whole pipeline.

Hands‑On Minimal Production‑Ready Pipeline

We’ll build a toy loan‑default predictor using the public UCI Credit Card dataset. The stack includes Spark for ingestion, Feast as a feature store, MLflow for tracking, Kubeflow Pipelines for orchestration, KServe for serving, and Prometheus + Evidently for monitoring.

Prerequisites (Ubuntu 22.04, Python 3.11)

# System packages
sudo apt-get update && sudo apt-get install -y docker.io kind kubectl

# Python environment
python -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install \
    pandas scikit-learn \
    mlflow feast[sqlite] \
    kfp==2.5.0 \
    kserve==0.11.0 \
    prometheus-client \
    evidently==0.4.2

In production you’d use a managed K8s service (EKS, GKE, AKS) and a remote Feast backend such as Redis or BigQuery.

Step‑by‑Step Code Walkthrough

1️⃣ Ingest & Register Features with Feast

# feature_repo/feature_store.py
import pandas as pd
from feast import FeatureStore, Entity, FeatureView, FileSource, ValueType

# Define a source (CSV on local disk)
source = FileSource(
    path="data/creditcard.csv",
    event_timestamp_column="Timestamp",
    created_timestamp_column="CreatedAt",
)

# Define an entity – the primary key
customer = Entity(name="customer_id", join_keys=["CustomerID"], value_type=ValueType.INT64)

# Define a FeatureView (the feature set)
features = FeatureView(
    name="customer_features",
    entities=[customer],
    ttl=86400,  # 1 day cache
    schema=[
        ("Age", ValueType.INT64),
        ("Income", ValueType.FLOAT),
        ("CreditScore", ValueType.FLOAT),
        ("NumOpenAccounts", ValueType.INT64),
    ],
    source=source,
    online=True,
)

# Materialize to the online store (SQLite in this demo)
if __name__ == "__main__":
    store = FeatureStore(repo_path=".")
    store.apply([customer, features])
    store.materialize_incremental(end_date=pd.Timestamp.utcnow())

Run python feature_repo/feature_store.py to create a local SQLite online_store.db that will serve low‑latency feature lookups during inference.

2️⃣ Experiment Tracking with MLflow

# training/train.py
import mlflow
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score

mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("loan-default-prediction")

def main():
    df = pd.read_csv("data/creditcard.csv")
    X = df[["Age", "Income", "CreditScore", "NumOpenAccounts"]]
    y = df["Default"]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    with mlflow.start_run():
        mlflow.log_params({"n_estimators": 200, "learning_rate": 0.05, "max_depth": 3})
        model = GradientBoostingClassifier(
            n_estimators=200, learning_rate=0.05, max_depth=3, random_state=42
        )
        model.fit(X_train, y_train)
        preds = model.predict_proba(X_test)[:, 1]
        auc = roc_auc_score(y_test, preds)
        mlflow.log_metric("test_auc", auc)
        mlflow.sklearn.log_model(model, artifact_path="model")
        print(f"Run completed – Test AUC: {auc:.4f}")

if __name__ == "__main__":
    main()

Start the MLflow server in another terminal:

mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns \
    --host 0.0.0.0 --port 5000

After the script finishes, open http://localhost:5000 to compare runs, then register the best model:

from mlflow.tracking import MlflowClient
client = MlflowClient()
run_id = "YOUR_RUN_ID"
client.create_registered_model("loan-default-model")
client.create_model_version(
    name="loan-default-model",
    source=f"runs:/{run_id}/model",
    run_id=run_id,
)

3️⃣ Orchestrate with Kubeflow Pipelines

# pipeline/pipeline.py
import kfp
from kfp import dsl
from kfp.dsl import ContainerOp

def data_preprocess_op():
    return ContainerOp(
        name="Preprocess",
        image="python:3.11-slim",
        command=["python", "-c"],
        arguments=["""
import pandas as pd
df = pd.read_csv('/data/creditcard.csv')
df.to_parquet('/output/features.parquet')
"""],
        file_outputs={"features": "/output/features.parquet"},
        pvolumes={"/data": kfp.dsl.PipelineVolume(pvc="data-pvc"),
                  "/output": kfp.dsl.PipelineVolume(pvc="output-pvc")},
    )

def train_op(features_path: str):
    return ContainerOp(
        name="Train",
        image="python:3.11-slim",
        command=["python", "-m", "training.train"],
        arguments=["--features", features_path],
        pvolumes={"/data": kfp.dsl.PipelineVolume(pvc="output-pvc")},
    )

@dsl.pipeline(
    name="Loan Default Prediction",
    description="From raw CSV → feature store → model registry → KServe endpoint",
)
def loan_default_pipeline():
    preprocess = data_preprocess_op()
    train = train_op(preprocess.outputs["features"])

if __name__ == "__main__":
    kfp.Client(host="http://localhost:8080").create_run_from_pipeline_func(
        loan_default_pipeline, arguments={}
    )

Deploy the pipeline via the Kubeflow UI (port‑forward kubectl port-forward svc/ml-pipeline-ui 8080:80 and open http://localhost:8080).

4️⃣ Serve the Model with KServe

# kserve/loan-default.yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: loan-default
spec:
  predictor:
    sklearn:
      storageUri: "gs://my-bucket/mlflow/loan-default-model/1/model"
      resources:
        requests:
          cpu: "500m"
          memory: "1Gi"

Apply the manifest: kubectl apply -f kserve/loan-default.yaml. Test the endpoint:

curl -X POST http:///v1/models/loan-default:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [[45, 72000, 680, 3]]}'

The response contains the default probability.

5️⃣ Monitoring, Drift Detection, and Automated Retraining

Prometheus Exporter: Add a sidecar to the KServe pod that emits request latency and error counters.

Evidently Drift Dashboard: Nightly job compares live feature distribution against the training baseline.

# monitoring/drift_check.py
import pandas as pd
from evidently.dashboard import Dashboard
from evidently.tabs import DataDriftTab

baseline = pd.read_parquet("gs://my-bucket/training_features.parquet")
current = pd.read_parquet("gs://my-bucket/live_features.parquet")

dashboard = Dashboard(tabs=[DataDriftTab()])
dashboard.calculate(baseline, current)
dashboard.save("drift_report.html")

Retraining Trigger: An Alertmanager rule fires when evidently_data_drift_score > 0.2 for three consecutive checks, invoking a webhook that restarts the Kubeflow pipeline.

# alertmanager/rules.yml
groups:
- name: mlops-retrain
  rules:
  - alert: DataDriftDetected
    expr: evident_data_drift_score > 0.2
    for: 15m
    labels:
      severity: critical
    annotations:
      summary: "Data drift exceeds threshold"
      runbook_url: "https://github.com/yourorg/mlops-runbooks#retrain"

Tool Comparison Matrix

Dimension	MLflow	Kubeflow	Airflow	Dagster
Primary Focus	Experiment tracking & model registry	End‑to‑end ML pipelines on K8s	General workflow orchestration	Data‑centric pipelines with strong typing
Kubernetes Native?	No (but container‑friendly)	Yes (full K8s integration)	No	Yes (via Dagster‑K8s)
UI for Run Comparison	✅ Rich UI	✅ Kubeflow Pipelines UI	❌ Requires plugins	✅ Dagit
Built‑in Feature Store	❌	✅ (Feast integration)	❌	❌
CI/CD Friendly	✅ CLI + REST	✅ Pipeline as code	✅ DAG as code	✅
Learning Curve	Low	Moderate‑High	Moderate	Moderate
Best For	Small‑to‑medium teams needing quick tracking	Large GPU‑heavy workloads on K8s	Legacy ETL + occasional ML	Teams that love typed pipelines and testing

Recommendation: Begin with MLflow + Airflow if you don’t have Kubernetes. When you adopt K8s, migrate training pipelines to Kubeflow while keeping MLflow as the model registry.

Common MLOps Pitfalls & Mitigations

Pitfall	Why It Happens	Mitigation
Data drift goes unnoticed	No systematic monitoring; reliance on ad‑hoc checks	Deploy automated drift dashboards (Evidently, WhyLabs) and set alert thresholds.
Model version confusion	Manual copy‑pasting of model files, ambiguous naming	Enforce a single source of truth—the model registry (MLflow, Sagemaker).
Environment mismatch (dev vs prod)	Different library versions, hardware (CPU vs GPU)	Package every stage in Docker images and version‑control the Dockerfiles.
Unreliable CI pipelines	Flaky tests, hidden state (random seeds)	Pin random seeds, enable deterministic training, run smoke tests after each deployment.
Security gaps	Exposed model artifacts, unsecured data pipelines	Apply IAM policies, encrypt data at rest, use OPA for policy enforcement on K8s resources.
Cost runaway	Unlimited auto‑scaling of GPU pods	Set resource quotas, use spot instances for batch training, monitor cloud cost dashboards.

Career & Business Impact of MLOps

Aspect	Impact
Speed to Market	Automated CI/CD shrinks model rollout from weeks to days.
Reliability	Proven rollback mechanisms cut downtime after a bad release.
Governance	Auditable pipelines satisfy GDPR, HIPAA, and other regulations.
Team Collaboration	Clear hand‑offs (data → model → ops) improve morale and reduce ownership friction.
Cost Efficiency	Resource‑aware pipelines avoid over‑provisioned GPUs; auto‑retraining runs only when needed.
Career Growth	MLOps expertise bridges data science and platform engineering—highly marketable in 2024‑2025.

TL;DR – Actionable Checklist

✅ Define a feature store (Feast, Hopsworks) and register all raw‑to‑feature transformations.
✅ Install MLflow (or your preferred tracker) as the single source of truth for models.
✅ Choose an orchestrator (Kubeflow for K8s, Airflow otherwise) and codify the full lifecycle as a DAG/pipeline.
✅ Containerize training and inference code; store images in a private registry.
✅ Deploy with KServe (or Seldon) and expose Prometheus metrics.
✅ Add drift & bias monitoring (Evidently) and set up alerting for automated retraining.
✅ Document runbooks for rollback, data‑quality checks, and compliance audits.

Conclusion

MLOps is not a single tool—it’s a discipline that stitches together data engineering, model development, DevOps automation, and continuous monitoring. By adopting the practices and stack outlined above, teams can ship models faster, guarantee reproducibility, and detect problems early.

Start small, iterate, and let the feedback loop between data, models, and operations drive continuous improvement. Your future self—and your stakeholders—will thank you.

Meta Description: Learn how to bring DevOps discipline to machine learning with a 2025 MLOps guide, covering lifecycle, tools, hands‑on pipeline, pitfalls, and best practices.

Focus Keywords: MLOps, DevOps for machine learning, production ML, model deployment, ML pipeline, feature store, MLflow, Kubeflow, KServe

Discover more from Susiloharjo

Subscribe to get the latest posts sent to your email.

Introduction

The End‑to‑End MLOps Lifecycle

Core MLOps Components & Recommended Open‑Source Tools

Hands‑On Minimal Production‑Ready Pipeline

Prerequisites (Ubuntu 22.04, Python 3.11)

Step‑by‑Step Code Walkthrough

1️⃣ Ingest & Register Features with Feast

2️⃣ Experiment Tracking with MLflow

3️⃣ Orchestrate with Kubeflow Pipelines

4️⃣ Serve the Model with KServe

5️⃣ Monitoring, Drift Detection, and Automated Retraining

Tool Comparison Matrix

Common MLOps Pitfalls & Mitigations

Career & Business Impact of MLOps

TL;DR – Actionable Checklist

Conclusion

Discover more from Susiloharjo

Discover more from Susiloharjo

Prerequisites (Ubuntu 22.04, Python 3.11)