CI/CD for ML

🎯 Maqsad

Bu bobni o’qib bo’lgach:

ML CI/CD ning klassik backend CI/CD’dan farqini bilasiz
Code testing, data testing, model testing’ni qila olasiz
Continuous Training (CT) pipeline qura olasiz
GitHub Actions, GitLab CI bilan ML deployment
CML (Continuous Machine Learning) tool’ni ishlatishni bilasiz

Nimani o’rganish kerak

CI vs CD vs CT(Continuous Training)
ML-specific testing — data, features, model
GitHub Actions for ML
GitLab CI/CD pipelines
CML (Continuous Machine Learning) — DVC team’ning toolu
Deployment strategies — blue-green, canary, shadow
Rollback mechanisms
Approval workflows — manual review oldidan production

ML CI/CD ning specialligi

Klassik DevOps CI/CD

Code change
  → Unit tests
  → Build Docker
  → Deploy

ML CI/CD

Code change          OR      Data change
  ↓                            ↓
  Unit tests              Data validation
  ↓                            ↓
  Train model             Retrain model
  ↓                            ↓
  Test model              Test model
  ↓                            ↓
  Deploy + Monitor        Deploy + Monitor

Uchta darajadagi testing

1. Code Tests (klassik)

def test_preprocess_function():
    assert preprocess("Hello") == "hello"

def test_feature_engineering():
    df = pd.DataFrame({"price": [100, 200]})
    result = add_features(df)
    assert "price_log" in result.columns

2. Data Tests

def test_data_schema():
    df = pd.read_csv("data/train.csv")
    assert df.shape[1] == 20
    assert df["age"].dtype == "int64"
    assert df["age"].min() >= 0

def test_data_quality():
    df = pd.read_csv("data/train.csv")
    assert df.isna().sum().sum() / len(df) < 0.05  # <5% missing
    assert df["target"].value_counts(normalize=True).max() < 0.95  # not too imbalanced

3. Model Tests

def test_model_performance():
    """Yangi model baseline'dan yaxshi bo'lsin."""
    model = train_model(X_train, y_train)
    accuracy = evaluate(model, X_test, y_test)
    assert accuracy > BASELINE_ACCURACY  # 0.85

def test_model_invariance():
    """Aniq inputlarda model determinist bo'lishi kerak."""
    pred1 = model.predict(X_sample)
    pred2 = model.predict(X_sample)
    np.testing.assert_array_equal(pred1, pred2)

def test_model_perturbation():
    """Kichik input o'zgarishi → kichik output o'zgarishi."""
    pred_original = model.predict(X_sample)
    pred_perturbed = model.predict(X_sample + np.random.normal(0, 0.01, X_sample.shape))
    diff = np.abs(pred_original - pred_perturbed).mean()
    assert diff < 0.1  # ish

def test_model_bias():
    """Modelda fairness — turli demografik guruhlar uchun"""
    male_acc = evaluate(model, X[gender == "M"], y[gender == "M"])
    female_acc = evaluate(model, X[gender == "F"], y[gender == "F"])
    assert abs(male_acc - female_acc) < 0.05  # 5% farqdan kam

Kod misollari

GitHub Actions — to’liq ML pipeline

# .github/workflows/ml-pipeline.yml
name: ML Pipeline

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

env:
  PYTHON_VERSION: "3.11"

jobs:
  # 1. Code quality
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ env.PYTHON_VERSION }}
      - run: pip install ruff mypy
      - run: ruff check src/ tests/
      - run: mypy src/
  
  # 2. Unit tests
  test:
    runs-on: ubuntu-latest
    needs: lint
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ env.PYTHON_VERSION }}
      - run: pip install -r requirements.txt -r requirements-dev.txt
      - run: pytest tests/ -v --cov=src --cov-report=xml
      - uses: codecov/codecov-action@v3
  
  # 3. Data + Model tests (data needed)
  ml-tests:
    runs-on: ubuntu-latest
    needs: test
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ env.PYTHON_VERSION }}
      - run: pip install -r requirements.txt
      
      - name: Pull data via DVC
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          pip install dvc[s3]
          dvc pull
      
      - run: pytest tests/data/ tests/model/ -v
  
  # 4. Build Docker
  build:
    runs-on: ubuntu-latest
    needs: ml-tests
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      
      - name: Login to Docker registry
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKER_USER }}
          password: ${{ secrets.DOCKER_PASSWORD }}
      
      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: |
            myregistry/ml-api:${{ github.sha }}
            myregistry/ml-api:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max
  
  # 5. Deploy to staging
  deploy-staging:
    runs-on: ubuntu-latest
    needs: build
    environment: staging
    steps:
      - uses: actions/checkout@v4
      - uses: azure/setup-kubectl@v3
      
      - name: Configure kubectl
        run: echo "${{ secrets.KUBE_CONFIG_STAGING }}" | base64 -d > ~/.kube/config
      
      - name: Deploy
        run: |
          kubectl set image deployment/ml-api api=myregistry/ml-api:${{ github.sha }} -n staging
          kubectl rollout status deployment/ml-api -n staging --timeout=5m
  
  # 6. Integration tests on staging
  integration-tests:
    runs-on: ubuntu-latest
    needs: deploy-staging
    steps:
      - uses: actions/checkout@v4
      - run: pip install pytest httpx
      - name: Run integration tests
        env:
          API_URL: https://ml-api-staging.example.com
        run: pytest tests/integration/ -v
  
  # 7. Deploy to production (manual approval)
  deploy-production:
    runs-on: ubuntu-latest
    needs: integration-tests
    environment: production  # GitHub'da manual approval set qilish
    steps:
      - uses: actions/checkout@v4
      - uses: azure/setup-kubectl@v3
      
      - name: Configure kubectl
        run: echo "${{ secrets.KUBE_CONFIG_PROD }}" | base64 -d > ~/.kube/config
      
      - name: Canary deploy (10% traffic)
        run: |
          kubectl set image deployment/ml-api-canary api=myregistry/ml-api:${{ github.sha }} -n production
          # Monitoring 10 daqiqa
          sleep 600
      
      - name: Full rollout
        run: |
          kubectl set image deployment/ml-api api=myregistry/ml-api:${{ github.sha }} -n production
          kubectl rollout status deployment/ml-api -n production --timeout=10m

CML — Continuous ML

# .github/workflows/cml.yml
name: CML Report

on: [pull_request]

jobs:
  train-and-report:
    runs-on: ubuntu-latest
    container: ghcr.io/iterative/cml:0-dvc2-base1
    steps:
      - uses: actions/checkout@v4
      
      - name: Train model
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          pip install -r requirements.txt
          dvc pull
          dvc repro
      
      - name: Create CML report
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          # Metrics comparison
          echo "## Model Metrics" >> report.md
          echo "" >> report.md
          dvc metrics diff main >> report.md
          
          # Plots
          dvc plots diff main --show-vega target > plot.json
          cml publish plot.json --md >> report.md
          
          # Post comment to PR
          cml comment create report.md

PR’ga avtomatik ko’rinadi:

## Model Metrics

| Metric    | Old    | New    | Change |
|-----------|--------|--------|--------|
| accuracy  | 0.85   | 0.89   | +0.04  |
| f1        | 0.82   | 0.87   | +0.05  |

[Confusion Matrix Plot]

Deployment strategies

1. Blue-Green deployment

# blue (current production) ishlamoqda
# green (new version) tayyorlanadi
# Switch — load balancer routing'ni o'zgartirish

apiVersion: v1
kind: Service
metadata:
  name: ml-api
spec:
  selector:
    app: ml-api
    color: blue   # green'ga o'zgartirsangiz — instant switch

2. Canary deployment

# v1 — 90% traffic
# v2 — 10% traffic
# Sekin-asta v2 traffic'ni 100%'ga oshirish

# Istio yoki Nginx ingress bilan
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10"

3. Shadow deployment

# Production prediction qaytariladi
# Lekin yangi model ham ishlaydi (response'siz)
# Comparison logged

@app.post("/predict")
async def predict(features: Features, background: BackgroundTasks):
    prod_pred = production_model.predict(features)
    
    # Shadow (async, foydalanuvchiga ko'rinmaydi)
    background.add_task(shadow_predict, features, prod_pred)
    
    return {"prediction": prod_pred}

Continuous Training (CT)

# scheduled retrain.py (Airflow yoki cron)
def continuous_training_pipeline():
    # 1. Check drift
    drift_score = check_drift(reference_data, recent_production_data)
    
    # 2. Decide: retrain kerakmi?
    if drift_score < 0.1 and current_accuracy > 0.85:
        log.info("No retraining needed")
        return
    
    # 3. Trigger retraining
    log.info("Drift detected, starting retraining")
    
    # 4. DVC + MLflow pipeline
    subprocess.run(["dvc", "repro"], check=True)
    
    # 5. Validate new model
    new_metrics = load_latest_metrics()
    old_metrics = load_production_metrics()
    
    if new_metrics["accuracy"] < old_metrics["accuracy"]:
        log.warning("New model worse than current. Skipping deployment.")
        return
    
    # 6. Register in MLflow
    register_model_in_mlflow()
    
    # 7. Trigger CI/CD
    subprocess.run(["gh", "workflow", "run", "deploy.yml"], check=True)

Testing patterns

# tests/test_model.py
import pytest
import joblib
import numpy as np

@pytest.fixture(scope="module")
def model():
    return joblib.load("models/model.pkl")

def test_model_accuracy(model):
    """Production threshold check."""
    X_test, y_test = load_test_data()
    accuracy = model.score(X_test, y_test)
    assert accuracy >= 0.85, f"Accuracy {accuracy} below threshold 0.85"

def test_model_latency(model, benchmark):
    """Pytest-benchmark."""
    X = np.random.randn(1, 10)
    result = benchmark(model.predict, X)
    # Auto-fails if too slow

def test_model_handles_missing(model):
    """Edge case — missing values."""
    X = np.array([[np.nan, 1.0, 2.0]])
    pred = model.predict(X)
    assert not np.isnan(pred[0])

def test_model_handles_extreme_values(model):
    """Edge case — extreme inputs."""
    X = np.array([[1e9, -1e9, 0]])
    pred = model.predict(X)
    assert pred[0] in [0, 1]  # valid output

@pytest.mark.parametrize("noise", [0.01, 0.05, 0.1])
def test_model_robustness_to_noise(model, noise):
    """Kichik noise → kichik output o'zgarishi."""
    X_original = np.random.randn(100, 10)
    pred_original = model.predict(X_original)
    
    X_noisy = X_original + np.random.normal(0, noise, X_original.shape)
    pred_noisy = model.predict(X_noisy)
    
    diff = (pred_original != pred_noisy).mean()
    assert diff < noise * 5  # noise'ga proportsional o'zgarish

Backend integratsiyasi

Pre-deployment validation gate

# .github/workflows/validate-model.yml
name: Validate New Model

on:
  workflow_call:
    inputs:
      model_version:
        required: true
        type: string

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - name: Load model from MLflow
        run: |
          python scripts/load_model.py --version ${{ inputs.model_version }}
      
      - name: Run validation tests
        run: |
          pytest tests/model_validation/ -v --model-version=${{ inputs.model_version }}
      
      - name: Check business metrics
        run: |
          python scripts/business_validation.py
          # Bu script'da: false positive rate, revenue impact, h.k.
      
      - name: Compare with production
        run: |
          python scripts/compare_models.py \
            --new-version ${{ inputs.model_version }} \
            --prod-version $(python scripts/get_prod_version.py)

Rollback workflow

# .github/workflows/rollback.yml
name: Emergency Rollback

on:
  workflow_dispatch:
    inputs:
      target_version:
        description: "Version to rollback to"
        required: true

jobs:
  rollback:
    runs-on: ubuntu-latest
    steps:
      - uses: azure/setup-kubectl@v3
      
      - name: Configure kubectl
        run: echo "${{ secrets.KUBE_CONFIG_PROD }}" | base64 -d > ~/.kube/config
      
      - name: Rollback deployment
        run: |
          kubectl set image deployment/ml-api api=myregistry/ml-api:${{ inputs.target_version }} -n production
          kubectl rollout status deployment/ml-api -n production
      
      - name: Notify
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "🚨 Production rollback to ${{ inputs.target_version }}"
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

Resurslar

GitHub Actions docs — docs.github.com/en/actions
CML (Continuous ML) — cml.dev
“Continuous Delivery for Machine Learning” — Martin Fowler
“ML Test Score” — Google paper (testing rubric)
Great Expectations — data testing framework
Pytest docs — testing best practices

🏋️ Mashqlar

🟢 Easy

GitHub Actions’da pytest run qiluvchi pipeline.
Code quality (ruff, mypy) checks.
Docker build action.

🟡 Medium

Full ML pipeline: lint → test → train → docker → deploy (staging).
CML report: PR’ga avtomatik metrics comparison.
Model validation: accuracy, latency, robustness tests.

🔴 Hard

Production CI/CD: blue-green yoki canary deployment (real cloud).
Continuous Training: drift detection → auto-retrain → auto-deploy (with approval).
Multi-environment: dev/staging/prod, har biriga alohida config.

Capstone

.github/workflows/:

To’liq ML CI/CD pipeline
Code → data → model tests
Build → deploy → integration tests
Production deployment manual approval bilan

✅ Tekshirish ro’yxati

CI/CD ML uchun specific tomonlarini bilaman
Code, data, model testing
GitHub Actions ML pipeline yozaman
CML bilan PR reports
Deployment strategies (blue-green, canary, shadow)
Continuous Training pipeline
Rollback mechanism

Airflow va Prefect ga o’tamiz — oxirgi bobga.

Keyboard shortcuts

Backend to ML: 6 Oylik Roadmap