Oy 2 — Mashqlar to'plami
🟢 Easy
Algoritmlar
load_iris(),load_wine(),load_breast_cancer()— har biri uchun 3 ta turli model train qiling va accuracy solishtiring.LogisticRegression,KNN,SVM,DecisionTree,RandomForest— barchasinicross_val_scorebilan baholang.- Feature scaling kerakmi yoki yo'qmi har model uchun aniqlang (Pipeline + StandardScaler bilan va siz solishtiring).
Metrics
- Confusion matrix'ni ko'lda hisoblang va
sklearn.metrics.confusion_matrixbilan tekshiring. - Precision, Recall, F1 ni formula bilan qo'lda hisoblang.
predictvapredict_probafarqini ko'rsating, threshold o'zgartirib accuracy ni o'zgartiring.
Pipeline
Pipeline([scaler, model])yarating vafit_predictqiling.ColumnTransformerbilan numerik va categorical ustunlarni alohida ishlang.- Pipeline'ni
joblib.dumpbilan saqlang va qaytadan yuklang.
🟡 Medium
Real datasets
- Titanic: Pipeline + Random Forest bilan 80%+ accuracy oling.
- House Prices: Lasso + Ridge solishtiring, R² 0.85+ oling.
- Telco Churn: imbalanced data bilan kurashing, F1 0.6+ oling.
- Wine Quality: regression vs classification yondashuvini solishtiring.
Feature Engineering
- NYC Taxi: datetime'dan 10+ feature yarating va RF accuracy yaxshilanishini ko'ring.
- Text feature engineering: bitta categorical ustunni
n-grambilan boyiting. - Polynomial features: degree=2 bilan eksperiment, overfitting'ni kuzating.
Hyperparameter Tuning
GridSearchCVbilan XGBoost 3 ta parametr — 100 trial vaqt necha?RandomizedSearchCVbilan bir xil narsa — vaqt va sifat farqi?Optunabilan 100 trial — eng yaxshi va eng tez!
Ensembles
- RF vs XGBoost vs LightGBM vs CatBoost — bir xil datasetda solishtiring (jadval).
- Voting Classifier (3 model) — har birining alohida natijasidan yaxshiroqmi?
- Stacking — base + meta yaratish.
🔴 Hard (Production)
1. Churn Prediction Service
To'liq talab:
- Django REST Framework yoki FastAPI
- PostgreSQL'da
customerjadval (50+ feature) /api/v1/predict/churn/{customer_id}— DB'dan feature olish + prediction/api/v1/predict/churn/batch— CSV upload + Celery background/api/v1/feedback— real natija qaytarish (model improvement uchun)/api/v1/metrics— Prometheus format- Docker + docker-compose
- GitHub Actions CI/CD
2. AutoML Service
Datasetni yuklab, avtomatik ravishda:
- EDA report (ydata-profiling)
- 5+ algoritm taqqoslash
- Best model'ni saqlash
- Prediction endpoint avtomatik tayyor
Inspirator: H2O AutoML, PyCaret.
3. A/B Testing Backend
- Ikki model serve qilish (
v1vav2) - Random traffic split (60/40 yoki configurable)
- Har prediction Postgres'ga log
- Statistik test bilan qaysi model yaxshi ekanini avtomatik aniqlash
- Slack notification: "Model v2 wins!"
4. Real-time Anomaly Detection
- Kafka consumer (transaction stream)
- IsolationForest yoki DBSCAN bilan online anomaly detection
- Anomaliyalarni alohida Kafka topic'ga jo'natish
- Grafana dashboard
Mini-loyihalar
Mini-loyiha 1: Spam Classifier
- SMS Spam dataset (UCI)
- TF-IDF + Logistic Regression / Naive Bayes
- FastAPI endpoint
- Streamlit UI
Mini-loyiha 2: Stock Price Direction
- yfinance bilan stock data
- Texnik indikatorlar (RSI, MACD) feature engineering
- Up/Down classification
- Backtesting
Mini-loyiha 3: Recommendation System (Collaborative Filtering)
- MovieLens dataset
- Surprise library
- User-based va item-based
- API:
/recommend/{user_id}
Mini-loyiha 4: Time Series Forecasting
- Prophet yoki ARIMA
- Daily sales bashorat
- 30 kunlik prediction
Quiz
ML Fundamentals
- Supervised va Unsupervised farqi?
- Bias-Variance tradeoff'ni misol bilan tushuntiring.
- Overfitting'ni qanday aniqlasiz?
- Cross-validation nima uchun kerak?
- Train/Val/Test bo'lishda nima uchun 3 ta?
Algorithms
- Logistic Regression nomidagi "regression" so'zi nima uchun? (Hint: log-odds)
- KNN'da
kparametri nimaga ta'sir qiladi? - Random Forest va Gradient Boosting farqi (parallel vs sequential)?
- XGBoost va LightGBM asosiy farqi?
- CatBoost'ning categorical handling'i nima sababdan yaxshiroq?
Metrics
- Imbalanced classification'da accuracy nima uchun yomon metric?
- ROC-AUC va PR-AUC qachon farq qiladi?
- F1 va F-beta orasidagi farq?
- Regression'da MAE va MSE qachon birini ishlatasiz?
- R² manfiy bo'lishi mumkinmi? Nima uchun?
Production
joblibvapicklefarqi?- ML modelni Docker'ga qanday joylaysiz?
- Model drift nima va qanday aniqlanadi? (preview, Oy 6)
- ONNX nima uchun foydali?
- A/B testing'da statistical significance nima?
✅ Oy 2 oxiri checklist
- Klassik ML algoritmlarining ko'pini ishlatib ko'rdim
- Scikit-learn Pipeline va ColumnTransformer ni egalladim
- XGBoost/LightGBM bilan ishladim (kamida 1 ta competition)
- Optuna bilan hyperparameter tuning qildim
- SHAP yoki Feature Importance bilan modelni interpret qildim
- FastAPI bilan ML model production'ga chiqarish
- Birinchi Kaggle submission qildim (top 30%)
- GitHub'ga capstone loyiha
- LinkedIn'ga post (loyiha + sertifikat)
Tabriklayman! Oy 3 — Deep Learning ga o'tamiz.