Computer Vision ga kirish
🎯 Maqsad
Bu bobni o'qib bo'lgach:
- Computer Vision masalalarining 5 ta asosiy turini bilasiz
- Har masala uchun mos pretrained model'larni tanlay olasiz
- Rasm/video bilan ishlash uchun zarur tushunchalarni bilasiz
- Domain'ga mos CV pipeline qurishni rejalashtira olasiz
Nimani o'rganish kerak
- CV masala turlari — classification, detection, segmentation, OCR, pose, generation
- Image fundamentals — pixel, channels, color spaces (RGB, BGR, HSV, Grayscale)
- Image formats — JPEG, PNG, WebP, TIFF
- CV bo'yicha pretrained ekosistema — torchvision, timm, MMDetection, Detectron2, Ultralytics
- Edge cases — rotation, occlusion, lighting, scale
CV masalalarining 5 ta asosiy turi
1. Image Classification
- Bitta rasm → bitta label (yoki bir nechta label, multi-label)
- Model: ResNet, EfficientNet, ViT, ConvNeXt
- Misol: spam image, kasallik turi, mahsulot kategoriyasi
2. Object Detection
- Bitta rasm → bir nechta bounding box + label + confidence
- Model: YOLO, Faster R-CNN, DETR
- Misol: avtomobillarni hisoblash, xavfsizlik tahdidlari
3. Semantic / Instance / Panoptic Segmentation
- Pixel darajasida classification
- Model: U-Net, Mask R-CNN, SAM (Segment Anything Model)
- Misol: medical imaging, satellite analysis
4. OCR (Optical Character Recognition)
- Rasm → matn
- Model: Tesseract, EasyOCR, PaddleOCR, TrOCR
- Misol: ID kartlar, hujjatlar, receipts
5. Pose / Keypoint Estimation
- Inson tanasi yoki obyekt nuqtalarini topish
- Model: MediaPipe, OpenPose, MMPose
- Misol: sport analytics, AR filtrlar
6. Generative (bonus)
- Rasm yaratish/o'zgartirish
- Model: Stable Diffusion, DALL-E, ControlNet
- Misol: marketing assets, design tools
Image fundamentals
Pixel va Channels
RGB rasm (3 channel):
shape = (height, width, 3)
har pixel: [R, G, B] qiymatlari, har biri [0..255] (uint8) yoki [0..1] (float)
Grayscale (1 channel):
shape = (height, width)
har pixel: [0..255] (yorqinlik darajasi)
OpenCV o'qiganda BGR (not RGB)!
PIL/torchvision RGB ishlatadi
Color spaces
| Space | Channels | Qachon |
|---|---|---|
| RGB | Red, Green, Blue | Default display |
| BGR | Blue, Green, Red | OpenCV default |
| Grayscale | Yorqinlik | Edge detection, classification (kichik) |
| HSV | Hue, Saturation, Value | Color-based filtering |
| YCrCb | Luminance, Chroma | Video compression |
| LAB | Lightness, A, B | Color-aware processing |
Image formats — qachon qaysi?
| Format | Lossy? | Transparency | Use case |
|---|---|---|---|
| JPEG | Yes | No | Photos, web (kichik) |
| PNG | No | Yes | Logos, screenshots |
| WebP | Both | Yes | Web (modern, kichik) |
| TIFF | No (yoki Yes) | Yes | Print, scientific |
| HEIC | Yes | Yes | iPhone |
| NPY | No | N/A | ML pipeline (raw arrays) |
Asosiy kutubxonalar
pip install opencv-python pillow numpy matplotlib
pip install torch torchvision timm
pip install ultralytics # YOLO
pip install easyocr paddleocr # OCR
pip install mediapipe # Pose, hand tracking
pip install albumentations # Augmentation
Kod misollari
Image yuklash va inspectsiya
import cv2
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
# OpenCV (BGR)
img_cv = cv2.imread("photo.jpg")
print(img_cv.shape) # (H, W, 3)
print(img_cv.dtype) # uint8
img_rgb = cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB)
# PIL (RGB)
img_pil = Image.open("photo.jpg")
print(img_pil.size) # (W, H) — diqqat: tartib boshqacha!
# matplotlib (RGB kutadi)
plt.imshow(img_rgb)
plt.axis("off")
plt.show()
Rasm bilan asosiy operatsiyalar
# Resize
resized = cv2.resize(img_cv, (224, 224))
# Crop
cropped = img_cv[100:400, 200:500] # [y1:y2, x1:x2]
# Rotation
h, w = img_cv.shape[:2]
M = cv2.getRotationMatrix2D((w/2, h/2), angle=45, scale=1.0)
rotated = cv2.warpAffine(img_cv, M, (w, h))
# Flip
flipped = cv2.flip(img_cv, 1) # 1=horizontal, 0=vertical, -1=both
# Color conversion
gray = cv2.cvtColor(img_cv, cv2.COLOR_BGR2GRAY)
hsv = cv2.cvtColor(img_cv, cv2.COLOR_BGR2HSV)
CV pipeline tanlash — decision tree
Sizning masalangiz?
│
├── "Bu rasm nima?"
│ → Image Classification (ResNet/EfficientNet/ViT)
│
├── "Rasmda qaerda nima bor?"
│ → Object Detection (YOLO, Faster R-CNN)
│
├── "Har pixel qaysi obyektga tegishli?"
│ → Segmentation (U-Net, SAM)
│
├── "Bu rasmda qanday matn yozilgan?"
│ → OCR (Tesseract, EasyOCR, PaddleOCR)
│
├── "Insondan keypoint'larni topish"
│ → Pose Estimation (MediaPipe, OpenPose)
│
└── "Rasm yaratish/o'zgartirish"
→ Generative (Stable Diffusion)
Backend integratsiyasi — umumiy patternlar
1. Image upload endpoint
from fastapi import FastAPI, UploadFile
from PIL import Image
import io
app = FastAPI()
@app.post("/process-image")
async def process_image(file: UploadFile):
# Validation
if not file.content_type.startswith("image/"):
return {"error": "Not an image"}
# Read
contents = await file.read()
img = Image.open(io.BytesIO(contents)).convert("RGB")
# Validate size
if img.size[0] > 4000 or img.size[1] > 4000:
return {"error": "Image too large"}
# Process (CV pipeline)
# ...
return {"status": "ok", "size": img.size}
2. URL'dan rasm yuklash
import httpx
from PIL import Image
import io
@app.post("/process-url")
async def process_url(url: str):
async with httpx.AsyncClient(timeout=10) as client:
response = await client.get(url)
img = Image.open(io.BytesIO(response.content)).convert("RGB")
# ...
3. Stream/Video processing
import cv2
def process_video(video_path: str, output_path: str):
cap = cv2.VideoCapture(video_path)
fps = cap.get(cv2.CAP_PROP_FPS)
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Apply model
processed = some_model_inference(frame)
out.write(processed)
cap.release()
out.release()
4. Async processing (Celery)
@celery_app.task
def process_image_async(image_path: str):
img = cv2.imread(image_path)
# Heavy processing
result = run_yolo(img)
# Save result
output_path = image_path.replace(".jpg", "_processed.jpg")
cv2.imwrite(output_path, result)
return {"output": output_path}
@app.post("/process-async")
async def process_async(file: UploadFile):
# Save uploaded file
path = f"/tmp/{uuid.uuid4()}.jpg"
with open(path, "wb") as f:
f.write(await file.read())
# Queue task
task = process_image_async.delay(path)
return {"task_id": task.id}
Resurslar
- PyImageSearch — pyimagesearch.com — eng yaxshi CV blog
- OpenCV docs — docs.opencv.org
- CS231n(Stanford) — CV nazariyasi
- Roboflow — datasets va training (no-code)
- MMDetection / Detectron2 — production-grade detection frameworks
- HuggingFace Vision — pretrained vision models
🏋️ Mashqlar
🟢 Easy
- Rasm yuklang (OpenCV va PIL), shape va format'ni chiqaring.
- RGB → Grayscale, RGB → HSV ga aylantiring va vizualizatsiya qiling.
- Rasmni 224x224 ga resize qilib saqlang.
🟡 Medium
- Image gallery API: FastAPI'da rasm upload, thumbnail (200x200) yaratish, EXIF metadata olish.
- Color analysis: rasmdan dominant ranglarni K-Means bilan toping (Oy 2'dan).
- Pretrained classifier: torchvision modeli bilan rasm uchun top-5 prediction.
🔴 Hard
- CV Pipeline Service: FastAPI + Celery + Redis. Endpoint'lar:
- Upload image
- Resize / convert format
- Apply pretrained model (classification/detection)
- Webhook callback bilan async
- Real-time webcam: FastAPI WebSocket + browser webcam → server'da YOLO → bounding box JSON qaytarish.
Capstone
notebooks/month-04/01_cv_intro.ipynb:
- Custom dataset (200+ rasm) yuklang yoki Kaggle'dan oling
- 5 ta turli CV masalani bitta dataset uchun ishlatib chiqing:
- Classification (pretrained)
- Detection (YOLO)
- Segmentation (SAM)
- OCR (matn bor rasmlarda)
- Pose (insonlar bor rasmlarda)
✅ Tekshirish ro'yxati
- CV ning 5+ ta asosiy masalalarini bilaman
- Image formats va color spaces farqini bilaman
- OpenCV va PIL ning farqini bilaman
- Pretrained model qachon va qaysi birini tanlashni bilaman
- Async image processing pipeline yaratishni rejalashtira olaman
OpenCV bilan ishlash ga o'tamiz.