OpenAI va Anthropic API
🎯 Maqsad
Bu bobni o'qib bo'lgach:
- OpenAI va Anthropic API'lar bilan ishlashni bilasiz
- Streaming responses, function calling, vision API'larini ishlatasiz
- Prompt caching bilan xarajatlarni 90%'gacha kamaytirishni bilasiz
- Production'ga retry, rate limit, error handling qo'shasiz
Nimani o'rganish kerak
- OpenAI SDK — Python client
- Anthropic SDK — Python client
- Chat completions — asosiy API
- Streaming — real-time response
- Function calling / Tool use — structured actions
- Vision — rasm bilan ishlash
- Embeddings — semantic search uchun
- Prompt caching(Anthropic) — narxni 90% kamaytirish
- Batching — async parallel calls
- Rate limitingva retry strategiyalari
- Token tracking va observability
Kutubxonalar
pip install openai anthropic
pip install instructor # structured output
pip install tenacity # retry logic
pip install backoff # exponential backoff
Kod misollari
OpenAI — basic chat
from openai import OpenAI
client = OpenAI(api_key="sk-...") # yoki os.getenv("OPENAI_API_KEY")
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Sen yordamchi assistantsan."},
{"role": "user", "content": "Salom! Python da list comprehension nima?"},
],
temperature=0.7,
max_tokens=500,
)
print(response.choices[0].message.content)
print(f"Tokens: in={response.usage.prompt_tokens}, out={response.usage.completion_tokens}")
Anthropic — basic message
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="Sen yordamchi assistantsan.",
messages=[
{"role": "user", "content": "Python da list comprehension nima?"},
],
)
print(response.content[0].text)
print(f"Tokens: in={response.usage.input_tokens}, out={response.usage.output_tokens}")
Streaming — real-time
OpenAI streaming
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Uzun hikoya yozing"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
Anthropic streaming
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Uzun hikoya yozing"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Function Calling / Tool Use
OpenAI function calling
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Berilgan shahar uchun ob-havoni qaytaradi",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "Shahar nomi"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["city"],
},
},
}]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Toshkentda ob-havo qanday?"}],
tools=tools,
)
# Tool call'ni bajarish
tool_call = response.choices[0].message.tool_calls[0]
if tool_call.function.name == "get_weather":
args = json.loads(tool_call.function.arguments)
weather = get_weather(args["city"], args.get("unit", "celsius"))
# Natijani qaytarib LLM'ga yuborish
response2 = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "Toshkentda ob-havo qanday?"},
response.choices[0].message,
{"role": "tool", "tool_call_id": tool_call.id, "content": str(weather)},
],
tools=tools,
)
print(response2.choices[0].message.content)
Anthropic tool use
tools = [{
"name": "get_weather",
"description": "Berilgan shahar uchun ob-havoni qaytaradi",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["city"],
},
}]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "Toshkentda ob-havo qanday?"}],
)
# Tool use'ni bajarish
for block in response.content:
if block.type == "tool_use":
if block.name == "get_weather":
result = get_weather(**block.input)
# Natijani qaytarib yuborish
response2 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "Toshkentda ob-havo qanday?"},
{"role": "assistant", "content": response.content},
{"role": "user", "content": [{
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
}]},
],
)
Vision API
OpenAI vision
import base64
def encode_image(image_path: str) -> str:
with open(image_path, "rb") as f:
return base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Bu rasmda nima ko'ryapsiz?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{encode_image('photo.jpg')}"},
},
],
}],
)
Anthropic vision
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Bu rasmda nima ko'ryapsiz?"},
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": encode_image("photo.jpg"),
},
},
],
}],
)
Prompt Caching (Anthropic) — 90% arzonroq!
# Katta system prompt cache qilinadi, qayta-qayta to'lanmaydi
LARGE_SYSTEM = open("docs.md").read() # 50K token docs
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[
{
"type": "text",
"text": LARGE_SYSTEM,
"cache_control": {"type": "ephemeral"}, # CACHE!
},
],
messages=[{"role": "user", "content": "Ma'lumotnoma haqida savol..."}],
)
# Birinchi marta: full price + cache write (1.25x)
# Keyingi 5 daqiqada: 0.1x price (90% cheaper!)
Embeddings
OpenAI embeddings
response = client.embeddings.create(
model="text-embedding-3-small", # 1536-dim, $0.02 / 1M tokens
input=["Salom dunyo", "Machine learning"],
)
embeddings = [d.embedding for d in response.data]
# Shape: [(1536,), (1536,)]
Anthropic embeddings? — yo'q
Anthropic'da o'z embeddings API yo'q. Variantlar:
- OpenAI text-embedding-3-small
- Voyage AI (Anthropic tavsiya etadi)
- Cohere embeddings
- Sentence Transformers (local)
Retry + Rate Limiting
from tenacity import retry, stop_after_attempt, wait_exponential
from openai import RateLimitError, APIError
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=2, max=60),
retry=lambda e: isinstance(e, (RateLimitError, APIError)),
)
async def call_llm_with_retry(messages: list, model: str = "gpt-4o-mini"):
response = await async_client.chat.completions.create(
model=model,
messages=messages,
)
return response.choices[0].message.content
Async batching
import asyncio
from openai import AsyncOpenAI
async_client = AsyncOpenAI()
async def process_one(text: str):
response = await async_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Summarize: {text}"}],
)
return response.choices[0].message.content
async def process_batch(texts: list[str], max_concurrent: int = 10):
sem = asyncio.Semaphore(max_concurrent)
async def bounded(text):
async with sem:
return await process_one(text)
return await asyncio.gather(*[bounded(t) for t in texts])
# 100 ta matnni 10 ta concurrent bilan
results = asyncio.run(process_batch(texts, max_concurrent=10))
Cost tracking middleware
import logging
from contextlib import contextmanager
logger = logging.getLogger("llm_costs")
PRICES = {
"gpt-4o-mini": (0.15, 0.60),
"claude-sonnet-4-6": (3.00, 15.00),
"claude-haiku-4-5": (0.80, 4.00),
}
@contextmanager
def track_llm_call(model: str, user_id: int = None):
"""Usage: with track_llm_call("gpt-4o-mini"): ..."""
response_holder = {}
def hook(response):
response_holder["response"] = response
yield hook
response = response_holder.get("response")
if response and hasattr(response, "usage"):
u = response.usage
in_price, out_price = PRICES[model]
cost = (u.prompt_tokens * in_price + u.completion_tokens * out_price) / 1_000_000
logger.info(f"model={model} in={u.prompt_tokens} out={u.completion_tokens} "
f"cost=${cost:.6f} user={user_id}")
Backend integratsiyasi
FastAPI'da streaming chat endpoint (SSE)
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from openai import AsyncOpenAI
app = FastAPI()
client = AsyncOpenAI()
class ChatRequest(BaseModel):
message: str
session_id: str
async def stream_chat(messages: list):
stream = await client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stream=True,
)
async for chunk in stream:
if chunk.choices[0].delta.content:
text = chunk.choices[0].delta.content
yield f"data: {json.dumps({'text': text})}\n\n"
yield "data: [DONE]\n\n"
@app.post("/chat/stream")
async def chat_stream(req: ChatRequest):
history = await get_history(req.session_id)
messages = history + [{"role": "user", "content": req.message}]
return StreamingResponse(
stream_chat(messages),
media_type="text/event-stream",
)
WebSocket chat
from fastapi import WebSocket
@app.websocket("/ws/chat")
async def chat_ws(websocket: WebSocket):
await websocket.accept()
try:
while True:
data = await websocket.receive_json()
messages = data["messages"]
async with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=messages,
) as stream:
async for text in stream.text_stream:
await websocket.send_json({"type": "delta", "text": text})
await websocket.send_json({"type": "done"})
except Exception as e:
await websocket.send_json({"type": "error", "message": str(e)})
await websocket.close()
Multi-provider abstraction
from abc import ABC, abstractmethod
class LLMProvider(ABC):
@abstractmethod
async def chat(self, messages: list, **kwargs) -> str: ...
class OpenAIProvider(LLMProvider):
def __init__(self, model="gpt-4o-mini"):
self.client = AsyncOpenAI()
self.model = model
async def chat(self, messages, **kwargs):
response = await self.client.chat.completions.create(
model=self.model, messages=messages, **kwargs)
return response.choices[0].message.content
class AnthropicProvider(LLMProvider):
def __init__(self, model="claude-sonnet-4-6"):
from anthropic import AsyncAnthropic
self.client = AsyncAnthropic()
self.model = model
async def chat(self, messages, **kwargs):
# System message ni alohida ajratish
system = next((m["content"] for m in messages if m["role"] == "system"), None)
msgs = [m for m in messages if m["role"] != "system"]
response = await self.client.messages.create(
model=self.model,
max_tokens=kwargs.pop("max_tokens", 1024),
system=system,
messages=msgs,
**kwargs,
)
return response.content[0].text
# Usage
provider = OpenAIProvider("gpt-4o-mini")
# yoki
provider = AnthropicProvider("claude-haiku-4-5")
response = await provider.chat([{"role": "user", "content": "Salom"}])
Resurslar
- OpenAI docs — platform.openai.com/docs
- Anthropic docs — docs.anthropic.com
- OpenAI Cookbook — cookbook.openai.com
- Anthropic Cookbook — GitHub
- LiteLLM — universal LLM wrapper: litellm.ai
- OpenRouter — bitta API ko'p model'lar uchun: openrouter.ai
🏋️ Mashqlar
🟢 Easy
- OpenAI va Anthropic API bilan "Hello World" — 5 ta savol-javob.
- Streaming response oling, har char'ni alohida chiqaring.
- Embedding'ni 2 ta gap orasidagi similarity uchun.
🟡 Medium
- Function calling: weather, calculator, search — 3 ta tool bilan agent.
- Vision: rasm yuklab, undan structured data ajrating (Instructor + vision).
- Prompt caching: katta system prompt bilan 10 ta savol — narx farqini ko'ring.
🔴 Hard
- Multi-provider chat: OpenAI/Anthropic/Google — bitta abstraction, auto-fallback.
- Cost-aware router: input murakkabligi va kontekst kattaligi bo'yicha mos modelni avtomatik tanlash.
- Streaming chatbot: FastAPI + WebSocket + Postgres history + Redis caching.
Capstone
notebooks/month-05/03_llm_apis.ipynb:
- 3 ta provider (OpenAI, Anthropic, OpenRouter) bilan to'liq tanish bo'lish
- Multi-turn chatbot streaming bilan
- Function calling — 5 ta tool
- Vision — rasm classification
- Cost tracking dashboard
✅ Tekshirish ro'yxati
- OpenAI va Anthropic API'ni bilaman
- Streaming responses ishlataman
- Function calling / tool use
- Vision API bilan ishlash
- Embeddings hisoblash va saqlash
- Prompt caching (Anthropic)
- Async batching
- Retry va rate limit handling
- Cost tracking va observability
LangChain va LlamaIndex ga o'tamiz.