PROCESS MILLIONS.
PAY HALF.
Submit a file of requests, get results back when they're done. Async bulk inference at 50% off real-time pricing. Built for workloads that can trade latency for cost.
CONTACT SALESJSONL IN. JSONL OUT.
One request per line. Each with a custom_id for result correlation. Upload the file, poll for completion, download the output. Same OpenAI-compatible format you already use.
GET STARTEDfrom openai import OpenAI
client = OpenAI(
base_url="https://api.haimaker.ai/v1",
api_key="your-api-key",
)
# Upload input file
batch_file = client.files.create(
file=open("requests.jsonl", "rb"),
purpose="batch",
)
# Create batch job
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
) # Upload input file
curl https://api.haimaker.ai/v1/files \
-H "Authorization: Bearer your-api-key" \
-F purpose="batch" \
-F file="@requests.jsonl"
# Create batch job
curl https://api.haimaker.ai/v1/batch/jobs \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "file-abc123",
"endpoint": "/v1/chat/completions",
"completion_window": "24h"
}' # requests.jsonl — one request per line
{"custom_id": "req-1", "body": {"model": "meta-llama/Llama-3.1-8B", "messages": [{"role": "user", "content": "Summarize this document..."}]}}
{"custom_id": "req-2", "body": {"model": "meta-llama/Llama-3.1-8B", "messages": [{"role": "user", "content": "Classify this ticket..."}]}}
{"custom_id": "req-3", "body": {"model": "meta-llama/Llama-3.1-8B", "messages": [{"role": "user", "content": "Extract entities from..."}]}} 50% COST SAVINGS
Batch jobs fill idle GPU capacity during off-peak hours. You get the same models and the same output quality at half the per-token price. The discount comes from scheduling flexibility, not corners cut.
SIMPLE FORMAT
JSONL in, JSONL out. Every request gets a custom_id that maps directly to its result. No new SDKs, no new APIs to learn. If you can write a for loop, you can build a batch job.
BUILT FOR SCALE
50,000 requests per batch. 100MB file uploads. 24-hour best-effort SLA. Separate rate limits from real-time traffic so your batch jobs never starve your production endpoints.
BUILT FOR
Workloads that value cost over latency.
BULK EVALUATION
Run eval suites across model versions, prompt variations, and parameter sweeps. Compare thousands of outputs without burning through your real-time budget.
DOCUMENT PROCESSING
Summarize, classify, or extract from millions of documents overnight. Legal discovery, medical records, support tickets — anything that sits in a queue.
DATA ENRICHMENT
Add AI-generated annotations, embeddings, or metadata to your datasets. Enrich your data warehouse while your team sleeps.
READY TO PROCESS AT SCALE?
Tell us about your workload and we'll set up batch processing for your team.
CONTACT SALES