Wait Mode And Async Fallback For LLM Requests | ReqRun

The product tension

Developers like synchronous APIs because they are easy to use. Send a request, get a response, continue.

But LLM requests do not always fit neatly inside one HTTP timeout. Some calls are slow, some are rate-limited, and some should continue after the original client has stopped waiting.

wait=true keeps the happy path

With wait=true, ReqRun accepts the request, stores it, and waits briefly for completion. If the worker finishes within the configured timeout, the caller gets a normal OpenAI-style chat completion response.

The developer experience stays simple for fast requests.

TypeScript

const response = await reqrun.chat.completions.create({
  model: "gpt-5-nano",
  messages: [{ role: "user", content: "Summarize this incident." }],
  wait: true,
  idempotency_key: "incident-421-summary",
});

if (response.object === "chat.completion.async") {
  const request = await reqrun.requests.get(response.id);
  console.log(request.status, request.attempts);
}

Async fallback keeps the work alive

If the request does not finish within the wait timeout, ReqRun returns an async response instead of blocking forever.

That async response contains the rr_ request id. The app can store that id, return it to the client, or check status later with GET /v1/requests/{id}.

Why not just increase the timeout

Longer timeouts can hide the problem, but they do not create durable state. If the process restarts or the network drops, the app is still guessing.

A bounded wait plus durable fallback is safer. It gives fast requests a direct response and slow requests a recovery path.

How to design your app around it

Use wait=true for user-facing actions where a fast answer is helpful. If ReqRun returns the async shape, show a pending state and poll or refresh later.

Use wait=false for background jobs, agent tasks, and automation where you already expect the result to be checked later.

Use wait=true for interactive paths with a graceful pending state.
Use wait=false for background work.
Always store rr_ ids for operations that may need inspection later.
Use idempotency_key so client retries do not create duplicate work.