NVIDIA Grace Blackwell GB10 · Available Now

Private LLM
inference on
real hardware.

Pay-per-hour access to a local GB10 superchip running 70B+ parameter models. No queue. No cloud markup. OpenAI-compatible API — drop in your session token and go.

inference.py
from openai import OpenAI

client = OpenAI(
    base_url="https://gb10.studio/v1",
    api_key="st_your_session_token",
)

response = client.chat.completions.create(
    model="GB10",
    messages=[{
        "role": "user",
        "content": "Explain CUDA memory coalescing",
    }],
    stream=True,
)

for chunk in response:
    print(chunk.choices[0].delta.content)
128 GB Unified Memory
1 PFLOP FP8 Performance
NVLink-C2C CPU–GPU Interconnect
70B+ Parameter Models
$1.00 Per Hour
01

OpenAI-Compatible API

Change the base URL and API key — nothing else changes. Works with LangChain, LlamaIndex, Cursor, and any OpenAI SDK client out of the box.

02

No Queue. No Sharing.

Reserve a slot, start a session, get the full chip. Not a slice of a shared cluster. Not a rate-limited API wrapper. The whole machine is yours.

03

Pay by the Minute

Pre-load credits, run inference, stop when done. Billed per minute rounded up. No monthly commit. No egress fees. No surprise invoices.