Private LLM
inference on
real hardware.
Pay-per-hour access to a local GB10 superchip running 70B+ parameter models. No queue. No cloud markup. OpenAI-compatible API — drop in your session token and go.
from openai import OpenAI client = OpenAI( base_url="https://gb10.studio/v1", api_key="st_your_session_token", ) response = client.chat.completions.create( model="GB10", messages=[{ "role": "user", "content": "Explain CUDA memory coalescing", }], stream=True, ) for chunk in response: print(chunk.choices[0].delta.content)▊
OpenAI-Compatible API
Change the base URL and API key — nothing else changes. Works with LangChain, LlamaIndex, Cursor, and any OpenAI SDK client out of the box.
No Queue. No Sharing.
Reserve a slot, start a session, get the full chip. Not a slice of a shared cluster. Not a rate-limited API wrapper. The whole machine is yours.
Pay by the Minute
Pre-load credits, run inference, stop when done. Billed per minute rounded up. No monthly commit. No egress fees. No surprise invoices.
Own a GB10?
Put it to work.
Your Grace Blackwell sits idle most of the day. List it on the GB10 Studio marketplace and earn from every minute someone else runs inference on it. You set the rate and your hours — we handle billing, payouts, and the customer.
-
01
Apply & get verified
Tell us about your GB10. A $5 application fee keeps the marketplace serious; another $5 is held in escrow against chargebacks.
-
02
List your slot
Point us at your HTTPS endpoint, set an hourly rate and weekly availability. Your slot token is encrypted at rest and rotates on demand.
-
03
Earn & cash out
Keep 85% of every session served. Watch earnings accrue live and request a payout once you clear $25.