Cloudy runs AI research jobs on GPUs. You give Cloudy a prompt, set a time and cost limit, and it starts the right machines, runs the work, saves the files and logs, and retries jobs when capacity changes. Everything is managed from one interface.
Example prompts Cloudy is built for:
- "Train a small draft model for speculative decoding on inference_traces.jsonl, using gpt-oss-20b as the draft for our larger gpt-oss-120b model. Test it in vLLM, measure acceptance rate, p50/p99 latency, throughput, and quality, then create the final inference script."
- "Replace our unfused RMSNorm + rotary embedding + QKV projection path with a fused Triton kernel. Benchmark it on H100s against the PyTorch baseline and report speedup, memory use, numerical drift, and failed tests."
- "Use Qwen2.5-Coder-32B as the base model and build a synthetic data curriculum from SWE-bench Verified, LiveCodeBench, HumanEval, and MBPP. Fine-tune with LoRA, run pass@1 and pass@10 evals, and keep the best checkpoint."
- "Run a long-context RLVR experiment on Kimi-K2 or GLM-4.5 using HotpotQA, NarrativeQA, and GPQA Diamond. Add dense rewards for evidence selection, compare against outcome-only rewards, and show benchmark deltas."
What Cloudy handles:
- Finding available spot or on-demand GPUs across providers
- Starting jobs and sandboxes with files, logs, and checkpoints
- Queuing, retrying, and stopping work when capacity changes
- Keeping experiments organized so you can run hundreds of them
Questions? Email hi@cloudy.so.