Blog

Latest updates, engineering insights, and product news from the VESSL AI team.

Don't tie a GPU to your agent

Same H100, same $5.10, 4× throughput. K=4 fan-out scheduling on VESSL Cloud collapses karpathy's autoresearch from 2 hours to 40 minutes.

VESSL AI

May 8, 2026|5 min

Fine-tune Gemma 4 for $0.38 — 15-minute A100 QLoRA tutorial on VESSL Cloud GPU

Tutorials

How to Fine-Tune Gemma 4 in 15 Minutes

Fine-tune Gemma 4 E4B on a cloud A100 in 15 minutes for $0.38. Real benchmarks, full code, and a storage strategy for team collaboration.

VESSL AI

Apr 20, 2026|8 min

Product

vesslctl: Manage VESSL Cloud from Your Terminal

The official VESSL Cloud CLI is here. Run your whole workflow from the terminal, with native MCP integration and a bundled Claude skill.

VESSL AI

Apr 20, 2026|5 min

Product

Your GPU Credit Lifesaver: Meet VESSL Cloud Job

Ever woken up to find credits drained on a workspace that finished hours ago? Batch Jobs submit, run, and auto-terminate — even failed runs don't keep bleeding money.

VESSL AI

Apr 20, 2026|7 min

VESSL Cloud — building a point-in-time finance LLM on 8×H100 with Qwen3.5-35B (full-weight recipe)

Machine Learning

Building a Point-in-Time Finance LLM on 8×H100 + Qwen3.5-35B: A Full-Weight Recipe

We ran point-in-time continued pretraining of a 35B model on 8×H100 and measured the lookahead-bias leakage premium — honest result + reproducible recipe.

VESSL AI

Jun 5, 2026|21 min

Machine Learning

A100 vs H100 vs B200 for LoRA fine-tuning and inference: a cost benchmark

We fine-tuned and served Gemma-4-31B on A100, H100, and B200, then normalized everything to cost per token. With the right configuration, the three land within ~5% — the settings matter more than the chip.

VESSL AI

Jun 4, 2026|11 min

AI Infrastructure

What Is a Neocloud? The Fastest Way to Get GPU Access

"I need GPUs for my AI project, but the waitlist is weeks long." If you've ever tried to spin up high-end GPUs on a major cloud provider, you know the pain. Long queues, complicated pricing, and surprise bills. And all just to get the compute you need. There's a new category of cloud infrastructure built to solve exactly this: neoclouds. In this post, we'll break down what neoclouds are, how they compare to hyperscalers, and how to get started today. This article was written by the VESSL AI t

VESSL AI

Apr 11, 2026|7 min

Machine Learning

2026 GPU Selection Guide — From L40S to B300

L40S, RTX Pro 6000, A100, H100, H200, B200, GB200, B300 — which GPU fits your workload? A side-by-side spec comparison with workload-based recommendations.

VESSL AI

Apr 9, 2026|4 min

Community

GTC 2026: GPU Infra Trends — Inference to Physical AI

Introduction At GTC 2026, Jensen Huang opened with a line that reframed the entire week: "2025 was the year of inference." And if 2025 was when inference arrived, GTC 2026 made one thing clear — what's accelerating it next is moving even faster. Three themes dominated the week: agentic tools that are compressing the AI development cycle from weeks to hours, Physical AI emerging as a genuinely continuous GPU workload, and a hardware roadmap built entirely around the assumption that inference de

VESSL AI

Apr 3, 2026|7 min