NVIDIA H200 GPU Cloud
Run NVIDIA H200 SXM GPUs with 141GB HBM3e — the same Hopper compute as the H100 with far more memory bandwidth, built for long-context LLMs and larger models without sharding.

- GPU memory
- 141GB HBM3e
- Memory bandwidth
- 4.8 TB/s
Technical specifications
- Architecture
- Hopper
- GPU memory
- 141GB HBM3e
- Memory bandwidth
- 4.8 TB/s
- NVLink
- 900 GB/s
- FP16/BF16 (Tensor)
- 1,979 TFLOPS
- FP8 (Tensor)
- 3,958 TFLOPS
- Max TDP
- 700W
- GPUs per node
- 8 (HGX H200)
*Peak performance with sparsity, per NVIDIA official specs. Final specs may vary by node configuration.
Pricing & availability
What's the H200 best for?
Long-context & large-model inference
141GB HBM3e fits 70B-class models — and big KV caches — on a single GPU, so you serve longer context windows without tensor-parallel sharding.
Memory-bound training & fine-tuning
Larger batches and longer sequences fit in memory; 4.8 TB/s bandwidth keeps Hopper FP8 tensor cores fed for memory-bound workloads.
Drop-in Hopper upgrade
Same CUDA, PyTorch, and NeMo stack as the H100 — move memory-constrained workloads over with no code changes and more headroom.
Compare NVIDIA data-center GPUs
| H100 Hopper | H200 You're viewing | B200 Blackwell | B300 Blackwell | |
|---|---|---|---|---|
| Architecture | Hopper | Hopper | Blackwell | Blackwell |
| GPU memory | 80GB HBM3 | 141GB HBM3e | 192GB HBM3e | up to 288GB HBM3e |
| Memory bandwidth | 3.35 TB/s | 4.8 TB/s | 8 TB/s | 8 TB/s |
| FP8 (Tensor) | 3,958 TFLOPS | 3,958 TFLOPS | 9 PFLOPS | 10 PFLOPS |
| Access | from $2.39/hr | Available on request | Available on request | Available on request |
| Best for | Cost-efficient training & inference | Long-context & large-model inference | Frontier-scale training (FP4) | Largest models & reasoning inference |
Why industry-leading teams run GPUs on VESSL Cloud
No waitlists
Access capacity across clouds through one platform — skip quotas and procurement.
Scale to multi-node
Spin up a single GPU or scale to large multi-node clusters over high-speed InfiniBand — as much as you need.
Transparent pricing
Spot, on-demand, and reserved options with pay-as-you-go billing.
Enterprise-ready
SOC 2 Type II compliance, with dedicated support for production AI.
Frequently asked questions
Is the H200 available now?
H200 capacity is available on request. Talk to our team for current availability and pricing — we'll match capacity to your timeline.
What's the difference between the H100 and H200?
Both share the same Hopper compute (1,979 TFLOPS FP16 / 3,958 TFLOPS FP8). The H200 carries 141GB HBM3e at 4.8 TB/s vs the H100's 80GB HBM3 at 3.35 TB/s — so it fits larger models, bigger batches, and longer context windows.
How much memory does the H200 have?
The H200 SXM has 141GB of HBM3e memory at 4.8 TB/s bandwidth — about 76% more capacity and 43% more bandwidth than the H100.
Can I run multi-node H200 training?
Yes. We provision HGX H200 nodes (8 GPUs each) with high-speed InfiniBand for distributed training, with auto-checkpointing.
Should I pick the H200 or a Blackwell B200?
The H200 is the memory-rich Hopper option for large-model inference today. If you need FP4 acceleration and 192GB+ for frontier-scale work, look at the Blackwell B200/B300 — talk to us and we'll help you choose.
Explore other GPUs
Different workload? Pick the GPU that fits your memory, throughput, and budget.
The proven Hopper workhorse — best price/performance for training, fine-tuning, and inference. From $2.39/hr.
View detailsBlackwell with 192GB HBM3e and FP4 acceleration — for frontier-scale training and high-throughput inference.
View detailsBlackwell Ultra with up to 288GB HBM3e — for the largest models and high-concurrency reasoning inference.
View detailsStop chasing GPUs.
Start shipping AI.
Unified access to GPU capacity across providers. One platform, transparent pricing.
- Start in minutes
- Scale to multi-node clusters
- High availability built-in
- 24/7 support available