NVIDIA Hopper

NVIDIA H200 GPU Cloud

Run NVIDIA H200 SXM GPUs with 141GB HBM3e — the same Hopper compute as the H100 with far more memory bandwidth, built for long-context LLMs and larger models without sharding.

NVIDIA H200 SXM — Hopper GPU on VESSL Cloud
NVIDIA H200 SXM
GPU memory
141GB HBM3e
Memory bandwidth
4.8 TB/s

Technical specifications

Architecture
Hopper
GPU memory
141GB HBM3e
Memory bandwidth
4.8 TB/s
NVLink
900 GB/s
FP16/BF16 (Tensor)
1,979 TFLOPS
FP8 (Tensor)
3,958 TFLOPS
Max TDP
700W
GPUs per node
8 (HGX H200)

*Peak performance with sparsity, per NVIDIA official specs. Final specs may vary by node configuration.

Pricing & availability

NVIDIA H200 SXMAvailable on request
Talk to sales

What's the H200 best for?

Long-context & large-model inference

141GB HBM3e fits 70B-class models — and big KV caches — on a single GPU, so you serve longer context windows without tensor-parallel sharding.

Memory-bound training & fine-tuning

Larger batches and longer sequences fit in memory; 4.8 TB/s bandwidth keeps Hopper FP8 tensor cores fed for memory-bound workloads.

Drop-in Hopper upgrade

Same CUDA, PyTorch, and NeMo stack as the H100 — move memory-constrained workloads over with no code changes and more headroom.

Compare NVIDIA data-center GPUs

H100
Hopper
H200
You're viewing
B200
Blackwell
B300
Blackwell
ArchitectureHopperHopperBlackwellBlackwell
GPU memory80GB HBM3141GB HBM3e192GB HBM3eup to 288GB HBM3e
Memory bandwidth3.35 TB/s4.8 TB/s8 TB/s8 TB/s
FP8 (Tensor)3,958 TFLOPS3,958 TFLOPS9 PFLOPS10 PFLOPS
Accessfrom $2.39/hrAvailable on requestAvailable on requestAvailable on request
Best forCost-efficient training & inferenceLong-context & large-model inferenceFrontier-scale training (FP4)Largest models & reasoning inference

Why industry-leading teams run GPUs on VESSL Cloud

No waitlists

Access capacity across clouds through one platform — skip quotas and procurement.

Scale to multi-node

Spin up a single GPU or scale to large multi-node clusters over high-speed InfiniBand — as much as you need.

Transparent pricing

Spot, on-demand, and reserved options with pay-as-you-go billing.

Enterprise-ready

SOC 2 Type II compliance, with dedicated support for production AI.

Frequently asked questions

Is the H200 available now?

H200 capacity is available on request. Talk to our team for current availability and pricing — we'll match capacity to your timeline.

What's the difference between the H100 and H200?

Both share the same Hopper compute (1,979 TFLOPS FP16 / 3,958 TFLOPS FP8). The H200 carries 141GB HBM3e at 4.8 TB/s vs the H100's 80GB HBM3 at 3.35 TB/s — so it fits larger models, bigger batches, and longer context windows.

How much memory does the H200 have?

The H200 SXM has 141GB of HBM3e memory at 4.8 TB/s bandwidth — about 76% more capacity and 43% more bandwidth than the H100.

Can I run multi-node H200 training?

Yes. We provision HGX H200 nodes (8 GPUs each) with high-speed InfiniBand for distributed training, with auto-checkpointing.

Should I pick the H200 or a Blackwell B200?

The H200 is the memory-rich Hopper option for large-model inference today. If you need FP4 acceleration and 192GB+ for frontier-scale work, look at the Blackwell B200/B300 — talk to us and we'll help you choose.

Stop chasing GPUs.
Start shipping AI.

Unified access to GPU capacity across providers. One platform, transparent pricing.

  • Start in minutes
  • Scale to multi-node clusters
  • High availability built-in
  • 24/7 support available