NVIDIA A100/H100 GPU Servers for AI, ML, and HPC Workloads

Deploy high-performance GPU servers with NVIDIA A100 (80GB) or H100 GPUs for AI/ML model training, inference, and scientific computing. Scale from single GPUs to multi-node GPU clusters on Swiss infrastructure.

NVIDIA A100/H100 GPU Servers for AI, ML, and HPC Workloads

Deploy high-performance GPU servers with NVIDIA A100 (80GB) or H100 GPUs for AI/ML model training, inference, and scientific computing. Scale from single GPUs to multi-node GPU clusters on Swiss infrastructure.

Key Features

What sets Xelon Cloud apart

Key Features

What sets Xelon Cloud apart

Why choose Xelon Cloud?

Overview

Xelon GPU Servers provide on-demand access to NVIDIA A100 and H100 GPUs, designed for deep learning model training, large language models (LLMs), computer vision, scientific simulations, and rendering workloads.

Key Highlights:

  • NVIDIA A100 80GB PCIe/SXM4 GPUs with NVLink for multi-GPU training

  • NVIDIA H100 80GB SXM5 GPUs (up to 3x faster than A100 for LLM training)

  • Flexible configurations: 1-8 GPUs per server, AMD EPYC CPUs, up to 2 TB RAM

  • Pre-installed ML frameworks: PyTorch 2.2, TensorFlow 2.15, JAX, CUDA 12.3

  • High-speed NVMe storage with up to 100 TB capacity per server

  • 100 Gbps InfiniBand networking for distributed training (multi-node GPU clusters)

Use Cases:

  • LLM Training: Train GPT, LLaMA, Mistral models with multi-GPU parallelism (DeepSpeed, FSDP)

  • Computer Vision: Object detection, image segmentation, video analysis (YOLOv8, Stable Diffusion)

  • Scientific Computing: Molecular dynamics, CFD simulations, genomics (GROMACS, OpenFOAM)

  • Rendering: 3D rendering, video transcoding, ray tracing (Blender, FFmpeg with NVENC)

GPU pricing starts at CHF 2.50/hour for A100 80GB (60-70% cheaper than AWS p4d.24xlarge).

Why choose Xelon Cloud?

Overview

Xelon GPU Servers provide on-demand access to NVIDIA A100 and H100 GPUs, designed for deep learning model training, large language models (LLMs), computer vision, scientific simulations, and rendering workloads.

Key Highlights:

  • NVIDIA A100 80GB PCIe/SXM4 GPUs with NVLink for multi-GPU training

  • NVIDIA H100 80GB SXM5 GPUs (up to 3x faster than A100 for LLM training)

  • Flexible configurations: 1-8 GPUs per server, AMD EPYC CPUs, up to 2 TB RAM

  • Pre-installed ML frameworks: PyTorch 2.2, TensorFlow 2.15, JAX, CUDA 12.3

  • High-speed NVMe storage with up to 100 TB capacity per server

  • 100 Gbps InfiniBand networking for distributed training (multi-node GPU clusters)

Use Cases:

  • LLM Training: Train GPT, LLaMA, Mistral models with multi-GPU parallelism (DeepSpeed, FSDP)

  • Computer Vision: Object detection, image segmentation, video analysis (YOLOv8, Stable Diffusion)

  • Scientific Computing: Molecular dynamics, CFD simulations, genomics (GROMACS, OpenFOAM)

  • Rendering: 3D rendering, video transcoding, ray tracing (Blender, FFmpeg with NVENC)

GPU pricing starts at CHF 2.50/hour for A100 80GB (60-70% cheaper than AWS p4d.24xlarge).

Automated Backups Made Simple

Daily snapshots are included with Xelon Cloud instances by default.

Need longer retention? Choose flexible options with 7, 30, or 365-day retention for compliance or business continuity.

Automated Backups Made Simple

Daily snapshots are included with Xelon Cloud instances by default.

Need longer retention? Choose flexible options with 7, 30, or 365-day retention for compliance or business continuity.

Technical Specifications

Technical Specifications

GPU Options

  • NVIDIA A100 80GB PCIe: 6912 CUDA cores, 432 Tensor cores, 80 GB HBM2e memory, 2 TB/s memory bandwidth, 312 TFLOPS FP16, 624 TFLOPS with sparsity

  • NVIDIA A100 80GB SXM4: Same specs as PCIe, with NVLink 3.0 (600 GB/s GPU-to-GPU bandwidth for multi-GPU training)

  • NVIDIA H100 80GB SXM5: 16896 CUDA cores, 528 Tensor cores, 80 GB HBM3 memory, 3.35 TB/s memory bandwidth, 989 TFLOPS FP16, 1979 TFLOPS with sparsity, 3958 TFLOPS FP8

Server Configurations

  • Single GPU: 1x A100/H100, 16-32 vCPU (AMD EPYC 7543), 128-256 GB RAM, 2 TB NVMe SSD

  • Dual GPU: 2x A100/H100, 32-64 vCPU, 256-512 GB RAM, 4 TB NVMe SSD

  • Quad GPU: 4x A100/H100 SXM4/SXM5 with NVLink, 64-128 vCPU, 512 GB - 1 TB RAM, 8 TB NVMe SSD

  • Octa GPU: 8x A100/H100 SXM4/SXM5 with NVLink, 128 vCPU, 2 TB RAM, 16 TB NVMe SSD

CPU & Memory

  • CPU: AMD EPYC 7543 (32 cores), 7713 (64 cores), 9554 (128 cores)

  • RAM: 128 GB, 256 GB, 512 GB, 1 TB, 2 TB DDR4-3200 ECC memory

  • PCIe: PCIe Gen4 x16 per GPU for maximum bandwidth

Storage & Networking

  • Boot/Scratch Storage: 2-16 TB NVMe SSD RAID 0/1 for datasets and checkpoints

  • Shared Storage: Ceph RBD or NFS for multi-node training (100 Gbps RDMA networking)

  • Network: 25 Gbps Ethernet (standard), 100 Gbps InfiniBand HDR for multi-node GPU clusters

Software & Frameworks

  • OS: Ubuntu 22.04 LTS with NVIDIA GPU drivers 535.x, CUDA 12.3, cuDNN 8.9

  • Deep Learning Frameworks: PyTorch 2.2, TensorFlow 2.15, JAX 0.4.23, MXNet, PaddlePaddle

  • Distributed Training: DeepSpeed, FSDP (Fully Sharded Data Parallelism), Horovod, PyTorch DDP

  • Model Serving: vLLM, TensorRT-LLM, Triton Inference Server, TorchServe

  • Jupyter: JupyterLab with GPU support, pre-installed kernels for Python 3.10/3.11

Pricing Models

  • On-Demand: Per-hour billing (CHF 2.50/hour for A100 80GB, CHF 4.00/hour for H100 80GB)

  • Reserved Instances: 1-year or 3-year commitments with 30-50% discounts

  • Spot Instances: Up to 70% discount for interruptible workloads (preemptible GPUs)

Technical Specifications

Technical Specifications

GPU Options

  • NVIDIA A100 80GB PCIe: 6912 CUDA cores, 432 Tensor cores, 80 GB HBM2e memory, 2 TB/s memory bandwidth, 312 TFLOPS FP16, 624 TFLOPS with sparsity

  • NVIDIA A100 80GB SXM4: Same specs as PCIe, with NVLink 3.0 (600 GB/s GPU-to-GPU bandwidth for multi-GPU training)

  • NVIDIA H100 80GB SXM5: 16896 CUDA cores, 528 Tensor cores, 80 GB HBM3 memory, 3.35 TB/s memory bandwidth, 989 TFLOPS FP16, 1979 TFLOPS with sparsity, 3958 TFLOPS FP8

Server Configurations

  • Single GPU: 1x A100/H100, 16-32 vCPU (AMD EPYC 7543), 128-256 GB RAM, 2 TB NVMe SSD

  • Dual GPU: 2x A100/H100, 32-64 vCPU, 256-512 GB RAM, 4 TB NVMe SSD

  • Quad GPU: 4x A100/H100 SXM4/SXM5 with NVLink, 64-128 vCPU, 512 GB - 1 TB RAM, 8 TB NVMe SSD

  • Octa GPU: 8x A100/H100 SXM4/SXM5 with NVLink, 128 vCPU, 2 TB RAM, 16 TB NVMe SSD

CPU & Memory

  • CPU: AMD EPYC 7543 (32 cores), 7713 (64 cores), 9554 (128 cores)

  • RAM: 128 GB, 256 GB, 512 GB, 1 TB, 2 TB DDR4-3200 ECC memory

  • PCIe: PCIe Gen4 x16 per GPU for maximum bandwidth

Storage & Networking

  • Boot/Scratch Storage: 2-16 TB NVMe SSD RAID 0/1 for datasets and checkpoints

  • Shared Storage: Ceph RBD or NFS for multi-node training (100 Gbps RDMA networking)

  • Network: 25 Gbps Ethernet (standard), 100 Gbps InfiniBand HDR for multi-node GPU clusters

Software & Frameworks

  • OS: Ubuntu 22.04 LTS with NVIDIA GPU drivers 535.x, CUDA 12.3, cuDNN 8.9

  • Deep Learning Frameworks: PyTorch 2.2, TensorFlow 2.15, JAX 0.4.23, MXNet, PaddlePaddle

  • Distributed Training: DeepSpeed, FSDP (Fully Sharded Data Parallelism), Horovod, PyTorch DDP

  • Model Serving: vLLM, TensorRT-LLM, Triton Inference Server, TorchServe

  • Jupyter: JupyterLab with GPU support, pre-installed kernels for Python 3.10/3.11

Pricing Models

  • On-Demand: Per-hour billing (CHF 2.50/hour for A100 80GB, CHF 4.00/hour for H100 80GB)

  • Reserved Instances: 1-year or 3-year commitments with 30-50% discounts

  • Spot Instances: Up to 70% discount for interruptible workloads (preemptible GPUs)

Optimized for Kubernetes & Cloud-Native Workloads

Our compute plans are designed to integrate seamlessly with CloudDeck and Xelon Kubernetes:


  • Native support for K8s node pools

  • Instant scaling

  • Multi-zone deployments

  • S3-compatible object storage for stateful workloads

  • SCION-secured networking for critical clusters


Optimized for Kubernetes & Cloud-Native Workloads

Our compute plans are designed to integrate seamlessly with CloudDeck and Xelon Kubernetes:


  • Native support for K8s node pools

  • Instant scaling

  • Multi-zone deployments

  • S3-compatible object storage for stateful workloads

  • SCION-secured networking for critical clusters


Optimized for Kubernetes & Cloud-Native Workloads

Our compute plans are designed to integrate seamlessly with CloudDeck and Xelon Kubernetes:


  • Native support for K8s node pools

  • Instant scaling

  • Multi-zone deployments

  • S3-compatible object storage for stateful workloads

  • SCION-secured networking for critical clusters


Book an Appointment

Choose a time that works for you and connect with one of our cloud specialists for a personalised session — via Microsoft Teams or phone.

Request Meeting with our Solution Architect

Request Meeting with our Partner Manager

Book an Appointment

Choose a time that works for you and connect with one of our cloud specialists for a personalised session — via Microsoft Teams or phone.

Request Meeting with our Solution Architect

Request Meeting with our Partner Manager

Get in touch

We’re here to help you with cloud strategy, technical questions, pricing, compliance, and tailored solutions for your organisation.

Get in touch

We’re here to help you with cloud strategy, technical questions, pricing, compliance, and tailored solutions for your organisation.

First name

Bonnie

Last name

Green

Email

name@example.com

Phone number

+(12) 345 6789

Your message

By submitting this form, you confirm that you have read and agree to Xelon's Terms of Service and Privacy Statement

Send message

Take your cloud to the next level

Experience high-performance Swiss cloud infrastructure built for teams who want reliability, sovereignty, and simplicity.

Take your cloud to the next level

Experience high-performance Swiss cloud infrastructure built for teams who want reliability, sovereignty, and simplicity.

Take your cloud to the next level

Experience high-performance Swiss cloud infrastructure built for teams who want reliability, sovereignty, and simplicity.