Modal: High-performance AI infrastructure

Modal: High-performance AI infrastructure

Bring your own code, and run CPU, GPU, and data-intensive compute at scale. The serverless platform for AI and data teams.

Created Aug 29, 2025
Updated May 31, 2026

What it is

Modal is a serverless compute platform designed for AI and high-performance Python workloads. It provides a cloud-based infrastructure that enables developers to run code at scale without managing servers or containers. The platform is primarily targeted at developers, data scientists, and ML engineers working on generative AI inference, model training, batch processing, and other computationally intensive tasks.

Main Features

Compute & Scaling

  • Sub-second container starts powered by a custom Rust-based container stack
  • Instant autoscaling to hundreds of GPUs within seconds
  • Scale down to zero when not in use
  • Support for state-of-the-art GPUs including Nvidia H100, A100, L40S, and T4

Development Experience

  • Zero configuration files required
  • Define hardware and container requirements using Python function decorators
  • Bring your own code and container images
  • Built-in debugging tools with interactive shell and breakpoints

Storage & Integrations

  • Mount cloud storage from major providers (S3, R2, etc.)
  • Network volumes, key-value stores, and queues
  • Export logs to Datadog or any OpenTelemetry-compatible provider
  • Web endpoints with custom domains, streaming, and websockets

Job Management

  • Powerful job scheduling with cron jobs, retries, and timeouts
  • Simple fan-out parallelism scaling to thousands of containers
  • Batch processing optimized for high-volume workloads

How it works

Generative AI Inference

Users deploy AI models by decorating Python functions with Modal's specifications. The platform handles loading model weights, container initialization, and automatic scaling based on request volume. Users can run custom models or popular frameworks with optimized cold boot times.

Fine-tuning and Training

Developers provision GPU resources (A100/H100) in seconds for training workloads. The platform provides pre-configured environments with necessary drivers and packages, enabling immediate training start. Users can run multiple experiments in parallel and only pay for active compute time.

Batch Processing

For high-volume data processing, users can parallelize workloads across thousands of containers with simple Python syntax. The platform automatically manages resource allocation, scaling, and cost optimization for CPU and memory-intensive tasks.

Web Service Deployment

Users can deploy web services and APIs by creating HTTPS endpoints from their Python functions. Modal handles domain management, SSL certificates, and supports streaming and websocket connections.

Key Points

  • Pay-per-use pricing model charges by the second for actual compute consumption
  • Provides $30 monthly free compute credit
  • Built on gVisor for secure sandboxed code execution
  • SOC 2 and HIPAA compliant with enterprise SSO support
  • Extensive documentation with practical examples for various use cases
  • Active developer community with Slack support

Additional Details

Pricing

  • GPU pricing ranges from $0.000164/sec (T4) to $0.001736/sec (B200)
  • CPU: $0.0000131/core/sec (minimum 0.125 cores per container)
  • Memory: $0.00000222/GiB/sec
  • Three tiers: Starter (individual developers), Team (startups), Enterprise (large organizations)

Availability

  • Multiple region support
  • Free tier includes $30 monthly compute credit
  • No long-term commitments or upfront payments required

Requirements

  • Python-based development
  • Basic familiarity with Python decorators and cloud concepts
  • No infrastructure management or container expertise needed
Quick Actions
Table of Contents