ibee

AI Storage
Storage that keeps every GPU saturated.

High-throughput, India-resident storage designed to hold GPUs at full utilisation through training, checkpointing, and inference. RDMA throughput, POSIX mount, predictable pricing.
Who this is for

Built for ML platform teams whose GPUs are I/O bound.

Four workload shapes where the storage tier is the thing capping your throughput — and where throughput translates directly to GPU-hour ROI.

Dataset-heavy training

Stream billions of training samples at fabric speed. Parallel reads mean your PyTorch DataLoader never waits on I/O, and your GPUs never drop below 95% utilisation waiting for the next batch.

Checkpoint storage

Write a multi-TB checkpoint from every rank in seconds, not minutes. Resume from a failed run without replaying a full training day of gradient updates.

Distributed cluster training

One namespace across dozens of GPU nodes. Your 128-GPU job sees a single coherent filesystem; your engineer sees consistent paths and no sharded-data logistics.

Hybrid and multi-region pipelines

Mount the same namespace from GPU Cloud, Bare Metal, and Kubernetes. Replicate across regions without application changes or dataset sharding.

Throughput, not capacity theatre.

AI Storage is priced and designed for GPU-fed throughput, not shelf-GB. Every feature exists to keep accelerators at sustained high utilisation — not to pad a spec sheet.

Fabric-speed throughput

RDMA-accelerated reads deliver multi-GB/s per GPU under steady-state load. No per-metadata-op fees, no burst-credit cliffs, no throttling windows.

Parallel file system

Linear scaling across storage nodes with distributed metadata. Handles billion-file namespaces without the single-point-of-contention problem that bottlenecks NFS.

POSIX + S3 in one namespace

Mount as POSIX for training scripts, address as S3 for data pipelines. Same data, same bucket, two protocols — no copies, no sync jobs.

AES-256 + customer-held keys

Encrypted at rest and in transit. Bring your own KMS key if your compliance team requires key separation from the storage provider.

Immutable dataset snapshots

Point-in-time snapshots of training sets. Re-run last month's experiment on the exact data it saw — reproducibility that survives a dataset refresh.

India-resident storage

Data stays within Indian jurisdiction by default. Customer-managed keys, audit trails, and in-country replication for regulated ML workloads.

IBEE AI Cloud

One namespace, every compute tier.

AI Storage mounts the same way across every IBEE compute product — so migrating between virtualised GPUs, dedicated bare metal, and long-term archival is a mount-point change, not a re-architecture.

Stop paying for GPUs
that are waiting on I/O.

AI Storage is rolling out to early-access teams. Register interest to benchmark throughput on your training workload and get a capacity quote sized to your actual dataset shape.

Frequently Asked Questions

ML platform teams that are choosing between self-hosted Weka, Vast, Lustre, Ceph, or one of the hyperscaler parallel file offerings. If you're running multi-node training, your training data is in the TBs, and your GPUs are idle waiting on I/O — this is the shape of storage built for that problem.
Object Storage is S3-compatible, priced for capacity, and designed for asynchronous reads — datasets, model artefacts, and backups. AI Storage is priced for throughput and latency: mounted as POSIX, RDMA on the hot path, and optimised for the read patterns of a PyTorch DataLoader feeding a GPU at fabric speed. Both are available, and datasets flow between them.
AI Storage is designed to deliver multi-GB/s per GPU under sustained training load, with burst capacity above that. Actual throughput depends on model architecture (CNNs are more I/O bound than LLMs), dataset layout, and DataLoader configuration. Benchmarking against your specific workload is part of the onboarding process for early-access customers — we publish the numbers your configuration hits, not a spec-sheet headline.
The fastest configuration uses InfiniBand between compute and storage, but AI Storage also supports RoCE over Ethernet for deployments where InfiniBand isn't practical. The exact interconnect is matched to the GPU pod you're running on — Bare Metal clusters ship with IB, GPU Cloud VMs use RoCE.
Capacity and throughput are priced separately and predictably — no per-metadata-op fees, no egress surprises, no billing spikes from running a PyTorch DataLoader at full speed. Volume tiers kick in automatically at larger scales. Contact sales for a quote sized to your training workload.
Yes — AI Storage exposes a POSIX API, so any training script that runs against a mounted filesystem works without modification. Migration involves copying datasets across and updating mount points. The IBEE team assists with migration planning and benchmarking during early access.

Have more questions?

Contact Our Technical Team