For engineering teams at Indian audio platforms, podcast networks, music streaming companies, and media houses building their own audio delivery infrastructure.
Audio Infrastructure Is a Pure Egress Business
Unlike video, which trades quality for bitrate with adaptive streaming, audio has simpler delivery mechanics: a file is downloaded or progressively played from start to finish. The infrastructure design is correspondingly simpler — but the egress cost structure is identical. Every byte of audio delivered to a listener is egress from your storage origin or CDN.
India is among the fastest-growing podcast and audio content markets globally, driven by smartphone penetration, affordable data, and a surge in regional language content. An audio platform that builds its infrastructure on hyperscaler storage at seed stage and does not revisit the decision at 100,000 monthly listeners will have an egress bill that materially affects its unit economics before the Series A conversation begins.
The Audio Egress Calculation
A 45-minute podcast episode at 128kbps MP3 is approximately 40 MB. At 128kbps AAC (better quality, smaller file), it is approximately 32 MB.
One hundred thousand monthly listeners each downloading one 40 MB episode = 4,000 GB of egress per episode per month. At AWS S3 Mumbai at Rs.6.75/GB, that is Rs.27,000 per episode per month, per 100,000 listeners. A catalogue of 50 episodes with 100,000 listeners each produces Rs.13.5 lakh per month in egress from one show.
At IBEE at Rs.2/GB, the same 4,000 GB is Rs.8,000 per episode per month. The 50-episode back catalogue costs Rs.4 lakh per month in egress — a Rs.9.5 lakh monthly difference on a single show.
For a podcast network with 10 shows and combined listenership of 500,000 per month, the infrastructure cost difference between hyperscaler and IBEE egress pricing is Rs.40–50 lakh per month.
Storage Structure for Audio Platforms
Raw Audio Bucket (Private)
Master recordings as delivered by creators or produced in-house: uncompressed WAV or high-bitrate MP3, before any processing. Private, never publicly accessible. Write-once preservation layer.
Key structure: raw-audio/{show-id}/{episode-id}/master.{ext}
Processed Audio Bucket (CDN Origin)
Transcoded delivery files: multiple bitrate variants of each episode, normalised for volume consistency. Typically 128kbps MP3 for broad compatibility (required for standard podcast RSS feeds), 96kbps AAC for mobile streaming apps, and a 64kbps AAC for low-bandwidth listeners.
Key structure: audio/{show-id}/{episode-id}/{variant}.{ext}
Example: audio/show-tech-decoded/ep-042/128kbps.mp3, audio/show-tech-decoded/ep-042/96kbps.aac
Artwork and Metadata Bucket
Show cover art, episode artwork, transcripts, chapter marker files, show notes images. Small files, high cache hit rate, relatively static.
Audio Processing Pipeline
When a raw episode upload arrives, the processing pipeline transcodes it into delivery formats, normalises loudness to the broadcast standard (LUFS-16 for podcasts), and writes the output files to the processed audio bucket.
FFmpeg handles the transcoding and normalisation:
# First pass: measure integrated loudness
ffmpeg -i master.wav -af loudnorm=I=-16:TP=-1.5:LRA=11:print_format=json \
-f null /dev/null 2>&1 | tail -12 > loudnorm_stats.json
# Second pass: apply correction and encode to delivery formats
STATS=$(cat loudnorm_stats.json)
INPUT_I=$(echo $STATS | jq -r '.input_i')
INPUT_TP=$(echo $STATS | jq -r '.input_tp')
INPUT_LRA=$(echo $STATS | jq -r '.input_lra')
INPUT_THRESH=$(echo $STATS | jq -r '.input_thresh')
TARGET_OFFSET=$(echo $STATS | jq -r '.target_offset')
# 128kbps MP3 (podcast standard)
ffmpeg -i master.wav \
-af "loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=${INPUT_I}:measured_LRA=${INPUT_LRA}:measured_TP=${INPUT_TP}:measured_thresh=${INPUT_THRESH}:offset=${TARGET_OFFSET}:linear=true" \
-c:a libmp3lame -b:a 128k -id3v2_version 3 \
128kbps.mp3
# 96kbps AAC (streaming app variant)
ffmpeg -i master.wav \
-af "loudnorm=I=-16:TP=-1.5:LRA=11:..." \
-c:a aac -b:a 96k \
96kbps.aacUpload both variants to the processed audio bucket on IBEE with appropriate content-type headers and cache-control settings. Audio delivery files are immutable once published — set Cache-Control: public, max-age=31536000, immutable.
Podcast RSS Feed Architecture
The podcast RSS feed is the distribution mechanism to Apple Podcasts, Spotify, Google Podcasts, and every other podcast directory. The feed is an XML document that lists every episode with metadata and the direct audio file URL.
The feed itself is a dynamic document generated by your API or a static file updated on each publish. Because podcast directories fetch the feed frequently (often hourly), the feed URL must be consistently accessible and respond quickly. Serve the feed from your application server or from a CDN-cached static file in object storage — not from a slow database query on each request.
Audio URLs in the feed must be direct HTTP(S) links to the audio files. Most podcast directories download the file directly from the URL in the feed. If your audio files are in a private bucket served via presigned URLs, presigned URLs are not suitable for podcast RSS feeds because they expire — directories that cache the feed and fetch episodes later will get expired URLs.
For podcast delivery, audio files should be in a publicly accessible bucket (or served via a CDN with public access). Presigned URLs are appropriate for premium, subscription-gated content in an app — not for standard RSS-distributed podcasts.
Analytics tracking for podcast downloads requires routing download requests through your API before redirecting to the actual audio file, since direct CDN delivery does not count as a download by IAB standards. The standard pattern: the RSS feed contains URLs pointing to your tracking endpoint (https://analytics.yourapp.com/track/ep-042/128kbps.mp3), which logs the request and immediately issues an HTTP 301 redirect to the actual IBEE CDN URL. This counts as a verified download while still serving the audio from CDN.
Caching Strategy for Audio Files
Audio files are large and immutable — ideal CDN caching targets. Once an episode is published and the audio file is written to the processed bucket, it never changes. Configure the CDN to cache audio files indefinitely (Cache-Control: public, max-age=31536000, immutable).
For popular episodes, CDN cache hit rates of 95%+ are achievable. The first listener's download fetches from the IBEE origin. Every subsequent listener within the CDN's cache TTL is served from cache — no origin hit. The origin egress cost per listener trends toward zero for popular content.
For the long tail of a large catalogue — episodes with low ongoing listener counts — CDN caches may not hold files if they are evicted for lack of access. For these, direct origin delivery is acceptable. The listener count is low enough that origin egress cost is not material.
Regional Language Audio and Data Sovereignty
Indian audio platforms producing content in Hindi, Tamil, Telugu, Kannada, Malayalam, Bengali, and other regional languages are building infrastructure that holds culturally significant content alongside user data. For platforms that capture listener behaviour, preferences, and payment data, the data residency of the storage layer has the same compliance dimensions as any other user-data application.
IBEE provides India-sovereign storage for the complete audio infrastructure stack — raw recordings, processed delivery files, user data — on infrastructure that is legally and physically within India.

