For product engineers, backend leads, and infrastructure teams at Indian e-commerce, marketplace, and content platforms managing large volumes of media assets.
Why a Single Bucket Is the Wrong Starting Point
Most product teams start by creating one S3 bucket and putting everything in it. Product images, user avatars, document uploads, marketing banners, generated reports — all in the same bucket, accessed by the same credentials, with no separation of concerns.
This works at small scale. At medium scale it creates three problems: access control is all-or-nothing (any service with bucket access can read any object, regardless of whether it should), lifecycle management becomes impossible to apply selectively (you cannot expire user-generated thumbnails without risk of expiring product images), and cost attribution is opaque (you cannot tell which product feature is driving storage growth).
The correct architecture separates media by type and access pattern from the beginning.
The Four-Bucket Media Architecture
Bucket 1 — Raw Uploads (Private)
Every user upload, every content team upload, every API-delivered file lands here first — in its original format, unmodified. This bucket is private. No public access. Write access for the upload service only, read access for the processing pipeline only.
The raw bucket is the archive layer. If a processing step produces incorrect results, reprocessing happens from raw. If a user disputes a moderation decision, the original unmodified file is available for review. Raw files are never deleted automatically — they are the ground truth record of what was received.
Key structure: raw/{year}/{month}/{upload-id}/{original-filename}
Bucket 2 — Processed Assets (Private Origin, CDN-Served)
The output of the processing pipeline: resized images at standard dimensions, format-converted files (WebP, AVIF), compressed thumbnails, video stills, document previews. This bucket is the CDN origin. It is not directly publicly accessible — the CDN fetches from it, caches the result, and serves subsequent requests from cache.
Key structure: processed/{content-type}/{content-id}/{variant-name}.{ext}
Example: processed/products/prod-abc123/hero-800w.webp, processed/products/prod-abc123/thumb-200w.webp
Bucket 3 — User-Generated Content (Private)
User profile photos, review images, user-submitted documents — content with privacy and moderation implications. Separate from processed product assets because the access model differs: user-generated content often requires authentication-gated delivery via presigned URLs, while product images are public.
Lifecycle policies on this bucket can be tuned independently: delete thumbnail variants of deleted user accounts, expire temporary upload previews, apply different retention rules for user documents versus user photos.
Bucket 4 — Marketing and Static Assets (Public)
Campaign images, brand assets, email template images, UI icons — content that is entirely public, changes infrequently, and is maintained by a non-technical team. Separate bucket means separate access credentials: the marketing team's deployment process can write to this bucket without any risk of touching product or user data.
The Upload Pipeline
For user and content uploads, the server-side proxy pattern is a performance and cost antipattern at scale. When a user uploads a 10 MB product image, having that upload transit through your application server wastes server resources and adds latency.
Use presigned POST uploads instead. The upload flow:
Your API generates a presigned POST policy for the raw uploads bucket — a set of signed form fields that authorise the client to upload directly to IBEE for up to N seconds. The API records a pending upload entry in the database with the expected key. The client uploads directly to IBEE using the presigned policy. IBEE accepts the upload and stores the file. Your API confirms the upload is complete (by checking the database record is marked done, or via a periodic check on the expected key).
The presigned POST policy can include conditions: maximum file size (content-length-range), allowed content types (content-type starts-with), and the specific bucket key the upload must use. This prevents clients from using the presigned policy to upload to unexpected keys or with unexpected file types.
For mobile apps, the AWS SDK's transfer manager handles multipart uploads transparently — large files are automatically split into parts and uploaded in parallel, with resume capability if the upload is interrupted.
The Processing Pipeline
When a new upload arrives in the raw bucket, the processing pipeline triggers. The correct trigger mechanism is an event-driven queue, not polling.
Configure the raw bucket to publish object creation events to a message queue (SQS-compatible). A processing worker consumes events from the queue and runs the appropriate pipeline for the content type: image resizing and format conversion for photos, thumbnail extraction for videos, text extraction for documents, virus scanning for any user upload.
Processing worker pseudocode for image variants:
def process_image(raw_bucket, raw_key, content_id):
# Download raw image
raw_image = download_from_ibee(raw_bucket, raw_key)
variants = [
('hero-1200w', 1200, 'webp', 85),
('hero-800w', 800, 'webp', 85),
('thumb-400w', 400, 'webp', 80),
('thumb-200w', 200, 'webp', 75),
]
for name, width, fmt, quality in variants:
resized = resize_and_convert(raw_image, width, fmt, quality)
key = f"processed/products/{content_id}/{name}.{fmt}"
upload_to_ibee('processed-assets', key, resized,
content_type=f'image/{fmt}',
cache_control='public, max-age=31536000, immutable')
# Update database record with processed variant paths
mark_content_processed(content_id)The immutable Cache-Control header on processed images tells the CDN and browser that this file will never change — cache it forever. Because variant filenames include the content ID (which is stable), this is safe for product images. For user content that can be updated (profile photos), use versioned keys that change when content changes rather than overwriting.
Responsive Image Delivery
Serving the correct image size to each device reduces page load times and egress costs. A 1200px hero image served to a mobile phone with a 390px display wastes bandwidth and slows rendering.
The HTML srcset attribute tells the browser about available sizes and lets it choose appropriately:
<img
src="https://cdn.example.com/processed/products/prod-abc123/hero-800w.webp"
srcset="
https://cdn.example.com/processed/products/prod-abc123/hero-1200w.webp 1200w,
https://cdn.example.com/processed/products/prod-abc123/hero-800w.webp 800w,
https://cdn.example.com/processed/products/prod-abc123/thumb-400w.webp 400w
"
sizes="(max-width: 600px) 400px, (max-width: 1024px) 800px, 1200px"
loading="lazy"
alt="Product image"
/>The browser selects the smallest variant that is at least as wide as the display context. A mobile user receives the 400w thumbnail. A desktop user receives the 1200w hero. The difference in file size — and therefore egress cost — is 5–10x between the smallest and largest variants.
Moderation Pipeline for User-Generated Content
User-uploaded content requires moderation before it is surfaced to other users. The architecture:
Uploads land in the user-generated content bucket with a status: pending tag. The processing pipeline triggers a moderation check — automated content classification (NSFW detection, text extraction from images, format validation) followed by human review for edge cases. Approved content gets a status: approved tag and is made accessible. Rejected content is flagged and the upload record is updated with a rejection reason.
Presigned URLs for user content are only generated for approved content. The application layer checks the content status before generating access URLs, ensuring pending and rejected content is never served to other users even if someone guesses the object key.
IBEE for Indian Media Platforms
For Indian e-commerce platforms, marketplaces, and content companies where product images, user photos, and generated assets are a high-egress workload, IBEE's combination of Rs.2/GB egress, sub-5ms latency for Indian users, and India-sovereign storage provides the cost and compliance foundation for a media asset architecture that scales with the business.
