Building a User-Generated Content Mod…

For engineering leads and trust and safety teams at Indian social platforms, marketplaces, edtech platforms, and any product that accepts user-uploaded media.

Why Moderation Architecture Starts at the Storage Layer

Content moderation is often thought of as a classification problem: does this image contain prohibited content? But before classification can happen, the uploaded content must be stored somewhere that: prevents it from being served to other users before review, preserves it for legal hold even after deletion, supports the complete appeal and reinstatement workflow, and provides an audit trail for platform compliance obligations.

The storage architecture is the foundation of the moderation pipeline. Getting it wrong means either content is exposed before moderation (which creates legal and reputational risk) or approved content is lost during status transitions (which creates product failures). This guide covers the correct design.

The Three-Bucket Moderation Architecture

Bucket 1 — Upload Staging (Private)

Every user upload lands here first. This bucket is entirely private — no content in it is ever served to any other user. The upload service has write access. The moderation pipeline has read access. No other service has any access.

Content remains in the staging bucket until the moderation pipeline makes a final decision. For automated moderation, this may be seconds. For content that requires human review, it may be hours or days.

Object tagging tracks moderation status within this bucket. On upload, the object is tagged moderation-status: pending. The moderation pipeline updates the tag to moderation-status: approved, moderation-status: rejected, or moderation-status: human-review-required.

Bucket 2 — Approved Content (Private Origin, CDN-Served)

Content that has passed moderation — either automatically or after human review — is copied to this bucket. The approved bucket is the CDN origin: presigned URLs or public CDN paths serve content to users from here.

Only content in the approved bucket is ever served to users. The copy from staging to approved is an atomic operation: the content is either in approved (and serveable) or it is not. There is no intermediate state where partially-approved content could be served.

Bucket 3 — Evidence Archive (Private, Legal Hold)

All content that was moderated — whether approved or rejected — is preserved in the evidence archive. This includes rejected content that was never served to any user. The evidence archive supports: legal holds for content involved in law enforcement requests, appeal processing for rejected content that a user disputes, and platform compliance audits demonstrating that prohibited content was moderated correctly.

The evidence archive has no lifecycle expiry by default. Deletion requires explicit action with an audit log entry. For platforms subject to IT Rules 2021 (intermediary guidelines) or sector-specific compliance requirements, the evidence archive is the technical implementation of the record-keeping obligation.

The Upload and Moderation Flow

Step 1 — Client-side validation before upload

Before accepting an upload, validate file type and size on the client. Return an error for obviously invalid uploads (non-image MIME type for a photo upload field, file larger than 50 MB for an avatar upload) without accepting the file at all. This reduces processing load and prevents common abuse patterns.

Step 2 — Presigned POST to staging bucket

The API generates a presigned POST URL for the staging bucket — as described in the media asset architecture guide — and returns it to the client. The client uploads directly to the staging bucket. The API records the pending upload in the database with status: pending_moderation.

Step 3 — Event-driven moderation trigger

The staging bucket emits an object creation event when the upload completes. A moderation worker consumes the event and begins the automated moderation pipeline:

File format validation — confirm the file is valid and not a disguised executable. Virus scan — pass the file through ClamAV or equivalent. Automated content classification — run the image or video through your content moderation model (self-hosted or a third-party classification API).

Step 4 — Automated decision

If the automated pipeline returns a high-confidence clean result, copy the content to the approved bucket and update the database record to status: approved. Generate any derivative assets (thumbnails, variants) from the staging copy and write them to the approved bucket.

If the automated pipeline returns a high-confidence prohibited result, update the database record to status: rejected, copy the file to the evidence archive, and notify the user with the appropriate rejection reason.

If the automated pipeline returns a low-confidence result or the content falls into a category requiring human review, update the database record to status: human_review and add the item to the human review queue.

Step 5 — Human review queue

The human review queue is a database-backed list of items requiring reviewer attention. Reviewers access a review interface that fetches presigned URLs for items in the staging bucket (so reviewers can view content without it being publicly accessible), displays the automated classification results and confidence scores, and provides approve/reject/escalate actions.

Human decisions update the database record and trigger the same copy-to-approved or copy-to-archive flow as automated decisions.

Step 6 — Appeals workflow

When a user appeals a rejection, the appeal record is linked to the evidence archive entry for that content. The appeal reviewer retrieves the original file from the evidence archive (not from staging, which may have been cleaned up), reviews it with any additional context the user provided, and makes a final decision. If the appeal is upheld, the content is copied from the archive to the approved bucket and made accessible.

Legal Hold and Evidence Preservation

Platforms that receive law enforcement or court requests for user-generated content need to produce the requested content reliably. The evidence archive is designed for this.

When a legal hold is placed on a specific user's content, add a legal-hold: true tag to all objects in the evidence archive associated with that user. Lifecycle policies must be configured to skip expiry for objects with a legal hold tag. The hold remains until explicitly released.

For content involved in reported incidents — harassment, threats, CSAM — the evidence is typically preserved for the duration of any investigation plus a defined retention period. This retention is managed through the same object tagging and lifecycle policy mechanism.

Under IT Rules 2021, significant social media intermediaries are required to retain certain categories of content for defined periods even after it has been removed from the platform. The evidence archive with legal hold capability is the technical implementation of this requirement.

IBEE for UGC Moderation Infrastructure

The moderation architecture requires private bucket access for staging and evidence, CDN delivery for approved content, and India-sovereign storage for evidence records. IBEE's S3-compatible API, presigned URL support, and object tagging support all three. For Indian platforms with obligations under IT Rules 2021 or sector-specific regulations, India-sovereign storage for the evidence archive means legal hold requests and law enforcement responses involve Indian entities under Indian legal process — not cross-border data requests to US-incorporated cloud providers.

Building a User-Generated Content Moderation Pipeline on Object Storage