For engineering leads and DevOps engineers responsible for cloud storage architecture and cost management at Indian businesses.
What Rightsizing Means for Storage
Rightsizing is a term most often applied to compute — shrinking an oversized VM to a size that matches actual CPU and memory utilisation. For storage, rightsizing means something different: ensuring that the data you are storing is on the storage tier and at the retention level that matches its actual access pattern and business value.
Unlike compute, object storage does not have tiers you can "undersize" to save cost — you are charged per byte regardless of whether data is accessed frequently or never. The rightsizing opportunity is: stop paying standard storage rates for data that should be archived, and stop paying any rate at all for data that should have been deleted.
The Four Access Pattern Categories
Data in a production system typically falls into four categories. Most storage architectures treat all data as category one, paying premium rates for data that belongs in categories two, three, or four.
Category 1 — Hot data: accessed frequently, often daily. User-generated content actively being used by users, application assets served in real-time, data feeds consumed by running processes. This data belongs in standard active storage.
Category 2 — Warm data: accessed occasionally, a few times per month. Reports generated last quarter, user uploads from 90 days ago, log data from the last 60 days. This data is worth retaining but is not accessed frequently enough to justify active storage rates on every byte.
Category 3 — Cold data: accessed rarely, perhaps once or twice per year. Compliance records, historical archives, old backups, regulatory filing storage. This data must be retained but almost never needs to be retrieved quickly.
Category 4 — Dead data: no business purpose, but nobody deleted it. Orphaned uploads from deprecated features, test data from old experiments, duplicate copies created by misconfigured pipelines. This data should not exist.
Step 1 — Classify Your Existing Data
Before rightsizing, you need to know what you have. Run a storage inventory across your buckets:
Look at the distribution of LastModified timestamps. Objects that have not been modified in over a year are candidates for warm or cold classification. Objects that have not been modified in over three years with no known access requirement are candidates for deletion.
For access patterns rather than modification dates, enable S3 server access logging and analyse request frequency per object key prefix. The access log tells you which prefixes are generating read requests and which have not been accessed in months.
Step 2 — Identify and Delete Dead Data
Dead data is the highest-value rightsizing opportunity because it reduces costs to zero for the deleted data rather than reducing it to archive rates.
Common sources of dead data in production buckets:
Orphaned multipart uploads — use a lifecycle rule AbortIncompleteMultipartUpload: DaysAfterInitiation: 7 to clean these up automatically.
Duplicate processed outputs — data pipelines that ran with bugs sometimes produce multiple versions of the same processed file. Check for objects with similar key patterns and verify whether duplicates are intentional.
Development and test data in production buckets — test uploads from load tests, development fixtures, sample files from QA. These should never have been written to the production bucket, but often are.
Feature tombstones — files associated with features that have been removed from the product. The feature is gone but the storage remains.
For each candidate, verify with the owning team that deletion is safe. Then delete.
Step 3 — Implement Data Classification via Naming Conventions
Rightsizing at scale requires knowing the access pattern of data before it is written. This is achieved through bucket naming and key prefix conventions that encode the data class at write time.
Rather than writing all data to a single company-data bucket, structure storage by access class:
company-data-active/ — standard storage, no lifecycle transitions, short or no expiry for transient data.
company-data-warm/ — standard storage with lifecycle transitions to archive tiers after 90 days on providers that support them.
company-data-archive/ — data written with the intent of long-term retention, accessed rarely.
company-data-tmp/ — temporary processing data with a lifecycle expiry of 7–30 days.
When a pipeline writes data, it writes to the bucket that matches the data's intended lifecycle — not to a generic dump bucket.
Step 4 — Measure Storage Growth by Category
After classifying your data, instrument the storage layer to track growth by category. A weekly CloudWatch metric (or equivalent on IBEE) that shows bytes in each bucket gives you visibility into which categories are growing and at what rate.
Rising growth in the tmp bucket when it has a 7-day lifecycle policy is a signal that a pipeline is writing more than expected. Rising growth in archive at a predictable rate is normal and expected. Rapid growth in active for a category of data that should not be growing quickly warrants investigation.
Rightsizing and IBEE's Flat Pricing
One of the operational advantages of IBEE's pricing model is that it does not have retrieval-penalised archive tiers. On AWS, moving data to Glacier saves on storage but introduces retrieval fees and minimum storage duration penalties that create rightsizing complexity — you must accurately predict access frequency before archiving, because a mis-classified object accessed more frequently than expected pays retrieval fees that exceed the storage savings.
On IBEE, all data is in the same flat-priced storage tier at Rs.1.50/GB-month with no retrieval surcharges. The rightsizing exercise on IBEE is simpler: classify what should be retained versus deleted, and delete what does not need to be kept. The cost model does not punish you for accessing archived data when you need it.
This simplicity means rightsizing on IBEE is a deletion discipline rather than a tier management discipline — which is both simpler to reason about and safer to implement.

