Automating AWS Cost Reduction: A Startup's Boto3-Powered Resource Cleanup Case Study[1]

December 31, 2025

Automating AWS Cost Reduction: A Startup's Boto3-Powered Resource Cleanup Case Study[1]

Key Takeaways

Unchecked Cloud Waste: "Temporary" development resources like idle EC2 instances and orphaned EBS volumes often become a permanent, silent drain on your budget, with costs sometimes growing faster than revenue.

Automate with Boto3 & Lambda: Manual cleanups are unsustainable. Using Python's Boto3 library and AWS Lambda, you can create a serverless "janitor" to automatically find, tag, and safely delete unused resources on a schedule.

Beyond Savings: This approach not only slashed our dev environment costs by over 30% but also improved our security posture and fostered a culture of cost-awareness and infrastructure hygiene across the team.

I once spoke to a founder who said their startup’s AWS bill grew faster than their revenue. In one month, their bill for "temporary" development resources was enough to hire a junior engineer for a full year. That's not just a budget overrun; that's a silent killer for a bootstrapped company.

It's a story I hear all too often. It’s the reason I became obsessed with automating our way out of cloud cost chaos.

The Problem: How 'Temporary' Resources Became a Permanent Drain

When you're building fast, you're spinning up resources left and right. An EC2 instance for a quick test, an EBS volume for a database experiment, a NAT gateway for a new VPC configuration. The mantra is "move fast and break things," not "move fast and meticulously clean up after yourself."

The Shock of the Month-End Bill

It started small, with a few extra dollars here and there. But then came the bill—the one that makes your stomach drop. Our AWS costs had spiked 40% month-over-month with no corresponding increase in customer usage.

It was all coming from our dev and staging environments. The "temporary" infrastructure had become a permanent, and expensive, part of our architecture.

Identifying the Culprits: Orphaned EBS Volumes, Idle EC2s, and Unattached EIPs

We dug in using AWS Cost Explorer and found the culprits. They were the ghosts of projects past:

Orphaned EBS Volumes: Dozens of them, unattached to any EC2 instance, silently racking up storage costs.
Idle EC2 Instances: t2.micro instances spun up for a hotfix weeks ago and then forgotten, still running.
Unattached Elastic IPs: Free when attached, but AWS charges you for them when they're just sitting in your account.

Why Manual Cleanup Wasn't a Sustainable Solution

Our first reaction was a manual cleanup spree. We spent an afternoon clicking through the AWS console, cross-referencing IDs, and cautiously hitting "terminate." We saved a few hundred dollars, but we knew it wasn't a real solution.

It was tedious, prone to human error, and didn't solve the underlying problem. Our team was still moving too fast to remember to clean up, so we needed a janitor who never slept.

Our Weapon of Choice: Automating Cleanup with Python and Boto3

If a human can click through the console, a script can do it better, faster, and more reliably. We turned to Boto3, the AWS SDK for Python.

Why Boto3? The Power of Scripting Your Infrastructure

Boto3 is one of the most powerful tools in a cloud engineer's arsenal. It turns the entire AWS API into a set of Python functions. Instead of manually searching, I can write a few lines of code to get a definitive list in seconds.

Step 1: Setting Up a Secure IAM Role for Our Cleanup Script

First things first: security. We created a specific IAM role with the minimum required permissions like ec2:DescribeVolumes and ec2:DeleteVolume. This principle of least privilege is non-negotiable.

Step 2: The Logic - How to Define 'Unused' Resources Safely

This was the most critical part. We started with a very conservative definition of "waste" to avoid accidentally deleting something important.

An EBS Volume is 'unused' if: It is in the available state (not attached) AND does not have a specific tag, like "backup": "true".
An EC2 Instance is 'idle' if: Its CPU utilization has been below 2% for the last 7 days AND it's in a non-production environment.

Step 3: The Code - A Walkthrough of Our Boto3 Script for Finding and Tagging Waste

The core of our script was surprisingly simple. For EBS volumes, it followed a basic loop: get all volumes, check if a volume's state is available, and if so, check for a "do-not-delete" tag.

Initially, the script just tagged resources for deletion with a "cleanup-candidate" tag. This gave us a chance to review its choices before giving it the power to actually delete anything.

Putting it into Production: From Cron Job to Serverless Lambda

A script on a laptop is an experiment. A Lambda function running on a schedule is a production tool.

Scheduling the Automation for Daily Sweeps

We packaged our Python script into a Lambda function and used Amazon EventBridge to trigger it every night at 2 AM. This serverless approach meant we weren't paying for an idle server to run a cron job. It was the epitome of cost-effective automation.

Implementing a 'Grace Period' with Tags to Prevent Accidents

We built in a safety net. The first time the script finds an unused resource, it doesn't delete it. Instead, it applies a tag: cleanup-candidate-date: YYYY-MM-DD. The script only deletes a resource if that tag is more than 7 days old, giving the team a grace period to intervene.

Adding Slack Notifications for Full Transparency

To close the loop, we integrated a webhook to post a daily summary to our team's Slack channel. This created visibility and built trust in the automation. "AWS Janitor Report: Tagged 5 EBS volumes and 2 EIPs for cleanup. Deleted 3 resources that passed their grace period. Total estimated savings: $45/month."

The Results: Measurable Savings and Peace of Mind

The impact was immediate and dramatic.

By the Numbers: Charting Our 30% Reduction in Dev Environment Costs

Within the first month, our development and staging environment costs dropped by over 30%. This wasn't from complex architectural changes or expensive Savings Plans. This was purely from eliminating waste.

We improved our Cost Efficiency by tackling idle resources head-on. The industry median Effective Savings Rate (ESR) from commitments is meaningless if you're paying for resources you aren't even using.

Beyond Cost: Fostering a Culture of Infrastructure Hygiene

The daily Slack notifications had a fascinating side effect. It subtly fostered a culture of ownership and infrastructure hygiene. Nobody wanted their "temporary" test instance showing up on the cleanup list for a week straight.

The Unexpected Benefit: Improved Security Posture

Fewer running, unmonitored resources means a smaller attack surface. By cleaning up old instances that weren't being patched, our automated janitor inadvertently became part of our security team.

Conclusion: How You Can Implement Your Own AWS Janitor

This journey from bill shock to automated control was transformative. We didn't just save money; we built a smarter, more efficient way to operate.

Key Lessons Learned on Our Journey

Start with visibility, not deletion. Tag resources first to let your team get comfortable with the script's logic.
Automate safely. A grace period and notifications are essential for building trust in the automation.
Serverless is your friend. Lambda and EventBridge are the perfect, low-cost tools for this kind of scheduled task.

A Call to Action: Start with One Resource Type and Expand

Don't try to boil the ocean. Start with the easiest, most obvious source of waste in your account, which for most is unattached EBS volumes. Get that working, prove the value, and then expand to other resources.

Link to our GitHub Repo with the full script

To help you get started, I’ve cleaned up our scripts and posted them to a public GitHub repository. You can find the full code, along with deployment instructions, here: [Link to Your GitHub Repo Here]

Search This Blog

The Think Drop

Automating AWS Cost Reduction: A Startup's Boto3-Powered Resource Cleanup Case Study[1]

Key Takeaways

The Problem: How 'Temporary' Resources Became a Permanent Drain

The Shock of the Month-End Bill

Identifying the Culprits: Orphaned EBS Volumes, Idle EC2s, and Unattached EIPs

Why Manual Cleanup Wasn't a Sustainable Solution

Our Weapon of Choice: Automating Cleanup with Python and Boto3

Why Boto3? The Power of Scripting Your Infrastructure

Step 1: Setting Up a Secure IAM Role for Our Cleanup Script

Step 2: The Logic - How to Define 'Unused' Resources Safely

Step 3: The Code - A Walkthrough of Our Boto3 Script for Finding and Tagging Waste

Putting it into Production: From Cron Job to Serverless Lambda

Scheduling the Automation for Daily Sweeps

Implementing a 'Grace Period' with Tags to Prevent Accidents

Adding Slack Notifications for Full Transparency

The Results: Measurable Savings and Peace of Mind

By the Numbers: Charting Our 30% Reduction in Dev Environment Costs

Beyond Cost: Fostering a Culture of Infrastructure Hygiene

The Unexpected Benefit: Improved Security Posture

Conclusion: How You Can Implement Your Own AWS Janitor

Key Lessons Learned on Our Journey

A Call to Action: Start with One Resource Type and Expand

Link to our GitHub Repo with the full script

Recommended Watch

Comments

Post a Comment

Popular Posts

The Walrus Operator Wars: Why Python's Assignment Expression Divided the Community and Nearly Cost Guido van Rossum His Role