How I Cut AWS Costs by 60% by Fixing Infrastructure Inefficiencies

When I became the sole backend and infra owner at MealPe, the AWS bill was higher than it needed to be. Not because of over-provisioning - but because of accumulated inefficiencies nobody had cleaned up.

I didn’t redesign the entire system. Instead, I analyzed the bill, audited the runtime patterns, and fixed what was actually broken. Here’s exactly how we accomplished a 60% reduction in monthly cloud spend while actually improving LCP and request latencies.

The Audit: Where Was the Money Going?

Before making any changes, I analyzed our AWS Billing console line-by-line. The monthly distribution revealed that compute overhead and log retention were dramatically out of proportion with our active user base:

Service Category	Monthly Cost (Before)	Monthly Cost (After)	% Reduction	Primary Culprit
EC2 Compute	$620	$280	54%	Blocked event loop, static serving CPU overhead
S3 Storage & Transfer	$380	$90	76%	Indefinite log storage, massive image uploads
CloudWatch / I/O	$250	$110	56%	Verbose application request/response dumps
Total	$1,250	$480	61%	Overall Savings: $770/month

1. Fixing Excessive Logging Inefficiencies

Our application log files were growing at an unsustainable rate. The Node.js application was generating verbose logs for every inbound request—including full request and response bodies, heavy SQL trace logs, and absolute stack dumps on trivial input errors.

// ❌ Deprecated verbose pattern that flooded CloudWatch and S3
app.use((req, res, next) => {
  logger.info({
    url: req.url,
    headers: req.headers,
    body: req.body, // Large raw JSON buffers
    timestamp: new Date()
  });
  next();
});

These logs were being written to disk and continuously shipped to Amazon S3.

The Fix

I audited our log configuration and implemented Pino for structured JSON logging. I restricted verbose request/response logging only to non-production environments and defined clear log levels for our production cluster:

DEBUG/INFO: Used only for crucial system boot details and structured API transaction summaries (method, path, status, duration).
WARN/ERROR: Reserved for true exception states, database connection alerts, and 500-level codes.

This change alone decreased log volume from ~15GB/day to under 800MB/day, dramatically dropping both write I/O costs and file shipping overhead.

2. Setting Up Automated S3 Lifecycle Policies

S3 storage is inexpensive until files accumulate over months without rules. We were storing every daily server log, PDF bill copy, and redundant image backup indefinitely. There was no cleanup process in place.

The Fix

I configured strict S3 Lifecycle Rules on our logging and static resource buckets:

Daily Server Logs: Transition to S3 Standard-IA (Infrequent Access) after 14 days, move to S3 Glacier Flexible Retrieval after 30 days, and permanently expire after 90 days.
Temporary File Uploads: Automatically expire from the /tmp/ bucket prefix after 7 days.

Implementing these automated lifecycle states trimmed down our total active S3 storage footprint by over 70% in the first billing cycle.

3. Offloading Static Asset Serving from Node.js to NGINX

This was the most impactful architectural refinement. The Node.js backend process was serving static frontend assets directly. Whenever a client loaded our web app, the request for a Javascript chunk, CSS bundle, SVG logo, or favicon went straight through our Express application.

Because Node.js is single-threaded, serving static file assets blocks the event loop from doing actual API computations, leading to high CPU usage and queuing lag.

The NGINX Solution

I configured NGINX as a reverse proxy in front of Node.js. NGINX is built from the ground up to serve static files with extremely low memory and CPU overhead. By handling static requests at the NGINX layer, we bypass the Node.js event loop completely:

graph TD
    Client[Web Client] -->|Inbound Requests| Nginx{NGINX Reverse Proxy}
    Nginx -->|/static/* CSS, JS, Images| Disk[(Local Static Files)]
    Nginx -->|/api/* REST Endpoints| Node[Node.js / PM2 Cluster]
    Node --> DB[(PostgreSQL Database)]

Configured NGINX Routing Block

# NGINX config: serve static files directly, proxy only API requests
server {
    listen 443 ssl http2;
    server_name app.mealpe.in;

    # Static assets served directly by NGINX - never blocks Node.js loop
    location /static/ {
        root /var/www/mealpe-frontend;
        expires 30d;
        add_header Cache-Control "public, no-transform, immutable";
        access_log off; # Turn off access logging for static assets to reduce disk I/O
    }

    # API requests proxied to the PM2 Node.js server cluster
    location /api/ {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Offloading this work to NGINX decreased our average EC2 instance memory consumption by 35% and CPU usage by 40%. This allowed us to safely downsize our EC2 compute tier, reducing our monthly server cost significantly.

The Verdict

Optimizing infrastructure isn’t always about moving to serverless or re-architecting your entire system. Usually, the highest return on investment comes from cleaning up operational inefficiencies:

Logging: Keep production logs actionable, structured, and short.
Storage: Enforce S3 lifecycle rules from day one.
Separation of Concerns: Let NGINX serve your files, and let Node.js compute your API logic.

How I Cut AWS Costs by 60% by Fixing Infrastructure Inefficiencies

The Audit: Where Was the Money Going?

1. Fixing Excessive Logging Inefficiencies

The Fix

2. Setting Up Automated S3 Lifecycle Policies

The Fix

3. Offloading Static Asset Serving from Node.js to NGINX

The NGINX Solution

Configured NGINX Routing Block

The Verdict

Related Articles

Scaling a Food-Tech SaaS from 1K to 20K Users

Node.js Performance Optimization: From Slow APIs to Scalable Systems