When I became the sole backend and infra owner at MealPe, the AWS bill was higher than it needed to be. Not because of over-provisioning - but because of accumulated inefficiencies nobody had cleaned up.
I didn’t redesign the entire system. Instead, I analyzed the bill, audited the runtime patterns, and fixed what was actually broken. Here’s exactly how we accomplished a 60% reduction in monthly cloud spend while actually improving LCP and request latencies.
The Audit: Where Was the Money Going?
Before making any changes, I analyzed our AWS Billing console line-by-line. The monthly distribution revealed that compute overhead and log retention were dramatically out of proportion with our active user base:
| Service Category | Monthly Cost (Before) | Monthly Cost (After) | % Reduction | Primary Culprit |
|---|---|---|---|---|
| EC2 Compute | $620 | $280 | 54% | Blocked event loop, static serving CPU overhead |
| S3 Storage & Transfer | $380 | $90 | 76% | Indefinite log storage, massive image uploads |
| CloudWatch / I/O | $250 | $110 | 56% | Verbose application request/response dumps |
| Total | $1,250 | $480 | 61% | Overall Savings: $770/month |
1. Fixing Excessive Logging Inefficiencies
Our application log files were growing at an unsustainable rate. The Node.js application was generating verbose logs for every inbound request—including full request and response bodies, heavy SQL trace logs, and absolute stack dumps on trivial input errors.
// ❌ Deprecated verbose pattern that flooded CloudWatch and S3
app.use((req, res, next) => {
logger.info({
url: req.url,
headers: req.headers,
body: req.body, // Large raw JSON buffers
timestamp: new Date()
});
next();
});
These logs were being written to disk and continuously shipped to Amazon S3.
The Fix
I audited our log configuration and implemented Pino for structured JSON logging. I restricted verbose request/response logging only to non-production environments and defined clear log levels for our production cluster:
- DEBUG/INFO: Used only for crucial system boot details and structured API transaction summaries (method, path, status, duration).
- WARN/ERROR: Reserved for true exception states, database connection alerts, and 500-level codes.
This change alone decreased log volume from ~15GB/day to under 800MB/day, dramatically dropping both write I/O costs and file shipping overhead.
2. Setting Up Automated S3 Lifecycle Policies
S3 storage is inexpensive until files accumulate over months without rules. We were storing every daily server log, PDF bill copy, and redundant image backup indefinitely. There was no cleanup process in place.
The Fix
I configured strict S3 Lifecycle Rules on our logging and static resource buckets:
- Daily Server Logs: Transition to S3 Standard-IA (Infrequent Access) after 14 days, move to S3 Glacier Flexible Retrieval after 30 days, and permanently expire after 90 days.
- Temporary File Uploads: Automatically expire from the
/tmp/bucket prefix after 7 days.
Implementing these automated lifecycle states trimmed down our total active S3 storage footprint by over 70% in the first billing cycle.
3. Offloading Static Asset Serving from Node.js to NGINX
This was the most impactful architectural refinement. The Node.js backend process was serving static frontend assets directly. Whenever a client loaded our web app, the request for a Javascript chunk, CSS bundle, SVG logo, or favicon went straight through our Express application.
Because Node.js is single-threaded, serving static file assets blocks the event loop from doing actual API computations, leading to high CPU usage and queuing lag.
The NGINX Solution
I configured NGINX as a reverse proxy in front of Node.js. NGINX is built from the ground up to serve static files with extremely low memory and CPU overhead. By handling static requests at the NGINX layer, we bypass the Node.js event loop completely:
graph TD
Client[Web Client] -->|Inbound Requests| Nginx{NGINX Reverse Proxy}
Nginx -->|/static/* CSS, JS, Images| Disk[(Local Static Files)]
Nginx -->|/api/* REST Endpoints| Node[Node.js / PM2 Cluster]
Node --> DB[(PostgreSQL Database)]
Configured NGINX Routing Block
# NGINX config: serve static files directly, proxy only API requests
server {
listen 443 ssl http2;
server_name app.mealpe.in;
# Static assets served directly by NGINX - never blocks Node.js loop
location /static/ {
root /var/www/mealpe-frontend;
expires 30d;
add_header Cache-Control "public, no-transform, immutable";
access_log off; # Turn off access logging for static assets to reduce disk I/O
}
# API requests proxied to the PM2 Node.js server cluster
location /api/ {
proxy_pass http://127.0.0.1:3000;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Offloading this work to NGINX decreased our average EC2 instance memory consumption by 35% and CPU usage by 40%. This allowed us to safely downsize our EC2 compute tier, reducing our monthly server cost significantly.
The Verdict
Optimizing infrastructure isn’t always about moving to serverless or re-architecting your entire system. Usually, the highest return on investment comes from cleaning up operational inefficiencies:
- Logging: Keep production logs actionable, structured, and short.
- Storage: Enforce S3 lifecycle rules from day one.
- Separation of Concerns: Let NGINX serve your files, and let Node.js compute your API logic.