Optimize Cloud Run configuration for my portfolio website

😕 What's wrong?

After migrating my portfolio website and being satisfied with the performance of my previous blog (github or portfolio), I started searching for other things I could optimize from the infrastructure side. This time my goal is to reduce the infrastructure cost as much as possible. For control data, we'll use the cloud run cost in the past 10 days before the web framework migration (11 August 2024 - 21 August 2024) with resource configuration CPU Limit, CPU Request, and Memory Limit+Request set to 0.1m, 0.08m, and 128Mi. I mentioned it in my previous blog, but that resource configuration is the lowest Cloud Run allowed.

Cloud Run Cost in 10 days pre-migration

Based on the image above, running a container for 10 days with the lowest resource configuration cost me $1. In a year it'll cost me around $36 per container or IDR 565.000 (USD 1 = IDR 15.000). With the additional cost of domain renewal that cost me IDR 180.000 per year the total cost to run my portfolio website is IDR 745.000/year or IDR 62.083/month. That's about the same cost for a Basic tier Netflix subscription!

Netflix Subscription Price in Indonesia

So in the spirit of saving infrastructure costs for better usage (not Netflix of course), let's start our journey to optimize the Cloud Run cost.

🔧 Optimization

Let's start with the details of the Cloud Run cost data before the migration. Drill down into the detail, we can see the top cost contributor of Cloud Run cost come from:

Cloud Run Cost by SKU

Looks like with the default Cloud Run configuration, I set the Minimum Idle Container active and keep my Cloud Run billing running. Upon further checking of Cloud Run terragrunt module that I use, the default Minimum Idle Container was set to 1. Let's change it to 0 and run the Postman performance test.

	// pre-migration
	max_scale_instances = 2
	min_scale_instances = 1
	
	// post-migration
	max_scale_instances = 2
	min_scale_instances = 0

I ran the postman collection Run in my workstation and iterated over 1000 endpoint calls. We'll use the average response time from all endpoints from this Run.

From the Cloud Run metrics dashboard, I watched these metrics:

🚄 Result

Here is the test result (P for postman, CR for cloud run)

Metric Control Test Performance Difference Test Infra Difference Difference between test
(P) Average Response Time 93ms 50ms -46.2% 41ms -55.9% -9.7%
(CR) Request Count 11.32rps 18.1rps +59.9% 19.62rps +73.3% +13.4%
(CR) Request Latency p99 141.46ms 9.97ms -93.0% 9.9ms -93.0% 0%
(CR) Request Latency p90 55.35ms 9.57ms -82.7% 9.5ms -82.8% -0.1%
(CR) Request Latency p50 5.38ms 5.04ms -6.3% 5ms -7.1% -0.8%
(CR) Max Billable Container Instance Time 1.001s/s 1.001s/s 0% 0.9s/s -10.1% -10.1%
(CR) Container Startup Latency - - - 1.2901s +1.2901s +1.2901s

Above I added the Test Performance column and its Difference from the previous blog (github or portfolio) so we can compare it with the pre-migration portfolio website and post-migration portfolio website before Cloud Run optimization. From the postman Average Response Time and Cloud Run Request Count we get the better value after the optimization. There are not much difference on the latencies side but since the idle container was removed, it added latency called Container Startup Latency that took 1.2901s. This latency only shows up in the first request during the postman Run as we can see in the image below.

Postman result

Now let's talk about the Max Billable Container Instance Time metric. From the optimization, we can reduce this metric by 10.1% or theoretically reduce the Cloud Run monthly cost from $3 to $2.7 if the container always runs at peak. But that is not the daily case, instead most of the time the container will be downscaled to 0 as there is not much traffic to my portfolio website. So we can compare the trend of the Max Billable Container Instance Time metric pre-migration and post-optimization. In pre-migration, at least one container is always idle to serve the incoming traffic. However, in the post-optimization configuration, I don't have any idle containers hence my cost when there is no traffic is $0. Since I'm willing to sacrifice availability for cost reduction, this is a big win!

Billable Container Instance Time Comparison

Looking back to the Cloud Run cost dashboard a week after optimization, I don't see any Cloud Run cost any more in the dashboard. With this cost optimization effort, I can reduce my Cloud Run potential cost from $36 per container or IDR 565.000 to $0 or $IDR. With a big "IF" of course because I might change my mind in the future.

Cloud Run Cost in 10 days post-optimization

✅ Conclusion

Thank you so much for reading my last blog on portfolio web framework migration. I have so much fun learning new things and writing all the blogs. I'm not sure what the next thing I'll write about is. I have a plan to explore several features like SSO, OAuth, Feature Flag, and Observability with OpenTelemetry. Please wait for the next blog and as always thank you. Cheers!