Day 6 Recap

21st August 2025

Overview

Day 6 launch on FDF saw trading enabled for 10 new players in the market.

And it was insane.

It was great to see so many people enjoy themselves, the chats, the trading, the frenzy of action, but whilst many had fun, a large number of you struggled to participate due to unavailability of our services.

We hear you. We felt it too.

This post is both a celebration of what we achieved together and a transparent look at why things went wrong, and how we’re making sure Day 7 is smoother.

The Numbers

Firstly, let’s just take a moment to appreciate the sheer scale of Day 6:

Buys: $702,925.18
Sells: $381,685.98
Total Volume: $1,084,611.15
Transactions: 11,080
Requests served in 24h: 16.27M
Requests served between 17:30–20:00 alone: 5.09M

That’s staggering growth in just 24 hours.

It feels strange to write a postmortem when the numbers are this strong, but as you know, being early comes with both upside and it’s personally frustrating that a large number of you had issues.

The lead up

Seeing the massive increase in user numbers leading up to the launch and deposits into users wallets growing, the team had been proactively increasing resource capacity. In every previous launch we had seem similar traffic spikes. Whilst performance had generally been good across the system we were still observing weird behaviour with the containers during the initial spike in api traffic.

Our containers would have wide spread liveness probe checks. This health check ensures that a container is healthy by our orchestration system. When containers are considered unhealthy, they get restarted.

The combination of user growth + this issue we were seeing when under heavy load lead to our decision to do two things for launch day 7.

Increase api workload capacity by 20%
Increase the health check timeout seconds to 3s.

What Happened

Plain and simple: our systems weren’t keeping up with your demand.

Every pod in our API layer was being restarted constantly. From your perspective, that looked like slow responses, timeouts, and failed trades.

17:00 - launch is delayed. We were observing an increase in container restarts and massive load on database
17:12 - Announcement is made to push back launch by 30 mins
17:20 - Api health resumes. Team was unable to explain the huge spike in database re-use. We speculated that not showing the new players early in transfers list was resulting in people hitting refresh a lot, creating more and more demand on DB. Database load comes down and traffic spike levels out.
17:30 - Players go live and we see massive spike in demand. Containers begin restarting constantly
18:00 - Team identifies potential issue with readiness check.
18:05 - Team applies patch to fdf-api deployment and starts roll out
18:07 - Rollout of new api config is deployed across all replicas. Restart rates drop to 0

Why? (simplified)

Because of a misconfigured health check.

Kubernetes (our orchestration layer) checks each API server’s health to see if it’s alive.
We left the default 1-second timeout in place. That meant if your API call took just over 1s (for example because of a busy database or network blip), Kubernetes assumed the whole container was dead.
As load spiked, some responses slipped past the 1s mark → probes failed → Kubernetes marked pods unhealthy → pods restarted → traffic shifted to fewer pods → feedback loop of death 💀

We were seeing a large number of trades go through the system during the period where service was unavailable, but it was a case of pot luck. If you’re requests landed on pods before too much traffic hit the pod and was marked unhealthy, your trades would make it through.

The Fix

Extended health check timeouts from 1s → 5s, and adjusted failure thresholds so pods tolerate small hiccups without being killed.

This change instantly resulted in a much more stable system under load. I believe that this issue had been present across all launches and is the reason we’d been seeing the spike in contain restart rates under load. The changes we made in preparation for more traffic magnified the problem.

Looking Ahead to Day 7

Day 6 showed us two things:

The community’s energy is real. Over $1M in volume in one day is no small feat.
We need to harden the platform faster. Scaling issues aren’t glamorous, but fixing them now means a smoother experience for everyone going forward.

Remember, we’re playing with live ammo here in the second week of launching the biggest app on Base.

Day 7 is coming, and it will be better.

See you on the pitch.

— The FDF Team

Last updated 1 month ago