Like millions of others across America, we were on the edge or our seats hitting refresh as the election results trickled in Tuesday night (and Wednesday. And Thursday…). But, as we hit that button over and over again, how many of us thought about the load this puts on the servers of all of these news organizations? Luckily for us, one of our clients happens to be in this space, and we have some thoughts about best practices related to handling a surge in traffic. Over the years, we've received many questions from clients about how to prepare their servers for expected (and unexpected) high traffic scenarios. Sometimes it may be related to a product announcement or similar, and other times just for events they may be covering. Either way, a surge in traffic is something that many clients eventually deal with, so we wanted to share a mini case study for how we helped ensure that our client's server infrastructure was ready for Election Day 2020.
When we first started working with this client more than a decade ago, they were a video news web startup. Since then, as the internet news landscape has evolved, they added mobile apps and smart TV apps on all major platforms; most recently, they became a 24/7 cable TV news network. As their distribution channels have become more complex, so has the server infrastructure that powers it behind the scenes.
Their technology stack now includes multiple unique server clusters, including those powering its website, content management platforms, APIs used to serve up content, and load TV apps. All public-facing servers sit behind load balancers, which automatically distributes traffic across the cluster without stressing any individual server. Just as importantly, if one server fails for whatever reason, others can pick up the slack.
While they may be an extreme example of a media-heavy site, it does serve as a useful illustration of how you can serve images, videos, and other files from a third-party service and spare your own server from bearing the load. In this case, we can store static media files on Amazon's S3 service and serve streaming video from a dedicated service as well; these services are (practically) infinitely scalable and can easily handle any spikes. That way, we can avoid high server loads and bandwidth usage and, as a bonus, can improve page load speeds.
Overall, the idea of distributing your infrastructure is key to making a scalable environment. In this instance, we relied on a combination of cloud-based services and more traditional methods. Though there are ways to utilize new services to make this even more manageable, and as these services evolve, we consider how they may fit into the infrastructure as it exists today because continuous improvement applies not just to your website or application but also to the guts of the system underneath.
Proxying and Caching
Using third-party services to deliver static files is only one way to prevent high traffic from increasing server loads. Proxy services like Cloudflare and Cloudfront can route all of your traffic through their servers; this offers a few crucial advantages:
- They can cache copies of data, HTML, CSS, and other web outputs on their own servers. The next visitor can see their cached copy, and your server no longer needs to expend resources serving these requests.
- Since the user no longer needs to wait for your server to respond, they can see the site load even faster.
- Since the proxy sits between the user and your server, your servers' location is hidden from bad actors who can no longer attack the server itself directly. Furthermore, the proxy service can absorb suspicious or malicious traffic that might otherwise compromise your server (like a brute force attack) or take it down (like a DDOS attack).
- Many proxy services also provide included advanced firewall services that can further filter out and block malicious behavior without any danger of it reaching your server.
These services have evolved over the years and are now a key component of a successful, scalable infrastructure plan.
24/7 Monitoring and Response
Sometimes, despite all the best planning, the unexpected happens. Traffic might be higher than expected, or a particular server function consumes more resources than usual. Our 24/7 monitoring service allows us to detect when a server goes down or even when it is trending in that direction and respond immediately. Our system administrators are prepared to respond within 15 minutes on weekends, holidays, and even in the middle of the night. They immediately reboot servers, block traffic, or even upgrade or replace servers to get you back up and running.
The monitoring also serves to familiarize you with your servers' baseline performance and provide early warnings when server resource usage trends reach capacity. This information is critical in helping plan for expected traffic spikes and determining what changes (for example, upgrading to more powerful servers temporarily) can help ensure you meet the surge without downtime or performance degradation.
The point of monitoring and response: having excellent infrastructure is essential, but you have to keep a watchful eye on its performance and make sure you have an escalation plan during surges just in case something does go awry.
The answer to how one prepares for surge traffic is complicated in some cases, and as you can see, it involves multiple variables. The best practice for handling a surge is making the right decisions earlier in the process, especially during initial development. This is where planning and architecture before a development project ensues is essential. Either way, even if you have a less-than-stellar system in place, there are always options available for you to handle large-scale traffic requirements. Even a hectic election day!