Let’s start with caching. So many levels of caching.
Before we even get to your website, there’s the DNS. What IP does your domain point to? Usually your OS will cache this DNS lookup for you, failing that your DNS server (e.g. your ISP, 8.8.8.8 or 1.1.1.1) will. This isn’t usually something you need to worry about as a web-developer.
Next, the low-hanging fruit: CSS and JS. This is easy to cache. The content is static (rarely changes). We can easily set some HTTP headers via nginx or apache to cache these for a long time. But about when the content does change? Can we afford to wait a week or month before our customers’ browsers pick up the new copy? Can we ask them to press Ctrl+F5? This isn’t a good solution. The easiest and most robust way to handle this is to change the actual filename of the CSS and JS files. If the filename changes, the browser will be forced to download a new copy. Webpack can do this for you, but hooking up the generated filenames into your server-side templating language is a little bit trickier. Generally what it boils down to is using some kind of “webpack stats” plugin to spit out a JSON file listing all of your assets, then have you language of choice read this file and generate the necessary HTML. Not too bad.
Okay, but what about the HTML? Can we cache that too? That gets even harder. Usually the HTML contains data — information that changes frequently. Even if we tried to cache that, how would we cache-bust? Unlike CSS and JS, we can’t change the URL. What if we removed all of the data from the page, and just served templates? The page could run an AJAX request to pull in the data. Better yet, we can use a service worker so that our page loads, even when offline! We can check for new resources in the background and either load them when the user comes back, or encourage them to refresh at their convenience.
What about the data though? First we have to wait for the initial page to load before we can even send out our AJAX request? That’s not cool. Maybe HTTP/2 push can help here? Is it possible to cache the data? We can use IndexDB to store a local copy, and then sync any updates back to the server with Background Sync. But what about sensitive or shared data? We wouldn’t want to send all our data to the client. Even if we managed to send only the stuff they have access to, what would we do if their access is revoked? Too bad — all they have to do is turn-off wifi and they can keep using the app and accessing all the data thanks to PWA.
That’s about all the client-side caching we can do. What about the server though? Where does this data live? In a MySQL server? How long do those queries take? Can those be cached in memory? What if two users are requesting the exact same piece of information at the exact same time? Should we run the query twice? Can they be batched? What if one user is making many requests? Can we combine some of those HTTP requests? How do we know when the data becomes stale? Can we track mutations?
That’s it for caching. Now we have the perfect server setup, we’ve thought of everything, we have an infallible cache-busting strategy, our payloads are under 14KB, but our server just crashed. Oh-oh! I hope you have automatic fail-over. Serving static content from another server isn’t too hard, but data is kind of tricky. Keeping all our database servers in sync in real-time is hard enough, but database writes are even harder.
Oh, there’s also “critical path CSS”, “14KB packets”, optimizing JS for performance and parse time (size isn’t the only thing that’s important). We can put our JS into webworker threads to avoid blocking the main UI thread. Or we can rewrite all our JS in C and compile it to WebAssembly — but don’t do too much interop or you’ll negate the benefit (passing data back and forth isn’t free). What about when your web-app loses focus? Can we scale back some of the renders or network requests to play nice?
These are just some of the things off the top of my head that we as web-developers have to worry about when building a performant website today in 2018.