Holiday Slowdowns (& their Upsides!) at the AO3

This is the time of year when fans participate in fic exchanges and drink mulled cider and curse out the Archive of Our Own for being so @#$!%&# slow. Because after a day of fighting holiday crowds or waiting on lines in department stores, nothing bums you out like getting Error 502: Page did not respond in a timely fashion on your favorite comfort fic.

Believe us: we know! We’re fans too, and we use the AO3, and after a long day fighting holiday crowds or coding a new feature, nothing bums us out like getting Error 502 on our favorite comfort fic, either.

We want you to know we’re doing something about it! We’re currently in the process of putting together a new systems architecture — including the imminent purchase of several very shiny new servers thanks to your support — that will boost performance substantially.

We could (and did) make lots of guesses going in to our latest big deploy, which included the Rails 3 upgrade we needed to make before we proceeded with the systems revamp, and we did our best to put in performance improvements both on the way and immediately after the deploy. But there is no substitute in the end for looking at actual results from real-world usage.

We’re now at 11,000 users and 120,000 works: we’ve grown much faster than any of us ever dreamed!

So what exactly is going to happen with this new design?

We currently are running on two servers — a single database server and a single Rails app server.

The new architecture will look something like this:

diagram of server topology
Click image to see larger

Basically, we are moving to a design where we can easily in future plug in additional machines as needed for performance improvements.

We are currently buying the new servers to implement this architecture, but we don’t think it’s a really good idea to implement a major system change like this right now. Implementing this new architecture isn’t trivial, and we plan to do it in a more controlled way and with a little more leeway for downtime once things slow down in the new year. There’s a lot going on at the AO3 — not just Yuletide, but loads of other holiday fic exchanges — and we don’t really want the whole thing to go boom.

But also, in this process, Yuletide, our first and biggest test case, is part of the long-term solution. Having thousands of users all banging on the same parts of the archive all at once and stressing our system to the limits helps us identify where the biggest bottlenecks are, and that in turn tells us what our priorities need to be when we buy our hardware.

That means things are slow now. And we’re very sorry about that. It’s part of why the AO3 is still in beta. We won’t be taking the “beta” sign off until we have reasonable confidence that the system can handle potentially millions of users and stories, without falling down and going boom or having major slowdowns. But we are not there yet!

So we need to ask for your patience. Yuletide and other exchanges — and the resulting intense usage — are tremendously useful in the long run for the AO3 and its designers and sysadmins. It is all part of the beta process — so please understand it is all necessary for us to get the AO3 to where it needs to be!

Mirrored from an original post on the Archive of Our Own, where it is also available in Deutsch and Español.