We’ve made a slight change to how TrueAchievement results are stored in the database
Pre-priming: Maintaining real deliverables as quickly as possible is a challenge
TrueAchievements saw a year-over-year increase in registrations in December when we launched My Year on Xbox, and of course with every registration comes more traffic as well as more gameplay data to process!
TrueAchievements is different from the vast majority of gaming sites, in that we offer customized versions of many of our pages that display a user’s progress in that game, achievement or walkthrough, plus of course a user’s profile page that collects not only their data, but also that of all their friends. This means that we cannot “cache” these pages in our CDN, and they must be generated quickly every time someone (or a web crawler or bot) visits a page.
To add to the complications, because of my initially “simple idea” that the TA score should reflect the rarity of the achievement in each game, we have to recalculate the scores of achievements, games, and players almost every day.
On top of that, we have probably the most complex Xbox leaderboard systems on the planet, as you can find your scores on up to 17,000 different leaderboards, most of which need to be pre-created each day to make them quick enough to display on the site.
So, we have a huge amount of traffic (about a million page views per day from humans alone) viewing a lot of pages that we can’t cache, along with massive amounts of number crunching that is done against the data being viewed on those pages (we currently have 3,287,979,385 achievements tracked across 232,541,069 games in the database).
This resulted in significant site slowdown during December, and you may have noticed pages timing out, scans taking a long time, or just general sluggishness while browsing the site.
Of course, these challenges aren’t new – I’ve pretty much rewritten or refactored the code since the day it launched. A database designed for 5,000 users doesn’t work well when there are 1.2 million users. It’s also a lot of fun work as you can actually measure your changes and instantly see how much impact they have had. It gives me a sense of tremendous well-being, and then I feel happy for the rest of the day (Parklife!).
Performance changes we’ve made since December
In order to get things running smoothly again, I’ve made a lot of changes since mid-December:
The TrueAchievement score is now an integer in the database
This is probably the simplest change from a backend perspective, but it’s probably the one you’ll notice first. TA achievement scores are always stored to 4 decimal places, but then rounded before being displayed everywhere on the website. So the achievement may appear as 17 TA score, but in the database it is stored as 17.3862. Now it is stored as 17.
Although you won’t notice this change at the achievement level, you may discover your TA total versus a game drop, as each portion of your TA is removed from your total score for that game. We’re processing the games over the next week or so, and during that time you may see a difference in the max TA for a game and your personal TA in that game even if you’ve completed it – all of this should be sorted by the weekend as we go through the games and process them.
There are a large number of benefits that come from this change:
- It is no longer confusing for users. One of the first questions we get asked is “How did I unlock an achievement worth 16 TA and an achievement worth 7 TA but my score is 24 in the game?”. This was due to rounding on the back end, but having to constantly explain that is a pain
- Storage space has been reduced – integer storage is about a third of the decimal size we were previously using, and we typically have 3 sets of TA scores (due to DLC settings) for each game, player game, leaderboard, and contest And The player, as well as the achievement
- Processing is faster when we add up all the results each time someone is screened
- We no longer have to update all players’ logs for the game when completion scores change by less than 1 – this is probably the biggest performance gain from this tweak
- We no longer have to round values everywhere on the website
Recalculations if each game takes place over the coming days, during which time you may notice discrepancies between your game score and your score if you complete it. These results will return to normal during this week.
Some site leaderboards now have minimum Gamerscore requirements
Daily leaderboard creation for the site has been increasing over time, reaching approximately 3 hours each day, and 5 hours per day creating weekly boards. The site slows down significantly while creating these boards, so I’ve been looking for different ways to improve this process.
The first thing I did was only include players on genre/platform leaderboards that had a Gamerscore of at least 20,000. Previously, we included about 350,000 players who were below this threshold. That’s been a huge amount of processing for players who, given their low Gamerscore score, probably don’t care that they’re ranked 207,976th on the First Person Shooter leaderboard on Xbox One. If anything, they’d probably prefer not to know about it at all 🙂 And if they want to be listed on the leaderboards again, it’s very quick and easy to get over 20k GS these days.
Every registered player is still listed on all leaderboards on the main site.
I’ve also rewritten a lot of the leaderboard build and build calls to make the summary tables smaller and faster, and made various tweaks to how leaderboard generation is distributed across server cores.
After all these tweaks, daily leaderboards are now generated in under 45 minutes, up from over 3 hours in December.
Player blogs are now cached
We have some very popular TA bloggers, and our bloggers often post huge lists of links to TA pages on their blogs. When viewing these blogs, we will analyze these links and then check the viewer’s progress on any games that are considered achievements before displaying them. This added huge numbers of database calls (some blogs had around 1000 links!) in order to get a single blog to show up. So we decided to cache these blogs and not show viewer progress anymore. This is a small loss of functionality, but it effectively protects the site from DDOS attacks when a popular blog post contains hundreds of links.
Some paintings are now only available to view when logged in
Our traffic from robots and scrapers has increased dramatically over the past few years. According to our hosting company Cloudflare, In the last 24 hours we have received around 3.5 million requests for TrueAchievements from verified or suspected bots.. We block some of the most obviously harmful ones, but some are actually useful for us to let through (search engines, Discord preview cards, Twitter, etc.). However, these helpful bots don’t need to see very complex panels (like your friends’ feed), so we’ve set some of these very heavy panels to only be viewable if you’re logged in to the site. This means that bots can still read pages but do not put as much pressure on the servers.
There may be more modifications in the future
The TGN development team dedicated the entire month of January to performance work to try to speed up the sites as much as possible. The vast majority of this work will not lead to any noticeable changes from the user’s perspective, other than a hoped-for increase in speed. If we make any additional tweaks to the functionality, you’ll be able to read about it first in the TA Discord server, so please join in if you haven’t already and want to stay up to date with our latest performance work.
a happy new year!