Database Sharding Blog
Monday, January 14, 2008
Avoiding On-Line Application Scalability Disasters
Government failures in information technology are famous. This is somewhat unfair because government failures are probably no more common than in commercial enterprises, they are just higher profile. For example, the UK's Public Records Office published the 1901 census on-line, the database was not able to handle the workload, which was more queries in an hour than were expected in a day. The resulting problem took 10 months to fix because they had not implemented a database architecture that allowed them to increase the capacity for read volumes (this would be simple with database sharding, for example, because the data could simply be sharded into smaller databases on separate servers).
Even companies that are famous for the scalability of their infrastructure, such as Amazon, have had scalability failures. For example, in 2003, a sudden traffic increase at Amazon.co.uk due to incorrect pricing resulted in the entire site being taken down.
The online betting site sportingindex.com was not designed to cope with an increase in customer numbers and service was offline for an entire day just before one of the biggest global betting sports events – the 2002 England versus Brazil World Cup game. The result was not just loss of revenue, but also loss of customers to other betting sites.
For a company that only operates online, the inability to scale is nothing short of business critical. For some companies, it is not just a temporary setback, scalability can make a strategic difference that changes the fate of a company: A business executive at Friendster blames the lack of scalability as a key reason for losing its early market lead:
"we had millions of Friendster members begging us to get the site working faster so they could log in and spend hours social networking with their friends. I remember coming in to the office for months reading thousands of customer service emails telling us that if we didn’t get our site working better soon, they’d be 'forced to join' a new social networking site that had just launched called MySpace…the rest is history."
So what can engineers do to avoid on-line scalability disasters?
The most common mistake is not designing on-line applications scalability and performance – because problems are rarely anticipated. It helps a lot of there is a clear understanding between technical and business sides of a project of the capacity requirements. For example, the business managers must understand the technical risks of a big marketing launch of a new on-line service. In all cases, project managers must allow sufficient time for testing.