Database Sharding Blog

Tuesday, February 24, 2009

Jurriaan Persyn on Database Sharding at Netlog

Jurriaan Persyn has provided an excellent summary of the presentation he gave at FOSDEM 2009 in Brussels, Belgium. FOSDEM is one of Europe's largest annual conference on open source software with about 5,000 attendees. He also provides the actual presentation slides used during his presentation.

Labels: , , , , , ,


Saturday, January 17, 2009

Database Sharding Schemes


Thursday, December 11, 2008

Fourth Part of Database Sharding Unraveled


Tuesday, November 4, 2008

Sharding Architectures

David Soria Parra of Sun Microsystems gave an interesting presentation on Sharding Architectures at the International PHP Conference 2008:

Labels: , , , , ,


Tuesday, September 23, 2008

Trend: Single Application Databases

db geek has made a very astute observation about single application databases:

One interesting trend that I've noticed in many of the organizations that I've been into is that increasingly databases are being built to serve single applications. The early visions of databases shared amongst multiple applications is no longer the first choice. To a certain extent this has always been the case for certain operational systems, but now the reach of single application databases has grown. You'll even find data replicated across multiple multi-terabyte data warehouses to support different business intelligence solutions.

db geek attributes this trend to factors such as commodity hardware and the benefits of reducing the complexities created by multi-application databases.

Labels: , , ,


Wednesday, August 13, 2008

Second Google Seattle Conference on Scalability

Presentations from the second Google Seattle Conference on Scalability have started to appear on YouTube.

For example, the technical lead on Google Maps for mobile dicussues the adapt-
ing the Google Maps to work in a low bandwidth, high latency environment with a wide variety of networks and devices:



The project was very international and involved collaboration with teams in London, New York, Seattle, Tokyo, Beijing, and Cupertino.

Labels: ,


Thursday, July 24, 2008

The Cloud Does Not Magically Deliver Scalability

Peter Wayner, an InfoWorld Test Center contributing editor, has conducted an interesting hands-on comparison of some of the leaders in cloud computing.

One interesting point that Wayner makes almost immediately in his report is that is that cloud computing, at least in the form of immediately availability of unlimited server capacity, does not deliver scalable applications:

After a few hours, the fog of hype starts to lift and it becomes apparent that the clouds are pretty much shared servers just as the Greek gods are filled with the same flaws as earthbound humans. Yes, these services let you pull more CPU cycles from thin air whenever demand appears, but they can't solve the deepest problems that make it hard for applications to scale gracefully. Many of the real challenges lie at the architectural level, and simply pouring more server cycles on the fire won't solve fundamental mistakes in design.


There are many benefits of cloud computing, especially the fact that hardware is now a commodity that is available on demand. However, there has been a general perception that using cloud computing providers such as Google or Amazon somehow magically delivers the same scalability as the applications provided by the these cloud computing providers. It should be remembered that providers such as Google and Amazon are also famous for their scalability because they shard their applications.

Labels: , ,


Friday, July 11, 2008

Scalability Disaster: Cellular Company’s Online Application Crashes

IBM ran a television campaign a few years ago that showed a development team monitoring a newly launched Web-based application that were initially happy with the growing number of users until they realized their application would not scale. It seems that UK cellular company O2’s staff missed the IBM campaign. 3G iPhone pre-orders have crashed O2’s Web site. The scalability problem could have been anticipated. O2 had taken 200,000 email addresses of people interested in purchasing the 3G iPhone, so they should have anticipated the potential load on the order processing application when they notified so many people that they were ready to take orders. O2’s official response was that: "Though O2 had invested several million pounds to increase the order capacity of the site, at times the site still couldn't process the sheer weight of demand." . The result is frustrated customers and negative press coverage in many UK newspapers. Would you trust your business communications to a company that can not scale a Web application? Can O2 be trusted to scale mail servers?

It would be interesting to find out what really happened.....

Labels: ,


Wednesday, July 9, 2008

Addressing Scalability Before It Is Too Late

Poornima, a database engineer at mint.com with experience in performance and scalability monitoring and testing, wrote a very interesting blog yesterday about addressing Web application scalability before it is too late. Poornima's starts by making a point that is often overlooked by engineers - the starting point for scalability is 'a business perspective' - because scalability failures are business failures and because scalability requirements are business requirements.

Poornima provides a simple explanation of why Database Sharding works so well for database-driven Web applications:

I’m sure we’ve all learned from our intro computer architecture class that CPU bound processes are the fastest and can be parallelized, whereas I/O processes are the bottleneck. In the case of a website, accessing the DB is the slowest I/O process. However, you can speed up access to data by sharding the database. Sharding breaks up a large database into smaller pieces that contains redundant information or a parent db can map data to separate dbs.

Poornima concludes that scalability problems are often good - because they indicate a growing business - but you should plan for scalability from the start.

Labels: ,


Thursday, June 26, 2008

Wikipedia's Scalability Architecture

Domas Mituzas has presented Wikipedia's scalability strategy at Velocity 2008 this week (presentation is available here). Mituzas is a Wikipedia performance engineer and database administrator and member of Board of Trustees of the Wikimedia Foundation. Mituzas is also a MySQL (now Sun) employee and was not shy about reminding people that the entire site is driven from a MySQL database.

There was a big emphasis in the presentation on achiving results with minimal resources because the Wikimedia Foundation is a non-profit organization with a comparitively small budget.

The Wikipedia scalability statistics are impressive - 80,000 SQL queries per second, 18 million page objects in the English language version of the site, 220 million revisions, and 1.5 terabytes of compressed data.

Wikipedia uses Database Sharding to set up master-slave relationships between databases, which are logically based on use cases and languages. Mituzas points out that the Wikipedia team only found out that they database architecture was an example of Database Sharding after they implemented it. Mituzas said MySQL instances range from 200 to 300 gigabytes.

Labels: ,


Monday, April 14, 2008

Scalability and Innovation

It's a common misconception that Google’s key intellectual property is its seemingly unique ability to delivery accurate search results. In fact, the patent for the famous PageRank algorithm is owned by Stanford University. Google listed the scalability of its vast infrastructure as its key asset during its Initial Public Offering.

Tim O’Reilly has explained that "it's not accident that Google’s system administration, networking, and load balancing techniques are perhaps even more closely guarded secrets than their search algorithms."

The business value of this infrastructure is discussed in this month's Harvard Business Review (HBR) in a detailed article about Google innovation called “Reverse Engineering Google’s Innovation Machine”.

HBR explains:

Google has spent billions of dollars creating its Internet-based operating platform and developing proprietary technology that allows the company to rapidly develop and roll out new services of its own or its partners’ devising.

However, Google not only has infrastructure that allows it to add new on-line services quicker and easier than its competitors, it also has a management strategy that encourages staff to innovate by including it in job descriptions:

Much of what the company does is rooted in its legendary IT infrastructure, but technology and strategy at Google are inseparable and mutually permeable – making it hard to say whether technology is the DNS of its strategy or the other way around.

What would it mean for your organization if there were sufficient infrastructure resources to allow everyone to experiment with new ideas?

Labels: , , , , ,


Monday, January 14, 2008

Avoiding On-Line Application Scalability Disasters

Web-based applications are different than normal enterprise applications – there can be unexpected usage spikes. There are many examples of on-line service scalability failures:

Government failures in information technology are famous. This is somewhat unfair because government failures are probably no more common than in commercial enterprises, they are just higher profile. For example, the UK's Public Records Office published the 1901 census on-line, the database was not able to handle the workload, which was more queries in an hour than were expected in a day. The resulting problem took 10 months to fix because they had not implemented a database architecture that allowed them to increase the capacity for read volumes (this would be simple with database sharding, for example, because the data could simply be sharded into smaller databases on separate servers).

Even companies that are famous for the scalability of their infrastructure, such as Amazon, have had scalability failures. For example, in 2003, a sudden traffic increase at Amazon.co.uk due to incorrect pricing resulted in the entire site being taken down.

The online betting site sportingindex.com was not designed to cope with an increase in customer numbers and service was offline for an entire day just before one of the biggest global betting sports events – the 2002 England versus Brazil World Cup game. The result was not just loss of revenue, but also loss of customers to other betting sites.

For a company that only operates online, the inability to scale is nothing short of business critical. For some companies, it is not just a temporary setback, scalability can make a strategic difference that changes the fate of a company: A business executive at Friendster blames the lack of scalability as a key reason for losing its early market lead:

"we had millions of Friendster members begging us to get the site working faster so they could log in and spend hours social networking with their friends. I remember coming in to the office for months reading thousands of customer service emails telling us that if we didn’t get our site working better soon, they’d be 'forced to join' a new social networking site that had just launched called MySpace…the rest is history."

So what can engineers do to avoid on-line scalability disasters?

The most common mistake is not designing on-line applications scalability and performance – because problems are rarely anticipated. It helps a lot of there is a clear understanding between technical and business sides of a project of the capacity requirements. For example, the business managers must understand the technical risks of a big marketing launch of a new on-line service. In all cases, project managers must allow sufficient time for testing.

Labels: , , ,


Saturday, October 13, 2007

Google TechTalk on Scalability

A Google TechTalk by Marissa Mayer called "Scaling Google for Every User" is now available via YouTube:


The presentation was given last June in Seattle and has a strong focus on the end-user experience.

Labels: , , ,


Wednesday, September 26, 2007

Google Scalability Presentation

Jeff Dean, an engineer and Google Fellow, has presented Building a Computing System for the Worlds Information at the Seattle Conference on Scalability last June. After a general introduction to Google’s infrastructure, the presentation focuses on the famous elements of Google’s systems intrastructure: Google file system (GFS), MapReduce, and BigTable.

Labels: , ,