Database Sharding Blog

Monday, June 30, 2008

The Economics of Database Sharding

Karl Seguin of Fuel Industries makes some interesting points about the economics Database Sharding.

Sequin unkindly speculated that the major database vendors have ignored Database Sharding for commercial reasons.

There are a lot of expensive ways to scale your database – all of which are highly touted by the big three database vendors because, well, they want to sell you all types of really expensive stuff. Despite what an “engagement consultant” might tell you though, most of the high-traffic websites on the web (google, digg, facebook) rely on far cheaper and better strategies: the core of which is called sharding.

What’s really astounding is that sharding is database agnostic – yet only the MySQL crowd seem to really be leveraging it. The sales staff at Microsoft, IBM and Oracle are doing a good job selling us expensive solutions.

Labels: ,


Thursday, June 26, 2008

Wikipedia's Scalability Architecture

Domas Mituzas has presented Wikipedia's scalability strategy at Velocity 2008 this week (presentation is available here). Mituzas is a Wikipedia performance engineer and database administrator and member of Board of Trustees of the Wikimedia Foundation. Mituzas is also a MySQL (now Sun) employee and was not shy about reminding people that the entire site is driven from a MySQL database.

There was a big emphasis in the presentation on achiving results with minimal resources because the Wikimedia Foundation is a non-profit organization with a comparitively small budget.

The Wikipedia scalability statistics are impressive - 80,000 SQL queries per second, 18 million page objects in the English language version of the site, 220 million revisions, and 1.5 terabytes of compressed data.

Wikipedia uses Database Sharding to set up master-slave relationships between databases, which are logically based on use cases and languages. Mituzas points out that the Wikipedia team only found out that they database architecture was an example of Database Sharding after they implemented it. Mituzas said MySQL instances range from 200 to 300 gigabytes.

Labels: ,


Monday, June 16, 2008

Third Installment of Database Sharding Unraveled

Bogdan Nicolau has published the third article in his 'Database Sharding Unraveled" series. He makes an interesting point about planning for database scalability from the start:

Before really diving into high scalability principles, I want to take a moment to talk about why database sharding has an important role even in small startups or medium sized web-sites (5 - 30k unique visitors/day).

It is equally important and benefic for a smaller web business to prepare itself from the beginning to tackle large amounts of users cheap. If it’s not obvious enough, think about what happens to a web-page that gets some plain old Digg attention. The server quickly collapses and the user experience immediately turns from positive to mega negative.

As I’ve explained before, the whole purpose of sharding is to be able to use an unlimited number of cheap machines topped by an open-source database. As experience taught me, the web server will rarely die. Instead, the DB server will choke easily when having to deal with many simultaneous connections.

The database doesn’t even have to be very big.


Bogdan's focus is building scalable database-driven Web sites - but his comments apply to general applications as well.

Labels:


Friday, June 13, 2008

A Database Sharding Plan for Twitter

There's a very interesting post by Hank Williams called Why does everything suck?: A Detailed Five Step Twitter Scaling Plan that goes into great detail about how they can solve their database scalabilty problems using Database Sharding.

Labels: