Database Sharding Blog

Wednesday, August 13, 2008

Second Google Seattle Conference on Scalability

Presentations from the second Google Seattle Conference on Scalability have started to appear on YouTube.

For example, the technical lead on Google Maps for mobile dicussues the adapt-
ing the Google Maps to work in a low bandwidth, high latency environment with a wide variety of networks and devices:



The project was very international and involved collaboration with teams in London, New York, Seattle, Tokyo, Beijing, and Cupertino.

Labels: ,


Monday, April 14, 2008

Scalability and Innovation

It's a common misconception that Google’s key intellectual property is its seemingly unique ability to delivery accurate search results. In fact, the patent for the famous PageRank algorithm is owned by Stanford University. Google listed the scalability of its vast infrastructure as its key asset during its Initial Public Offering.

Tim O’Reilly has explained that "it's not accident that Google’s system administration, networking, and load balancing techniques are perhaps even more closely guarded secrets than their search algorithms."

The business value of this infrastructure is discussed in this month's Harvard Business Review (HBR) in a detailed article about Google innovation called “Reverse Engineering Google’s Innovation Machine”.

HBR explains:

Google has spent billions of dollars creating its Internet-based operating platform and developing proprietary technology that allows the company to rapidly develop and roll out new services of its own or its partners’ devising.

However, Google not only has infrastructure that allows it to add new on-line services quicker and easier than its competitors, it also has a management strategy that encourages staff to innovate by including it in job descriptions:

Much of what the company does is rooted in its legendary IT infrastructure, but technology and strategy at Google are inseparable and mutually permeable – making it hard to say whether technology is the DNS of its strategy or the other way around.

What would it mean for your organization if there were sufficient infrastructure resources to allow everyone to experiment with new ideas?

Labels: , , , , ,


Monday, November 5, 2007

Data Management for Internet-Scale Single-Sign-On

Sharon Perl of Google and Margo Seltzer of Harvard University and Oracle Corporation presented a paper one year ago today on Data Management for Internet-Scale Single-Sign-On at the 3rd USENIX Workshop on Real, Large Distributed Systems in Seattle. The paper explains in detail the Database Sharding strategy used for the single-sign-on service:

The SSO service maps usernames to user account data and services to service-specific data. These mappings are stored in the SSO database, which is partitioned into hundreds of pieces (called shards) for load balancing and data localization. Each shard is a replicated Berkeley DB database composed of between 5 and 15 replicas, depending on the shard's purpose. The SSO data in each replica is stored in a single Berkeley DB Btree database 2.

Smaller shards have five full replicas, any of which is capable of becoming a master. All updates must go to the master. Consistent reads must also go to the master. We sometimes allow ``stale reads'', which may be slightly out-of-date by an amount of time that we control, and which can be performed at non-master replicas. The larger replication groups typically have five replicas capable of becoming masters (``electable replicas'') plus additional read-only replicas. Read-only replicas receive updates from the master, but do not participate in elections or contribute to commit quorums for updates, so the number of read-only replicas and their distance from other replicas does not affect the latency or availability of operations. When the system is running well (the normal case) the state of read-only replicas will be fairly closely synchronized with the master. A shard can have a master as long as more than half its electable replicas are up and communicating.

We spread replicas across multiple, geographically distributed data centers for availability in the face of failures of machines, networks, or data centers. At the same time, we try to keep replicas within a shard fairly close to one another because the communication latency between replicas affects how long it takes to commit a write operation to a shard or to elect a new master. The set of shards is geographically dispersed for data locality. We try to assign new users to shards based on where their data is likely to be accessed. This becomes tricky when the user data is shared by a variety of services that also may be spread over geographically dispersed data centers. We could do more optimization of data placement than we currently do, however it has not turned out to be a high priority for system performance.

As illustrated in Figure 1, there are logically two different kinds of shards. The vast majority of shards are independent databases that map a set of userids to account data and service ids to user-independent service data. The remaining shards implement the ID-map, which maps usernames to userids and userids to shards.

The ID-map is used for login, e-mail delivery, and at other times when we need to find a user's account data given a username. The ID-map shards are chained together in a doubly-linked list to store an extensible map, for scalability. Each shard in the chain handles a sub-range of the key space. Adjacent shards store adjacent ranges. Client library code keeps hints for the names of the component shards of the ID-map and their corresponding key ranges, so that we do not have to traverse the list for each key access. If the keys get rebalanced among the shards (which they can using offline tools), clients of the storage system will notice the changes and adjust their cached locations.


The paper is available in PDF format here.

Labels: , , ,


Thursday, August 2, 2007

Google Podcast on Database Sharding

Google Developer Podcast number 6 discusses sharding at Google, database sharding, and Hibernate Shards.

Some topics covered in the postcast include:

How sharding and database sharding is a commonly used term at Google.

What and when to shard databases and data - for scalable performance.

How Hibernate Shards can be used unobtrusively with Hibernate

Database Sharding strategies - getting it right first time

How database sharding compares with horizontal partitioning at the database level using new features in MySQL and PostgreSQL.

The postcast is available here (MP3 file, 23.2 MB, 34 minutes).

Labels: ,