dbShards is the industry’s first software product that allows database sharding to be applied to existing applications and databases with little or no modification to existing application code. Applications using dbShards connect to multiple databases (shards) where data is partitioned based on one or more sharding strategies.
Database Sharding Overview
The following diagram provides an overview of an application using dbShards. Each application server (AS) is running dbs/Client (dbShards provides database drivers for popular languages including Java, PHP, Python and Ruby). Each shard server (S1 through S4) is running a database server such as MySQL as well as dbShards replication agents and query agents (to support parallel query functionality). In this example, data is partitioned using customer ID as a shard key. A modulus algorithm is used to determine which shard each customer ID belongs to.
Why Do I Need Database Sharding?
Instead of storing application data in a single database on a single server with shared CPU, memory and disk, database sharding allows the database to be divided into a number of smaller “shards”, each of which can be hosted on independent servers, with dedicated CPU, memory and disk, therefore greatly reducing resource contention. Because each shard is small, the database server can do a much better job of storing indexes and query caches in memory, resulting in significantly improved performance. Just as databases slow down exponentially as they grow beyond the limits of a single server, sharding a database can result in better-than-linear performance gains.
The bottom line is that database sharding is the only shared-nothing solution to horizontal scaling of relational databases allowing better-than-linear scalability.
Why Do I Need a Database Sharding Product?
Many companies implement sharding by re-architecting their application to be shard-aware. Each time a database operation is required the application determines which shard to connect to. This can easily result in a 3-6 month rewrite effort and although sharding seems like a simple concept initially, there are some complexities that need to be considered, for example …
* Do I need to shard all of my tables or just the big tables?
* How do I ensure my data is evenly distributed between shards?
* How does sharding affect referential integrity constraints?
* How do I use auto increment values and ensure unique values across all shards?
* How do I perform joins between my sharded tables and non-sharded tables?
* How do I run aggregrate queries that need data from multiple shards?
* What if I need to add more shards later on or change the sharding strategy?
* How do I perform the initial sharding of my existing data?
* What about joins between shards and transactions involving multiple shards?
* How do I ensure data is going to the correct shard?
* How do I implement HA in a sharded environment?
* How does sharding affect my backup/recovery procedures?
Why Should I Use dbShards?
dbShards addresses the complexity of sharding by providing tools to perform the initial sharding of data based on a shard strategy and then providing runtime infrastructure to make the shards appear as a single database to applications through the use of a custom database driver and agent processes for parallel query capabilities.
dbShards Enterprise is ideal for installation on both physical hardware in traditional data centers and cloud systems such as Amazon EC2.