Grid Databases – The Future of Database Technology?

Looking for something specific?

Back in the 90's a group of German engineers put together the world's first grid computing network with over 100 PCs running on the first version of the Linux operating system. It was a great success and everybody called it the dawn of a new technology that will change the computing world forever.

What these engineers didn't understand was how database engines worked at the time and how they actually set the trends for hardware development. The database engines were calling for Massive Parallel Processing (MPP) systems that offered dozens of CPUs in one single server platform. Just when we thought the mainframe was dead – with grid computing, we witnessed the birth of the open systems mainframe.
To offset these MPP systems, software architects created a middleware layer to get away from these monsters. Now we have these MPP systems in the center of the universe and all these middleware servers dancing around them.

For years to come, companies filled their data centers with hundreds of middleware servers and dozens of MPP systems. Cooling and power supplies were running at their peak and data center managers didn't know how to support this hardware excess into the future.

This became the birthplace of server virtualization. Within a few years, server virtualization became the main focus for every company searching for infrastructure savings. Consolidating middleware servers was an easy task and a huge success story shared with pride by the project managers.

But one area was not so successful in consolidating hardware. MPP systems didn't go away quietly. It turned out that the database systems were too much to handle for the server virtualization frenzy. Yes, countless efforts have been undertaken to move the databases off these monsters, but the virtual world couldn't provide the performance needed to support the databases.

Remember our grid computing story? The vision of sharing an army of small computers to produce the same computing power as the massive MPP systems seemed lost forever. Until the ugly truth about MPP systems became obvious. Running huge MPP systems are not only energy intensive, but the associated maintenance costs are also burdensome. Buying a replacement MPP system was the only option. However, spending money got increasingly tight over the past several years. This all became a Catch-22.  You needed to spend more to increase your costs!

Database vendors to the rescue: grid database technology seems to be the way out of under the massive weight of these MPP systems. Take a couple of low-cost powerful dual or quad core servers and spread the workload over multiple servers. Not only do you get instant high availability, but you gain added scalability beyond your MPP platform's physical limitations. There are two major methodologies in achieving grid databases; shared everything and shared nothing.

The shared everything category is dominated by Oracle and Sybase. Both systems are able to instantly failover database processes should one participating server go down, aka. high availability. And both can dynamically scale their CPU power by adding more servers to the grid, and both systems can balance the workload among all participating servers. Oracle RAC and Sybase ASE-Cluster Edition are the most sophisticated systems available today. If you want to squeeze the maximum out of your existing hardware or if you are seeking to replace energy-wasting and maintenance-fee-eating MPP monsters, these two databases are the weapons of choice.

The second methodology is shared nothing. Microsoft SQL Server 2008 Federation Data Store represents the leader in this category. Unlike the shared everything technology, the shared nothing approach has a clear distinction between local and global data. The data federation approach allows combining data stored locally on multiple individual databases. It acts as an aggregator of multiple databases. This is not as sophisticated as the shared everything approach, but it gets the job done as well.

There's a third contender, Sybase IQ Multiplex. This system uses a hybrid approach, shared data, but no shared cache. This is very unique and no other database vendor has anything like it. Sybase IQ is Sybase's data warehouse engine. A column-vector database that set new performance benchmark records by having just one distinctive writer node in the cluster and a nearly unlimited number of reader nodes.  But there's a caution: Never try to run an OLTP application on this system. This system is built for data warehousing and massive analytical reporting, a perfect match for data hungry BI tools.

There are a couple of other database vendors that are offering grid database technology of some flavor. This article is not meant to create a competitive analysis between all of these systems, but a starting point to get your imagination going. The bottom-line is that preserving energy becomes more and more important and software vendors are providing solutions to maximize low-cost and low-energy consuming hardware. The future belongs to grid databases.