SQL vs NoSQL
Since both languages have equal number of proponents when deciding which language to use, Network World invited two influencers to share their views on SQL vs NoSQL. Ryan Betts, CTO of VoltDB, is a stark proponent of SQL and will therefore take the structured side. Bob Wiederhold, CEO of CouchDB, on the other hand is of the strong opinion that, when evaluating SQL vs NoSQL in light of Big Data, NoSQL is clearly the better choice. We will try and sum up the points on both sides.
SQL
SQL is an incredibly well-established and long-running technology that is now deployed with the likes of Google, Facebook and Cloudera, all companies with a lot of clout in the Big Data sphere. In addition to simply pointing out that SQL is the tried and tested language, Betts provides four reasons for why it is a more appropriate choice for dealing with Big Data.
SQL opens up the insight potential that big data has to a much wider community of people, who will not necessarily have a software development background. Since the user types just the commands and leaves the decision of how to most efficiently perform that query up to the database engine, analysts, managers and other employees can run large-scale queries without understanding the underlying computational processes
One of the main reasons against SQL was its lack of scalability but Betts strongly disagrees with this, noting that companies such as Facebook would not be using SQL if it was not able to handle their petabytes of data.
NoSQL
NoSQL, a language that supports non-relational database queries, uses a distributed file system*Â and is able to handle data coming in non-standard shapes and sizes. NoSQL allows multiple users to access the information at the same time, which in turn means that the size of the dataset that is worked with can be immense without causing any issue.
Scalability without NoSQL might be possible, but it is unnecessarily costly since ever more expensive hardware is required. NoSQL on the other hand can be run across a large number of cheap nodes that, when combined, offer the same power at a much lower cost. Adding further space to the network is therefore easily done and light on the wallet.
NoSQL does not try to squeeze information into rows and columns that, in turn, are identified by further rows and columns, all of which need to be accessed and collated during each read/write operation. This does not make a crucial difference when working with smaller datasets but, as these grow, the computing power required to execute these operations takes up time. Its distributed nature makes NoSQL much faster. It may duplicate data in the process but since storage is comparatively cheap, the extra storage cost is in no relation to the speed gained.
Wiederhold’s clearest argument, and his most convincing, is that most of the data collected today is in unstructured form. As a result only a NoSQL database (like CouchDB) is able to deal with it.
Join in on the debate of SQL vs NoSQLÂ here.
*see here for a definition of what a distributed file system is and here for how one would work
Image Credits: owenjell / Flickr