Skip to main content

A Journey From SQL to NoSQL to NewSQL


Two years back I wrote that the primary challenge with NoSQL is that it's not SQL. SQL has played a huge rule in making relational databases popular for the last forty years or so. Whenever the developers wanted to design an(y) application they put an RDBMS underneath and used SQL from all possible layers. Over a period of time, the RDBMS grew in functions and features such as binary storage, faster access, clusters, sophisticated access control etc. and the applications reaped these benefits. The traditional RDBMS became a non-fit for cloud-scale applications that fundamentally required scale at whole different level. Traditional RDBMS could not support this scale and even if they could it became prohibitively expensive for the developers to use it. Traditional RDBMS also became too restrictive due to their strict upfront schema requirements that are not suitable for modern large scale consumer web and mobile applications. Due to these two primary reasons and a lot more other reasons we saw the rise of NoSQL. The cloud movement further fueled this growth and we started to see a variety of NoSQL offerings.

Each NoSQL store is unique in which how a programmer would access it. NoSQL did solve the scalability and flexibility problems of a traditional database, but introduced a set of new problems, primary ones being lack of ubiquitous access and consistency options, especially for OLTP workload, for schema-less data stores.

This has now led to the movement of NewSQL (a term initially coined by Mat Aslett in 2011) whose working definition is: "NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for OLTP workloads while still maintaining the ACID guarantees of a traditional single-node database system." NewSQL's focus appears to be on gaining performance and scalability for OLTP workload by supporting SQL as well as custom programming models and eliminating cumbersome error-prone management tasks such as manual sharding without breaking the bank. It's a good first step in the direction of a scalable distributed database that supports SQL. It doesn't say anything about mixed OLTP and OLAP workload which is one of the biggest challenges for the organizations who want to embrace Big Data.

From SQL to NoSQL to NewSQL, one thing that is common: SQL.

Let's not underestimate the power of a simple non-procedural language such as SQL. I believe the programmers should focus on what (non-procedural such as SQL) and not how. Exposing "how" invariably ends up making the system harder to learn and harder to use. Hadoop is a great example of this phenomenon. Even though Hadoop has seen widespread adoption it's still limited to silos in organizations. You won't find a large number of applications that are exclusively written for Hadoop. The developers first have to learn how to structure and organize data that makes sense for Hadoop and then write an extensive procedural logic to operate on that dataset. Hive is an effort to simplify a lot of these steps but it still hasn't gained desired populairty. The lesson here for the NewSQL vendors is: don't expose the internals to the applications developers. Let a few developers that are closer to the database deal with storing and configuring the data but provide easy ubiquitous access to the application developers. The enterprise software is all about SQL. Embracing, extending, and augmenting SQL is a smart thing to do. I expect all the vendors to converge somewhere. This is how RDBMS and SQL grew. The initial RDBMS were far from being perfect but SQL always worked and the RDBMS eventually got better.

Distributed databases is just one part of the bigger puzzle. Enterprise software is more about mixing OLAP and OLTP workload. This is the biggest challenge. SQL skills and tools are highly prevalent in this ecosystem and more importantly people have SQL mindset that is much harder to change. The challenge to vendors is to keep this abstraction intact and extend it without exposing the underlying architectural decisions to the end users.

The challenge that I threw out a couple of years back was:

"Design a data store that has ubiquitous interface for the application developers and is independent of consistency models, upfront data modeling (schema), and access algorithms. As a developer you start storing, accessing, and manipulating the information treating everything underneath as a service. As a data store provider you would gather upstream application and content metadata to configure, optimize, and localize your data store to provide ubiquitous experience to the developers. As an ecosystem partner you would plug-in your hot-swappable modules into the data stores that are designed to meet the specific data access and optimization needs of the applications."

We are not there, yet, but I do see  signs of convergence. As a Big Data enthusiast I love this energy. Curt Monash has started his year blogging about NewSQL. I have blogged about a couple of NewSQL vendors, NimbusDB (NuoDB) and GenieDB, in the past and I have also discussed the challenges with the OLAP workload in the cloud due to its I/O intensive nature. I am hoping that NewSQL will be inclusive of OLAP and keep SQL their first priority. The industry is finally on to something and some of these start-ups are set out to disrupt in a big way.

Photo Courtesy: Liz

Comments

Popular posts from this blog

Emergent Cloud Computing Business Models

The last year I wrote quite a few posts on the business models around SaaS and cloud computing including SaaS 2.0 , disruptive early stage cloud computing start-ups , and branding on the cloud . This year people have started asking me – well, we have seen PaaS, IaaS, and SaaS but what do you think are some of the emergent cloud computing business models that are likely to go mainstream in coming years. I spent some time thinking about it and here they are: Computing arbitrage: I have seen quite a few impressive business models around broadband bandwidth arbitrage where companies such as broadband.com buys bandwidth at Costco-style wholesale rate and resells it to the companies to meet their specific needs. PeekFon solved the problem of expensive roaming for the consumers in Eurpoe by buying data bandwidth in bulk and slice-it-and-dice-it to sell it to the customers. They could negotiate with the operators to buy data bandwidth in bulk because they made a conscious decision not to st...

Reveiw: Celluon Epic Laser Keyboard

The Celluon Epic is a Bluetooth laser keyboard. The compact device projects a QWERTY keyboard onto most flat surfaces. (Glass tabletops being the exception) You can connect the Epic to vertically any device that supports Bluetooth keyboards including devices running iOS , Android , Windows Phone, and Blackberry 10. On the back of the device there is a charging port and pairing button. Once you have the Epic paired with your device it acts the same as any other keyboard. For any keyboard the most important consideration is the typing experience that it provides. The virtual keyboard brightness is adjustable and is easy to see in most lighting conditions. Unfortunately the brightness does not automatically adjust based on ambient light. With each keystroke a beeping sound is played which can be turned down. The typing experience on the Epic is mediocre at best. Inadvertently activating the wrong key can make typing frustrating and tiring. Even if you are a touch typist you'll still ...

Rise Of Big Data On Cloud

Growing up as an engineer and as a programmer I was reminded every step along the way that resources—computing as well as memory—are scarce. The programs were designed on these constraints. Then the cloud revolution happened and we told people not to worry about scarce computing. We saw rise of MapReduce, Hadoop, and countless other NoSQL technology. Software was the new hardware. We owe it to all the software development, especially computing frameworks, that allowed developers to leverage the cloud—computational elasticity—without having to understand the complexity underneath it. What has changed in the last two to three years is a) the underlying file systems and computational frameworks have matured b) adoption of Big Data is driving the demand for scale out and responsive I/Os in the cloud. Three years back, I wrote a post, The Future Of The BI In Cloud  where I had highlighted two challenges of using cloud as a natural platform for Big Data. The first one was to create a lar...