Skip to main content

Challenging Stonebraker’s Assertions On Data Warehouses - Part 2

Check out the Part 1 if you haven’t already read it to better understand the context and my disclaimer. This is the Part 2 covering the assertions from 6 to 10.

Assertion 6: Appliances should be "software only."

“In my 40 years of experience as a computer science professional in the DBMS field, I have yet to see a specialized hardware architecture—a so-called database machine—that wins.”

This is a black swan effect; just because someone hasn’t seen an event occur in his or her lifetime, it doesn’t mean that it won’t happen. This statement could also be re-written as “In my 40 years of experience, I have yet to see a social network that is used by 500 million people.” You get the point. I am the first one who would vote in favor of commodity hardware against a specialized hardware, but there are very specific reasons why the specialized hardware makes sense in some cases.

“In other words, one can buy general purpose CPU cycles from the major chip vendors or specialized CPU cycles from a database machine vendor.”

Specialized machines don’t necessarily mean specialized CPU cycles. I hope the word “CPU cycle” is used as metaphor and not to indicate its literal meaning.

“Since the volume of the general purpose vendors are 10,000 or 100,000 times the volume of the specialized vendors, their prices are an order of magnitude under those of the specialized vendor.”

This isn’t true. The vendors who make general-purpose hardware also make specialized hardware, and no, it’s not an order of magnitude expensive.

“To be a price- performance winner, the specialized vendor must be at least a factor of 20-30 faster.”

It’s a wrong assumption that BI vendors use specialized hardware just because of the performance reasons. The “specialized” in many cases for an appliance is simply a specialized configuration. The appliance vendors also leverage their relationship with the hardware vendors to fine tune the configuration based on their requirements, negotiate a hefty discount, and execute a joint go-to-market strategy.

The enterprise software follows value-based pricing and not cost-based pricing. The price difference between a commodity and a specialized appliance is not just the difference of the cost of hardware that it runs on.

“However, every decade several vendors try (and fail).”

Not sure what is the success criteria behind this assertion to declare someone a winner or a failure. Acquisitions of Netezza, Greenplum, and Kickfire are recent examples of how well the appliance companies have performed. The incumbent appliance vendors are doing great, too.

“Put differently, I think database appliances are a packaging exercise”

The appliances are far more than a packaging exercise. Other than making sure that the software appliance works on the selected hardware, commoditized or otherwise, they provide a black box lifecycle management approach to the customers. The upfront cost of an appliance is a small fraction of the overall money that the customers would end up spending during the entire lifecycle of an appliance and the related BI efforts. The customers do welcome an approach where they are responsible for managing one appliance against five different systems at ten different levels with fifteen different technology stack versions.

Assertion 7: Hybrid workloads are not optimized by "one-size fits all."

Yes, I agree, but that’s not the point. It’s difficult to optimize hybrid workloads for a row or a column store, but it is not as difficult, if it’s a hybrid store.

“Put differently, two specialized systems can each be a factor of 50 faster than the single "one size fits all" system in solution 1.”

Once again, I agree, but it does not apply to all the situations. As I discussed earlier, the performance is not the only criteria that matters in the BI world. In fact, I would argue the opposite. Just because the OLTP and OLAP systems are orthogonal, the vendors compromised everything else to gain the performance. Now that’s changing. Let’s take an example of an operational report. This is the kind of report that only has the value if consumed in realtime. For such reports, the users can’t wait until the data is extracted out of the OLTP system, cleaned up, and transferred into the OLAP system. Yes, it could be 50 times faster, but completely useless, since you missed the boat.

The hybrid systems, the once that combine OLTP and OLAP, are fairly new, but they promise to solve a very specific problem, which is real real-time. While the hybrid systems evolve, the computational capabilities of OLTP and OLAP systems have started to change as well. I now see OLAP systems supporting write-backs with a reasonable throughput and OLTP systems with good BI style query performance, all of these achieved through modern hardware and clever use of architectural components.

Let’s not forget what optimization really is. It means desired functionality at reasonable performance. A real-time report, that takes 10 seconds to run could be far more valuable than a report that runs under ten milliseconds, three days later.

“A factor of 50 is nothing to sneeze at.”

Yes, point taken. :-)

Assertion 8: Essentially all data warehouse installations want high availability (HA).

No, they don’t. This is like saying all the customers want five 9 SLA on the cloud. I don’t underestimate the business criticality of a DW if it goes down, but not all the DW are being used 24x7 and are mission critical. One size doesn’t fit all. And, if your DW is not required to be highly available, you need to ask yourself, whether it is fair for you to pay for the HA architectural cost, if you don’t want it. Tiered SLAs are not new, and tiered HA is not a terrible idea.

Let’s talk about the DWs that do require to be highly available.

“Moreover, there is no reason to write a DBMS log if this is going to be the recovery tactic. As such, a source of run-time overhead can be avoided.”

I am a little confused how this is worded. Which logs are we referring to - the source systems or the target systems? The source systems are beyond the control of a BI vendor. There are newer approaches to design an OLTP system without a log, but that’s not up for discussion for this assertion. If the assertion is referring to the logs of the target system, how does that become a run-time overhead? Traditional DW systems are a read-only system at runtime. They don’t write logs back to the system. If he is referring to the logs while the data is being moved to DW, that’s not really run-time, unless we are referring to it as a hot-transfer.

There is one more approach, NoSQL, where eventual consistency is achieved over a period of time and the concept of a “corrupted system” is going away. Incomplete data is an expected behavior and people should plan for it. That’s the norm, regardless of a system being HA or not. Recently Netflix moved some of its applications to the cloud, where they have designed a background data fixer to deal with data inconsistencies.

HA is not black and white, and there are way more approaches, beyond the logs, to accomplish to achieve desired outcome.

Assertion 9: DBMSs should support online reprovisioning.

“Hardly anybody wants to take the required amount of down time to dump and reload the DBMS. Likewise, it is a DBA hassle to do so. A much better solution is for the DBMS to support reprovisioning, without going offline. Few systems have this capability today, but vendors should be encouraged to move quickly to provide this feature.”

I agree. I would add one thing. The vendors, even today, have a trouble supporting offline provisioning to cater to the increasing load. On-line reprovisioning is not trivial, since in many cases, it requires to re-architect their systems. The vendors typically get away with this, since the most customers don’t do capacity planning in real-time. Unfortunately, traditional BI systems are not commodity where the customers can plug-in more blades when they want and take them out when they don’t.

This is the fundamental premise behind why cloud makes it a great BI platform to address such re-provisioning issues with elastic computing. Read my post “The Future Of BI In The Cloud”, if you are inclined to understand how horizontal scale-out systems can help.

Assertion 10: Virtualization often has performance problems in a DBMS world.

This assertion, and the one before this, made me write the post “The Future Of BI In The Cloud”. I would not repeat what I wrote there, but I will quickly highlight what is relevant.

“Until better and cheaper networking makes remote I/O as fast as local I/O at a reasonable cost, one should be very careful about virtualizing DBMS software.”

Virtualizing I/O is not a solution for large DW with complex queries. However, as I wrote in the post, a good solution is not to make the remote I/O faster, but rather tap into the innovation of software-only SSD block I/O that are local.

“Of course, the benefits of a virtualized environment are not insignificant, and they may outweigh the performance hit. My only point is to note that virtualizing I/O is not cheap.”

This is what a disruption initially looks like. You start seeing good enough value in an approach, for certain types of solutions, that seems expensive for other set of solutions. Over a period of time, rapid innovation and economies of scale remove this price barrier. I think that’s where the virtualization stands, today. The organizations have started to use the cloud for IaaS and SaaS for a variety of solutions including good enough self-service BI and performance optimization solutions. I expect to see more and more innovation in this area where traditional large DW will be able to get enough value out of the cloud, even after paying the virtualization overhead.

Comments

Popular posts from this blog

15 YEARS OLD GIRL IMPREGNATED AND MAN RESPONSIBLE FOR IT TOOK FOR AN ABORTION THAT FAILED

BBI FACILITATE ARREST OF 35 YEARS OLD FOR DEFILEMENT, IMPREGNATING 15 YEARS OLD GIRL AND ABORTING FIVE MONTHS PREGNANCY IN ANAMBRA STATE. Today, at 1:26pm, We received a complaint from a concerned citizen who informed us of a 15yrs old girl brought into a hospital for medical treatment. Our intelligence team led by Director General Gwamnishu Emefiena Harrison Kenneth Nwaobi Ezika Kene and others left Asaba and arrived Ogidi Anambra state for investigation. 35yrs Chris Azuoma took the victim to hospital where she was injected and given abortion pills. She bled heavily and had complications and so decided to take her to a specialist hospital to evacuate the foetus. Getting to the hospital, we met the management and identified ourselves as Human rights group and they granted us permission to interview the victim. She confirmed the story and the perpetrator confessed forcefully having unprotected sexual intercourse with the victim. 2015 Administration of Criminal Justice permit private per

Hacking Into The Indian Education System Reveals Score Tampering

Debarghya Das has a fascinating story on how he managed to bypass a silly web security layer to get access to the results of 150,000 ISCE (10th grade) and 65,000 ISC (12th grade) students in India. While lack of security and total ignorance to safeguard sensitive information is an interesting topic what is more fascinating about this episode is the analysis of the results that unearthed score tampering. The school boards changed the scores of the students to give them "grace" points to bump them up to the passing level. The boards also seem to have tampered some other scores but the motive for that tampering remains unclear (at least to me). I would encourage you to read the entire analysis and the comments , but a tl;dr version is: 32, 33 and 34 were visibly absent. This chain of 3 consecutive numbers is the longest chain of absent numbers. Coincidentally, 35 happens to be the pass mark. Here's a complete list of unattained marks - 36, 37, 39, 41, 43, 45, 47, 49, 51, 53,

Reveiw: Celluon Epic Laser Keyboard

The Celluon Epic is a Bluetooth laser keyboard. The compact device projects a QWERTY keyboard onto most flat surfaces. (Glass tabletops being the exception) You can connect the Epic to vertically any device that supports Bluetooth keyboards including devices running iOS , Android , Windows Phone, and Blackberry 10. On the back of the device there is a charging port and pairing button. Once you have the Epic paired with your device it acts the same as any other keyboard. For any keyboard the most important consideration is the typing experience that it provides. The virtual keyboard brightness is adjustable and is easy to see in most lighting conditions. Unfortunately the brightness does not automatically adjust based on ambient light. With each keystroke a beeping sound is played which can be turned down. The typing experience on the Epic is mediocre at best. Inadvertently activating the wrong key can make typing frustrating and tiring. Even if you are a touch typist you'll still