Skip to main content

Commoditizing Data Science



My ongoing conversations with several people continue to reaffirm my belief that Data Science is still perceived to be a sacred discipline and data scientists are perceived to be highly skilled statisticians who walk around wearing white lab coats. The best data scientists are not the ones who know the most about data but they are the ones who are flexible enough to take on any domain with their curiosity to unearth insights. Apparently this is not well-understood. There are two parts to data science: domain and algorithms or in other words knowledge about the problem and knowledge about how to solve it.

One of the main aspects of Big Data that I get excited about is an opportunity to commoditize this data science—the how—by making it mainstream.

The rise of interest in Big Data platform—disruptive technology and desire to do something interesting about data—opens up opportunities to write some of these known algorithms that are easy to execute without any performance penalty. Run K Means if you want and if you don't like the result run Bayesian linear regression or something else. The access to algorithms should not be limited to the "scientists," instead any one who wants to look at their data to know the unknown should be able to execute those algorithms without any sophisticated training, experience, and skills. You don't have to be a statistician to find a standard deviation of a data set. Do you really have to be a statistician to run a classification algorithm?

Data science should not be a sacred discipline and data scientists shouldn't be voodoos.

There should not be any performance penalty or an upfront hesitation to decide what to do with data. People should be able to iterate as fast as possible to get to the result that they want without worrying about how to set up a "data experiment." Data scientists should be design thinkers.

So, what about traditional data scientists? What will they do?

I expect people that are "scientists" in a traditional sense would elevate themselves in their Maslow's hierarchy by focusing more on advanced aspects of data science and machine learning such as designing tools that would recommend algorithms that might fit the data (we have already witnessed this trend for visualization). There's also significant potential to invent new algorithms based on existing machine learning algorithms that have been into existence for a while. What algorithms to execute when could still be a science to some extent but that's what the data scientists should focus on and not on sampling, preparing, and waiting for hours to analyze their data sets. We finally have Big Data for that.

Image courtesy: scikit-learn

Comments

Popular posts from this blog

Emergent Cloud Computing Business Models

The last year I wrote quite a few posts on the business models around SaaS and cloud computing including SaaS 2.0 , disruptive early stage cloud computing start-ups , and branding on the cloud . This year people have started asking me – well, we have seen PaaS, IaaS, and SaaS but what do you think are some of the emergent cloud computing business models that are likely to go mainstream in coming years. I spent some time thinking about it and here they are: Computing arbitrage: I have seen quite a few impressive business models around broadband bandwidth arbitrage where companies such as broadband.com buys bandwidth at Costco-style wholesale rate and resells it to the companies to meet their specific needs. PeekFon solved the problem of expensive roaming for the consumers in Eurpoe by buying data bandwidth in bulk and slice-it-and-dice-it to sell it to the customers. They could negotiate with the operators to buy data bandwidth in bulk because they made a conscious decision not to st...

A Data Scientist's View On Skills, Tools, And Attitude

I recently came across this interview (thanks Dharini for the link!) with Nick Chamandy, a statistician a.k.a a data scientist at Google. I would encourage you to read it; it does have some great points. I found the following snippets interesting: Recruiting data scientists: When posting job opportunities, we are cognizant that people from different academic fields tend to use different language, and we don’t want to miss out on a great candidate because he or she comes from a non-statistics background and doesn’t search for the right keyword. On my team alone, we have had successful “statisticians” with degrees in statistics, electrical engineering, econometrics, mathematics, computer science, and even physics. All are passionate about data and about tackling challenging inference problems. I share the same view. The best scientists I have met are not statisticians by academic training. They are domain experts and design thinkers and they all share one common trait: they love data!...

Reveiw: Celluon Epic Laser Keyboard

The Celluon Epic is a Bluetooth laser keyboard. The compact device projects a QWERTY keyboard onto most flat surfaces. (Glass tabletops being the exception) You can connect the Epic to vertically any device that supports Bluetooth keyboards including devices running iOS , Android , Windows Phone, and Blackberry 10. On the back of the device there is a charging port and pairing button. Once you have the Epic paired with your device it acts the same as any other keyboard. For any keyboard the most important consideration is the typing experience that it provides. The virtual keyboard brightness is adjustable and is easy to see in most lighting conditions. Unfortunately the brightness does not automatically adjust based on ambient light. With each keystroke a beeping sound is played which can be turned down. The typing experience on the Epic is mediocre at best. Inadvertently activating the wrong key can make typing frustrating and tiring. Even if you are a touch typist you'll still ...