Skip to main content

Chasing Qualitative Signal In Quantitative Big Data Noise


Joey Votto is one of the best hitters in the MLB who plays for Cincinnati Reds. Lately he has received a lot of criticism for not swinging on strikes when there are runners on base. Five Thirty Eight decided to analyze this criticism with the help of data. They found this criticism to be true; his swings at strike zone pitches, especially fastballs, have significantly declined. But, they all agree that Votto is still a great player. This is how I see many Big Data stories go; you can explain "what" but you can't explain "why." In this story, no one actually went (that I know) and asked Votto, "hey, why are you not swinging at all those fastballs in the strike zone?"

This is not just about sports. I see that everyday in my work in enterprise software while working with customers to help them with their Big Data scenarios such as optimizing promotion forecast in retail, predicting customer churn in telco, or managing risk exposure in banks.

What I find is as you add more data it creates a lot more noise in these quantitative analysis as opposed to getting closer to a signal. On top of this noise people expect there shall be a perfect model to optimize and predict. Quantitative analysis alone doesn't help finding a needle in haystack but it does help identify which part of haystack the needle could be hiding in.
"In many walks of life, expressions of uncertainty are mistaken for admissions of weakness." - Nate Silver
I subscribe to and strongly advocate Nate Silver's philosophy to think of "predictions" as a series of scenarios with probability attached to it as opposed to a deterministic model. If you are looking for a precise binary prediction you're most likely not going to get one. Fixating on a model and perfecting it makes you focus on over-fitting your model on the past data. In other words, you are spending too much time on signal or knowledge that already exists as opposed to using it as a starting point (Bayesian) and be open to run as many experiments as you can to refine your models as you go. The context that turns your (quantitative) information into knowledge (signal) is your qualitative aptitude and attitude towards that analysis. If you are willing to ask a lot of "why"s once your model tells you "what" you are more likely to get closer to that signal you're chasing.

Not all quantitative analyses have to follow a qualitative exercise to look for a signal. Validating an existing hypothesis is one of the biggest Big Data weapons developers use since SaaS has made it relatively easy for developers to not only instrument their applications to gather and  analyze all kinds of usage data but trigger a change to influence users' behaviors. Facebook's recent psychology experiment to test whether emotions are contagious has attracted a lot of criticism. Keeping ethical and legal issues, accusing Facebook of manipulating 689,003 users' emotions for science, aside this quantitative analysis is a validation of an existing phenomenon in a different world. Priming is a well-understood and proven concept in psychology but we didn't know of a published test proving the same in a large online social network. The objective here was not to chase a specific signal but to validate a hypothesis— a "what"—for which the "why" has been well-understood in a different domain.

About the photo: Laplace Transforms is one of my favorite mathematical equations since these equations create a simple form of complex problems (exponential equations) that is relatively easy to solve. They help reframe problems in your endeavor to get to the signal.

Comments

Popular posts from this blog

Emergent Cloud Computing Business Models

The last year I wrote quite a few posts on the business models around SaaS and cloud computing including SaaS 2.0 , disruptive early stage cloud computing start-ups , and branding on the cloud . This year people have started asking me – well, we have seen PaaS, IaaS, and SaaS but what do you think are some of the emergent cloud computing business models that are likely to go mainstream in coming years. I spent some time thinking about it and here they are: Computing arbitrage: I have seen quite a few impressive business models around broadband bandwidth arbitrage where companies such as broadband.com buys bandwidth at Costco-style wholesale rate and resells it to the companies to meet their specific needs. PeekFon solved the problem of expensive roaming for the consumers in Eurpoe by buying data bandwidth in bulk and slice-it-and-dice-it to sell it to the customers. They could negotiate with the operators to buy data bandwidth in bulk because they made a conscious decision not to st...

Focus On Your Customers And Not Competitors

A lorry is a symbol of Indian logistics and the person who is posing against it is about to rethink infrastructure and logistics in India. Jeff Bezos is enjoying his trip to India charting Amazon’s growth plan where competitors like Flipkart have been aggressively growing and have satisfied customer base. This is not the first time Bezos has been to India and he seems to understand Indian market far better than many CEOs of American companies. His interview with a leading Indian publication didn’t get much attention in the US where he discusses Amazon’s growth strategy in India. When asked whether he is in panic mode: For 19 years we have succeeded by staying heads down, focused on our customers. For better or for worse, we spend very little time looking at our competitors. It is better to stay focused on customers as they are the ones paying for your services. Competitors are never going to give you any money. I always believe in focusing on customers, especially on their latent unme...

Purple Squirrels

It is fashionable to talk about talent shortage in the silicon valley. People whine about how hard it is to find and hire the "right" candidates. What no one wants to talk about is how the hiring process is completely broken. I need to fill headcount: This is a line that you hear a lot at large companies. Managers want to hire just because they are entitled to hire with a "hire or lose headcount" clause. Managers spend more time worrying about losing headcount and less time finding the right people the right way. Chasing a mythical candidate: Managers like to chase purple squirrels . They have outrageous expectations and are far removed from reality of talent market. Managers are also unclear on exactly what kind of people they are looking to hire. Bizarre interview practices: "How many golf balls can fit in a school bus?" or "can you write code with right hand while drawing a tree with left hand?" We all have our favorite bizarre interview st...