Skip to main content

Proxies Are As Useful As Real Data


Last year I ran a highly unscientific experiment. I would regularly put a DVD in an open mail bin in my office to mail it back to Netflix, every late Monday afternoon. I would also count the total number of Netflix DVDs put inside that bin by other people. Over a period of time I observed a continuous and consistent decline in the number of DVDs. I compared my results with the numbers released by Netflix. They matched. I'm not surprised. Even though this was an unscientific experiment on a very small sample size with a high degree of variables, it still gave me insights into the overall real data, that I otherwise had no access to.

Proxies are as useful as real data.

When Uber decides to launch a service in a new city or when they are assessing demand in an existing city they use crime data as surrogate to measure neighborhood activity. This measurement is a basic input in calculating the demand. There are many scenarios and applications where access to a real dataset is either prohibitively expensive or impossible. But, a proxy is almost always available and it is good enough in many cases to make certain decisions that eventually can be validated by real data. This approach, even though simple, is ignored by many product managers and designers. Big Data is not necessarily solving the problem of access to a certain data set that you may need, to design your product or make decisions, but it is certainly opening up an opportunity that didn't exist before: ability to analyze proxy data and use algorithms to correlate them with your own domain.

As I have argued before, the data external to an organization is probably far more valuable than the data that they internally have. Until now the organizations barely had capabilities to analyze a subset of their all internal data. They could not even think of doing anything interesting with the external data. This is rapidly going to change as more and more organizations dip their toes in Big Data. Don't discriminate any data sources, internal or external.

Probably the most popular proxy is the per-capita GDP to measure the standard of living. The Hemline Index is yet another example where it is believed that the women's skirts become shorter (higher hemline) during good economic times and longer during not-so-good economic times.

Source: xkcd
Proxy is just a beginning of how you could correlate several data sources. But, be careful. As wise statisticians will tell you, correlation doesn't imply causation. One of my personal favorite example is the correlation between the Yankees winning the worldseries and a democratic president in the oval office. Correlation doesn't guarantee causation, but it gives you insights into where to begin, what question to ask next, and which dataset might hold a key to that answer.This iterative approach wasn't simply feasible before. By the time people got an answer to their first question, it was too late to ask the second question. Ability to go after any dataset anytime you want opens up a lot more opportunities. At the same time when Big Data tools, computing, and access to several external public data sources become a commodity it would come down to human intelligence prioritizing the right questions to ask. As Peter Skomoroch, a principal data scientist at LinkedIn, puts it "'Algorithmic Intuition' is going to be as important a skill as 'Product Sense' in the next decade."

Comments

Popular posts from this blog

Emergent Cloud Computing Business Models

The last year I wrote quite a few posts on the business models around SaaS and cloud computing including SaaS 2.0 , disruptive early stage cloud computing start-ups , and branding on the cloud . This year people have started asking me – well, we have seen PaaS, IaaS, and SaaS but what do you think are some of the emergent cloud computing business models that are likely to go mainstream in coming years. I spent some time thinking about it and here they are: Computing arbitrage: I have seen quite a few impressive business models around broadband bandwidth arbitrage where companies such as broadband.com buys bandwidth at Costco-style wholesale rate and resells it to the companies to meet their specific needs. PeekFon solved the problem of expensive roaming for the consumers in Eurpoe by buying data bandwidth in bulk and slice-it-and-dice-it to sell it to the customers. They could negotiate with the operators to buy data bandwidth in bulk because they made a conscious decision not to st...

Focus On Your Customers And Not Competitors

A lorry is a symbol of Indian logistics and the person who is posing against it is about to rethink infrastructure and logistics in India. Jeff Bezos is enjoying his trip to India charting Amazon’s growth plan where competitors like Flipkart have been aggressively growing and have satisfied customer base. This is not the first time Bezos has been to India and he seems to understand Indian market far better than many CEOs of American companies. His interview with a leading Indian publication didn’t get much attention in the US where he discusses Amazon’s growth strategy in India. When asked whether he is in panic mode: For 19 years we have succeeded by staying heads down, focused on our customers. For better or for worse, we spend very little time looking at our competitors. It is better to stay focused on customers as they are the ones paying for your services. Competitors are never going to give you any money. I always believe in focusing on customers, especially on their latent unme...

Purple Squirrels

It is fashionable to talk about talent shortage in the silicon valley. People whine about how hard it is to find and hire the "right" candidates. What no one wants to talk about is how the hiring process is completely broken. I need to fill headcount: This is a line that you hear a lot at large companies. Managers want to hire just because they are entitled to hire with a "hire or lose headcount" clause. Managers spend more time worrying about losing headcount and less time finding the right people the right way. Chasing a mythical candidate: Managers like to chase purple squirrels . They have outrageous expectations and are far removed from reality of talent market. Managers are also unclear on exactly what kind of people they are looking to hire. Bizarre interview practices: "How many golf balls can fit in a school bus?" or "can you write code with right hand while drawing a tree with left hand?" We all have our favorite bizarre interview st...