Skip to main content

Proxies Are As Useful As Real Data


Last year I ran a highly unscientific experiment. I would regularly put a DVD in an open mail bin in my office to mail it back to Netflix, every late Monday afternoon. I would also count the total number of Netflix DVDs put inside that bin by other people. Over a period of time I observed a continuous and consistent decline in the number of DVDs. I compared my results with the numbers released by Netflix. They matched. I'm not surprised. Even though this was an unscientific experiment on a very small sample size with a high degree of variables, it still gave me insights into the overall real data, that I otherwise had no access to.

Proxies are as useful as real data.

When Uber decides to launch a service in a new city or when they are assessing demand in an existing city they use crime data as surrogate to measure neighborhood activity. This measurement is a basic input in calculating the demand. There are many scenarios and applications where access to a real dataset is either prohibitively expensive or impossible. But, a proxy is almost always available and it is good enough in many cases to make certain decisions that eventually can be validated by real data. This approach, even though simple, is ignored by many product managers and designers. Big Data is not necessarily solving the problem of access to a certain data set that you may need, to design your product or make decisions, but it is certainly opening up an opportunity that didn't exist before: ability to analyze proxy data and use algorithms to correlate them with your own domain.

As I have argued before, the data external to an organization is probably far more valuable than the data that they internally have. Until now the organizations barely had capabilities to analyze a subset of their all internal data. They could not even think of doing anything interesting with the external data. This is rapidly going to change as more and more organizations dip their toes in Big Data. Don't discriminate any data sources, internal or external.

Probably the most popular proxy is the per-capita GDP to measure the standard of living. The Hemline Index is yet another example where it is believed that the women's skirts become shorter (higher hemline) during good economic times and longer during not-so-good economic times.

Source: xkcd
Proxy is just a beginning of how you could correlate several data sources. But, be careful. As wise statisticians will tell you, correlation doesn't imply causation. One of my personal favorite example is the correlation between the Yankees winning the worldseries and a democratic president in the oval office. Correlation doesn't guarantee causation, but it gives you insights into where to begin, what question to ask next, and which dataset might hold a key to that answer.This iterative approach wasn't simply feasible before. By the time people got an answer to their first question, it was too late to ask the second question. Ability to go after any dataset anytime you want opens up a lot more opportunities. At the same time when Big Data tools, computing, and access to several external public data sources become a commodity it would come down to human intelligence prioritizing the right questions to ask. As Peter Skomoroch, a principal data scientist at LinkedIn, puts it "'Algorithmic Intuition' is going to be as important a skill as 'Product Sense' in the next decade."

Comments

Popular posts from this blog

15 YEARS OLD GIRL IMPREGNATED AND MAN RESPONSIBLE FOR IT TOOK FOR AN ABORTION THAT FAILED

BBI FACILITATE ARREST OF 35 YEARS OLD FOR DEFILEMENT, IMPREGNATING 15 YEARS OLD GIRL AND ABORTING FIVE MONTHS PREGNANCY IN ANAMBRA STATE. Today, at 1:26pm, We received a complaint from a concerned citizen who informed us of a 15yrs old girl brought into a hospital for medical treatment. Our intelligence team led by Director General Gwamnishu Emefiena Harrison Kenneth Nwaobi Ezika Kene and others left Asaba and arrived Ogidi Anambra state for investigation. 35yrs Chris Azuoma took the victim to hospital where she was injected and given abortion pills. She bled heavily and had complications and so decided to take her to a specialist hospital to evacuate the foetus. Getting to the hospital, we met the management and identified ourselves as Human rights group and they granted us permission to interview the victim. She confirmed the story and the perpetrator confessed forcefully having unprotected sexual intercourse with the victim. 2015 Administration of Criminal Justice permit private per

Hacking Into The Indian Education System Reveals Score Tampering

Debarghya Das has a fascinating story on how he managed to bypass a silly web security layer to get access to the results of 150,000 ISCE (10th grade) and 65,000 ISC (12th grade) students in India. While lack of security and total ignorance to safeguard sensitive information is an interesting topic what is more fascinating about this episode is the analysis of the results that unearthed score tampering. The school boards changed the scores of the students to give them "grace" points to bump them up to the passing level. The boards also seem to have tampered some other scores but the motive for that tampering remains unclear (at least to me). I would encourage you to read the entire analysis and the comments , but a tl;dr version is: 32, 33 and 34 were visibly absent. This chain of 3 consecutive numbers is the longest chain of absent numbers. Coincidentally, 35 happens to be the pass mark. Here's a complete list of unattained marks - 36, 37, 39, 41, 43, 45, 47, 49, 51, 53,

Reveiw: Celluon Epic Laser Keyboard

The Celluon Epic is a Bluetooth laser keyboard. The compact device projects a QWERTY keyboard onto most flat surfaces. (Glass tabletops being the exception) You can connect the Epic to vertically any device that supports Bluetooth keyboards including devices running iOS , Android , Windows Phone, and Blackberry 10. On the back of the device there is a charging port and pairing button. Once you have the Epic paired with your device it acts the same as any other keyboard. For any keyboard the most important consideration is the typing experience that it provides. The virtual keyboard brightness is adjustable and is easy to see in most lighting conditions. Unfortunately the brightness does not automatically adjust based on ambient light. With each keystroke a beeping sound is played which can be turned down. The typing experience on the Epic is mediocre at best. Inadvertently activating the wrong key can make typing frustrating and tiring. Even if you are a touch typist you'll still