This week instead of deconstructing another data set, I want to talk about why my last data post was wrong. The goal of this post is to touch upon a few basic points to consider when either analyzing data or reading an analysis – by showing an example of how I was wrong.
If you want to make the most out of this post, it’s probably a good idea to first read the last one.
In data it’s usually easy for anyone to make you believe something by showing evidence. We see presidential poll predictions all the time where big news channels fail repeatedly (thankfully) in making the right predictions. Usually it’s the classic case when things “seem right” but aren’t. Like my previous data blog post.
Having said that let’s go straight into three factors where people usually go wrong in data analysis –
1. Correlation and causation – There is so much already talked about this point but they are the easiest to get wrong. That’s probably because it’s very easy to draw casual causation. Which is exactly what I did in my last analysis. The three countries I was looking at (India, Ethiopia and Indonesia) had child mortality rates which were falling the period when the US loaned money, so I casually assumed that the US assistance had effect. A fatal error.
Here’s a simple rule – If we are looking at the influence of a data set, we are looking at causation.Correlation might lead to causation but is more than often not the case. To prove causation you require more mathematical techniques.
The actual solution – In my case the actual solution to the problem would be to look into all the aids the three countries got, then run a suitable form of regression, then prove or disprove the hypotheses statistically. There is no shortcut to arriving to these conclusions. That maybe another reason why the media gets it wrong all the time, it requires a certain level of understanding of statistics to prove causation and people who make these conclusions may not have the technical competence.
2. The goal of proving a hypotheses – This second point is more to do with the impatience of human intuition. Many time during an analysis we want to prove a hypothesis and keep trying to find evidence to prove it as opposed to discarding the original hypothesis altogether. Things sometimes seem logically true but may be completely inaccurate. It’s sometimes our goal to prove our logical intuition right as opposed to finding the truth which is the cause of these kind of mistakes.
In my case, I was looking for evidence to show that all the money the US had been lending worked. It may not have and I came to a hurried conclusion.
3. Interdependence – It’s very logical to say that aid affects child mortality, but maybe it’s also the other way around?
For instance, there might be a cut off point in the data where once child mortality reaches a threshold aid is cut off? Which might have been the reason for US to cut off aid as well.
That’s it. I hope this helps you looking at an analysis to understand why it may be wrong or why it may have jumped to conclusions too quickly.
I will be back next week or mid next week with a new data set.
Side Note: A huge thank you to Shreemoy Mishra for his feedback!