I’m often asked how I decide which approach to adopt for a particular data analysis problem. From my experience, the answer is always, WHAT IS YOUR GOAL?
There are basically six (6) goals or questions that should be addressed when faced with a statistical problem, namely:
- Descriptive – as the name implies, a description of available data. it involves show ing or summarizing data in a meaningful way. This is what you get from Nigerian Bureau of Statistics publications, they basically report detailed descriptions of data they collect (e.g). Most of my posts are descriptives you can see here and here.
- Exploratory – here you try to find connections, relationships, trends, etc. It determines the direction for future analysis. The analogy to further explain this will be the hospital scenario. Consider a child who is sick and is taken to the hospital. The doctor quickly examines the child and asks some questions about the child’s condition (Exploratory) as this will determine the direction for further investigations and tests while trying to manage his current symptoms. So with exploratory data analysis (EDA) generalizations are not made until “tests results are out”. Always remember “correlation does not imply causation”. See my post on EDA with Whatsapp messages.
- Inferential – use a representative sample of data to say something about a bigger population, ie, make generalizations about a certain population based on a sample by estimating a certain quantity with some degree of uncertainty. For example, I can say something intelligent with some degree of uncertainty, about the difference between the quantity of petrol purchased and the actual amount in the gallon by sampling filling stations in your area.
- Predictive – use data on some objects to predict values of another object. E.g predict the price of a cup of ice-cream using the size of the cup, whether it was served with or without toppings, and whether or not it’s a signature mix.
- Causal – determine what happens to one variable when you make changes to another variable. The aim is to determine whether a particular variable really affects another variable and to estimate the magnitude of that effect if any.
- Mechanistic – the study of deterministic systems. Mostly applicable in physical/engineering sciences. The goal is to understand exact changes in variables that lead to exact changes in other variables. Industrial engineers use deterministic models to predict the behavior of the processes they are designing.
Finally, I would like to add that, It is always good practice to do descriptives and EDA before any other analysis is done. This step is crucial as it guides subsequent decisions.