In my previous blog, I wrote about Descriptive Analytics – “what happened in the past” – as the route to Business Intelligence. In a later blog, I covered the framework of analytics problem solving, where Data Summarisation, Data Visualisation, and simple Statistics, are important for Business Intelligence. In this blog, let’s understand Business Intelligence and Descriptive Analytics further.
One of the questions executives are facing in this age is why should they rely on data. One of the appeals of a great leader, has been the ability to cut through the haze, and take decisions.
Why is the data so important to a business? Haven’t many leaders delivered stellar results solely on gut-based decisions?
Harvard Business Review (HBR) states, “The choices facing managers and the data requiring analysis have multiplied even as the time for analyzing them has shrunk”.
It turns out, that, in the context of decisions, bad ones can lead to billion $ failures. With increasing volumes of and quality of data, a fast-changing business environment, and the need to take instant decisions – is pushing the leaders to rely more and more on the data.
Whilst, there might be a romantic allure to gut-based decisions, leaders are now increasingly leaning towards technology to help with the decision-making process. Successful leaders are now combining their intuition, with insights, to make informed decisions.
“Data-Informed, Gut Decisions” is emerging as a key differentiator.
And, as AllBusiness.com puts it, “Business Intelligence bridges the Heart and the Head”.
3 key components of Business Intelligence
The key purpose of Business Intelligence is to dive deep into the historical data, and look for insights. Descriptive Analytics, using methods and models to understand that data, and then visualising it for consumption, is the route to Business Intelligence.
There are 3 main components to Business Intelligence:
- Technology: BI platforms, Enterprise Data warehouse (collecting the data)
- Methodology: Descriptive Analytics (understanding the data)
- Output: Reports, Charts, and Dashboards (visualising the data)
There are many technology platforms (1) available to choose from, and I will not focus on it, in this blog. With the understanding that organisations have a variety of ways to collect the data, we will move our focus to “understanding the data”.
Understanding the data
In the context of business intelligence, understanding refers to knowing the qualitative and quantitative aspects of the data. This is a fundamental activity and an important one. A variety of methods are applied to the data, to gain insights out of them. The universe of data analysis comprises of:
In my previous blog, I had introduced the “types” of data – Structured, Semi-Structured, and Un-Structured data. The data can be further sub-categorised into:
|Cross-sectional data||Data collected on many variables, at the same time||The status of an aircraft at a specific time. Variables are “Speed, Altitude, Head-Winds, Tail-Winds, Outside Temperature etc”|
|Time-series data||Data collected on single variable, at various time intervals||The speed of the aircraft between 10am to 11am, measured every 5 minutes|
|Panel data||Data collected on many variables, at various time intervals||The “speed and altitude” of the aircraft between 10am to 11am, measured every 5 minutes|
The data set represents “how much” data collected or available, is being used for the purposes of analytics. One unique set of data is referred to as an “observation”. Interchangeably, terminologies like records/cases/data-points, are also used for observations.
|Population||All possible observations available. This is the universal set of data|
|Sample||A few samples, of the entire observations, chosen based on certain logic. This is the sub-set of the universal set|
Scales of measurement
The data can be either numeric or alpha-numeric, and therefore, various measurement scales are applicable, depending on the data.
|Nominal scale||Variables that are qualitative, and mathematical operations are not possible or meaningless||Names, categories (1,2,3…), marital status (S, M, D)|
|Ordinal scale||Variables are captured in an ordered set, and some mathematical operations are meaningless||Rating on 1-5 scale. While, 5>4 has a meaning, 5-4 is meaningless|
|Interval scale||Variables are captured in an interval set, and some mathematical operations are meaningless||Temperature between 0 and 100 C|
|Ratio scale||Variables for which ratio can be computed, and meaningful||Comparing salary, sales , consumption|
Measures of Central Tendency
This is a familiar measure for most of the business users – the Mean, Median, and Mode. By using a SINGLE value, these variables can describe the data set.
|Mean value||Average value of the data||Population = Calculated value Sample = Estimated value|
|Median or Mid value||The value that divides the data into 2 equal parts||By arranging the value in ascending order, the mid-point is the median value|
|Mode value||The most frequently occurring value in the data set||The only measure, that can be used for both quantitative and qualitative data|
Quantiles are used for dividing the observations in a population (or sample) in a specific way. For example, Median, divides the data into two equal halves. Other commonly used quantiles are:
|Percentile||The 100 quantiles are called percentiles||P5 refers to 5% of the data in the population|
|Decile||The 10 quantiles are called deciles||Second decile contains the first 20% of the data set|
|Quartile||The 4 quantiles are called quartiles||Quartile 3 corresponds to 75% of the data set|
Measures of Variation
Variation, is one of the primary objections of analytics. Understanding the variability in the data, is the key to many insights.
|Range||Difference between the max and min values of the data||Helps understand the data spread. E.g., age spread in a group|
|Inter-quartile distance||The distance between Quartile 3 & Quartile 1||IQD is useful for identifying outliers in the data|
|Variance||Measure of variability in the data from their average (mean) value||Helps understand how far a set of numbers are spread out from their mean value|
|Standard Deviation||SD quantifies the variation||Low SD = Data points are close to the mean High SD = Data is widely spread out from the mean|
|Variance & SD are important concepts to understand for analytics. A good (and simplified) example is here|
Measures of shapes
Shapes of data distribution in each set (population or sample), is used to explain the variances in the data.
|Symmetry||Many naturally occurring things (like heights of people, blood pressure etc) follow normal distribution (symmetrical in shape). In case of normal distribution, Mean = Median = Mode||The “infamous” Bell Curve in corporate world is a Normal Distribution|
|Skewness||Skewness is a measure of symmetry – whether the distribution of the data set is symmetrical or not||Used to measure how different the actual data set is, from the normal or expected|
|Kurtosis||Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution||Used to measure how different the actual data set is, from the normal or expected|
Visualising the data
Given the complexity of calculations that goes in some of these methods, visualisation plays a vital role in enabling business users to consume the insights.
In the upcoming blog, I will get cover the “visualising the data” aspect.