In my previous blog, I briefly touched upon the concept of hypothesis. Let’s get to see it in detail, in this blog. In the context of Machine Learning, hypothesis testing is one of the important concepts, especially in regression modelling.

**What is a hypothesis?**

In the context of business, any business makes a certain claim, and then works towards finding if it is right or not. For example, if a business wants to grow its revenue, and “thinks or believes” that the additional spend on marketing is going to deliver that growth, a claim (or hypothesis) has been made.

**Hypothesis:** More spend on marketing
= revenue growth

Using data (or opinions or gut-feel or all put together), the business then works towards validating the claim. In some cases, revenue projections are made based on data available or what-if scenarios. In certain cases, business executes pilot programs to measure the effectiveness of the marketing spend, and if it indeed does deliver the results.

With Machine Learning gaining popularity, predictive learning models are used to establish the business case. In this example, the modelling will be to test whether a positive relationship between marketing spend and revenue growth exists. This is called as hypothesis testing.

**What is
hypothesis testing?**

The basic premise of hypothesis testing is to check the VALIDITY of the NULL & the ALTERNATE HYPOTHESIS. Let’s take it one at a time.

In the business example above, our objective is to validate the relationship between marketing spend and revenue.

Initially, when we start the hypothesis testing, the following is assumed to hold good:

- NULL HYPOTHESIS = There is NO relationship between marketing spend and revenue
- ALTERNATE HYPOTHESIS = There exists a relationship between marketing spend and revenue

In the hypothesis testing, initially, we start with the initial belief that the NULL hypothesis is true i.e., “there is NO relationship between marketing spend and revenue”. At the end of the hypothesis testing, using statistical methods, either the NULL hypothesis is ACCEPTED OR REJECTED. When the NULL hypothesis is rejected, the ALTERNATE hypothesis becomes valid.

**The framework
for hypothesis testing**

How does one go about with hypothesis testing? Well, as with everything else, it starts with an idea. A framework from idea to reality looks like:

**Describe
the Idea/problem:** It starts with describing the problem or the need. In our example, it starts
with “Our business needs to grow”. From this, many hypotheses will emerge. One
such hypothesis is “Increase the spend on marketing” – which the business
decided to test out. Our objective is to find whether a “positive” relation
exists between marketing spend, and revenue. By increasing the spend on
marketing, would the revenue grow?

**Define the NULL & ALT Hypothesis:** The next step is to define the NULL & ALT hypothesis. As a standard practice, NULL hypothesis, assumes there is NO relationship between the variables (marketing spend is the input variable, and revenue is the output variable). In our example, it will be:

NULL Hypothesis: There is NO relationship between marketing spend and revenue

ALT Hypothesis: There exists a relationship between marketing spend and revenue

Note: A “thumb-rule” is that the NULL hypothesis is assumed to be true. It starts with that premise, and the objective of hypothesis testing is either to accept or reject the NULL hypothesis.

**Define the
ACC & REJ criteria:** At what point do we accept or reject the NULL hypothesis? This is determined by
the **significance value** (denoted by
the symbol **α**). This value depends on
the context.

For example, If the business deems
that a cut-off of 10% is required to accept or reject NULL hypothesis, then **α** = 0.1. What it means is that, “there
has to be **no more than 10%** chance of
marketing spend and revenue to be **un-related**”.

**Identify the testing methods:** There are a variety of testing methods applied, depending on the context. The most common ones are the Z-Test, T-Test, and Chi-Square Tests. The objective of these testing methods is to check the validity of the NULL hypothesis, based on the evidence from the data sets/observations.

**Calculate the probability:** The probability value (or the p-value), is the statistical evidence, from the data set observed via the test. The p-value represents the probability of observing the test statistical value, when the NULL hypothesis is true.

**Decide:** Finally, based on the **significance value** (**α**), and the **p-value**, a final decision is taken on whether to spend on marketing
to improve the revenues.

**The big
picture**

Let’s bring it all together and observe what happens.

OBJECTIVE | “Our business needs to grow” |

CHOSEN HYPOTHESIS | “Increase spend on marketing, to increase revenue” |

NULL HYPOTHESIS | There is NO relation between marketing spend and revenue |

ALT HYPOTHESIS | There EXISTS a relation between marketing spend and revenue |

SIGNIFICANCE VALUE (α) (to
accept null hypothesis) | No more
than 10% (0.1) of chance for marketing spend and revenue to be un-related |

p-VALUE (from
the tests) | The
evidence from the data suggests that 23% of the time marketing spend and
revenue are un-related (Confused? Look at the objective
of null hypothesis, and what hypothesis is being tested) |

ACCEPTANCE CRITERIA FOR NULL
HYPOTHESIS | p-value < α |

OBSERVED VALUE | p-value (23%) > α (10%) |

INFERENCE | 23% of the time the marketing spend and revenue are UNRELATED. In other words, 77% of the time the marketing spend and revenue are RELATED |

FINAL DECISION | NULL HYP = Rejected ALT HYP = Accepted |

BUSINESS DECISION | Business may decide to go ahead with the marketing spend, as evidence suggests 77% of the time, it results in revenue improvement |

**Further
reading & understanding of the concepts**

If you are interested in getting deeper into the topic, these are a few useful resources to start off with.

Introduction to hypothesis | https://www.youtube.com/watch?v=VK-rnA3-41c |

When to use Z-test & T-test | https://www.youtube.com/watch?v=YsalXF5POtY |

One-tail Z-Test | https://www.youtube.com/watch?v=FU9UR9XVZwc |

Two-tail Z-test | https://www.youtube.com/watch?v=aiRVUkM92os |

Chi-square test | https://www.youtube.com/watch?v=2QeDRsxSF9M |

Thank you for reading this far, and encouraging me to continue to write. In the next blog, let’s get an understanding the regression models – a commonly used Machine Learning method.

Pingback: itemprop="name">Regression: The Crystal Ball of Machine Learning! - Vijay Raghunathan

Pingback: itemprop="name">Machines – How do they learn? - Vijay Raghunathan