The Science of Hypothesis Testing: Unlocking the Power of Data Analysis

data science Oct 05, 2023
thumbnail image for hypothesis testing blog from BigDataElearning

Have you ever had a heated debate with your friends on a topic? 

Picture this: you confidently declare that drinking green tea will reduce stress levels…

..but your friend disagrees and insists that it won't make a difference.

Well, congratulations! You've just engaged in hypothesis testing πŸ˜€

You had a hypothesis, and your friend had an alternative hypothesis. 

And by arguing back and forth, you were essentially testing each other's hypotheses to determine which one was supported by the evidence (or lack thereof).

Data Science Explained In 20 Infographics

“Data Science Made Simple: Learn It All Through 20 Engaging Infographics (Completely Free)"


With this in mind let’s look further what an hypothesis testing means, few hypothesis and null hypothesis examples, and so we will cover the following

What is Hypothesis Testing?

First let’s start with “What Hypothesis testing means?”.

"Hypothesis testing is a statistical method that is used to determine whether a claim or hypothesis about a population is likely to be true or not."

In our example, if you evaluate whether drinking tea reduced stress level or not, from a set of observed results, it is called hypothesis testing

 

Understanding Hypotheses:

In the above analogy “drinking green tea reduces stress levels” is the hypothesis that you made.  

Hypothesis is nothing but a proposed explanation with limited evidence.  This is a starting point which can be subject for further argument.

Null Hypothesis and Alternative Hypothesis:

 In hypothesis testing, there are two types of hypotheses: 

  • null hypothesis (H0) and 
  • alternative hypothesis (H1). 

"The null hypothesis is the default hypothesis, which assumes that there is no significant difference between the hypothesis sample and the population."

In our analogy your hypothesis “drinking green tea reduces stress levels” is the null hypothesis

"The alternative hypothesis on the other hand, is the opposite of the null hypothesis and assumes that there is a significant difference between the hypothesis sample and the population."


The alternate hypothesis is your friend claiming that drinking green tea doesn’t reduces stress levels 

Another Interesting Analogy:

Think of the null hypothesis as a lawyer defending their client in court. The client is assumed to be innocent until proven guilty. The lawyer's job is to argue that there isn't enough evidence to prove their client's guilt.

In this analogy, the null hypothesis is "the client is innocent" and the alternative hypothesis is "the client is guilty."

The job of the lawyer is to argue that there isn't enough evidence to prove the client's guilt, which is essentially saying that there isn't enough evidence to reject the null hypothesis (i.e., that the client is innocent).

Now that you have got a hold of what null hypothesis and what an alternate hypothesis means, also keep in mind that you also need to say how confident you are when making a null hypothesis. 

This is where the level of confidence comes into play.

Level of Confidence

"The level of confidence refers to the degree of certainty that the true population parameter lies within a specified range of values."

Suppose you want to test a null hypothesis that the average height of 5th grade students falls within the range of 5 feet to 5 feet 2 inches, with a 90% confidence level. 

This means that if you were to conduct the test multiple times with different groups of 5th grade students, you would expect 9 out of 10 tests to yield an average height falling within the specified range.

Level of Significance

 "The significance level, also known as the alpha level, is the probability of rejecting the null hypothesis."

The level of significance comes into play when you set a threshold for the probability of making a type I error, which is the error of rejecting a true null hypothesis. 

In this case, if you set the level of significance at 0.10 (or 10%). This means that if 1 out of 10 tests shows that the average height doesn’t fall within specified range, then you accept it. However if 2 or more, out of 10 tests shows that average height doesn’t fall within the range, then you can reject the null hypothesis.

It's important to note that the significance level and the level of confidence are related to each other. 

The level of significance also denoted as (α) and the level of confidence (1-α) are complementary concepts that are related to each other. So formula for level of significance (α) = (1-α)

If you set the significance level at 0.10, this means you have a confidence level of 90% (or 1-0.10). 

Conversely, if you set the significance level at 0.05, you have a confidence level of 95% (or 1-0.05).

 

The Data Science Aspirant's 90-Day Proven Roadmap

Get INSTANT ACCESS to This Proven Roadmap To Become a Data Scientist in 90 Days,

Even Without Prior Data Science Experience - Guaranteed.

 


Conclusion

As we finish learning about hypothesis testing, let's summarize what we've learned:

  1. Importance of Hypothesis Testing: We saw how important hypothesis testing is in data science. It helps us decide if the results we see support a specific idea.

  2. Two Types of Hypotheses: There are two main types—null hypothesis says there's no big difference, and alternate hypothesis challenges that, suggesting there is a difference.

  3. Confidence and Significance Levels: We learned about confidence level, which shows how sure we are that our findings represent the whole group. And significance level, which decides when we can reject the null hypothesis.

  4. Making Smart Choices: Finally, understanding these basics helps us make better choices based on data and ensures our conclusions are more accurate.

Knowing the basics of hypothesis testing helps us understand how important our data is and makes our decisions better.

Remember, the next time you hear a claim or hypothesis, ask yourself, "Has it been tested?", “What is your level of confidence in the hypothesis?” , “What is the acceptable level of significance of making errors?” πŸ™‚

Question For You

What is the null hypothesis?

A) The hypothesis that is typically supported by evidence. 

B) The hypothesis that there is a significant difference between the sample and the population.

C) The default hypothesis that assumes no significant difference between the sample and the population.

D) The hypothesis that supports an alternative hypothesis.

Tell me in the comments if it is (A, B, C, or D) that best describes the null hypothesis

Stay connected with weekly strategy emails!

Join our mailing list & be the first to receive blogs like this to your inbox & much more.

Don't worry, your information will not be shared.