We’ve taken a quick look at the basics of hypothesis testing in the previous session. Now, let’s dive deeper into hypothesis formulation, which sets the stage for the analysis and determines the direction of the statistical test.
What Is a Hypothesis?
A hypothesis is a statement that makes a claim about a population parameter, such as a mean or proportion, that you want to test using sample data. A hypothesis is said to be well-formulated when it is clear, testable, and directly aligned with the research question or business problem you are trying to solve.
In the context of inferential statistics, there are usually two competing, mutually exclusive hypotheses:
Null Hypothesis (H0)
This is the status quo or the default state of the matter. The null hypothesis is what the data is being tested against, and it serves as the starting point or the baseline of the hypothesis. In the absence of information, it is the null hypothesis that stands, and it represents the default assumption that there is no effect, no difference, or no change. It can also be a statement of no relationship between variables.
Alternative Hypothesis (H1 or Ha)
This is what you’re trying to provide evidence for or what needs to be established using data. The rival opinion of a null hypothesis, the alternative hypothesis serves as the research hypothesis or the improvement target. It suggests that there is an effect, a difference, or a change.
Before proceeding further, let’s take a look at examples of hypothesis formulation in action. Observe the common elements that null and alternative hypotheses share.
Example 1
Our first situation concerns a manufacturing company that has introduced a new production process. The company wants to determine whether this new process produces fewer defects compared to the old process. Historically, the old process had an average defect rate of 5%. Considering this, we have two hypotheses :
Null Hypothesis (H0) – The new production process does not reduce the defect rate compared to the old process. (Defect rate = 5%)
Alternative Hypothesis (H1) – The new production process reduces the defect rate compared to the old process. (Defect rate < 5%)
Procedure:
- Collect a sample of products produced using the new process.
- Calculate the defect rate in the sample.
- Perform a statistical test (e.g., a one-sample Z-test or t-test) to determine whether the sample defect rate is significantly lower than 5%.
- Based on the p-value and significance level (e.g., 𝛼 = 0.05), decide whether to H0 reject in favor of H1.
Outcome: If the p-value is less than the significance level, the company would reject the null hypothesis and conclude that the new process significantly reduces the defect rate.
Example 2
We have a company that wants to compare the effectiveness of two different marketing campaigns, Campaign A and Campaign B, in terms of customer conversion rates. The company randomly assigns customers to either Campaign A or Campaign B and tracks the number of conversions for each campaign. That being said, here are our hypotheses:
Null Hypothesis (H0) – There is no difference in the conversion rates between Campaign A and Campaign B. (Conversion rate 𝐴=𝐵)
Alternative Hypothesis (H1) – There is a difference in the conversion rates between Campaign A and Campaign B. (Conversion rate 𝐴≠𝐵)
Procedure:
- Collect data on the number of conversions from both campaigns.
- Calculate the conversion rates for Campaign A and Campaign B.
- Perform a two-sample Z-test or t-test to compare the conversion rates between the two campaigns.
- Based on the p-value and significance level (e.g., 𝛼=0.05), decide whether to reject H0 in favor of H1.
Outcome: If the p-value is less than the significance level, the company would reject the null hypothesis and conclude that there is a significant difference in effectiveness between the two marketing campaigns.
In these examples, notice that the null hypothesis serves as the baseline or default position that you want to test against. It has the information that we know. It’s also expressed as an equality (e.g., =, ≤, ≥). Meanwhile, the alternative hypothesis is what you aim to support with your data. It’s testing an assumption that is beyond what is known. It is also expressed as an inequality (e.g., ≠, >, <). Keep these details in mind for the next portion of the lesson, which is determining how to formulate hypotheses.
How Do You Formulate Hypotheses?
Early on, it’s important to determine which hypothesis should be null and which should be alternative. Here is a step-by-step instruction that will help you through this process.
Step 1: Define the Research Question
It’s a must to start with a clear research question or business problem you want to solve. Are you trying to see if there’s a difference, an effect, or a change in a particular variable? For example, you can ask “Does a new marketing strategy increase customer engagement?” This question will guide the formulation of your hypotheses.
Step 2: Identify the Population Parameter of Interest
Next, determine what population parameter (mean, proportion, variance, etc.) you are interested in. Let’s say that you’re interested in the mean customer engagement rate.
Step 3: Formulate the Null Hypothesis (H0)
Remember that the null hypothesis typically states that there is no effect or no difference. It is the hypothesis that the researcher tries to disprove, and it’s often formulated as an equality. The null hypothesis can be that the mean customer engagement rate after the new marketing strategy is equal to the mean engagement rate before the strategy.
Step 4: Formulate the Alternative Hypothesis (H1 or Ha)
The alternative hypothesis is what you want to provide evidence for. It suggests that there is a significant effect or difference, and it’s typically formulated as an inequality. In contrast to the null hypothesis above, our alternative hypothesis can be that the mean customer engagement rate after the new marketing strategy is greater than the mean engagement rate before the strategy.
Let’s use the steps mentioned above to come up with null and alternative hypotheses for the following situations.
Example 1
Let’s say we have a laboratory that is testing a new drug’s effectiveness.
Research Question: Does a new drug lower blood pressure more effectively than the existing standard treatment?
- Null Hypothesis (H0) – The new drug has no difference in effectiveness compared to the existing treatment. The numbers will show that the mean reduction in blood pressure for the new drug = the mean reduction for the existing treatment.
- Alternative Hypothesis (H1) – The new drug is more effective in lowering blood pressure than the existing treatment. If this is the case, the mean reduction in blood pressure for the new drug > mean reduction for the existing treatment.
Here, H0 assumes there is no difference between the two treatments. The alternative hypothesis H1 is what the researchers hope to demonstrate—that the new drug is more effective.
Example 2
We want to compare customer satisfaction levels between two products.
Research Question: Is there a difference in customer satisfaction between Product A and Product B?
- Null Hypothesis (H0) – There is no difference in customer satisfaction between Product A and Product B. Here, the mean satisfaction score for Product A = the mean satisfaction score for Product B.
- Alternative Hypothesis (H1) – There is a difference in customer satisfaction between Product A and Product B. The numbers will end up with the mean satisfaction score for Product A ≠ mean satisfaction score for Product B.
In this situation, the null hypothesis H0 suggests that there is no difference in customer satisfaction between the two products, while the alternative hypothesis H1 suggests that there is a difference, regardless of direction.
By carefully crafting your null and alternative hypotheses, you can ensure that your statistical analysis is meaningful and relevant to your objectives. Stay tuned as we go deeper into the basic concepts of hypothesis testing and performing a hypothesis test in the next sessions.