In the previous article about inferential statistics, we considered a hypothesis that predictions of sample data can infer those of the actual population, only if the sample data represents the actual population (assumption). Now the question arises, how do we know that the considered hypothesis is true? This is where the concept of hypothesis testing comes into the picture.
Hypothesis testing determines if the considered hypothesis is a valid one by drawing conclusions from the results of testing methods.
Steps for hypothesis testing:
Hypothesis testing involves five steps:
1. Stating the null and alternate hypothesis
2. Specifying the level of significance (α)
3. Finding the critical values
4. Selecting a test statistic
5. Drawing the conclusions
1. Types of hypothesis:
There are two types of hypotheses:
A.Null hypothesis (H0):
● Null hypothesis is the actual claim from the problem.
● This is the hypothesis that should be tested, to know whether the considered hypothesis is valid or not.
● If the considered hypothesis is not valid or failed, then we state that the null hypothesis is rejected.
● Else if, the considered hypothesis is valid, we must not state that we accept the null hypothesis. We must state that we failed to reject the null hypothesis.
● In null hypothesis, there is always an equality relation between the variables. You’ll find symbols like =, ≤ and ≥.
B.Alternate hypothesis (H1):
● This is the alternative hypothesis for the null hypothesis.
● Unlike null hypothesis, alternate hypothesis is not tested.
● Alternate hypothesis is considered when the null hypothesis is rejected.
● In this hypothesis, there will be no equality relation between the variables. You will find symbols like <, > and ≠.
A company has claimed that the overall sales of its total products by the end of the month are at least 5000.
In this example,
Null hypothesis: total products ≥ 5000
Alternate hypothesis: total products < 5000
A company has claimed that the overall sales of its total products by the end of the month are more than 5000
In this example,
Null hypothesis: total products ≤ 5000
Alternate hypothesis: total products > 5000
2. Level of significance (α):
● Level of significance determines the probability of rejecting the null hypothesis.
● Level of significance must be chosen carefully. If the level of significance is high, then there are more chances that we reject the null hypothesis. Therefore, the probability for a type 1 error (discussed below) is high.
● If the level of significance considered is too low, then the probability of rejecting the null hypothesis is low. So, the probability for a type 2 error (discussed below) increases.
● α =5% is most widely considered as the level of significance because it balances perfectly between type 1 and 2 errors.
● Assuming α=5%, leaves us confidence interval with 95%.
● This means out of all the randomly selected samples, for 95% of the sample intervals the true population mean lies in the interval and for the remaining 5% of the samples the true population mean would be outside the confidence interval.
Finally, there will be only one population mean value known as true population value. In most of the cases, we will never know the actual true population mean value.
Z-table: http://www.z-table.com/ (this table should be used below for calculations)
3. Finding the critical values:
● Now we must find the region where the null hypothesis is rejected and the region where we fail to reject the null hypothesis.
● If the alternate hypothesis is of the form,
H1 ≠, then it is a two tailed test
H1 > or H1 <, then it is a one tailed test
Let α = 5% or 0.05
Here α is divided by 2, to serve as lower and upper boundary. I.e. αu = 0.025 (upper) same for µl (lower)
Let α =5% or 0.05
● To do so, we have 2 methods:
a. Critical value method:
● Zc is considered as the critical value, Zc = (1 – α).
● Find the upper and lower critical value using,
UCV (upper boundary) = µ + (Zc * σ x̅)
LCV (lower boundary) = µ - (Zc * σ x̅)
µ 🡪Population mean
Zc 🡪 critical value
σ x̅ 🡪 standard deviation of sample
b. P value method:
This method is the most widely used one. It works only with Z-test.
● Find the Z score.
● Find the p value for relative Z score.
● If the problem is a one-tailed test then, p value remains the same.
● If the problem is a two-tailed test then we must consider both upper and lower boundary, so multiply the obtained valued with 2.
4. Selecting a test statistic:
● There are various statistical methods. Here we will be learning about the first two methods, i.e. Z-test and T-test.
A. Z -test method
B. T test method
C. Chi-square method etc.
● Z -test method
▪ Formula for calculating the Z score is,
● T test method
Formula for calculating the T score is,
t = (x̅ - µ0) / (s /√n)
x̅ 🡪 sample mean
µ0 🡪 population mean
s 🡪 standard deviation of sample
n 🡪 sample size
5. Drawing the conclusions:
● With the critical value method, if the Z score or T score lies in the upper or lower critical region, we can reject the null hypothesis.
● With the critical value method, if the Z score or T score does not lie in the upper or lower critical region, we fail to reject the null hypothesis.
● With the p value method, if the p value is ≤ level of significance, we can reject the null hypothesis.
● With the p value method, if the p value is > level of significance, we fail to reject the null hypothesis.
Types of errors:
There are two types of errors that may occur in hypothesis testing:
1. Type 1 (α):
● Type 1 error occurs when we reject the null hypothesis, even when the null hypothesis is true or valid.
● A type 1 error is represented with alpha (α).
1. Type 2 (β):
● Type 2 error occurs when we fail to reject the null hypothesis, when the null hypothesis is false or invalid.
● A type 2 error is represented with beta (β).
● In some cases, having a type 2 error is more dangerous when compared to type 1.
A pharmaceutical company is producing a medicine to cure cancer and the company claims that the medicine is effective.
In this example,
Null hypothesis: medicine produced is effective.
Alternate hypothesis: medicine produced is not effective.
Type 1 (α): medicine is effective, but we rejected the null hypothesis.
Type 2 (β): medicine is not effective, but we failed to reject the null hypothesis
🡪From the above example, type 1 error does not affect people. Whereas type 2 may show some side effects or even lead to death.
🡪This is the reason having a type 2 error is more dangerous when compared to type 1 error.
An insurance company is reviewing its current policy rates. When originally setting the rates they believed that the average claim amount will be maximum Rs1,80,000. They are concerned that the true mean is actually higher than this, because they could potentially lose a lot of money. They randomly selected 40 claims and found the sample mean to be Rs1,95,000. Assuming that the standard deviation of claims is Rs50,000 and set (level of significance) α= 0.05 or 5%, test to see if the insurance company should be concerned or not.
Population standard deviation = 50000
Mean of the sample data = 195000
Sample size = 40
Step 1: consider the null and alternate hypothesis
● Null hypothesis (H0) = µ ≤ 180000
● Alternate hypothesis (H1) = µ > 180000
● From the above, we can say that it is a right tailed test.
Step 2: consider the level of significance (α) and calculate the critical value
● Significance level = 5%
● Critical value = Z value of (1 – 0.05 = 0.950) = 1.65
● Common critical values obtained for relative level of significance (α),
Critical value (1 tailed)
Critical value (2 tailed)
● Upper critical value = 195000 + (1.65 * 50000) = 277500
Step 3: select suitable test method and evaluate the value
● Here we choose the Z- test method. Z value = 1.897 (using Z test formula)
If you want to learn about how to find the Z value using Z table, have a look at this video.
Step 4: compare the obtained Z value with critical value
● Z value(1.897) > critical value(1.65)
Step 5: drawing conclusions
● As the obtained Z value lies in the critical region, we reject the null hypothesis.
Chief Executive Officer @Himanshu Bahmani
Machine Learning based Forecasting in Supply Chain
March 28, 2018
Predictive Customer Lifetime Value
March 28, 2018