Making Sense of the TwoSample TTest
 30
 Nov
 2017
 Posted ByJC
 InData Science. (PDF).
 No Comments.
The twosample ttest is one of the most commonly used hypothesis tests in Lean Six Sigma work. The twosample ttest offers the statistics for comparing average of two groups and identify whether the groups are really significantly different or if the difference is due instead to random chance.
Most importantly, it helps to answer questions like whether the average success rate is higher after implementing a new sales tool than before or whether the test results of patients who received a drug are better than test results of those who received a placebo.
Here is an example starting with the absolute basics of the twosample ttest. The question is, whether there is a significant (or only random) difference in the average cycle time to deliver a pizza from Pizza Company A vs. Pizza Company B. Figure 1 shows the data collected from a sample of deliveries of Company A and Company B.
To perform this test, the following steps must be taken:
1. Plot the Data
For any statistical application, it is essential to combine it with a graphical representation of the data. Several tools are available for this purpose. They include the popular stratified histogram, dotplot or boxplot.
The boxplot in Figure 2 shows that the delivery time for Pizza Company B seems to be lower than for A. However, there is a certain degree of overlap between the two data sets. Therefore, based on this plot, it is risky to draw a conclusion that there is a significant (i.e. statistically proven) difference between the average delivery time of the two companies. A statistical test can help to calculate the risk for this decision.
2. Formulate the Hypothesis for TwoSample tTest
In this case, the parameter of interest is an average, i.e. the nullhypothesis is
H_{0}: μ_{A} = μ_{B},
with μ_{A} and μ_{B} being the population means of both companies.
This means, the alternative hypothesis is
H_{A}: μ_{A} ≠ μ_{B}.
3. Decide on the Acceptable Risk
Since there is no reason for changing the commonly used acceptable risk of 5%, i.e. 0.05, we use this risk as our threshold for making our decision.
4. Select the Right Tool
If there is a need for comparing two means, the popular test for this situation is the twosample ttest or Student’s ttest.
5. Test the Assumptions
Finally, the only prerequisite for the application of the twosample ttest is that data needs to be normal. Therefore, we have drawn the descriptive statistics for both samples (company A and company B).
Since both samples have a pvalue above 0.05 (or 5 percent) for the AndersonDarling Normality test, we can conclude that both samples are normally distributed. The test for normality uses the Anderson Darling test for which the null hypothesis is “Data are normally distributed” and the alternative hypothesis is “Data are not normally distributed.”
6. Conduct the Test
Using the twosample ttest, statistics software SigmaXL generates the output in Figure 4.
Since the pvalue is 0.289, i.e. greater than 0.05 (or 5 percent), we cannot reject H_{0}.
7. Make a Decision
As a result, not rejecting H_{0} means that there is not enough evidence for assuming a difference. Hence, there is no difference between the means. To say that there is a difference is taking a 28.9% risk of being wrong.
Interested in the stats? Read here.

Download Article in PDF.
Recent Articles
 My Job Does Not Require me to be Creative
 Don’t Procrastinate Feedback
 The Future Challenges for the HR Practitioner
 Building the Muscles of Your Workforce
 Beware the Hawthorne
Recent Comments
 NILOY MITTER on My Job Does Not Require me to be Creative
 Eugene on My Job Does Not Require me to be Creative
 Bhupinder Kaur on Is Group Coaching Possible?
 Chiang Meng on Is Group Coaching Possible?
 UK on Don’t Automate, Obliterate!