Home / General / Making Sense of Test For Equal Variances

Making Sense of Test For Equal Variances

Three teams compete in our CHL business simulation, CHL Red, CHL Green and CHL Blue. After completing Day One, it looks like the teams show a very different performance (Figure 1). Although the means look very similar, the variation is strikingly different. This is surprising, since all teams start with exactly the same prerequisites. To test this assumption of different variability among the teams, the Test for Equal Variances is deployed.

Test for Equal Variances - Time Series Plot

Figure 1: Data of CHL Blue, CHL Green and CHL Red in a Time Series Plot

Finding a significant difference in variances (square of standard deviation), in the variation of different data sets is important. These data sets could stem from different teams performing the same job. If they show a different variation, it usually means, that different procedures are used to perform the same job. Looking into this may offer opportunities for improvement bu learning from the best. However, this only makes sense if this difference is proven, i.e. statistically significant.

To perform the Test for Equal Variances, we take the following steps:

1. Plot the Data

Test for Equal Variances - Box Plot for CHL Blue, CHL Green and CHL Red

Figure 2: Box Plot for CHL Blue, CHL Green and CHL Red

For any statistical application, it is essential to combine it with a graphical representation of the data. Several tools are available for this purpose. They include the popular stratified histogram, dotplot or boxplot. The Time Series Plot at Figure 1 does not clearly show the variability of the three teams.

The boxplot in Figure 2 shows much better an obvious difference in the variation between the three groups. A statistical test can help to calculate the risk for this decision.

2. Formulate the Hypothesis for Test For Equal Variances

In this case, the parameter of interest is an average, i.e. the null-hypothesis is

H0: σBlue = σGreen = σRed,

with all σ being the population standard deviation of the three different teams.

This means, the alternative hypothesis is

HA: At least one σ is different to at least one other σ.

3. Decide on the Acceptable Risk

Since there is no reason for changing the commonly used acceptable risk of 5%, i.e. 0.05, we use this risk as our threshold for making our decision.

4. Select the Right Tool

Test for Equal Variances - Descriptive Statistics for CHL Blue, CHL Green and CHL Red

Figure 3: Descriptive Statistics for CHL Blue, CHL Green and CHL Red

If there is a need for comparing variances, there are at least three popular tests available:

  1. For two normally distributed data sets: F-test,
  2. For more than two normally distributed data sets: Bartlett’s test and
  3. For two or more non-normally distributed data sets: Levene’s test.

Since test that are based on a certain distribution are usually sharper, we need to check whether we have normal data. Figure three reveals, that CHL Blue does not show normality following the p-value of the Anderson-Darling Normality Test. Therefore, we need to run a Test for Equal Variances following Levene’s test

5. Test the Assumptions

Finally, there are no other prerequisites for running a Levene’s test.

6. Conduct the Test

Levene's Test for Equal Variances

Figure 4: Levene’s Test for Equal Variances

Running the Test For Equal Variances, using the statistics software SigmaXL generates the output in Figure 4.

Since the p-value for Levene’s Test Statistic is 0.0000, i.e. less than 0.05 (or 5 percent), we reject H0 and accept HA.

7. Make a Decision

With this, the Levene’s statistics means that there is at least one significant difference.

Additionally, this statistics shows which CHL is different from which other CHL. The p-value for Levene Pairwise Probabilities is 0.0000 between CHL Blue and CHL Green, as well as between CHL Green and CHL Red, i.e. there is a significant difference between CHL Blue and CHL Green as well as between CHL Green and CHL Red. The boxplot shows the direction of this difference.

Finally, the statistics informs that CHL Green seems to have a significantly better way to run the simulation with much less variation, i.e. StDev of 1.09min compared to 3.92 and 4.17, respectively. After further looking into the procedure, we recognise that CHL Green organises packages in First-in-First-out (FIFO) order, whereas CHL Blue and CHL Red do not ensure FIFO.


Interested in the stats? Read here.