In this article
- Overview of Significance Tests
- Testing for Differences in Counts, Proportions, and Percentages (Chi-Square Test)
- Testing for Difference Between Two Means (Student’s t-test)
- Weighting and Effective Sample Size
- Comparisons Between Overlapping Groups and with Total
- Multiple Comparison Considerations
1. Overview of Significance Tests
The standard significance tests used to compare values within Harmoni use the conventional null hypothesis significance tests (NHST) approach. This proceeds by initially assuming there is no difference between the values, and then looking at the actual difference in values and calculating how likely this data is, given this assumption. If the resulting probability is small (typically under 0.05) the hypothesis is “rejected” and we can conclude there is a significant difference between the values.
Usually, a confidence level of 95% is applied for determining significant differences. This corresponds to a significance level (or alpha) of 0.05 and means if the result of the test yields a probability (or p-value) less than the significance level, then the result is “statistically significant”.
In Harmoni chi-squared tests are used for comparisons between discrete categorical variables. These are comparisons of counts and proportions or percentages of counts. For continuous variables (averages or scores or values such as volume data) Student’s t-test is used to test comparisons.
All tests are two-tailed. For testing differences in proportions, the chi-square test is equivalent to a Z test on proportions. The chi-square tests all apply a continuity correction.
2. Testing for Differences in Counts, Proportions, and Percentages (Chi-Square Test)
In tables with count data, it is often desired to test the hypothesis that the frequencies of occurrences in the various categories of one variable are independent of the frequencies in the second variable. Such data can be tabulated and said to be arranged in a contingency table.
If the columns (or the rows) of a contingency table represent random samples from independent populations, then the test (null hypothesis H0) is typically phrased as a comparison of proportions. For example, is the frequency of ownership of Brand A the same in two different age groups? This may be expressed as H0: p1 = p2, where p1 is the proportion of ownership in group 1 and p2 is the proportion of ownership in group 2. In such a case the contingency table has 2 rows and 2 columns and is referred to as a 2x2 contingency table.
The most common procedure for analyzing contingency table data is by using the chi-square statistic. The computation of chi-square utilizes observed and expected frequencies. In a contingency table, we have two variables under consideration and we denote an observed frequency as fij (e.g. the value in row 1 column 1 is f11). The total frequency in row i of the table is denoted Ri and is the sum of the frequencies in the row. Similarly, Cj denotes the total frequency in column j.
For chi-square analysis of contingency tables the standard formula is:
Χ2 = Σ Σ (fij – eij) 2 / eij
- where fij = the observed frequency in row i column j
- and eij = the expected frequency in row i column j
The expected frequency in a cell of a contingency table is
eij = Ri * Cj / n
- where n = grand total
Note the row totals of the expected frequencies equals the row totals of the observed frequencies, and the column totals of the expected frequencies equal the column totals of the observed frequencies.
Once Χ2 (the chi-square statistic) has been obtained, its significance can be found by evaluating the chi-square distribution at the value Χ2 for the appropriate degrees of freedom. By comparing the statistic to the theoretical distribution, we can determine the probability of getting this (or a more extreme) difference. This probability is called a ‘p-value'. If the p-value is small compared to some predefined criteria (say less than 0.05) then we declare the difference in the two values is statistically significant.
Correction for Continuity in 2x2 Contingency Tables
As mentioned above when two proportions are directly compared the chi-squared test is based on a 2x2 contingency table. In a 2x2 contingency table, the degree of freedom is 1.
Chi-square values obtained from actual (count) data belong to a discrete distribution in that they can only take on certain values. However, the theoretical Χ2 distribution is a continuous distribution in which all values are possible. Thus, our need to determine the probability of a calculated Χ2 can only be met approximately by evaluating the chi-square distribution and our conclusions are not taking place exactly at the level of probability which we set. In the case where the degrees of freedom = 1 (i.e. a 2x2 contingency table) it is usually recommended to use a correction for continuity.
Various continuity correction methods have been proposed by Yates (1934), Cochran (1942, 1952), and more recently Haber (1980). The Cochran/Haber (1980) continuity correction is applied to chi-square tests in Harmoni.
3. Testing for Difference Between Two Means (Student’s t-test)
For comparisons of continuous variables (measures or average values), the statistical test used by Harmoni to determine significant differences is a t-test.
In this case, we are comparing the mean values from two independent samples to infer whether differences exist between the two sampled populations. In a two-tailed test, the null hypothesis is that there is no difference in the means of the two samples i.e. H0: μ1 – μ2 = 0. If the two samples came from normal populations, and if the two populations have equal variances, then Student’s t-test may be applied. The t statistic for testing the hypothesis concerning the difference between two population means is
t = ( m1 – m2 )/ sm1 – m2 Equ. 1
The quantity m1 – m2 is simply the difference between the two means, and sm1 – m2 is the standard error of the difference between the sample means. The quantity sm1 – m2 (and the variance of the difference s2m1 – m2) can be calculated from the sample data and are estimates of the population parameters σm1 – m2 (and σ2m1 – m2). It can be shown mathematically that the variance of the difference between two independent variables is equal to the sum of the variances of the two variables. Furthermore, the independence assumption implies there is no correlation between the two variables, and the assumption of equal variances implies the two sample variances are both estimates of the population variance. Therefore, in the t-test, we compute the pooled variance, s2p, which is then used as the best estimate of σ2.
Equation 1 then becomes
t = ( m1 – m2 )/ √(s2p /n1 + s2p /n2) Equ. 2
- where n1 and n2 are the sizes of the two samples and the degrees of freedom is equivalent to n1 + n2 – 2
In Student’s t-test, the t statistic follows a Student’s t-distribution. By evaluating the t-statistic against the t-distribution with the determined degrees of freedom we obtain a p-value (the probability of getting this or a more extreme difference). If the p-value is small compared to some predefined criteria (say less than 0.05) then we reject the null hypothesis and declare the difference between the two mean values is statistically significant.
Violations of the Assumptions
The two-sample Student’s t-test assumes, from its underlying theory, that both samples come at random from normal populations with equal variances. Numerous studies have shown that the t-test is robust enough to stand considerable departures from its theoretical assumptions, especially if the sample sizes are equal or nearly equal, and especially when two-tailed hypotheses are considered. The larger the samples, the more robust the test. The comparison of two means from normal populations without assuming equal variances is known as Welch’s t-test. Welch’s t-test is not currently supported in Harmoni.
4. Weighting and Effective Sample Size
When weighting has been applied then the statistical tests are performed on the weighted percentages (proportions) or values. In other words, the comparisons are between the weighted values in the cells. However, within the statistical tests, the standard error is calculated using the unweighted sample size or the effective sample size (effective base).
Generally, weighting causes a decrease in the statistical significance of results. The effective sample size is an estimate of the equivalent sample size from an unweighted simple random sample. Roughly speaking, it is a measure of the precision of the survey. For example, if a weighted sample of 1000 people has an effective sample size of 800, that indicates the weighted sample is no more robust than an unweighted simple random sample of 800.
Each cell shown on a table potentially has a different effective sample size. This is because the impact of weights can differ by the sample base of the cell. For example, if a study over-sampled Females and under-sampled Males, then the imbalance may be corrected by down-weighting Females and up-weighting Males. Although the raw sample size is simply the sum of the two genders, the effect of the weighting is to reduce the precision of the cell statistic.
The Effective Sample Size is calculated from the weighted values (i.e. respondent weighting factors) using the Kish formula as follows:
Effective Sample Size = (Σ w)2 / Σ (w2)
Note that Σw is the normal (weighted) total. If the analysis is unweighted then w is 1 for all cases and the Effective Sample Size is the same as the Unweighted Sample Size.
In general, it is better to use the effective sample size in significance tests, especially when extreme weighting has been applied.
5. Comparisons Between Overlapping Groups and with Total
The tests in Harmoni assume independence of the comparison groups and this assumption does not hold if the groups overlap.
This is a particular issue with all "compare with Total" tests (i.e., Total is set as the Reference) because the Total usually includes the test group. So, it is better to compare, for example, Males with Females than Males with Total because usually the Males and Total groups have a high overlap.
A similar situation arises with rolling averages in time series data. If, for example, rolling 3-month averages are calculated, then when testing for significant differences, the reference and test cells should be at least 3 months apart to avoid overlapping samples.
However, the effect of overlapping groups is small if the test group is small compared to the reference, or Total, group.
6. Multiple Comparison Considerations
In standard Harmoni tables, significance tests are performed solely between a single test cell and a single selected reference cell. In other words, the tests are conducted without reference or consideration of any other cells within the table. This means the outcome of a significance test is always consistent, regardless of how a table is configured or viewed. But this also means there is no correction for multiple comparisons across all the cells of a table and users should take this into account when viewing tables as a whole.
However, note that for multiple reference significance tests (M-SIG) where there are several simultaneous pairwise comparisons, a correction for multiple comparisons is applied – see Multi-Reference Significance Difference Statistics