Is There a Difference?
By the end of this chapter, students will be able to:
Recall that in most experiments, the null hypothesis means that no relationship, association, or difference exists between the study groups or samples. So how can we test to see whether there really is no difference? Thats what we will discuss here: an actual test to see whether there is a statistically significant difference.
In this chapter, we will talk about the chi-square (X2) test, which is appropriate when working with independent samples and an outcome or dependent variable that is nominal- or ordinal-level data. You already know that nominal-level data tell you that there is a difference in the quality of a variable, whereas ordinal-level data have a rank order so that one level is greater than or less than another.
For example, suppose you are an operating room nurse. You want to see whether there is a difference in the need for postoperative transfusion for patients who have surgery in the morning and patients who have surgery in the afternoon. Your independent variable identifies the groups you want to compare, a sample of morning patients and a sample of afternoon patients. Your outcome or dependent variable is postoperative transfusion, which is measured at a nominal level (yes or no). Now you want to see whether the frequencies you observe differ from the frequencies you would expect if the variables were independent or not related.
Lets formulate a null hypothesis and an alternative hypothesis using the standard notation, which looks like this:
Here are our hypotheses:
Your next step is to set up another 2 × 2 table. Statisticians love these! See Table 8-1.
Before you determine statistical significance, you need to determine the degrees of freedom (df), which refers to the number values that are free to be unknown once the row and column totals are in a 2 × 2 contingency table. With a chi-square test, the degrees of freedom is equal to the number of rows minus one times the number of columns minus one:
df = (2 - 1) × (2 - 1)
df = 1 × 1 = 1
All 2 × 2 tables have 1 degree of freedom. In other words, once you know the row and column totals and one additional cell value in the table, you can figure out the rest of the cell values in the table, and they do not change unless the original cell value changes. This is why there is only one value that is free to be unknown.
Once you put your data into a statistical program, it will compute the expected values for each cell, assuming that the two variables are independent. You will then need to apply the chi-square test to see whether the observed values are significantly different from the expected values at 1 degree of freedom.
Also, note that the chi-square test doesnt tell you the direction of the relationship or difference. Suppose the p-value for your X2 is significant. In that case, all you know is that you can reject the null and that there is a statistically significant difference in your outcome variable between the two samples. As the statistical wizard you are, you then look again at your data to determine that difference. For instance, if you had a statistically significant X2 in the example about surgery time and postoperative transfusions, you could go back and look at which group had more transfusions. In this sample, a larger portion of the afternoon patients needed transfusions. Given your statistically significant result, you could conclude that, for this sample, patients who had surgery in the afternoon were more likely to need a postoperative transfusion than patients who had surgery in the morning.
The chi-square test is one of the simplest tests available in a subset of statistics called categorical data analysis. The majority of theories relating to categorical data analysis started to be developed around the turn of the 20th century. Karl Pearson (1857-1936) was a very important statistician from England who is responsible for first developing the chi-square distribution. Pearson was an arrogant man who frequently butted heads with colleagues. He specifically argued about the intelligence and merits of a young statistician named R. A. Fisher (1890-1962), who has since become established as one of the most important scientists of all time. The two men argued over many different things. Fisher was concerned about what would happen to Pearsons chi-square statistic when sample sizes were extremely low, and Pearson didnt think it was a problem.
Fisher contributed a number of fundamentals of statistical science, and he developed the Fisher exact test. This test is now commonly used in place of Pearsons chi-square test when the sample size of any cell in the data is less than 5 (because, in the end, Fishers arguments with Pearson proved correct). Fisher also used properties derived from Pearsons chi-squared distribution to show that Gregor Mendel, the eminent geneticist and Augustinian priest who theorized about the inheritance of genetic traits using peas, most likely derived many of his theories based on fabricated data. (Fisher remained convinced that one of Mendels assistants was responsible for the fabrication; Mendel is still considered a very gifted geneticist.) In any event, the two testsPearsons chi-square test and Fishers exact testare now very common statistics to use in clinical trials and scientific research. Specifically, their use is very popular when a researcher wants to test whether a new treatment or therapy is better than the so-called gold standard already in use.
The null hypothesis that is tested in both these tests is as follows:
H0: The proportions being compared are equal in the population.
Here is a motivating example thats slightly more difficult than the one in the main text. This time well use a variable that has more than just two categories.
Suppose we want to study the relationship between a partners occupation and a spouses marital happiness in a clinically relevant population. Our null hypothesis is as follows:
H0: Partners occupation is not related to spouses marital happiness in this subset of occupations in the population we sampled.
To conduct the study, we enroll 1868 spouses and ask them to report their partners occupation and to rate the happiness of their marriage on the following 4-point scale:
The results are shown in Table 8-2. Now, if you plug these values into any statistical program (or use the From the Statistician: Methods section from this chapter to hand-calculate the values), youll see that the p-value for the difference between the happiness of the spouses of statisticians versus those of supermodels is so low that it is estimated at 0. So now you know that there is an association between the partners occupation and the spouses marital happiness (because your p-value is less than alpha, meaning it is significant). So you reject the null hypothesis that there isnt a relationship between a partners occupation and a spouses marital happiness.
Partners Occupation | |||
---|---|---|---|
Spouses Marital Happiness | Statistician | Supermodel | Total |
Very happy | 800 | 25 | 825 |
Pretty happy | 706 | 10 | 716 |
Happy | 200 | 25 | 225 |
Not too happy | 2 | 100 | 102 |
Total | 1,708 | 160 | 1,868 |
But what else might you like to know? Lets say that a friend is dating a statistician and a supermodel and that they both intend to propose this evening. What advice would you give your friend? Which might be the better choice to ensure long-term marital satisfaction? Remember that the chi-square test tells you only that there is a better choice, not which one it is. But you remember from your college stats class that you can determine which is better by looking at the data. The proportion of spouses married to statisticians who reported being happy or higher was 99.9% (1706 ÷ 1708), whereas the proportion of spouses married to supermodels who reported being happy or higher was 37.5% (60 ÷ 160). Assuming that your friend wants to get married in the first place, which proposal should your friend accept? (Hint: In the statistics books, the statisticians always live happily ever after!)
Note: All the anecdotal information regarding Spearman and Fisher comes from Agrestis (2002) landmark book on the subject, Categorical Data Analysis.
You might be inclined to use a chi-square test anytime you have an outcome or dependent variable at the nominal level, but the test wouldnt always be a good choice. This is because the chi-square test includes some additional assumptions (in addition to requiring a nominal-level outcome or dependent variable), which must be met for the test to be used appropriately.
All cells within the 2 × 2 table must have an expected value greater than or equal to 5. If at least one cell in your 2 × 2 table has an expected value of less than 5, you should use the Fisher exact test instead. You should also note that if any of the cells in the frequency table have greater than 5 but fewer than 10 expected observations, you can still use the chi-square test, but you need to do a Yates continuity correction as well. The really nice thing in this day and age is that many statistical programs automatically make this correction when this condition occurs, saving you the time and trouble of doing it manually. You might want to look for it on your next Intellectus statistics printout.
The sample should be random and independent. Heres an example of a violation of this assumption: Your study involved measuring the need for postoperative transfusion among brothers and sisters who underwent a particular procedure. (Because these subjects are related to each other, they are not independentonce you included the brother in the study, the sister was included as well, so her participation was dependent on her brothers selection.) In this case, the sample is not independent and random. Instead, the sample is now matched or paired, and the McNemar test is the correct choice to use for the analysis. (Both the Fisher and the McNemar tests are based on the same idea as the chi-square, but they have mathematical adjustments to accommodate the violation of the assumptions of the chi-square test.)
Our discussion has included independent and dependent variables. We are now also talking about independent and dependent samples, which sometimes confuses students. Independent and dependent are just the adjectives describing the attribute of whatever noun they are modifying.
Make sure you dont stop at independent and dependent. Consider what attributes they are describing to understand a question or article.
Sample Type | Level of Dependent or Outcome Variable | Test | Research Example |
---|---|---|---|
Two independent samples | Nominal or ordinal | Chi-square | Is there a difference in the level of nursing education achieved for married and divorced nurses? (The independent variable is nominal, married/divorced; the dependent variable is ordinal, with four levels: AD, BS, MS, and DNP.) |
Two dependent samples | Nominal or ordinal | McNemars test | In motor vehicle accidents involving a passenger and a driver, is the driver or the passenger more likely to experience a head injury? (Driver and passenger are related because they are in the same car. This independent variable creates two dependent samples to compare; the dependent variable, head injury, is yes or no, so it is at the nominal level.) |
Note: The levels of the independent variable can create samples or groups. For example, marital status may be your independent variable. The groups you are interested in comparing are married people and people who are divorced, creating two samples or groups to compare.)
Table 8-3 is a repeat of Table 8-2, for easier reference.
The results are shown in Table 8-4.
Partners Occupation | |||
---|---|---|---|
Spouses Marital Happiness | Statistician | Supermodel | Total |
Very happy | 800 | 25 | 825 |
Pretty happy | 706 | 10 | 716 |
Happy | 200 | 25 | 225 |
Not too happy | 2 | 100 | 102 |
Total | 1,708 | 160 | 1,868 |
Expected Frequencies | ||
---|---|---|
Statistician | Supermodel | |
Very happy | (825 × 1708) ÷ 1868 = 754.34 | (825 × 160) ÷ 1868 = 70.66 |
Pretty happy | (716 × 1708) ÷ 1868 = 654.67 | (716 × 160) ÷ 1868 = 61.33 |
Happy | (225 × 1708) ÷ 1868 = 205.73 | (225 ×160) ÷ 1868 = 19.27 |
Not too happy | (102 × 1708) ÷ 1868 = 93.26 | (102 × 160) ÷ 1868 = 8.74 |
The big sigma character (Σ) just means to sum everything over all the cells; in our case, we calculate:
(Number of rows - 1)(Number of columns - 1)
In this case, the degrees of freedom are:
3 × 1 = 3
p< 0.0001
There are two main points to review in this chapter. First, you should understand the concept of the null hypothesis. The null hypothesis means that no relationship, association, or difference exists between the variables of interest. Second, the chi-square test is used to look for a statistically significant difference or relationship when you have a nominal- or ordinal-level dependent or outcome variable.
Dont forget to use your decision tree! See Figure 8-1.
Last, the chi-square test does not tell you the direction of the relationship; only you can make that interpretation.
That wraps up this chapter. Not too bad, right?
Questions 1-11: A study is completed to examine the relationship between gender identification and sports participation. It is conducted by randomly surveying ninth graders at Smith High School. The collected data are shown in Table 8-5.
Male | Not male | Total | |
---|---|---|---|
No sports | 30 | 50 | 80 |
Sports participation | 70 | 50 | 120 |
Total | 100 | 100 | 200 |
Male | Not Male | Total | |
---|---|---|---|
No sports | 60 | 20 | 80 |
Sports participation | 140 | 180 | 320 |
Total | 200 | 200 | 400 |
Chi-square = 25, p< 0.0001.
Pruritus | No Pruritus | Total | |
---|---|---|---|
Screen positive | 98 | 12 | 110 |
Screen negative | 53 | 651 | 704 |
Total | 151 | 663 | 814 |
You are asked to review a proposal for a grant-funded project. When you read the following, you immediately identify a problem. What is it?
The researchers plan to examine the relationship between shift (days/nights) and end-of-shift fatigue scores (0-100). They will randomly select five hospitals in the state and then randomly select 100 nurses to sample at each hospital. They will utilize an alpha of 0.10 for this pilot study. The analysis of how the shift affects the average fatigue score will be computed with a chi-square test, and subsequent p-values will be reported.
It is really helpful to start reading professional research articles to see how these concepts are used in actual study scenarios. So give it a try with this article. This study is a nurse-led initiative to help prevent the late detection of patient deterioration using continuous physiological monitoring. Early detection of patient deterioration can help prevent poor patient outcomes and extended length of stay.
Stellpflug, C., Pierson, L., Roloff, D., Mosman, E., Gross, T., Marsh, S., Willis, V., and Gabrielson, D. (2021). Continuous physiological monitoring improves patient outcomes. American Journal of Nursing, 121(4), 40-46.
Chi-Square Test of Independence
Chi-Square Distributions and Chi-Square Test of Independence
Chi-Square Test of Independence (2)
Open your Kidney Data Set project.
3. Nominal, mode, mode = participating in sports for males and for the total sample
5. H0: There is no relationship between gender and sports participation.
7. See Table 8-8.
Male | Not Male | |
---|---|---|
No sports | (80 × 100) ÷ 200 = 40 | (80 × 100) ÷ 200 = 40 |
Sports participation | (120 × 100) ÷ 200 = 60 | (120 × 100) ÷ 200 = 60 |
If your alpha is 0.05, then yes, sports participation is significantly different for males and females. This conclusion is because p is significant.
11. The outcome variable is nominal/ordinal. It is an independent sample, and the cell values are all >5.
13. Yes, females are more likely.
df = (Number of rows - 1) × (Number of columns - 1) = 1 × 1 = 1
17. 20 = true positives, 5 = false positives, 10 = false negatives, 215 = true negatives
19. 215/220 = 98%. If the subject does not have the disease, there is a 98% chance the screen will be negative. A specific screen is good at identifying those without the disease.
21. 215/225 = 96%. If the screening test is negative, it is probable that the subject does not have the disease.
23. You would need to use Fishers exact test because of the small cell size.
25. Type of mammogram, nominal
29. Fail to reject the null, p > alpha
33. No, these would be dependent samples and would require McNemars test. Chi-square must have independent samples.
39. Because his screen is negative, we know there is an 84% chance he does not have oropharyngeal cancer.
45. Unable to determinethere is a significant difference, but you need more information to determine where.
49. Sensitivity = 98/151 = 65%
51. Specificity = 651/663 = 98.2%
53. We will have to find another study. Mine was a nonrandomized study of pregnant people. Generalizing these results to your situation would not be advisable, but I love that you are so interested in my work! Thanks, Dad!
1. The interventions implemented in 2016 improved critical metrics but still left gaps between the vital sign checks every 4 to 8 hours, in which patient deterioration was missed.
3. The use of continuous monitoring is not associated with improved patient surveillance and early identification of patient deterioration.
5. Sampling bias could be introduced. By eliminating the subjects where there was an issue with the equipment, the results may be skewed toward showing a greater effect than what was actually there while also masking a limitation of the impact of poorly functioning equipment.
7. A 53% decline (55/572 to 26/547), Chi-square = 10.3931, p< 0.006. Yes, it was significant; reject the null.
9. A type II error, likely due to inadequate sample size to detect the effect size
11. Unable to determine because of the short time frame of the intervention period and the limited number of patient transfers to the ICU, but trends positive for financial savings
Answers to Data Analysis Application questions can be found in the Instructor Resource package accompanying this text.