From this point on, we will be building on earlier principles. In this chapter, we will move on to statistical concepts related to measuring factors that are of interest to nurse clinicians and investigators. This chapter will prepare you to:
Investigators define variables and collect data to answer their research questions. A variable is a trait or characteristic that varies or changes, and data are the values of variables when they vary. For example, systolic blood pressure is a variable because it is a characteristic that fluctuates, both from one person to another and at different times within the same person (Figure 3-1). Each blood pressure measurement is a data value. A collection of these data values is a data set.
Variables and data.A diagram shows the variable, systolic blood pressure, followed by the data, 120, and followed by the data set with the data list, 120, 130, 112, 101, and 160, one below the other.
Investigators classify variables according to a variety of characteristics that suggest how variables may be used in research and evidence-based practice studies. Variables may be qualitative or quantitative. Qualitative variables have values that are nonnumeric, and quantitative variables have values that are numeric. Systolic blood pressure may be either qualitative—high, normal, or low—or quantitative—120 mmHg. In this text, we will confine ourselves to numeric variables, as these are amenable to statistical analyses.
Numeric variables may be either discrete or continuous. Discrete variables have values that are countable but do not include the fractions between countable categories. Continuous variables have every possible value on a continuum. Gender may be counted, as in there are 10 women in a waiting area, and is an example of a discrete variable; this is because, in the real world, there is no such thing as 10.5 women. In contrast, systolic blood pressure ranging from 0 to 200 mmHg is an example of a continuous variable because we could measure the pressure anywhere between 0 and 200 mmHg. Conceivably, we could measure systolic blood pressure to the nearest hundredth mmHg, for example 120.05 mmHgHg.
Variables can also be characterized as independent or dependent when an investigator is investigating the interaction between variables for statistical hypothesis testing. The variable that is manipulated by the investigator or affects another variable is the independent variable. The dependent variable is affected by an independent variable. For example, suppose an investigator is examining whether hepatitis B antigen affects liver function test results. The presence or absence of the hepatitis B antigen is the independent variable, as it affects the liver function test results, and the liver function test result is the dependent variable, as it is affected by hepatitis B antigen.
Consider another example. An investigator is studying the effectiveness of a newly developed medicine to treat constipation. The investigator devises an experiment in which the treatment group receives the new medication and the control group receives a placebo. The investigator measures the number of days between taking the drug and the first bowel movement among participants in both control and treatment groups. Here, the group assignment—treatment or control—is the independent variable because it is manipulated by the investigator and it affects the length of time until the first bowel movement. The number of days until the first bowel movement is the dependent variable because it is affected by the group assignment or whether the participant received the new drug or the placebo.
In the previous two examples, it seems clear how independent variables differ from dependent variables. However, determining the number of independent variables in a study can be confusing. Suppose an investigator is studying the licensure examination passing rates of graduating nurses from the classes 2018, 2019, and 2020 at a public university. In this case, the graduating class is the independent or grouping variable and the passing rate is a dependent variable. However, it can initially seem as if the three graduating classes are three different independent variables, whereas the graduating class is actually a single independent variable with three levels (Figure 3-2).
Types of variables.A diagram shows three graduating classes, 2009, 2010, and 2011, as either three independent variables, or a single independent variable with three levels.
Evaluating the Use of VariablesHas the investigator explicitly identified and defined the variables in the study?
|
In a different example, we can see that defining the independent and dependent variables is related to how the investigator postulates the relationship between the variables. An investigator is studying the relationship between social support and quality of life in older adults in assisted-living environments. The investigator hypothesizes that strong social support influences the quality of life of these older adults. In this example, it seems clear that social support is the independent variable and quality of life is the dependent variable. However, the investigator could propose the equally valid hypothesis that quality of life influences social support; as such, we have flipped independent and dependent variables. In high-quality studies, the investigator provides a logical argument for defining the independent and dependent variables and the hypothesized relationship between them. The nurse using research for evidence-based practice must know how to evaluate such arguments and decide on the legitimacy of the approach used.
Understanding what variables are, how they are classified, and how they are related to one another is crucial in deciding what statistical method is appropriate for analyzing data. Regularly using statistics is an important part of becoming comfortable with evidence-based practice, and nurses in advanced practice and leadership roles will have to work with many examples to become proficient.
Wound healing is an important indicator of nursing care across a variety of settings. Let us consider an example where we are interested in implementing a wound-healing intervention based on research evidence found in an article. The authors report that wounds healed 50% faster with the intervention than with another commonly used treatment. To evaluate the effectiveness of this intervention (independent variable), we need to know how wound healing (the dependent variable) was measured to determine if the new intervention is an improvement over other approaches to treatment.
After defining the variables of interest, the investigator must think about how to measure the variables. There are four common levels of measurement: nominal, ordinal, interval, and ratio (Table 3-1). It is important to understand the level of measurement because different statistical procedures require different levels of measurement on the variables of interest. Measurement is also important for application of research to practice. Let us discuss each level of measurement, one by one, and see how they differ from each other.
Nominal | Ordinal | Interval | Ratio |
---|---|---|---|
Gender | Pain scale (0‐10) | Temperature | Age |
Ethnicity | Age groups (18‐25, 26‐35, etc.) | IQ | Height |
ZIP code | Grade (A, B, C, D, and F) | SAT score | Weight |
Religious affiliation | Histological opinion (‐/6/1/11/111) | Depression score | Blood pressure |
Medical diagnosis | Patient satisfaction scale (poor, acceptable, good) | Time of day | Years of work experience |
Names of medicines | Nurse performance (below average, average, above average) | Dates (years) | Time to complete a task |
In nominal level of measurement, data are classified into mutually exclusive categories where no ranking or ordering is imposed on categories. The word nominal simply means to name or category. Common examples in this level of measurement are gender and ethnicity; an investigator can classify the subjects as men, women, or transgender for gender and as different ethnic groups (e.g., Caucasian, African American, Asian, or Hispanic), respectively. However, no ranking or ordering can be imposed on any of those categories, as we cannot say that one gender or ethnic group is superior/inferior, or is more or less than the other groups. Other examples of nominal level of measurement include religious affiliation (e.g., Christian, Catholic, or Buddhist), political party affiliation (e.g., Democrat, Independent, or Republican), and hair color (e.g., black, brown, or blond). Nominal measurement is often used in health-related research to characterize a wide variety of variables, such as treatment results (improvement or recurrence) and signs and symptoms (present or not present).
In ordinal level of measurement, data are also classified into mutually exclusive categories. However, ranking or ordering is imposed on categories. A common example in this level of measurement is grouped age. People can be categorized into one of the following groups: (1) 18 and under, (2) 19μ30, (3) 31μ49, and (4) 50 and above. Here, we not only have distinctive categories with no overlapping (mutually exclusive categories), but there is a clear ranking or ordering among categories. Category (2) has older people than category (1), but younger people than categories (3) and (4). Other examples of ordinal level of measurement include letter grade (A, B, C, D, F), Likert-type scales (strongly disagree, somewhat disagree, neutral, somewhat agree, strongly agree), ranking in a race (first, second, third, etc.), and histological ratings (-, ±, +, ++, + + +). A common pain scale, ranking from 0 to 10, is a good example of ordinal level of measurement in health care, where 0 is equal to no pain and 10 is severe pain. Although these data can be ordered, we cannot accurately determine the distance between two categories. That is, we cannot say that the interval between 1 and 2 on a pain scale is exactly the same as the interval between 3 and 4.
In interval level of measurement, data are classified into categories with rankings and are mutually exclusive as in ordinal level of measurement. In addition, specific meanings are applied to the distances between categories. These distances are assumed to be equal and can be measured. Temperature, for example, is measured on categories with equal distance, and any value is possible; the distance or interval between 35°F and 40°F is the same as the distance or interval between 55°F and 60°F. However, in interval level of measurement, there is no absolute value of “zero.” Zero degrees Fahrenheit is different from zero degrees Celsius. Therefore, there is no absolute or unconditional meaning of zero. In addition, we cannot say 25°F is three times as cold as 75°F—that is, there is no concept of ratio, or equal proportion, in interval level of measurement. Other examples of interval level of measurement include standardized tests such as intelligence quotient (IQ) or educational achievement tests. In health care, we often use interval level of measurement for clinical purposes.
In ratio level of measurement, all characteristics of interval level of measurements are present; in addition, there is a meaningful zero, and ratio or equal proportion is present. For example, income is measured on scales with equal distance and a meaningful zero. The measurement of income for someone will be zero if they have no source of livelihood. We may also say that someone making $60,000 a year makes exactly twice as much as someone making $30,000 a year. Blood pressure is another example of ratio level of measurement, as it is possible to have a blood pressure of zero, and a systolic pressure of 100 mmHg is twice that of 50 mmHg. Other examples of ratio level of measurement are age, height, and weight.
Level of measurement is important because it directs what statistical tests may be used to analyze the data sets collected by the investigator. Clinicians who understand the relationship between level of measurement and choice of statistical test are able to evaluate the strength of any given study. Table 3-2 gives some examples of statistical tests per level of measurement.
Independent Variable | Dependent Variable | Statistical Test to Be Utilized |
---|---|---|
Nominal (control/patient) | Ratio (systolic blood pressure) | Independent sample t-test, paired sample t-test |
Nominal or ordinal (low/middle/high systolic blood pressure) | Ratio (liver function) | One-way analysis of variance (ANOVA) |
More than one nominal or ordinal (systolic blood pressure group + gender) | Ratio (liver function) | Factorial ANOVA |
Nominal (control/patient) | Nominal (nondiabetic/diabetic) | Chi-square test of association |
Nominal + ratio (control/patient + age) | Nominal (nondiabetic/diabetic) | Logistic regression |
Ratio (weight) | Ratio (systolic blood pressure) | Correlation, simple linear regression, multiple linear regression (if more than one independent variable) |
Nominal with ratio to control (control/patient with age to control) | Ratio (systolic blood pressure) | Analysis of covariance (ANCOVA) |
One or more nominal or ordinal (systolic blood pressure group) | More than one ratio (liver function + depression) | Multivariate analysis of variance (MANOVA) |
There are times when an investigator may choose to reclassify or transform a variable’s level of measurement. For example, blood pressure that is originally measured at the ratio level may be transformed to an ordinal level of measurement if the investigator categorizes the blood pressure in intervals of 40 (i.e., 0μ40, 41μ80, 81μ120, and 121μ140), or transforms it into categories of “high” and “low.” There may be sound reasons for such transformations, such as wanting to make comparisons between people in those categories. However, transformation from a higher level of measurement to a lower level (ratio to ordinal, for example) will always result in the loss of information, as everyone with blood pressure between 41 and 80 is categorized into a single group. Second, it limits analysis to those statistical tests for categorical measurements. To return to our earlier discussion of variables, variables measured at the nominal and ordinal levels of measurements are discrete or categorical variables, and those variables measured at interval and ratio levels of measurements are continuous variables.
In addition to determining what level of measurement will be needed for any given variable, the investigator designing a research or an evidence-based practice study needs to choose the best measurement tools. A tool or instrument is a device for measuring variables. Examples include paper-and-pencil surveys or tests, scales for measuring weight, and an eye chart for estimating visual acuity. There may be one or more measurement tools available for variables of interest, or there may be none, and that a tool will need to be created. Whether you use an existing tool or create one, you should make sure that the measurement tool is the best approach for measuring the variable of interest. Strong instruments will reduce the likelihood of measurement error.
Measurement error is the difference between the measured value and the true value. Measurement error is unavoidable in research and can be either random or systematic errors. Systematic errors occur consistently because of known causes, and random errors occur by chance and are the result of unknown causes. One common source of systematic error is the incorrect use of tools or instruments. For example, suppose that you need to measure depression in older adults, and you found an instrument, Beck’s Depression Inventory (BDI) (Beck et al., 1961). Would you start measuring older adults’ depression levels using the BDI right away? Probably not! First, you would want to make sure that the BDI is a good measurement tool to assess depression in older adults. The concepts of reliability and validity help us to make that decision.
Reliability tells us whether or not a test or tool can consistently measure a variable. If a patient scores 35 on the BDI over and over again, it means that BDI is reliable, because it is measuring depression consistently at different times. Whether you are engaged in research, evidence-based practice, quality improvement, or process improvement, choosing a dependable measurement tool is important.
There are three commonly used statistical evaluations of reliability, and they are all correlation coefficients: internal consistency, testμretest, and interrater reliability. Internal consistency is used to measure whether items within a tool, such as a depression scale, measure the same thing (i.e., are they consistent with one another?). Cronbach’s alpha, the most commonly used coefficient, ranges from 0 to 1, with a higher coefficient indicating that the items are consistently measuring the same variable. Note that Cronbach’s alpha is normally used when the level of measurement is interval or ratio, but the KuderμRichardson (KR-20) coefficient is used when the level of measurement is nominal or ordinal. Testμretest reliability is used to address the consistency of the measurement from one time to another. If the tool is reliable, the subjects’ scores should be similar at different times of measurement. Investigators commonly correlate measurements taken at different times to see if they are consistent. The higher the correlation coefficient, the stronger the testμretest reliability. Interrater reliability is used to determine the degree of agreement between individuals’ scores on ratings (i.e., are they giving consistent ratings?). Cohen’s kappa is commonly used and ranges from 0 to 1, with a coefficient of 1 signifying perfect agreement. For example, pressure ulcers are often scored on a scale reflecting depth, area, color, and drainage. If two nurses are using a rating scale to score the seriousness of pressure ulcers, we would want to know how consistent the scores are between the nurses. Ideally, both nurses would score the same pressure ulcer very closely.
Instrument reliability is influenced by several factors. For surveys or inventories such as the BDI, the length of the tool influences reliability. The shorter the tool is, the less reliable it will be because it will be more difficult to include all aspects of the variable under study. The second factor is the clarity of expression of each question or item. Confusing questions/items introduce measurement error. The third factor is the time allowed for completing the measurement tool. If the investigator does not allow participants enough time, reliability will decline. The fourth factor is the condition of test takers on the measurement day. If a test taker is ill, tired, or distracted, these conditions can negatively affect the reliability of the measurement tool. The fifth factor is the difficulty of the measurement tool. If the tool is not designed at an appropriate level for the target audience, it can affect the reliability both positively and negatively; reliability will be inflated if the tool was too easy for the target audience, and deflated if the tool was too difficult for the target audience. Lastly, the investigator must consider the homogeneity of the subjects, that is, how similar the participants in a study are to one another. If the subjects in a group are very alike, they will tend to respond similarly to the instrument and produce similar scores, resulting in high reliability. If the subjects in a group are heterogeneous, their scores will range widely, and reliability will be lower.
Reliability in its simplest form may be thought of as consistency or stability, and this is an important element to consider when choosing a measurement tool. However, consistency/reliability does not imply accuracy. For example, if we have a thermometer that consistently measures temperature two degrees above the actual temperature, it is a reliable thermometer, but not particularly accurate. In health care and nursing, accuracy of measurement is very important, and instruments must be assessed for this element in reports of research.
Validity tells us whether a tool, an instrument, or a scale measures the variable that it is supposed to measure. There are three main types of validity: content validity, criterion-related validity, and construct validity. Content validity has to do with whether a measurement tool measures all aspects of the idea of interest. For example, the BDI would not be a valid measure of depression if it did not include somatic symptoms of depression. Criterion-related validity is about how well a tool is related to a particular standard or benchmark. For example, suppose you wanted to validate the usefulness of a new depression scale, the Patient Health Questionnaire depression scale (PHQ-9) (Kroenke & Spitzer, 2002). An investigator might administer both the BDI, a known and an accurate measure of depression, and the new instrument, the PHQ-9, to a group of patients with depression and compute a correlation coefficient for two scales. A strong association between the BDI and the new PHQ-9 would establish the criterion-related validity of the PHQ-9. Construct validity is the extent to which scores of a measurement tool are correlated with a construct that we wish to measure. A construct may be thought of as an idea or concept. For example, an investigator can ask himself/herself this question: “Am I really measuring depression with the BDI, or could it also be measuring anxiety?”
The quality of measurement in a study has a direct influence on the strength of the findings or inferences that we can make. Internal validity is the extent to which we can say with any certainty that the independent and dependent variables are related to each other. The strength of the internal validity of a study is often evaluated based on whether there is any uncontrolled or confounding variable that may influence this relationship between independent and dependent variables. Such confounding variables may include outside events that happened during the study, in addition to the variable under study. These confounding events can actually cause a change in scores or measurement and result in less accurate findings. Changes in the participants because of aging or history may also introduce an element of inaccuracy. In longitudinal studies, for example, past experience with the measurement tool may also confound results, as merely having been exposed to the tool previously may influence the performance of the subjects on the later measurements. The choice of sampling, random or nonrandom, will also influence internal validity. Random sampling is designed to ensure that all participants are equal in every way, reducing the likelihood of confounding variables influencing the study results. Finally, human beings are prone to change their behavior when they know that they are being studied (i.e., the Hawthorne effect) and this introduces bias that is difficult to quantify and explain.
External validity is about whether the results of a study can be generalized beyond the study itself. Can we make accurate inferences about the population from the sample we have selected? Can we verify with any confidence the hypothesis that we are testing? External validity is influenced by the quality of the sample. If the characteristics of the sample used in the study do not represent the population, the results from the study should not be generalized or inferred to the population. External validity is also influenced by measurement. If our measures are unreliable or inaccurate measures of the variables of interest, then we cannot make useful inferences about those variables.
Reliability and validity are two important concepts to consider whenever using a measurement tool. You should make sure either that the tool used is both reliable and valid, or that its limitations are discussed if the tool is not proved to be reliable or valid. One thing to note is that reliability always precedes validity (Figure 3-3). An instrument can be reliable but not valid, but an instrument cannot be valid without being reliable. Unless you ensure that the tool has the appropriate reliability and validity for your sample, you cannot make inferences following statistical tests.
Reliability and validity.Four diagrams depict the difference between reliability and validity.
Evaluating MeasuresAre the measures congruent with variables identified in the study?
|
Data from Hagerty, B. M., & Bathish, M. A. (2018). Testing the relationship between a self-management intervention for recurrent depression and health outcomes. Applied Nursing Research, 44, 76μ81. doi:10.1016/j.apnr.2018.10.001
In 2018, Hagerty and Bathish reported their findings on a correlational and longitudinal study that aimed to test the relationship between a self-management intervention and several health outcomes in adults with major depressive disorder. The self-management intervention based on self-regulation and metacognitive theory was delivered to 23 adults over three sessions, using a workbook. The researchers found in a 6-month follow-up that depressive symptoms were significantly reduced, whereas self-efficacy and quality of life were substantially improved. The researchers used well-known reliable and valid instruments to measure the dependent variables: depression (Beck Depression Inventory II), self-efficacy (Self-Efficacy for Managing Chronic Disease ScaleμStanford University), quality of life (Quality of Life Depression Scale), and social support (Medical Outcomes Study Social Support Survey). |
This chapter is intended to introduce the essentials of measurement, including the definition of variables and data, different levels of measurement, reliability and validity of the measurements, and how they influence the quality of a study by promoting internal and external validity.
Variables are the characteristics or traits that are assumed to vary, and data are the values of variables. There are four levels of data: nominal, ordinal, interval, and ratio. Nominal and ordinal data are both made up of categorical/discrete variables; however, ordinal data have ranking or ordering between/among those categories, whereas nominal data do not. Interval and ratio data are generated from continuous variables with equal distance between intervals, but only ratio data have a meaningful zero and allow for proportionate understanding of the measure.
Reliability and validity may be thought of as the consistency and the accuracy of any given tool or instrument for measuring variables. Reliability and validity of measures influence the internal validity (strength of the relationship between independent and dependent variables) and the external validity (ability to generalize from sample to population) of a study.
Reproduced with permission from Horne, C. E., Johnson, S., & Crane, P. B. (2019). Comparing comorbidity measures and fatigue post myocardial infarction. Applied Nursing Research, 45, 1μ5.
Abstract The purpose of this study was to examine comorbidity measures that may relate to the symptom of fatigue post myocardial infarction (MI): self-reported comorbidities, medication-validated comorbidities, weighted comorbidities for fatigue, and number of comorbidities. Using a cross-sectional design, we interviewed a convenience sample of 98 adults, 65 and older, who were 6 to 8 months post MI. Participants self-reported their comorbidities using a list of 23 comorbid conditions. All medications were visually inspected, and medications were reviewed by a geriatric pharmacist for a common side effect of fatigue. The Revised Piper Fatigue Scale was used to measure fatigue. The mean age of the participants was 76 (SD = 6.3), and most of the sample were White (84%). Neither medication-validated comorbidities nor those medications with fatigue as a common side effect explained fatigue. When controlling for age, sex, and marital status, self-reported comorbidities explained 10% of the variance in fatigue (F(4, 93) = 2.65; p = 0.04). Having five or more self-reported comorbidities explained 7% of the variance in fatigue scores (F(1, 96) = 7.53; p = 0.007). Comorbidities are associated with fatigue post MI. Adults post MI with five or more comorbidities should be screened for fatigue. |
1961). An inventory for measuring depression. Archives of General Psychiatry, , 561μ571.
(2018). Testing the relationship between a self-management intervention for recurrent depression and health outcomes. Applied Nursing Research, , 76μ81. doi:10.1016/j.apnr.2018.10.001
(2019). Comparing comorbidity measures and fatigue post myocardial infarction. Applied Nursing Research, , 1μ5.
(2002). The PHQ-9: A new depression diagnostic and severity measure. Psychiatric Annals, (2), 1μ7.
(