section name header

Info


A. Definitions
[Figure] "Sensitivity and Specificity"

  1. Sensitivity (SEN)
    1. Ability of a test to detect the disease
    2. SEN = Test Positive (TP)/Disease = TP/(TP+FN) where FN = false negatives
  2. Specificity (SPE)
    1. Ability of a test to rule out the disease (absence of disease)
    2. Test Negative (TN)/No Disease = TN/(FP+TN) where FP=false positives
  3. Predictive Values: Positive (PPV) and Negative (NPV)
  4. PPV
    1. Likelihood that a positive test indicates the presence of disease
    2. PPV = TP/(TP+FP)
    3. PPV = (prevalence x SEN)/{(prevalence x SEN) + (1-prevalence) x (1-SPE)}
  5. NPV
    1. Likelihood that a negative test indicates the absence of disease
    2. NPV = TN/(TN+FN)
  6. Prevalence
    1. The number of persons with a disease divided by the population being studied
    2. Prevlance is the same as pretest probability of disease
    3. Also, the proportion of patients who have the target disorder before the test is carried out
    4. Calculated as (TP+FN)/(TP+FN+FP+TN)
  7. Post-Test Probability
    1. Proportion of patients with that particular test result who have the target disorder
    2. Calculated as post-test odds/(1+post-test odds)
  8. Pretest Odds
    1. Odds that the patient has the target disorder before test is carried out
    2. Calculated as pretest probability/(1-pretest probability)
  9. Post-Test Odds
    1. Odds that the patient has the target disorder after the test is carried out
    2. Calculated as pretest odds x likelihood ratio (see below)
  10. Accuracy
    1. Overall, how good a test is (both positive and negative results together)
    2. Calculated as (TN+TP)/Total Number of Subjects = (TN+TP)/(TP+FN+FP+TN)

B. Use of Test Characteristics

  1. Tests are imprecise tools which may be used to detect and/or rule out diseases
  2. The above five characteristics of a test or disease can (should) help determine whether the test should be done (that is, whether it will alter the management of the patient)
  3. Test Results and Patient Care [2]
    1. New tests should add independent information about risk or prognosis
    2. Good new measures should account for a large proportion (>1.2X) of outcome or risk
    3. Measure should be reproducible
    4. As diagnostic test, measure should have high SEN, SPE, and high PPV and/or NPV
  4. SEN and SPE are purely characteristics of a given test and the cutoff points (for positive and negative test values) chosen
  5. Disease Prevalence and Testing
    1. The NPV and PPV depend on the prevalence of a disease in the population studied [3]
    2. Even highly SEN and SPE tests will be difficult to interpret in low prevalence groups
    3. Thus, one must either know or guess the disease prevalence in the population to which the patient belongs before deciding whether to do the test [3]
  6. Accuracy is a global measure of the value of a diagnostic test
    1. The efficiency of a test at correctly detecting a disease depends on cutoffs chosen
    2. Cutoffs can be chosen sequentially, and the SEN and SPE plotted
    3. This generates a receiver operating curve (ROC)
  7. ROC
    [Figure] "Receiver Operating Curve"
    1. For a test which is no better than chance, the ROC is a diagonal line
    2. Better tests have curves close to upper left corner in standard plot
    3. The most efficient point for a test can be determined from ROC by finding the maximal distance from the diagonal (chance) line to the actual operating curve
  8. Test characteristics are different for screening versus prognostic tests
    1. Screening tests must have a minimum of false positives [4]
    2. This is because screening tests are done on asymptomatic patients
    3. False positives (low specificity) lead to many unwarrented followup tests
    4. False negatives are concerning, but tests can be repeated at given intervalss
    5. Prognostic tests must have minimum of false negatives
  9. Test morbidity (cost) compared with potential gain in knowledge are considered individually

C. Likelihood Ratios [7]
[Figure] "Sensitivity and Specificity"

  1. LR is ratio of probability of test result among patients with target disorder to probability of that same test result among patients without the disorder
    1. Abnormal test results should be much more common in ill persons compared with healthy
    2. Normal test results should be much more common in healthy than ill persons
    3. Likelihood levels near unity (1) are of little value
    4. Very high or low LRs are helpful in clinical decision making
  2. Positive LR (LR for positive test) is calculated as SEN/(1-SPE)
  3. Negative LR (LR for negative test) is calculated as (1-SEN)/SPE
  4. Bayes Factor (BF) is a likelihood ratio (LR) [5]
    1. Comparison of how well two hypotheses predict the data
    2. BF = (Probability of data, null hypothesis)÷ (Probability of data, alternative hypothesis)
    3. These BF are really "pretest probabilities" that a disease is present or not
    4. They can be used be used to "contextualize" a given test
    5. High disease prevalence populations have higher BF than low disease populations
    6. Both positive and negative LRs can be calculated

D. Example 1: High Disease Prevalence

  1. Disease prevalence 40% (such as COPD in a group of long-term smokers)
  2. Test is 98% sensitive, 99% specific (such as pulmonary function tests)
  3. Of 1000 long-term smokers, 400 will actually have the disease (40% prevalence)
  4. The test will detect 392/400 with bronchitis (98% sensitivity)
  5. The test will incorrectly label 1% of the 600 (6) without disease as positive: 99% specific
  6. The PPV = 392÷ (392+6) = 98%
  7. The NPV = 594÷ (594+8) = 99%
  8. Comments
    1. This is a very good test
    2. The prevalence of the disease was high
    3. Doing the test did not change our prediction very much in terms of number with disease
    4. Lyme Disease testing provides an excellent example where PPV and NPV are critical [5]

E. Example 2: Low Disease Prevalence

  1. Disease prevalence 2% (such as COPD in group of non-smokers from suburban areas)
  2. Same test characteristics as above (98% sensitive, 99% specific)
  3. Of 1000 non-smokers, 20 will actually have the disease (2% prevalence)
  4. The test will detect essentially all of the patients with disease (19.6 of the 20)
  5. The test will incorrectly label 1% of 980 patients w/o disease as positive (~10 patients)
  6. The PPV = 19.6÷ (19.6+9.8) = 67.8%
  7. The NPV ~ 100%
  8. Comments
    1. Although this is a very good test, the PPV is relatively poor because the disease is not prevalent
    2. Even when a positive result is obtained with this test, we aren't sure if it's real
    3. The lower the disease prevalence, the better the test must be to be useful
    4. For very rare diseases, calculation of the PPV can be problematic [6]
    5. Bayesian approach can be used to estimate the PPV for tests for such rare diseases

F. Using Tests to Alter Further Diagnostics [1]

  1. Tests should be ordered, in general, only if they alter patient's management or prognosis
    1. How will the results of this test change what you do ?
    2. How likely is it that the test result will change your risk assessment of disease and/or the prognosis of this patient ?
    3. What is the risk to the patient from the test, and is it worth the risk vs. benefit ?
  2. If one is certain of the diagnosis, then mediocre quality tests can only decrease certainty
  3. Obviously certain tests can be used to quantitate level of disease in addition to use as diagnostics
  4. In the next example, we consider a common problem:
    1. What is the likelihood that the patient has coronary artery disease (CAD) ?
    2. The ETT (±Thallium) is often used to help assess the likelihood of CAD
    3. A positive ETT test often suggests that coronary angiography be done.
    4. Is this a reasonable conclusion ? When should an ETT be done in the first place ?

G. Example: Mediocre Test in Different Prevalence Populations

  1. Consider an Exercise Treadmill Test (ETT) with Thallium Test to rule out CAD
  2. Sensitivity ~70%, Specificity ~90% (sensitivity lower in women)
  3. Now let's take 3 different patients
    1. Patient A - pre-test probability of CAD 20%
    2. Patient B - pre-test probability of CAD 50%
    3. Patient C - pre-test probability of CAD 80%
  4. The pre-test probability of these patients is equivalent to the "Prevalence of disease" in the population to which the patient belongs.
  5. Calculation of the post-test (ETT) likelihood of disease for each patient
    1. Patient A - post-test probability of CAD Positive ETT/Thal ~64% Neg ETT/Thal ~ 8%
    2. Patient B Positive ~88% Neg ~25%
    3. Patient C Positive ~97% Neg ~39%
    4. Note: these likelihoods for positive tests are the PPV, for negative tests, the NPV
  6. The above information suggest the following:
    1. For patient A and a positive test, we're still not sure whether or not disease is present
    2. For patient A and a negative test, we've confirmed our suspicion: CAD isn't present
    3. For patient C, regardless of the test result, we still have a high suspicion for CAD
    4. The test is most helpful for Patient B, whom we were unsure about to begin with
    5. Conclude that the test is most helpful in patients with a moderate suspicion of disease
    6. If a disease is highly suspected, then one generally needs to assess the degree (extent) of the disease, rather than presence or absence
  7. Similar issues arise in multiple areas, for example with Lyme ELISA testing [3]

H. Conclusions

  1. Knowledge of test characteristics, along with a pre-test likelihood of a condition, allows for a prediction of whether the test will help decide whether the patient has the condition
  2. Pre-test discussion of strategies following either positive or negative test results
  3. Test characteristics must be evaluated in setting of test use (screening or other) [4]


References

  1. Sox HC. 1996. Annu Rev Med. 47:463 abstract
  2. Manolio T. 2003. NEJM. 349(17):1587 abstract
  3. Tugwell P, Dennis DT, Weinstein A, et al. 1997. Ann Intern Med. 127(12):1109 abstract
  4. Barratt M, Irwig L, Glasziou P, et al. 1999. JAMA. 281(21):2029 abstract
  5. Goodman SN. 1999. Ann Intern Med. 130(12):1005 abstract
  6. Smith JE, Winkler RL, Fryback DG. 2000. Ann Intern Med. 132(10):804 abstract
  7. Grimes DA and Schulz KF. 2005. Lancet. 365(9469):1500 abstract