A. Definitions
[Figure] "Sensitivity and Specificity"
- Sensitivity (SEN)
- Ability of a test to detect the disease
- SEN = Test Positive (TP)/Disease = TP/(TP+FN) where FN = false negatives
- Specificity (SPE)
- Ability of a test to rule out the disease (absence of disease)
- Test Negative (TN)/No Disease = TN/(FP+TN) where FP=false positives
- Predictive Values: Positive (PPV) and Negative (NPV)
- PPV
- Likelihood that a positive test indicates the presence of disease
- PPV = TP/(TP+FP)
- PPV = (prevalence x SEN)/{(prevalence x SEN) + (1-prevalence) x (1-SPE)}
- NPV
- Likelihood that a negative test indicates the absence of disease
- NPV = TN/(TN+FN)
- Prevalence
- The number of persons with a disease divided by the population being studied
- Prevlance is the same as pretest probability of disease
- Also, the proportion of patients who have the target disorder before the test is carried out
- Calculated as (TP+FN)/(TP+FN+FP+TN)
- Post-Test Probability
- Proportion of patients with that particular test result who have the target disorder
- Calculated as post-test odds/(1+post-test odds)
- Pretest Odds
- Odds that the patient has the target disorder before test is carried out
- Calculated as pretest probability/(1-pretest probability)
- Post-Test Odds
- Odds that the patient has the target disorder after the test is carried out
- Calculated as pretest odds x likelihood ratio (see below)
- Accuracy
- Overall, how good a test is (both positive and negative results together)
- Calculated as (TN+TP)/Total Number of Subjects = (TN+TP)/(TP+FN+FP+TN)
B. Use of Test Characteristics
- Tests are imprecise tools which may be used to detect and/or rule out diseases
- The above five characteristics of a test or disease can (should) help determine whether the test should be done (that is, whether it will alter the management of the patient)
- Test Results and Patient Care [2]
- New tests should add independent information about risk or prognosis
- Good new measures should account for a large proportion (>1.2X) of outcome or risk
- Measure should be reproducible
- As diagnostic test, measure should have high SEN, SPE, and high PPV and/or NPV
- SEN and SPE are purely characteristics of a given test and the cutoff points (for positive and negative test values) chosen
- Disease Prevalence and Testing
- The NPV and PPV depend on the prevalence of a disease in the population studied [3]
- Even highly SEN and SPE tests will be difficult to interpret in low prevalence groups
- Thus, one must either know or guess the disease prevalence in the population to which the patient belongs before deciding whether to do the test [3]
- Accuracy is a global measure of the value of a diagnostic test
- The efficiency of a test at correctly detecting a disease depends on cutoffs chosen
- Cutoffs can be chosen sequentially, and the SEN and SPE plotted
- This generates a receiver operating curve (ROC)
- ROC
[Figure] "Receiver Operating Curve"
- For a test which is no better than chance, the ROC is a diagonal line
- Better tests have curves close to upper left corner in standard plot
- The most efficient point for a test can be determined from ROC by finding the maximal distance from the diagonal (chance) line to the actual operating curve
- Test characteristics are different for screening versus prognostic tests
- Screening tests must have a minimum of false positives [4]
- This is because screening tests are done on asymptomatic patients
- False positives (low specificity) lead to many unwarrented followup tests
- False negatives are concerning, but tests can be repeated at given intervalss
- Prognostic tests must have minimum of false negatives
- Test morbidity (cost) compared with potential gain in knowledge are considered individually
C. Likelihood Ratios [7]
[Figure] "Sensitivity and Specificity"
- LR is ratio of probability of test result among patients with target disorder to probability of that same test result among patients without the disorder
- Abnormal test results should be much more common in ill persons compared with healthy
- Normal test results should be much more common in healthy than ill persons
- Likelihood levels near unity (1) are of little value
- Very high or low LRs are helpful in clinical decision making
- Positive LR (LR for positive test) is calculated as SEN/(1-SPE)
- Negative LR (LR for negative test) is calculated as (1-SEN)/SPE
- Bayes Factor (BF) is a likelihood ratio (LR) [5]
- Comparison of how well two hypotheses predict the data
- BF = (Probability of data, null hypothesis)÷ (Probability of data, alternative hypothesis)
- These BF are really "pretest probabilities" that a disease is present or not
- They can be used be used to "contextualize" a given test
- High disease prevalence populations have higher BF than low disease populations
- Both positive and negative LRs can be calculated
D. Example 1: High Disease Prevalence
- Disease prevalence 40% (such as COPD in a group of long-term smokers)
- Test is 98% sensitive, 99% specific (such as pulmonary function tests)
- Of 1000 long-term smokers, 400 will actually have the disease (40% prevalence)
- The test will detect 392/400 with bronchitis (98% sensitivity)
- The test will incorrectly label 1% of the 600 (6) without disease as positive: 99% specific
- The PPV = 392÷ (392+6) = 98%
- The NPV = 594÷ (594+8) = 99%
- Comments
- This is a very good test
- The prevalence of the disease was high
- Doing the test did not change our prediction very much in terms of number with disease
- Lyme Disease testing provides an excellent example where PPV and NPV are critical [5]
E. Example 2: Low Disease Prevalence
- Disease prevalence 2% (such as COPD in group of non-smokers from suburban areas)
- Same test characteristics as above (98% sensitive, 99% specific)
- Of 1000 non-smokers, 20 will actually have the disease (2% prevalence)
- The test will detect essentially all of the patients with disease (19.6 of the 20)
- The test will incorrectly label 1% of 980 patients w/o disease as positive (~10 patients)
- The PPV = 19.6÷ (19.6+9.8) = 67.8%
- The NPV ~ 100%
- Comments
- Although this is a very good test, the PPV is relatively poor because the disease is not prevalent
- Even when a positive result is obtained with this test, we aren't sure if it's real
- The lower the disease prevalence, the better the test must be to be useful
- For very rare diseases, calculation of the PPV can be problematic [6]
- Bayesian approach can be used to estimate the PPV for tests for such rare diseases
F. Using Tests to Alter Further Diagnostics [1]
- Tests should be ordered, in general, only if they alter patient's management or prognosis
- How will the results of this test change what you do ?
- How likely is it that the test result will change your risk assessment of disease and/or the prognosis of this patient ?
- What is the risk to the patient from the test, and is it worth the risk vs. benefit ?
- If one is certain of the diagnosis, then mediocre quality tests can only decrease certainty
- Obviously certain tests can be used to quantitate level of disease in addition to use as diagnostics
- In the next example, we consider a common problem:
- What is the likelihood that the patient has coronary artery disease (CAD) ?
- The ETT (±Thallium) is often used to help assess the likelihood of CAD
- A positive ETT test often suggests that coronary angiography be done.
- Is this a reasonable conclusion ? When should an ETT be done in the first place ?
G. Example: Mediocre Test in Different Prevalence Populations
- Consider an Exercise Treadmill Test (ETT) with Thallium Test to rule out CAD
- Sensitivity ~70%, Specificity ~90% (sensitivity lower in women)
- Now let's take 3 different patients
- Patient A - pre-test probability of CAD 20%
- Patient B - pre-test probability of CAD 50%
- Patient C - pre-test probability of CAD 80%
- The pre-test probability of these patients is equivalent to the "Prevalence of disease" in the population to which the patient belongs.
- Calculation of the post-test (ETT) likelihood of disease for each patient
- Patient A - post-test probability of CAD Positive ETT/Thal ~64% Neg ETT/Thal ~ 8%
- Patient B Positive ~88% Neg ~25%
- Patient C Positive ~97% Neg ~39%
- Note: these likelihoods for positive tests are the PPV, for negative tests, the NPV
- The above information suggest the following:
- For patient A and a positive test, we're still not sure whether or not disease is present
- For patient A and a negative test, we've confirmed our suspicion: CAD isn't present
- For patient C, regardless of the test result, we still have a high suspicion for CAD
- The test is most helpful for Patient B, whom we were unsure about to begin with
- Conclude that the test is most helpful in patients with a moderate suspicion of disease
- If a disease is highly suspected, then one generally needs to assess the degree (extent) of the disease, rather than presence or absence
- Similar issues arise in multiple areas, for example with Lyme ELISA testing [3]
H. Conclusions
- Knowledge of test characteristics, along with a pre-test likelihood of a condition, allows for a prediction of whether the test will help decide whether the patient has the condition
- Pre-test discussion of strategies following either positive or negative test results
- Test characteristics must be evaluated in setting of test use (screening or other) [4]
References
- Sox HC. 1996. Annu Rev Med. 47:463

- Manolio T. 2003. NEJM. 349(17):1587

- Tugwell P, Dennis DT, Weinstein A, et al. 1997. Ann Intern Med. 127(12):1109

- Barratt M, Irwig L, Glasziou P, et al. 1999. JAMA. 281(21):2029

- Goodman SN. 1999. Ann Intern Med. 130(12):1005

- Smith JE, Winkler RL, Fryback DG. 2000. Ann Intern Med. 132(10):804

- Grimes DA and Schulz KF. 2005. Lancet. 365(9469):1500
