A. Studies Found in Medical Literature
- Descriptive Studies
- Observational Studies
- Interventional Studies - Clinical Trials
- Meta-Analysis
- Need to consider internal and external validity in evaluating trials
- Internal Validity
- Role of chance hypothesis testing and confidence interval
- Bias - systematic event in allocation of exposure or outcome
- Confounding - are there other uncontrolled features that could be associated with outcome or exposure to explain results
- External Validity - can the findings be generalizable
- Levels of Evidence to Support Treatment Decisions [1]
- Level 1: randomized clinical trials (or "all or none" data)
- Level 2: cohort studies, outcomes research
- Level 3: case-control studies
- Level 4: case series
- Level 5: expert opinions without explicit critical appraisal; opinions based on physiology, bench research, or pathophysiological principles used to direct clinical practice
- "All or none" data are data that show all patients died before availability of a certain treatment; after the treatment became available, some or all of the patients survived
B. Descriptive Studies
- Characterize disease in terms of person, place, and time
- Useful for alerting medical community, hypothesis generation, resource allocation
- No comparison group - cannot be used to assess cause and effect relationship
- Case Report or Case Series
- Reports an atypical condition or disease for an individual or series of individuals
- May alson include typical condition with atypical presentation(s)
- Correlational study
- Population level data used to associate disease and exposure
- People with disease may not be people who were exposed
- Subtle relationships may not be apparent (for example, "J" curve-type responses)
- Cross-Sectional study
- Population examined at a defined time to assess exposure and disease status
- Useful for inalterable characteristics such as blood type
- Does not provide information about duration of disease or exposure
- Level 4 evidence
C. Observational Studies
- Case Control
- Subjects (cases) selected based on disease status / prevalence
- Controls have similar exposures but do not have disease
- Good for rare diseases, diseases with long latency, analysis of multiple exposures
- Recall bias regarding exposures
- Selection bias because disease and exposure occur before study begins
- Level 3 evidence
- Retrospective Cohort
- Exposure and disease have already occurred
- Subjects selected for exposure to agent being studied
- Controls selected to be similar except for exposure being studued
- Good for rare exposures with high attributable risk to outcomes
- Subject to recall bias regarding degree of exposure and disease
- Subject to selection bias because disease and exposure occur before study begins
- Reduced selection bias if cohort is predefined
- Less resistant to confounders than prospective cohort
- Level 2 evidence
- Prospective Cohort
- Exposure has occurred but disease has not
- Subjects selected for exposure to agent being studied
- Controls selected to be similar except for exposure being studied
- Good for evaluating multiple effects of exposure
- Subject to losses to follow-up which can undermine validity of study
- Subject to ascertainment bias
- Expensive and time consuming to conduct
- Level 2 evidence
- Nested Case Control
- Case control study within a predefined cohort
- Reduces sampling bias but selection bias may occur (if cohort defined later)
- Example is Nurses' Health Study case control groups (not originally chosen)
- Level 4 evidence
D. Interventional Studies - Clinical Trials
- Studies in which subjects are allocated to receive 1 of 2 or more of the exposures being investigated, exposed, then observed
- Can only be performed when exposure being investigated is suspected to be beneficial and not suspected to be harmful (or much more beneficial than harmful)
- Allocation may be random or systematic
- Random allocation always preferrable because it controls for both known and unknown confounders if the sample size is large enough
- Selection bias may be a problem when allocation is systematic
- Study may be conducted blinded or double blinded
- Effects of Blinding
- Blinding minimizes recall bias from subject but investigator aware of allocation
- Double-blinding minimizes both interviewer and recall bias
- Double-blinding increases internal validity of trial and is always preferred
- Considerations in Evaluating Results of an Interventional Trial
- Number of subjects in study - enough for results to be statistically significant ?
- Number of endpoints achieved - enough to draw conclusions?
- Power of study - a function of the magnitude of the exposure's effect and the number of endpoints
- Losses to follow-up (did people who left trial share any characteristics ?)
- Characteristics of study participants compared to decliners
- Characteristics of subjects allocated to exposure compared to those allocated to placebo - especially important when allocation is not randomized
- Compliance of exposed group compared to unexposed group
- Magnitude of placebo effect - must be subtracted from effect observed for exposure to determine effect attributable to exposure being studied
- Level 1 evidence
E. Meta-Analysis [1]
- Attempt to combine results of 2 or more studies looking at a specific disease
- Used to improve power of studies, particularly in low prevalence diseases
- Problems with combining data often arise
- Discrepancies with randomized, prospective trials have been observed [2]
- However, number of these analyses is increasing
- The kappa statistic overall has been in the 0.35 range, showing borderline agreement compared with randomized, controlled studies
F. Interpreting Statistics in a Medical Study
Appropriate Statistical Tests by Use for Data Type and Distribution of Data Being Considered |
(Abbreviated List of Common Tests and Their Uses)
Use / | Data Type | Continuous (measurements) | Discrete (categorical and discrete) |
---|
Data Distribution |
Gaussian (Normal) | Non-Gaussian (Non-Normal) |
1. Describe a | mean,standard error | percentile, proportion, median, extremes |
sample | standard deviation | range, confidence interval around proportion |
2a. Compare 2 | non-paired T test | Wilcoxon Rank | chi squared test, |
groups, non- | Sum Test | confidence interval |
paired data | of difference between |
2 proportions, |
Fisher's Exact |
Probability |
2b. Compare 2 | paired T test | Wilcoxon | McNemar's Test for |
groups, paired | Sign-Rank Test | Correlated |
data | Proportions |
3. | Compare > 2 | Analysis of | Nonparametric | Chi squared test |
groups | Variance | Analysis of |
Variance |
4. Correlate 2 | Pearson | Spearman | Logistic Regression |
Variables for | Correlation | Rank Correlation |
Subjects in | Coefficient, |
same group | Linear Regression |
5. Correlate >2 | Multiple Correlation Coefficients | Multiple Logistic |
Variables for | Multiple Regression | Regression |
Subjects in |
Same Group |
G. Statistics Found Frequently in Medical Literature - P (Probability) Value [3]
- Indicates the percentage likelihood that the results observed are due to chance alone
- By convention, a P value of 0.05 or less is considered significant
- Significant means the result is unlikely to be due to chance alone
- P values are affected by sample size and magnitude of effect
- If a sample is very small, the P value may be >0.05 although the magnitude of the effect is large
- If a sample is very large, the P value may <0.05 although the magnitude of the effect is small
- Caution must be exercised in interpretation of P values without consideration for scientific and medical issues relevant to a study [3]
- Confidence Intervals (CI)
- Usually a 95% confidence interval, but can be set at higher or lower levels
- Means that a range of values within which the result will fall 95 times if an experiment if repeated100 times
- Alternately, a range surrounding the result, within which the true result lies with 95% probability
- CI are affected by both the sample size and the magnitude of the effect
- Clinical Efficacy of Therapies [4]
- Experimental Event Rate (EER) and Placebo Control Event Rate (CER) are measured
- Relative Risk Reduction (RRR) = EER-CER /CER is effect of intervention (I) relative to control (C) rate ( is absolute value)
- Absolute Risk Reduction (ARR) = EER-CER for interventions which reduce bad outcomes
- Absolute Benefit Increase (ABI) = EER-CER for interventions which show benefits
- Number Needed to Treat (NNT) to have one (1) additional good outcome = 1/ABI
- In general, ARR and NNT are the most helpful in assessing the clinical efficacy
- Kappa Statistic
- Measures agreement beyond that due to chance alone
- Value <0.40 considered fair-to-slight agreement
- Maximal value is 1.0
- Likelihood Ratios (LR) [5,6]
- LR is ratio of probability of test result among patients with target disorder to probability of that same test result among patients without the disorder
- Positive LR (LR for positive test) is calculated as sensitivity/(1-specificity)
- Negative LR (LR for neative test) is calculated as (1-sensitivity)/specificity
- LR has also been called the Bayes Factor
- Probabilities and Odds Ratios
- Pretest probability is prevalance of disease: proportion of patients who have the target disorder before the test is carried out
- Post-test probability is proportion of patients with that particular test result who have the target disorder
- Pretest odds: odds that the patient has the target disorder before test is carried out, calculated as pretest probability/(1-pretest probability)
- Post-test odds: odds that the patient has the target disorder after the test is carried out, calculated as pretest odds x LR
References
- Dodson TB, Caruso PA, Nielsen GP. 2004. NEJM. 350(3):267 (Case Record)

- LeLorier J, Gregoire G, Benhaddad A, et al. 1997. NEJM. 337(8):536

- Borzak S and Ridker PM. 1995. Ann Intern Med. 123(11):873

- Goodman SN. 1999. Ann Intern Med. 130(12):995

- Sackett DL and Haynes RB. 1997. ACP Journal Club. 127(1):A15

- Goodman SN. 1999. Ann Intern Med. 130(12):1005

- Katz MH. 2003. Ann Intern Med. 138(8):644
