Interpreting Clinical Trials
Dermatologists are constantly exposed to clinical trials data at meetings, in the news, and in the medical literature. At first glance, this information, usually presented in easy-to-read bar graphs and charts, may seem straightforward. What may not be so apparent are the background issues that have influenced the study design, data that have not been presented, and nuances in design or interpretation that may be crucially important but easily overlooked.
Understanding clinical trial design and how outcomes are measured is therefore critical to interpreting the results of studies and ultimately to applying these findings in practice. While the randomized double-blind study is the gold standard in clinical trials, there are other valid and meaningful approaches. This article will review some of the features that are important to consider when evaluating results from a clinical trial.
First, good trial design requires a clear and concise statement of the central research question or hypothesis. Multiple analyses of other endpoints should be specified ahead of time to avoid erroneous conclusions caused by statistical "fishing trips" or data mining. Eventually, if enough ad hoc tests are run, some tests will appear to be statistically valid by chance alone, and unfocused analysis increases this possibility.
The two most common design forms are the randomized trial and the observational trial. Randomization attempts to remove potential bias in the allocation of subjects to different testing groups. When done correctly, it should produce intervention and control groups that are, on average, evenly balanced in terms of both predictable prognostic factors (such as age and gender) and other, unknown characteristics. Randomization, however, is by definition random, and thus the groups are unlikely to be exactly the same, although they will tend to be more similar if larger groups are involved.
Observational trials monitor a group or several groups as they pass through time. There are obvious limitations to observational trials, as the groups being followed may not be the same and there may be bias in evaluating the outcome when such trials are not blinded. Nonetheless, observational studies can be effective in generating some kinds of information, and they may be the only practical way to look at some diseases. One large analysis has suggested that results from observational studies are often similar to those of randomized studies. Not surprisingly, however, observational studies usually overestimate the magnitude of any beneficial effect.
There are many advantages to placebo-controlled trials. They demonstrate absolute efficacy and safety and allow for distinction between adverse events due to the drug and those due to the underlying disease or background noise. They also detect treatment effects with a smaller sample size than used in any other type of concurrently controlled study, while minimizing the effect of subject and investigator expectations. The major disadvantage is that, if there are other known effective treatments available, patients may have to forgo them for some time in order to be able to participate. (Regulatory agencies in Europe recently mandated that studies compare a new drug with the standard-of-care treatment available rather than with a placebo, in contrast to most trials in the United States.)
Blinding is important as well; in an unblinded, or open-label, trial, both the subject and the investigator know which intervention the subject has been assigned. In single-blind studies, only the investigator or subject is aware of which intervention the subject is receiving. In a double-blind study, neither the subject nor the investigator knows the treatment or group assignment. Given the magnitude of the effect of the vehicle alone in topical studies, the placebo effect overall, and the tendency for investigators to optimistically grade changes in a positive way, blinding can be the key feature that differentiates a scientific assessment from clinical observation.
The open-label extension is a study design in which the investigator is aware of which intervention is being given to which participant after the blinded portion of the study has been completed. Some studies with an open-label design are randomized, but most do not include a comparison group. Open-label study designs may vary; some will take all comers, while others may limit enrollment to responders or nonresponders. These choices significantly change the patient pool being evaluated and may have an important impact on the outcome.
An intent-to-treat (ITT) analysis is a comparison between two or more groups assigned to receive different treatments that include all enrolled subjects, regardless of whether they may have dropped out of the study due to lack of compliance, intolerance, or a concomitant, unrelated illness. Studies that do not use this approach may yield results that appear more promising than they really are. For example, if many people leave a study because their disease worsens, and if they are not included in the analysis, it will make the percentage of successes appear greater. If patients drop out for unrelated reasons, they are counted as failures.
Clinical trials should have sufficient statistical power to detect the differences between groups, and this feature is an important part of the determination of sample size. The calculation of sample size is based on the nature of the condition, the desired precision of the answer, the degree of improvement expected, the availability of alternative treatments, the knowledge of the intervention being studied, and the availability of participants. If not "powered" appropriately, important findings may be missed. The type of comparison being made should also be included in this calculation. For example, concluding that two drugs have "equivalent efficacy" may not be valid if the study was too small to show a small but clinically meaningful difference between them.
A P value of less than 0.05 is usually defined as the lowest level of significance at which the null hypothesis can be rejected; it is the conventional cutoff used in studies to determine that a result likely did not happen by chance. In other words, this value signals that the difference between groups could have occurred by chance alone in less than 1 time in 20. Clinical significance, on the other hand, is a matter of judgment, and clearly, some results that are statistically significant may be clinically insignificant.
More than just primary data is collected during clinical trials. Adverse drug reactions (ADRs) are recorded, but so are all adverse events (AEs). Adverse events include all noxious untoward events, regardless of whether a causal relationship with the intervention is being tested, in order to detect any signals of unexpected problems. Although some studies may have enough patients to show a difference in efficacy, there may not be enough patients to show differences in adverse events, especially if the adverse events are rare, so adverse events reports must be interpreted with caution.
Comparing results of individual drug combinations assessed in different trials poses several limitations. Entry criteria, methods of analysis, and degrees of adherence may all be different, and therefore comparisons are difficult to interpret. Meta-analysis faces some of these same problems as it is a combination of several studies, but it uses a quantitative method for analyzing the pooled results of more than a single study to improve power, especially when results from different studies are inconsistent.
Of course, the analysis of study findings does not end with the data. Continued critique of whether the authors' conclusions are consistent with the study findings, whether conclusions stay within the parameters of the study design, and whether clinical decisions can be made based on the authors' conclusions remains the final level of analysis. However, an understanding of how the researchers sought to answer their questions can illuminate whether their conclusions are indeed worthwhile.