Having trained in Dublin, Galway and Cambridge, David Healy is now Professor of Psychiatry at Cardiff University (North Wales Department of Psychological Medicine, Hergest Unit, Bangor LL57 2PW, UK. Email: healy_hergest{at}compuserve.com). He is a past Secretary of the British Association of Psychopharmacology. His current research interests include studies of mental health service utilisation, aspects of the history of psychopharmacology and research into the benefits and risks of psychotropic agents.
|
|
|---|
Q: If a 95% confidence interval of an odds ratio contains the number 1.0, what does this mean?
A: It means that the odds ratio is not significant (p. 177).
Many psychiatrists would consider that there is little basis for applying to clinical practice findings that are not significant. Doing so might even be dangerous.
In response to the controversy about fluoxetine and suicidality in the early 1990s, Eli Lilly and Company analysed their clinical trials for suicidal acts (Beasley et al, 1991). The analysis gave the relative risk of suicidal acts for patients taking fluoxetine compared with placebo as 1.9 (95% CI 0.216.0). This led Beasley et al to state Analysis of the incidence of suicidal acts (suicidal attempts and completions) revealed no statistically significant differences in the act rates between fluoxetine-treated and placebo-treated patients. And the conclusion they drew from this lack of significance was that Data from these trials do not show that fluoxetine is associated with an increased risk of suicidal acts or emergence of substantial suicidal thoughts among depressed patients (Beasley et al, 1991).
The problem in this thinking can be brought out by considering where confidence intervals (CIs) come from. Figure 1
shows a P-value function, which gives the point estimate for a relative risk and a distribution of potential values around this point (Poole, 1987). The P-value is the point at which the distribution of values intersects with the vertical axis running through x = 1. This graphical representation makes clear that, although a 95% CI may be a desirable threshold, at values less than this the overwhelming bulk of the data may well fall clearly on one side or the other of 1.0, which is the value for no effect.
![]() View larger version (22K): [in a new window] |
Fig. 1 A P-value function showing the distribution of all confidence intervals.
|
![]() View larger version (25K): [in a new window] |
Fig. 2 P-value functions for drugs A (dashed black curve) and B (solid red curve) with differing relative risks and differing levels of significance.
|
The recent paper by Hommes and colleagues reports a meta-analysis of six randomised trials comparing subcutaneous heparin with continuous intravenous heparin for the initial treatment of deep vein thrombosis... The result of our calculation was an odds ratio of 0.61 (95% CI 0.298 to 1.251; P > 0.05); this figure differs greatly from the value reported by Hommes and associates (odds ratio, 0.62; 95% CI, 0.39 to 0.98; P < 0.05)... Based on our recalculation of the overall odds ratio we concluded that subcutaneous heparin is not more effective than intravenous heparin, exactly the opposite to that of Hommes and colleagues.
Graphical representation of the two datasets (Fig. 3
) reveals the problem in the Messori et al interpretation that the written version may conceal. It simply is not right that both groups have found exactly the opposite thing. Opposite findings would fall on either side of the vertical axis through x=1. This example again brings out the point that a rigid adherence to a 95% CI may result in a nonsensical outcome. Yet programmes to foster critical appraisal typically invite interpretations of this sort by questioning whether the interpretation of the data would be the same at the upper and lower ends of the confidence interval (Oxman et al, 1994).
![]() View larger version (29K): [in a new window] |
Fig. 3 P-value functions for Hommes et al (solid red curve) and Messori et al (dashed black curve) data (redrawn with permission from Rothman et al, 1993).
|
|
|
|---|
In a comparable vein, Khan et al(2003) looked at suicide rates in clinical trials of selective serotonin reuptake inhibitors (SSRIs), comparators and placebo from a complete set of trials lodged with the US Food and Drug Administration, reporting on the outcomes for 48 277 people with depression. Although the percentage of suicides was 0.15% among those taking SSRIs, 0.20% for those on other antidepressants and 0.10% for those on placebo, they stated that there was no statistical difference in suicide rate among patients assigned to SSRIs, other antidepressants or placebo, and went on to say the only possible conclusion supported by the present data is that prescription of SSRI antidepressants is not associated with a greater risk of completed suicide. However, even a secondary school student could see from these figures that the risk on active treatment is greater than on placebo. The relative risk of a suicide on SSRIs compared with placebo is 1.4 (95% CI 0.563.62), and comparing all anti-depressants with placebo gives a relative risk of 1.62 (95% CI 0.664.02).
When the crisis regarding suicidality in paediatric SSRI trials emerged in 2003, a task force for the American College of Neuropsychopharmacology, on the basis that there were no statistically significant increases in suicidal behaviour and suicidal thinking, concluded that taking SSRIs or other new generation antidepressant drugs does not increase the risk of suicidal thinking or suicide attempts (Mann et al, 2005). But the task force itself presented a relative risk of a suicide-related event on SSRIs that was 1.4 times greater than that on placebo.
|
|
|---|
But in the current climate there is no simple way to know when signals become evidence. For example, when recent paediatric trials of antidepressants showed a significant increase of suicidal acts on active treatment compared with placebo at the 95% CI, this was also portrayed as a signal. One can argue that the clinical trial signal in this case only becomes significant when data from several different antidepressants are pooled, and there is therefore no evidence that any individual antidepressant causes suicidality.
Thus, guidelines for parents prepared by the American Psychiatric Association (APA) and American Academy of Child and Adolescent Psychiatry (AACAP) can still pose the question: Do anti-depressants increase the risk of suicide? and answer it: There is no evidence that antidepressants increase the risk of suicide (American Psychiatric Association & American Academy of Child and Adolescent Psychiatry, 2005). Indeed, a spokesperson for the APA and AACAP could state It does appear that these medications may affect the likelihood that a patient will actually tell someone about their suicidal thoughts or even suicide attempt. From my perspective as a child and adolescent psychiatrist, this is actually a good thing, because it means you have the opportunity to intervene and keep the child safe (Fassler, 2005).
Anxiety: a suicide risk?
One of the most striking instances of the unwillingness to think that signals might offer evidence of a hazard posed by a therapy came from Khan et al in a 2002 review of deaths by suicide of adults participating in clinical trials of fluoxetine, paroxetine, sertraline, clomipramine, fluvoxamine, clonazepam and venlafaxine for obsessivecompulsive disorder, post-traumatic stress disorder, social phobia, generalised anxiety disorder or panic disorder. From this dataset, they concluded: We found that suicide risk among patients with anxiety disorders is higher than in the general population by a factor of 10 or more. Such a finding was unexpected... The sample of patients selected was considered at minimal risk of suicide (Khan et al, 2002). Anxiety disorders, they suggested, posed a risk of suicide.
In fact, this dataset on 12 914 patients taking active treatment and 3875 patients taking placebo reported 11 suicides in the active treatment groups and no suicides in the placebo groups. The data on suicidal acts were incomplete but combined data for suicidal acts and suicides show a relative risk of 1.65 for active treatment over placebo. It takes a prior judgement that antidepressants could not trigger suicide to interpret such data as evidence for the suicide risk posed by anxiety disorders rather than by the antidepressants used to treat them.
Reading the signals
Anthony (2005), of behalf of the Pharmaceutical Research and Manufacturers of America (PhRMA), states a concern some may have with equating signals with evidence:
FDA will send out this information which they concede is just early signal information ... it sounds very good in principle. But I want you to think about it in terms of your reputation. It is really the reputation of a brand that is being signalled. Imagine yourself, someone reporting that they had early information that you may be a child molester. I know that sounds extreme but it is that type of thing... It is just an allegation, and everyone agrees that ... [However,] that is what people will remember, and that is the reason there is a lot of concern about presenting early signal information when you dont really have any proof. It is very different than the kind of rigorous process we had in the past, where you had to do a trial and it had to be statistically significant before you presented that.
What must be clear from the above is that one way to ensure that signals do not become significant is to keep trials underpowered. This highlights the fact that the decision-making tool of statistical significance is nested within another set of decisions namely, whether to explore the significance of hazards or not. In terms of traditional interpretations of statistics on the adverse events of therapies, patients and physicians have been hostages to power. I suggest a way around this problem in the final section of this article.
In fact, although the story of SSRIs and suicide is traditionally couched in terms of anecdotal signals that originated in case reports from the 1990s, had a cumulative meta-analysis of published SSRI trials been undertaken since 1988, it would have shown a relative risk of suicidal acts on SSRIs compared with placebo of 2.93 (95% CI 0.4518.9) (Fergusson et al, 2005). For every year since 1988, the relative risk of suicidal acts on SSRIs has been double that of placebo, even though many of the published trials either did not report suicidal acts or reported them as having happened on placebo rather than on active treatment.
|
|
|---|
In the wake of the FDAs decision to put a black box warning on antidepressants, the APA responded that this might have a chilling effect on appropriate prescribing for patients and said The American Psychiatric Association believes that antidepressants save lives (American Psychiatric Association, 2004). The British regulator, MHRA, appears to take a similar position as regards warnings the only options being that antidepressants are available without suicide warnings (for adults) or effectively banned (for children).
Assay or efficacy: choose your word carefully
There is no direct evidence base in clinical trial data to support the position that antidepressants save lives. A possible indirect support might lie in evidence that antidepressants work. If they work it can be presumed that if patients who are depressed continue to take them this will reduce their risk of suicide.
The issue of whether antidepressants work opens up further questions regarding the meaning of the antidepressant figures. When published trials for antidepressants are meta-analysed (e.g. Kirsch & Sapirstein, 1998), the results suggest that about 50% of patients on an active drug respond whereas 40% on placebo respond. Results of this sort are conventionally taken as evidence that the drugs work.
However, many, including the regulators who approve the drugs on the basis of such trials, regard antidepressant trials as assay systems aimed at demonstrating a treatment signal from which a presumption of efficacy can be drawn, rather as efficacy trials. In typical antidepressant assays, a selected set of rating scale outcomes rather than real-life outcomes is followed. If these trials are simply assay systems, it can be reasonable to discount and leave unpublished evidence from failed trials in which an active treatment fails to distinguish from placebo, on the basis that the trial lacked assay sensitivity. But the corollary of this is that we have little solid evidence that antidepressants actually work (Stang et al, 2005).
Alternatively, if we regard antidepressant trials as efficacy trials then both those demonstrating and those not demonstrating a treatment effect should be thrown into the meta-analytic hopper, and if this is done the degree of superiority of active treatment over placebo for adults may be little more than 5%, or a mean of 2 points on Hamilton Rating Scale for Depression scores, or no greater than it was shown to be in paediatric antidepressant trials (Khan et al, 2000; Kirsch et al, 2002).
There are further problems with claims of efficacy. First, randomised controlled trials (RCTs) broadly speaking were created to check unfounded therapeutic claims rather than to provide fuel for a therapeutic bandwagon. It is for this reason that they are constructed on the basis of a null hypothesis, and a 95% significance level is set up to exclude false positives. When the null hypothesis turns out to be unfounded the result is not evidence that the drug works but rather an indication that it is not possible to say that the treatment does nothing. In an ideal world this would not be the signal for a vigorous marketing campaign but the point at which studies began to determine just which patients did best on this as opposed to other treatments.
Second, aside from the debate as to whether rating scale data should be read as indicative of treatment effect or of treatment efficacy, there is a problem in reading the difference on rating scale scores between active treatment and placebo in a first past the post, winner takes all manner. Whether we are dealing with an effect or efficacy, trials enable us to quantify the contribution from the drug as a component of the therapeutic response where it is not readily possible to quantify the other components of that response. From the RCT data cited above, it appears that when people improve during antidepressant trials, 8090% of the response seen can be attributed to the natural history of the disorder, or to the effect of seeking help, or to the benefit of any lifestyle advice or problem-solving offered by the clinician, or to what has been called transference or related aspects of the therapeutic encounter. These components combined are traditionally subsumed under the heading of the placebo response. They cannot readily be separated out and weighed, in the way the drug component can be, although Kirsch & Sapirstein (1998) have offered some evidence that the natural history of depression may account for up to 33% of the placebo response in other words, more than the specific drug effect.
The psychiatrists role
The ambiguities regarding just what antidepressant trials show are deeply problematic if we expect the culture and money in mental health services to follow evidence of efficacy and to advocate otherwise would be irrational. In psychiatry, RCTs of penicillin for general paralysis of the insane would probably quantify the drug effect as a component of the therapeutic response at 90% or more, and no one would argue about money and culture following such evidence. But what should happen if the combined non-drug components contribute four times more of the eventual response to treatment in standard cases than does the active drug? If the money and culture are to follow the evidence in this scenario, where should they go?
One possibility is to modify the APA statement to say that psychiatrists rather than antidepressants can save lives. For example, we might expect lives to be saved in the case of clinical practice, informed by the evidence, that restricts antidepressant use to cases in which it is clear that the condition has not resolved of its own accord, efforts at problem-solving have not led to a resolution, and hazards such as suicide arising from the severity of the condition have shifted the riskbenefit ratio in favour of a closely monitored drug intervention with informed patients, rather than non-intervention.
Aside from the scientific and clinical merits of this position, there is a political case for reading the data this way, in that if there is no evidence that antidepressants pose risks and if antidepressants rather than physicians save lives, then in a brave new world in which healthcare is being segmented, it is not difficult to foresee a future in which depression screening and treatment might be undertaken by non-medical personnel.
|
|
|---|
Statistical methods today are seen as methods to summarise data rather than methods that speak to the nature of reality. When statistical methods were introduced into the evaluation of therapies, it was to safeguard both patients and therapists. Fishers relatively strict standard of a 95% significance level was adopted for demonstrations of treatment efficacy to eliminate the harm stemming from false-positive claims made by quacks aiming to make money from vulnerable patients. Neymans concern was that Fishers methods might miscategorise true hypotheses as false and thus close down research that should continue. His primary tool to manage the risk of false negatives was the confidence interval, the aimed of which was to profile hypotheses in a way that might point toward further lines of research. Confidence intervals were an alternative to, rather than a different form of, significance testing. Lack of significance should not lead to the conclusion that nothing had been found, and it was not something to be solved simply by increasing power. In the case of the antidepressants, smaller trials but with instruments sensitive to the emergence of suicidal ideation would have helped establish the characteristics of the probably small group of patients vulnerable to such developments.
But the statistical wires seem to have got crossed when applied to therapeutics. In the case of putative antidepressant benefits, Fishers stringent standard is relaxed so that evidence of a treatment effect in as few as one in three trials is taken as a sufficient signal of potential effectiveness to permit a drug to be marketed. In contrast, when it comes to adverse effects, unless confidence intervals exclude 1.0 the results are dismissed in a manner that is at odds with the rationale for using these intervals.
Coming to the point (estimate)
It may be that the problem stems from the bias of therapists to see a treatments benefits and to miss its hazards. But this alone offers a good argument for divorcing data on the relative benefits and risks of treatments from automatic significance testing. After all, the point estimate drawn from available evidence offers the most probable value for a risk. If significance testing were taken out of the frame, then an increased point estimate for risk would ordinarily indicate an increased risk rather than evidence of no risk. Adopting this approach does not require any philosophical or methodological justification in that an increased point estimate offers a commonsense basis to lay bets on the outcome. The burden of persuading us that white is really black would face drug companies rather than physicians or patients.
Furthermore, presenting a point estimate for the benefits of treatment drawn from all trials of an agent rather than demonstrations of a treatment effect in selected trials would, in the case of antidepressants, make it clear that these agents are not the equivalent of penicillin for depression.
|
|
|---|
|
|
|---|
Theme: Relative risk
An analysis of the incidence of suicidal acts by patients with bulimia nervosa treated with fluoxetine compared with placebo shows a relative risk for fluoxetine compared with placebo of 1.5 (95% CI 0.36.9).
Options
Theme: Outcome measures
When balancing the benefits of treatment with fluoxetine against the risk of suicide from depression trials in which the risk of a suicidal act is 1.9 times greater on fluoxetine than on placebo (95% CI 0.216.0):
Options
Theme: Interpretation of results
You are a researcher given access to data on 15 000 patients, 11 000 exposed to active treatment and 4000 to placebo, and you have 11 suicides on active agent and none on placebo for a condition not usually linked to suicide.
Options
Theme: Confidence intervals
Confidence intervals are consistent with all possible values. You, however, are faced with an analysis of the incidence of suicidal acts in patients with bulimia nervosa treated with fluoxetine compared with placebo that shows a relative risk of suicidal acts on fluoxetine of 1.5 compared with placebo with a 95% CI of 0.36.9.
Options
EMI correct matchings
|
For a commentary on this article see pp. 327328, this issue. Extended matching item (EMI) questions appear in Part 1 of the Royal College of Psychiatrists membership examination (the MRCPsych). Over the coming issues of APT we intend to offer EMIs instead of MCQs in some articles. For further examples of EMIs and information on the MRCPsych examination go to http://www.rcpsych.ac.uk/training/examinations/regulationsandcurricula/examregulations/examformat.aspx or follow the links from the Royal College of Psychiatrists homepage (http://www.rcpsych.ac.uk) through the training pages to exam format. Ed.
Now in a revised and updated 3rd edition: Brown, T. & Wilkinson, G. (2005) Critical Reviews in Psychiatry. London: Gaskell. Ed. ![]()
|
|
|---|

Related articles in APT:
This article has been cited by other articles:
![]() |
D. Healy Trussed in Evidence? Ambiguities at the Interface between Clinical Evidence and Clinical Practice Transcultural Psychiatry, March 1, 2009; 46(1): 16 - 37. [Abstract] [PDF] |
||||
![]() |
D. Healy Media care and patient pressure J Psychopharmacol, August 1, 2007; 21(6): 668 - 669. [PDF] |
||||
![]() |
J. Geddes Providing the best available evidence: INVITED COMMENTARY ON... THE ANTIDEPRESSANT TALE Adv. Psychiatr. Treat., September 1, 2006; 12(5): 327 - 328. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||