Background The CHIC study (COVID-19 High-intensity Immunosuppression in Cytokine storm syndrome) is a quasi-experimental treatment study exploring immunosuppressive treatment versus supportive treatment only in patients with COVID-19 with life-threatening hyperinflammation. Causal inference provides a means of investigating causality in non-randomised experiments. Here we report 14-day improvement as well as 30-day and 90-day mortality.
Patients and methods The first 86 patients (period 1) received optimal supportive care only; the second 86 patients (period 2) received methylprednisolone and (if necessary) tocilizumab, in addition to optimal supportive care. The main outcomes were 14-day clinical improvement and 30-day and 90-day survival. An 80% decline in C reactive protein (CRP) was recorded on or before day 13 (CRP >100 mg/L was an inclusion criterion). Non-linear mediation analysis was performed to decompose CRP-mediated effects of immunosuppression (defined as natural indirect effects) and non-CRP-mediated effects attributable to natural prognostic differences between periods (defined as natural direct effects).
Results The natural direct (non-CRP-mediated) effects for period 2 versus period 1 showed an OR of 1.38 (38% better) for 14-day improvement and an OR of 1.16 (16% better) for 30-day and 90-day survival. The natural indirect (CRP-mediated) effects for period 2 showed an OR of 2.27 (127% better) for 14-day improvement, an OR of 1.60 (60% better) for 30-day survival and an OR of 1.49 (49% better) for 90-day survival. The number needed to treat was 5 for 14-day improvement, 9 for survival on day 30, and 10 for survival on day 90.
Conclusion Causal inference with non-linear mediation analysis further substantiates the claim that a brief but intensive treatment with immunosuppressants in patients with COVID-19 and systemic hyperinflammation adds to rapid recovery and saves lives. Causal inference is an alternative to conventional trial analysis, when randomised controlled trials are considered unethical, unfeasible or impracticable.
- antirheumatic agents
Data availability statement
Data are available upon reasonable request. The database contains deidentified (anonymous) participant data. Requests for admission can be directed to email@example.com.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
What is already known about this subject?
We have previously shown in a quasi-experimental treatment study with non-equivalent treatment arms (the CHIC study, COVID-19 High-intensity Immunosuppression in Cytokine storm syndrome) that a brief course of high-dose methylprednisolone plus tocilizumab if necessary may accelerate recovery and save lives of patients with COVID-19-associated cytokine storm syndrome.
The CHIC study was a non-randomised study, which leaves room for potential selection biases explaining treatment differences and precludes causal conclusions.
What does this study add?
Causal inference with non-linear mediation analysis allows under certain circumstances a causal interpretation of the results of quasi-randomised experiments with a control group.
We showed that the beneficial effects of 5–7 days of intense immunosuppressive therapy in patients with COVID-19-associated CSS on 14-day improvement and 90-day survival are mediated via an immediate and profound C reactive protein response.
How might this impact on clinical practice?
Results of quasi-experimental studies conducted under difficult circumstances or time pressure, or when randomised controlled trials are considered unfeasible or unethical, may gain credibility by application of the principles of causal inference.
Patients with COVID-19 pneumonia may develop a syndrome characterised by severe systemic hyperinflammation, sometimes called cytokine storm syndrome (CSS). Patients with COVID-19-associated CSS are at an increased risk of severe pulmonary thrombosis, respiratory insufficiency and death.1
Recently, we have shown in a quasi-experimental study2 with matched historical control patients (the so-called non-equivalent group design with matched controls, NEGD) that a strategy of immunosuppression with high-dose methylprednisolone (MP) plus the interleukin 6 receptor blocker tocilizumab (TCZ) for those who did not immediately respond may accelerate recovery and reduce mortality in patients with predefined signs of CSS.3 All study patients fulfilled the criteria for hyperinflammation, but the patients of the historical control group were admitted between 7 March and 31 March (period 1) and the patients of the immunosuppressive treatment group starting from 1 April (period 2). None of the patients in period 1 received (components of) the immunosuppressive treatment strategy, while all of the eligible patients in period 2 were treated according to this strategy. This strict assurance of period specificity excluded bias by indication. Further, conventional confounding analysis revealed that the treatment contrast between the intervention group and the historical control group remained fully intact after adjustment for baseline differences between both groups.3
It was argued by ourselves3 and by others4 that this quasi-experimental design did not preclude the scenario that the prognosis for patients in period 1 was worse than for patients in period 2. This is a valid argument since hospitals were overwhelmed by critically ill patients in the first most hectic weeks of the pandemic and optimal supportive care for COVID-19 had to be fully developed. Only a randomised controlled trial (RCT), it was argued, would have provided unambiguous proof. However, in the design phase of the study, we felt ethical and practical arguments precluded the conduct of a randomised clinical experiment with an untreated control arm in our setting.
In the absence of RCTs, causal inference has been proposed for studies with NEGD to further elaborate potentially causal relationships.5 Here we have applied the principles of causal inference to further substantiate the effects of immunosuppressive therapy in COVID-19-induced hyperinflammation. We have used non-linear mediation analysis, a form of causal inference, to decompose the effects due to differences in prognosis between different time periods and the effects due to the treatment itself, by introducing a mediator variable CRP response. The hypothesis, based on prior knowledge, was that part of the observed beneficial effect of immunosuppressive therapy during period 2 was mediated by active suppression of inflammation and another part by a better prognosis in period 2 than in period 1. The analysis described here is an example of how, in the absence of a randomised controlled experiment, the credibility of quasi-experimental studies can be increased by the principles of causal reasoning.
The experimental study
Patients from the CHIC study (COVID-19 High-intensity Immunosuppression in Cytokine storm syndrome), described elsewhere in more detail,3 were analysed. The CHIC study was a quasi-experimental treatment study that used matched historical control patients. In brief, patients had to have COVID-19 pneumonia, complicated by signs of hyperinflammation, as defined by rapid respiratory deterioration on or during admission, plus fulfilment of at least two out of three biomarker criteria (C reactive protein (CRP) >100 mg/L, serum ferritin >900 μg/L, D-dimer >1.500 μg/L). Treatment was strictly time-separated. Patients assigned to the historical control group had to be admitted in period 1 and were selected retrospectively using the same above-mentioned (biomarker) criteria. Patients eligible for the intervention had to be admitted in period 2 and received treatment with MP 250 mg intravenously on day 1, followed by MP 80 mg intravenously for at least 4 consecutive days. Single-dose TCZ, 8 mg/kg body weight, maximum of 800 mg, was administered intravenously if the clinical situation did not improve, or instead worsened between day 2 and day 5.
In the original CHIC study report, three primary outcome variables were defined: discharge from the hospital or at least two stages of improvement according to the WHO improvement scale designed for the purpose of influenza pneumonia; hospital mortality; and the need to start invasive ventilation.3 Outcomes were analysed on the basis of time-to-event. In comparison with patients in period 1 (on standard care), patients in period 2 (on treatment) had a 79% higher likelihood of two-stage WHO improvement, 65% less hospital mortality and 71% less need for mechanical ventilation.
For this causal inference analysis, three outcomes were analysed: WHO improvement on day 14 of CSS, survival on day 30 and survival on day 90. We refrained from analysing mechanical ventilation as an outcome since mechanical ventilation was considered another intermediate outcome variable between WHO improvement (in hospital) and survival on day 30 (partially in hospital) and day 90 (outside the hospital).
Causal inference, or causal reasoning, is a process of drawing conclusions based on the conditions of the occurrence of a certain effect.6 7 Causal inference shares the elements of a Bayesian analysis8 and incorporates prior knowledge and information about the pathophysiology of a condition, as well as knowledge about the mechanism of action that is presumably at the basis of a certain effect.6 According to Pearl and MacKenzie6, causal inference presumes an association between exposure (X) and outcome (Y) (the question of correlation, forming the first layer of causal hierarchy), insight into how an intervention on X (reflected by the operator do(X)) changes Y (the intervention question, forming the second layer of causal hierarchy), and a theory about—or insight into—how a different intervention on X than was really applied would have modified Y (the counterfactual question, forming the third layer of causal hierarchy).6 In an RCT, the counterfactual question is addressed by interpreting the effect in the control group (sometimes placebo). In studies with NEGD, other sources for counterfactual information should be sought and justified. In the CHIC study, counterfactual information was obtained using the untreated patients in period 1.
The structural causal model
The model of interest in this study is the relation between the COVID-19-induced state of hyperinflammation (X) on the one hand, and worsening of disease followed by respiratory insufficiency and death (outcome Y) on the other hand. According to Pearl and MacKenzie6, causality assumes that an intervention on X (do (X)) leads to a change in Y. Intensive immunosuppressive treatment with high doses of MP (or TCZ) administered intravenously is widely known to suppress inflammation in systemic rheumatological diseases, such as rheumatoid arthritis, systemic lupus erythematosus and vasculitis, and to improve their outcomes.9 10 The most widely available and sensitive biomarker to follow the state of systemic inflammation (X) is the patient’s serum CRP level. CRP can respond quickly (within days) to high-dose immunosuppressive treatment. A true causal relationship therefore not only requires an association between the treatment and the CRP response, but also between the CRP response and the outcome (Y). In addition, causality requires irreversibility, implying here that drug-induced suppression of inflammation (a positive CRP response) precedes a change in the clinical situation and not vice versa (namely that the clinical situation of the patient determines (precedes) the choice of treatment (which would cause bias by indication)).
The hypothesised structural causal graph (or directed acyclic graph) is visualised in figure 1A.
The association between X and Y (figure 1A) can be influenced by a set of variables (Ci) that can impact both X and Y and may confound the relationship of X on Y. Some of these confounders have been measured and adjusted for; others have not and can be responsible for residual confounding. One consequence of applying the causal inference theory is that this back-door via Ci is closed by conditioning on do(X) by a phenomenon called d-separation11: arrows descending from do(X) and C1 collide in X (C1 and do(X) are the colliders), and the effects of C1 for the interpretation of the effects of X on Y can be ignored. Another (imaginary) set of variables (Ui) can be formulated that may have an impact on both do(X) and Y, and for which d-separation is irrelevant because there are no common causes. Examples of such disturbing variables, often found in studies with NEGD, are variables reflecting the severity of the disease, which physicians take into consideration when deciding about the start, change or intensity of a treatment. Such variables may cause confounding by indication. In the CHIC study (figure 1B), classic confounding by indication can be ruled out since treatment assignment was strictly time-separated (treatment=0 for period 1 and treatment=1 for period 2), but prognostic differences between period 1 and period 2 (a period effect) could still exist, and confounding of this type may contribute to explaining the observed treatment effects. In order to decompose the specific effect of treatment and the non-specific effects of period (1 and 2) (referred to by Pearl12 as front-door adjustment), a mediator variable (ΔCRP) is introduced that best reflects the pathophysiological mechanism by which immunosuppressive treatment is supposed to reduce hyperinflammation (X) and influence the outcome (Y). Non-linear mediation analysis (according to Pearl12 and VanderWeele13) is used to estimate what part of the total observed effect is attributable to period (natural direct effect, NDE) and what part is attributable to treatment-induced suppression of inflammation, mediated via ΔCRP (natural indirect effect, NIE) (see online supplemental text S1 and S2 for a more detailed discussion).
The mediator variable
It was hypothesised that active treatment (do (X)=1) in patients with hyperinflammation (X) will lead to a measurable CRP response (M), which will in turn lead to a change in the outcome (Y). The discriminating capacity of M in predicting Y was tested by conventional receiver operating characteristic analysis. The optimal CRP response, measured up to 13 days after inclusion, based on the best balance between sensitivity and specificity for discriminating between clinical improvement (assessed after 14 days) or not, or survival (after 30 days) or not, appeared to be 80% decrease from baseline (data not shown).
Definition of natural indirect and direct effects
NIE is defined here as the effect that is mediated via M (the CRP response). NDE is defined here as all other effects (positive or negative) that are not mediated via M. Such effects may include adverse effects of the treatment (eg, patients on immunosuppressive treatment may get complications such as bacterial infections) as well as unmeasured prognostic differences between patients in period 1 and patients in period 2, either existing at baseline or occurring during follow-up (unmeasured confounding) and not related to the mediator.
Modelling of the impact of treatment
After decomposition of the NDE and NIE, the ORs for the NIE (ORNIE in table 2) were assumed to reflect the unbiased effects of immunosuppressive treatment, and the ORs for the NDE (ORNDE in table 2) were assumed to reflect the unbiased effect of changed (improved) prognosis in period 2 versus period 1. Only ORNIE was used to calculate the post-treatment likelihood for all three outcomes, as a function of their pretreatment likelihood (probability plots), using Bayes’ rule.14 Bayes’ rule implies post-treatment likelihood (expressed as odds) is proportional to the product of OR for the treatment effect and pretreatment likelihood (expressed as odds):
post-treatment odds=ORNIE×pretreatment odds
Observed pretreatment odds were assumed to be the odds on the outcomes (Y) actually observed in period 1 (the control period). The number needed to treat (NNT) with intensive immunosuppressive treatment in order to find one additional patient with a favourable outcome (Y) was calculated as NNT=1/absolute risk reduction (ARR). ARR was defined as the likelihood of Y in period 2 minus the likelihood of Y in period 1.
Due to missing CRP values at follow-up, a proper CRP response could not be calculated in 6 of the 86 patients from period 1. Data on a CRP response were available for all 86 patients from period 2. In total 166 patients were included in this analysis. The mean (SD) baseline CRP level was 167 (98) mg/L for patients in period 1 and 159 (73) mg/L for patients in period 2. The mean (SD) CRP response was 20% (51%) in period 1 and 85% (83%) in period 2. Table 1 proves that the likelihood of WHO clinical improvement on day 14 and survival on days 30 and 90 was strongly associated with the occurrence of an early CRP response (last column); those with a CRP response had better outcomes than those without. This effect was found both in period 1 (the period without treatment) and in period 2 (with treatment) (see columns 3 and 4). However, the likelihood of a positive CRP response was far higher for patients in period 2 (who had received treatment) than for patients in period 1 (who had not received treatment). Still, in period 1, a few positive CRP responses (8 of 80 had it) were documented, while not all patients in period 2 had a positive CRP response (21 of 86 did not have it).
When looking in the strata of ‘no CRP response’ (the lines with ΔCRPno), there was also some effect of period 2 versus period 1, independent of CRP response, but this contrast was smaller than the contrast between positive CRP response and no positive CRP response for all three outcomes. So far, it looks as if immunosuppressive treatment had an effect on outcome via CRP response (an indirect effect), but that there was also a period effect (a direct effect).
Direct and indirect effects were further decomposed and quantified using non-linear mediation analysis (table 2). For all three outcomes, direct (CRP response-independent) period effects could be demonstrated. The likelihood of non-treatment-related WHO clinical improvement on day 14 was 38% higher (OR=1.38) for patients in period 2 than in period 1, and survival was 16% higher (OR=1.16) (both on day 30 and day 90). This means that, irrespective of immunosuppressive treatment, the overall prognosis of period 2 patients for improvement and survival was better than of period 1 patients.
For all three outcomes, substantial indirect effects could also be demonstrated. The likelihood of CRP response-mediated clinical improvement on day 14 was 127% (OR=2.27) higher for patients in period 2 than in period 1, 30-day survival was 60% (OR=1.60) higher and 90-day survival was 49% (OR=1.49) higher.
Figure 2 extrapolates the impact of the treatment effects (the decomposed NIE) to different virtual pretreatment levels. Note that the figure reflects modelled data based on measured effects in the CHIC study and not real data. The treatment effects visualised in this graph can be considered as unbiased. All curves lying above the diagonal represent beneficial treatment effects. The more the curve deviates from the diagonal the more impressive the treatment effect is. More treatment effects are seen in 14-day clinical improvement than in 30-day and 90-day survival. The symbols representing the actually measured data in the CHIC study are plotted in the graph. The NNTs calculated on the basis of these really observed data are 5 for 14-day clinical improvement, 9 for 30-day survival, and 10 for 90-day survival.
The data indicate that, in comparison with period 1, the immunosuppressive therapy administered to patients in period 2 has increased the 14-day clinical improvement rate over and above the level that could be expected based on the prognostic advantages that patients in period 2 (apparently) had. Similar effects were found for (30-day and 90-day) survival. It was possible to decompose the period effect (direct) from the CRP-mediated effect (indirect) by using mediation analysis. The period effect includes non-CRP-related improvements in management over time, such as better ventilatory support techniques,15 more focus on anticoagulation,16 and putatively a better natural prognosis. While it cannot be entirely excluded that these improvements have also had some impact on CRP response, it is hard to believe that they could reduce baseline CRP levels by more than 80% within 13 days. The estimate of the immunosuppressive treatment effect is nevertheless conservative. It is possible that part of the specific effect by immunosuppressive treatment was not captured by us as indirect effect. As said, the threshold of 80% improvement in CRP before a CRP response was counted as present was quite high, and it cannot be excluded that some patients with a beneficial outcome due to immunosuppressive treatment had in reality a CRP response slightly lower than 80%. If true, this would mean that the real contribution of immunosuppressive therapy is even higher than estimated by us.
The difference in prognosis between period 1 and period 2 (the period effect), here reflected by an ORNDE >1 in the decomposition analysis, was obvious. The primary (conventional) analysis of the CHIC study already showed that the existing (measured) differences at baseline could not be held responsible for the observed treatment effects.3 Still, as brought up by others, differences in unmeasured variables can be responsible for differences in prognosis in a study with NEGD. Not only patients’ baseline prognosis may have differed, prognostic differences could also have occurred during follow-up through non-inflammation-mediated mechanisms (eg, differences in mechanical ventilation or anticoagulation therapy, or adverse effects due to immunosuppressive therapy).
At the roots of this causal analysis was the critique that the real efficacy of immunosuppressive therapy could not be estimated properly because of these prognostic differences (see figure 1: the proposed structural causal model). Formally, the CHIC study lacks the internal validity of an RCT. That there were prognostically relevant differences between period 1 patients and period 2 patients has been proven by this analysis, but the mediation analysis also has made clear that these prognostic benefits only account for part of the observed differences between period 1 and period 2. In fact, the positive effects of immunosuppressive therapy outweigh the positive effects of a better prognosis over time.
In the primary analysis, it has already been stipulated, using conventional statistical analysis, that the observed treatment contrast was robust to adjustment for measured (known) confounders. The structural causal model further adds credibility to this observation. Formally, by changing the state of hyperinflammation (X) through administering immunosuppressants (do (X)), by ruling out confounding of the relationship of X on Y by d-separation, and by proving that changing (X) has a positive impact on the outcome (Y) mediated by CRP (M), the relation between the ‘parent’ (X) and its ‘descendant’ (Y) proves to be causal.6 This means immunosuppressive therapy can be considered an effective means of improving outcome via reducing inflammation, as measured by CRP response. The requirement of time precedence, relevant for causation, is fulfilled since a CRP response up to day 13 follows the intervention on day 0 and precedes the clinical outcomes on days 14, 30 and 90, respectively. The formal requirement of ignorability, meaning that treatment is independent of the outcome, assured in an RCT by random treatment assignment, is met here by the strict time separation of period 1 and period 2 and period exclusivity for control (period 1) and immunosuppressive (period 2) treatment. We had already stipulated in the primary publication that bias by indication could be ruled out as an explanation for observed treatment differences.3
The potential implications for the treatment of patients with COVID-19 are noteworthy. Figure 2 shows the potential impact of immunosuppressive treatment on top of optimal supportive care in appropriately selected patients, namely those with hyperinflammation or CSS. Note that these effects should be considered free of bias since we separated them from the period effects. At first glance the effects visualised in figure 2 do not seem to be very impressive, but the intervention is only brief and temporary (5–10 days), applied during the acute phase of COVID-19, while its benefit in terms of improvement and better survival extends until at least 90 days. The brief intervention is assumed to help patients recovering from a temporary but dangerous period of hyperinflammation, invoked by the viral infection followed by dysregulation of the immune system.1 We used the CHIC study data to estimate NNTs, which were understandably better for clinical improvement than for survival. These estimated NNTs compare very well with those that can be deduced from the RECOVERY trial, which has compared the effects of the glucocorticoid dexamethasone on top of supportive care only versus supportive care alone.17 The beneficial effect of dexamethasone on 30-day mortality was rather small in the entire study population (we calculated an NNT of 36). Looking at the patients who needed mechanical ventilation, more likely those with hyperinflammation, the NNT in the RECOVERY trial was 8–9, very close to what we estimated for 30-day and 90-day survival in the CHIC study. Unfortunately, the RECOVERY investigators did not look at the subgroup of patients with increased CRP at baseline. Taking both studies together, though, NNTs of approximately 10 for mortality/survival are very reasonable and broadly considered clinically relevant. It means that 10 patients with COVID-19-induced hyperinflammation should be treated in order to save one additional life.
The structural causal model we have proposed here applies to patients with COVID-19 who fulfil our criteria for hyperinflammation (generalisability). This means not all patients with COVID-19 admitted to the hospital should be treated with immunosuppressive drugs, even though independent RCTs have now claimed benefits for both components of our strategy, when administered separately.17 18 The RECOVERY investigators also concluded that dexamethasone did not provide benefits when administered to patients who did not need oxygen to be supplied (the less severe patients)17 and TCZ seemed to be less effective in trials with patients without signs of hyperinflammation.19 20
The structural causal model as worked out here predicts the beneficial effects of immunosuppressive drugs are non-specific and depend on the rapid suppression of hyperinflammation (measured by a rapid and profound CRP response). It does not suggest or support a compound-specific treatment effect. Further, the structural causal model implies that immunosuppressive therapy only improves the outcome on top of optimal supportive care (ventilation, anticoagulation and so on), which should not be omitted.
Acceptability of clinical research, measured as clinical implementation, depends on a combination of several factors, including scientific rigour, comprehension and face validity. It is obvious that the results of RCTs are more credible than those of quasi-experimental studies, which is why the clinical community as well as guideline committees always wait for the trials. While the logic of causal inference is not fundamentally different from that underpinning RCTs, causal inference will be mistrusted, maybe since clinicians consider it less statistical (this article does not contain one p value) and medical journals will basically be reluctant to accept. One should for, instance, accept that a conclusion of causality depends on the acceptability of using counterfactual information (the third layer of causation). For the CHIC study, this means accepting the estimation of how patients in period 1 would have responded on immunosuppressive therapy, based on a biologically plausible mechanism (suppression of inflammation). In reality, however, no one in period 1 has ever received immunosuppressive therapy. Clinicians and guideline committees, used to interpret RCT results that provide a global estimation of counterfactuals on a silver platter (the control arm), will find this difficult to accept, but counterfactuals have become common in many fields of science and facets of society, such as artificial intelligence and machine learning, social sciences and econometrics.21–23 Societies cannot even properly function anymore without accepting counterfactual information and principles of causal inference.
We still think conventional RCTs should be done whenever possible to resolve the most burning clinical questions and especially in situations of maximum uncertainty (equipoise). However, clinical scenarios may exist in which RCTs are unrealistic, unethical or impractical. In such circumstances, causal inference provides a basis for accepting the added value of certain treatments that have not passed the filter of RCTs yet, or will never do so.
Data availability statement
Data are available upon reasonable request. The database contains deidentified (anonymous) participant data. Requests for admission can be directed to firstname.lastname@example.org.
The design and performance of the CHIC study were approved by the ethics committee of the Zuyderland Medical Center, Heerlen/Sittard.
Contributors All authors are responsible for the intellectual content of the manuscript and designed it together. RL wrote the manuscript, SR did the analyses, and all authors approved the final version of this manuscript.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.