Statistics from Altmetric.com
Pearson and colleagues have presented data highlighting the use of the paediatric index of mortality (PIM) score as a tool for auditing paediatric intensive care unit (PICU) performance.1 Whilst we would agree with the authors' message that PIM has many advantages over other scoring systems, we feel that urgent calibration is needed before this tool is adopted as a benchmark for performance indication in the UK. PIM variables were developed predominantly from an Australian data set (one British PICU, Birmingham participated) over 1994–95; the data used in Pearson's validation comes from five UK PICUs, including our own over the period 1998–99.1 PIM continues to discriminate between death and survival reasonably well giving an area under the ROC curve of 0.840 (95% CI 0.819–0.853),1 marginally less than the figure of 0.90 seen in the original paper.2 However, from the 4 year period between development and validation the model is now calibrating poorly, as evidenced by two pieces of information from Pearson's study.1
First, the overall standardised mortality ratio (SMR) is 0.87 (95% CI 0.81–0.94); this figure is remarkably concordant across 4 of the 5 PICUs. Second, from table 2,1 it is possible to calculate the Hosmer-Lemeshow statistic: chi-squared = 37.41, p<0.0001. This implies poor calibration, (good calibration traditionally represented by a p value >0.10).
The reasons for the loss of calibration are unclear. A possible, perhaps over optimistic explanation is that UK units in the latter study were all “over performing” given that individual units demonstrated an SMR of between 0.83 and 0.89. However it is unlikely that such a quantum leap in the quality of paediatric intensive care delivery has occurred over the 4 years between 1994–98, given that no major treatment breakthroughs or radical service reorganisation has occurred in this time.
More recent data from our PICU highlight the trend towards poorer calibration, where the PIM-derived SMR from 910 patients seen during the 2000 calendar year is 0.54 (95%CI 0.39–0.69). The authors acknowledge the shortcomings and state that a revised version of PIM will soon be available. However, recalibration is only worthwhile if a very broad sample of UK units participates. The UK PICOS study (paediatric intensive care outcome study) will attempt to address this, by collecting data used in the calculation of several scoring systems across the whole of the UK over a one year period commencing March 2001. From this study it is hoped that an optimal indicator of PICU performance will be derived.
Dr Tibby and Dr Murdoch note that, in our study of paediatric intensive care units (PICUs) in the UK,1 PIM discriminated well between children who died and children who survived, with an area under the ROC curve of 0.84. However, they are concerned that PIM had “poor calibration” because the standardised mortality rate (SMR) in the UK units was 0.87 (95% CI 0.81–0.94)—that is, the actual number of deaths was only 87% of the number predicted by PIM. In fact, this figure is almost identical to the PIM SMR for all PICUs in Australia in 1997–99, where the SMR was also 0.87 (95% CI 0.81–0.92). It is very encouraging that PIM gives such similar results in Australia and the leading PICUs in the UK, as it suggests that standards are comparable between the two groups of units and that PIM performs similarly in Australian and UK children.
It is normal for SMRs to fall with time as intensive care improves, and for mortality prediction models to need recalibration. This has happened with PRISM,2 MPM3 and APACHE,4 as well as PIM. Despite Dr Tibby and Dr Murdoch's reservations, the fact that the SMR has fallen by a similar amount in both Australia and the UK suggests that standards of care have improved in PICUs in those countries in recent years.
Dr Tibby and Dr Murdoch point out that the Hosmer-Lemeshow test gives a low p value for PIM's performance in the UK data. This test divides the sample into 10 groups, ranging from very low to very high risk of death, and compares the actual number of survivors and non-survivors in each group with the number predicted by PIM. Because PIM predicts too many deaths in the leading units in the UK, it follows that the number of actual deaths differs from the number predicted—so the Hosmer-Lemeshow p value is low. However, table 2 in our paper shows that the ratio of observed to expected deaths was similar across the 10 groups,1 so that the recalibrated model is likely to fit well. The fact that the Hosmer-Lemeshow test gives a low p value does not necessarily mean that a model (such as PIM) is invalid—it often means only that the standard of care in the test PICUs differs from that in the units in which the model was derived.
The PICUs that contributed the data from which the PIM score was derived were all leading units that deliver a high standard of care, so the score reflects best practice in 1994–96 when the data were collected. We are recalibrating PIM using data from units in the UK and Australia, and the new model will be available this year. Unfortunately, the quality of paediatric intensive care is not uniform in the UK, and there is evidence that some units do not perform at an optimal standard.5–7 Surely it would be preferable for the UK to use an international standard based on best practice (such as PIM), rather than the average of good and not-so-good units from the whole of the UK (PICOS). The UK should aim for best practice rather than being content with average practice.