Aims: To compare the predictive performance of clinical risk factor assessment and pre-discharge bilirubin measurement as screening tools for identifying infants at risk of developing significant neonatal hyperbilirubinaemia (post-discharge total serum bilirubin (TSB) >95th centile).
Methods: Retrospective cohort study of term and near term infants born in an urban community teaching hospital in Pennsylvania (1993–97). A clinical risk factor scoring system was developed and its predictive performance compared to a pre-discharge TSB expressed as a risk zone on a bilirubin nomogram. Main outcome measures were prediction model discrimination, range of predicted probabilities, and sensitivity, specificity, positive and negative predictive values, and likelihood ratios for various positivity criteria.
Results: The clinical risk factor scoring system developed included birth weight, gestational age <38 weeks, oxytocin use during delivery, vacuum extraction, breast feeding, and combination breast and bottle feeding. The pre-discharge bilirubin risk zone had better discrimination (c = 0.83; 95% CI 0.80 to 0.86) than the clinical risk factor score (c = 0.71; 95% CI 0.66 to 0.76) and predicted risk of significant hyperbilirubinaemia as high as 59% compared with a maximum of 44% for the clinical risk factor score. Neither the risk score nor the pre-discharge TSB risk zone predicted the outcome with ⩾0.98 sensitivity without significantly compromising specificity (0.13 and 0.21, respectively). Multi-level clinical risk factor scores and TSB risk zones produced likelihood ratios of 0.15–3.25 and 0.05–9.43, respectively.
Conclusions: The pre-discharge bilirubin expressed as a risk zone on an hour specific bilirubin nomogram is more accurate and generates wider risk stratification than a clinical risk factor score.
- AAP, American Academy of Pediatrics
- AGA, appropriate for gestational age
- BW, birth weight
- GA, gestational age
- LGA, large for gestational age
- SGA, small for gestational age
- TSB, total serum bilirubin
- clinical prediction rules
- neonatal hyperbilirubinaemia
Statistics from Altmetric.com
- AAP, American Academy of Pediatrics
- AGA, appropriate for gestational age
- BW, birth weight
- GA, gestational age
- LGA, large for gestational age
- SGA, small for gestational age
- TSB, total serum bilirubin
Recognising the challenges of identifying infants with significant hyperbilirubinaemia after hospital discharge, the American Academy of Pediatrics (AAP) and other healthcare quality monitoring groups have recommended that prior to hospital discharge, providers screen all newborn infants for their risk of developing significant hyperbilirubinaemia. The two screening strategies that have been recommended are clinical risk factor assessment and bilirubin measurement prior to discharge.1,2 Precise risk factor assessment utilises clinical prediction rules that incorporate risk factors for hyperbilirubinaemia obtainable from the medical history and physical examination.3 The bilirubin measurement screening strategy utilises the infant’s pre-discharge bilirubin expressed as a percentile on an hour specific bilirubin nomogram.4 The objective of this study was to assess and compare the predictive performance of these two screening strategies in order to identify a preferred clinical strategy.
Design, setting, and study group
For the clinical risk factor screening strategy we developed a clinical risk factor based logistic regression model and then used that model to derive a clinical risk factor score to predict significant hyperbilirubinaemia. For the bilirubin measurement strategy, we evaluated the predictive accuracy of the pre-discharge bilirubin expressed as a risk zone on an hour specific bilirubin nomogram. The study group was drawn from infants who were the subjects of a previous study to develop the hour specific bilirubin nomogram.4 The base population was 13 003 full or near term infants born at an urban community teaching hospital in Pennsylvania during 1993–97. In order to exclude preterm infants with inaccurate (overestimated) gestational age, the original study restricted subjects to infants with birth weight (BW) ⩾2000 g if gestational age (GA) ⩾36 weeks and BW ⩾2500 g if GA ⩾35 weeks. The study sample was further limited to infants who participated in the hospital’s early discharge follow up programme (n = 2976), which was offered to all mothers discharged within 48 hours after vaginal deliveries or 72 hours after caesarean sections who could not have their infant seen by a provider within a day or two. The early discharge follow up programme was introduced in 1993 in response to the relatively new but increasingly common practice of early postpartum discharge at that time. Infants who required phototherapy during the birth hospitalisation (n = 18) or were treated in the intensive care nursery for any period of time prior to discharge(n = 118) were excluded, leaving 2840 infants as potential subjects for analysis (fig 1).
As part of the early discharge follow up programme, an effort was made to obtain TSB levels in the hospital for all infants prior to discharge and on follow up. Throughout 1993–94, most infants had pre-discharge TSBs but only about 40% had both pre- and post-discharge TSBs. However, with the exception of 6 months, throughout 1995–97 more than 75% of all infants in the programme had both pre- and post-discharge TSB levels obtained. In order to prevent verification bias (which occurs when patients with a positive test result preferentially have the result confirmed with a reference standard test) we further restricted the study sample to infants born during these months in 1995–97 when ⩾75% of the infants had both pre- and post-discharge TSBs measured (n = 996) (fig 1). A total of 899 infants (90%) had both pre- and post-discharge TSBs in this time frame. The 97 infants who did not were not included in the development and evaluation of the risk assessment strategies. Institutional review boards of the participating institutions approved the study.
Data sources and collection
Demographic and laboratory data were obtained from the research database developed for the original bilirubin nomogram study. Research assistants reviewed the original maternal and infant birth hospitalisation records of all 2840 infants from that study to collect additional data about clinical risk factors not recorded as part of that study. Additional demographic information (such as maternal race and age) was abstracted from standard administrative forms. Maternal and infant clinical information, such as medical history and physical findings, was drawn primarily from standard admission, intrapartum, and discharge forms. When clinical information on these forms was either missing or ambiguous, research assistants reviewed physician progress notes and nursing flow sheets for additional information. Two research assistants performed the chart reviews and entered data into a Microsoft Access (Redmond, WA) relational database.
Data elements and definitions
The dependent variable for the prediction models was development of significant hyperbilirubinaemia, defined as a post-discharge TSB >95th centile on an hour specific bilirubin nomogram.4 The 95th centile on the bilirubin nomogram is nearly identical to the phototherapy threshold curve recommended for “medium risk” infants in the AAP’s 2004 clinical practice guideline, and thus is a clinically meaningful outcome to predict.2
We identified risk factors from the medical history, physical examination, or laboratory evaluation that were previously reported as predictors of significant neonatal hyperbilirubinaemia.3,5–8 These are grouped into infant factors, maternal factors, and pregnancy events/delivery characteristics in table 1. Because the presence and degree of jaundice often cannot be accurately nor reliably estimated9 and because it was not systematically or consistently documented in the hospital record, we did not include it as a predictor in the development of the clinical risk factor score.
Pre/post discharge TSB and risk zone designation
When an infant had more than one pre-discharge TSB, the TSB measurement closest to the time of hospital discharge was selected as the predictor TSB. In general, TSBs obtained prior to and after hospital discharge were defined as pre- and post-discharge TSBs, respectively. However, to approximate commonly occurring newborn lengths of stay, for a minority of cases we used two rules developed a priori to reassign pre-discharge TSB to post-discharge TSB, and vice-versa. First, when an infant remained in the hospital for more than two days (for example, born by caesarean section) and had multiple TSB values obtained prior to discharge, but none after discharge, if the last TSB value was obtained after 40 hours of age, then the TSBs obtained prior to 40 hours were designated pre-discharge TSBs and the subsequent TSBs were designated post-discharge TSBs. Second, when an infant was discharged early and had no pre-discharge TSB but did have multiple post-discharge TSBs, if there was at least one TSB obtained before 72 hours, then the first TSB was designated a pre-discharge TSB and all subsequent TSBs were defined as post-discharge TSBs. Overall, we reassigned a pre-discharge TSB to a post-discharge TSB for 213 (7.5%) infants and a post-discharge TSB to a pre-discharge TSB for 96 (3.4%) infants.
Univariate associations between the individual predictors and the development of a post-discharge hour specific TSB >95th centile, were examined using t tests for continuous predictors and Pearson χ2 and Fisher’s exact test (when appropriate) for categorical predictors. Clinical risk factors (not including the pre-discharge TSB or risk zone) that were associated with the outcome in univariate analyses at the p < 0.2 level of significance were considered for inclusion in a clinical risk factor based multivariable logistic regression model. In cases where the candidate variables were highly correlated (for example, birth weight and discharge weight), the most clinically relevant and reliably ascertained variable was chosen for inclusion in the model building procedure. The multivariable model was developed using the best subset selection method10,11 and compared to those developed using forward selection, backward elimination, and stepwise procedures. To translate the final logistic regression model into a clinical risk factor score, each infant was assigned points, equal to the sum of the odds ratios (ORs) (multiplied by 2 and rounded down) corresponding to risk factors present for that infant.3 Scoring systems based on the logistic coefficients and non-multiplied ORs produced identical results but involved adding fractions, which we felt was cumbersome for clinicians.
The hour specific bilirubin nomogram was used to translate the pre-discharge predictor TSB values into corresponding centile based risk zones. The pre-discharge risk zone (low, 0–40th centile; low-intermediate, 41th–75th centile; high-intermediate, 76th–95th centile; and high, >95th centile) was then used to predict the outcome of interest.
The accuracy of the clinical risk factor score and the pre-discharge TSB risk zone as predictors of subsequent significant hyperbilirubinaemia was compared in three ways. First we used the c-statistic12 to calculate model discrimination, that is, the models’ ability to distinguish between patients with and without the outcome of interest (development of significant hyperbilirubinaemia). The c-statistic is equivalent to the area under the receiver operating characteristics (ROC) curve,13 which we plotted for each model. The discrimination of the models was compared using an algorithm developed by DeLong and colleagues.14 Second, we determined and compared the spectrum of predicted probabilities generated by each model. Third, we calculated the sensitivity, specificity, positive and negative predictive values, and likelihood ratios for alternative positivity criteria for each prediction rule. The Hosmer-Lemeshow goodness-of-fit statistic15 was used to determine each prediction rule’s calibration, that is, the degree of correlation between observed and predicted rates of the outcome along the spectrum of predicted risk.
Approximately 11% (n = 98) of infants developed significant hyperbilirubinaemia (a post-discharge TSB >95th centile). In univariate analyses, factors that were associated with an increased risk of developing significant hyperbilirubinaemia (p < 0.20) included gestational age (GA) <38 weeks and ⩾40 weeks, large for gestational age (LGA), higher pre-discharge TSB risk zone, higher birth weight, breast feeding, combined breast and bottle feeding, maternal diabetes, vacuum extraction, prolonged rupture of membranes, and oxytocin use. Small for gestational age (SGA), parity, and caesarean section were associated with a decreased risk of developing significant hyperbilirubinaemia (table 1).
Clinical risk factor score
The clinical risk factor scoring system derived from the multivariable logistic regression included the following risk factors: birth weight, GA <38 weeks, oxytocin use during delivery, vacuum extraction, breast feeding, and combination breast and bottle feeding (table 2). Model building using forward selection, backward elimination, and stepwise procedures identified the same risk factors, except the combined breast and bottle feeding variable did not remain in the models. The clinical risk factor score had a c-statistic of 0.71 (95% CI 0.66 to 0.76) (fig 2). The range of predicted probabilities generated by the scoring system was relatively narrow (0.01–0.44).
Table 3 shows the predictive properties of the clinical risk factor score using various cut-off points for what score is considered a positive test. If the goal of screening is to identify all true positives, then a relatively low score needs to be used as the positivity criterion. For example, using a score of 8 as the positivity criterion identified 98% of infants who went on to develop significant hyperbilirubinaemia (that is, sensitivity = 0.98), but incorrectly labelled as high risk 87% of infants who did not develop significant hyperbilirubinaemia (that is, specificity = 0.13). More stringent positivity criteria (that is, higher score thresholds) result in improved specificity (fewer false positives), but also in a rapid decline in sensitivity (more false negatives) (table 3). The likelihood ratios for specific clinical risk factor score intervals ranged from 0.15 to 3.25 (table 4). The Hosmer-Lemeshow goodness of fit statistic for the clinical risk factor score was non-significant (p = 0.47), suggesting good calibration.
Pre-discharge TSB risk zone
The pre-discharge TSB expressed as a risk zone on the hour specific bilirubin nomogram had good discrimination (0.83; 95% CI 0.80 to 0.86) (fig 2) and generated predicted probabilities ranging from 0.01 to 0.59. Using the 40th centile as a threshold to designate infants at risk of developing significant hyperbilirubinaemia identified all but one infant who went on to develop significant hyperbilirubinaemia (that is, sensitivity = 0.99) (table 3) but incorrectly labelled 79% of infants without significant hyperbilirubinaemia as being at risk for developing the outcome (that is, specificity = 0.21). The number of false positives generated using the 40th centile as the positivity criterion is lower than the number generated using a risk score of 8, but the positive predictive value is not much better (13%). This may be tolerated if the clinician’s primary objective is to avoid false negatives (maximise sensitivity and NPV). The likelihood ratios for specific risk zones ranged from 0.05 to 9.43 (table 4). The Hosmer-Lemeshow goodness of fit statistic for the pre-discharge TSB risk zone was non-significant (p = 0.25).
In this study we found that the pre-discharge TSB risk zone was superior to a clinical risk factor score for assessing risk of developing significant neonatal hyperbilirubinaemia. Neither screening strategy could predict the outcome with ⩾0.98 sensitivity without compromising specificity (0.13 and 0.21, respectively), but the pre-discharge risk zone approach had better discrimination, and multi-level risk zones provided a wider range of likelihood ratios (and predicted probabilities) for risk stratification than multi-level risk scores. For example, a 37 week gestational age (5 points) infant delivered with vacuum assistance (4 points) and oxytocin augmentation (4 points) weighing 3300 g at birth (6 points) and exclusively breast fed at discharge (5 points) would have a clinical risk score of 24, placing her in the highest risk strata with a 29% predicted probability (PPV) of developing significant hyperbilirubinaemia (a post-discharge TSB >95th centile). If that same infant had a pre-discharge TSB in the high risk zone (>95th centile) a clinician could actually expect that infant to have a significantly higher probability of developing significant hyperbilirubinaemia after discharge (PPV = 54%).
Our clinical risk factor scoring system is similar to clinical risk factor based prediction rules developed by others. Stevenson and colleagues16 found end tidal carbon monoxide levels, method of feeding, and birth weight were predictive of TSB ⩾95th centile on the hour specific bilirubin nomogram at 96±12 hours of life (c-statistic not reported). Chou et al identified maternal race, breast feeding, and GA<38 weeks as consistent predictors of TSB ⩾20 mg/dl (c = 0.79) and TSB ⩾ age specific AAP criteria for considering phototherapy (c = 0.69).17 Newman and colleagues’ risk index,3 which includes exclusive breast feeding, family history of jaundice in a newborn, neonatal bruising, cephalohaematoma, gender, gestational age, and maternal race and age ⩾25, had better discrimination than ours (c = 0.83), but was developed using a different study design (nested case-control) to predict a different outcome (absolute TSB >25 mg/dl (428 mmol/l)). Comparison of the pre-discharge TSB risk zone to a slightly modified version of Newman’s risk index18 (developed to predict a post-discharge TSB >20 mg/dl (342 mmol/l)) found that the discrimination of the TSB risk zone (c = 0.79) was superior to the clinical risk factor score (c = 0.69).
Our clinical risk factor score differs from others in that it includes oxytocin exposure, a risk factor for hyperbilirubinaemia first recognised more than 25 years ago.19–23 Oxytocin exposure may be a surrogate marker for infant size, cephalopelvic disproportion, and nulliparity—all risk factors for hyperbilirubinaemia—although oxytocin may exert some, as yet uncharacterised, direct effect on neonatal bilirubin metabolism. Our rule also includes vacuum extraction, which may be a surrogate marker for large infant size, but also perhaps is a more reliable indicator of bruising and/or cephalohaematoma (known risk factors for hyperbilirubinaemia) than the detection and documentation of these findings in routine clinical care. Finally, similar to Stevenson and colleagues,16 we found a direct linear relation between greater birth weight and significant hyperbilirubinaemia. The finding in univariate analyses that being small for gestational age (GA) greatly decreased (OR 0.1; 95% CI 0.02 to 0.9) while being large for GA increased the risk of significant hyperbilirubinaemia (OR 1.6; 95% CI 0.9 to 2.9) suggests that size for GA is also a factor in determining risk. The relation of higher birth weight to hyperbilirubinaemia may be mediated by its association with maternal diabetes and subsequent neonatal polycythaemia, or the increased risk of bruising and cephalohaematoma incurred by larger infants.
Our study has some limitations. It is possible that the initial restriction of the study sample to infants who participated in the hospital sponsored early discharge follow up programme introduced spectrum bias (a form of selection bias). That is, if infants enrolled in the programme were at higher risk of developing significant hyperbilirubinaemia, then the results of the prediction models evaluated may only be generalisable to higher risk infants. Two factors argue against the presence of spectrum bias: (1) the decision to offer participation in the early discharge follow up programme was based entirely on the duration of hospital stay, regardless of risk of developing hyperbilirubinaemia; and (2) the demographics of the study sample closely resemble those of the base population. The fact that not all study infants had a post-discharge TSB performed may have introduced some verification bias. That is, it may have decreased the number of infants with negative test results (low pre-discharge TSB risk zone or clinical risk score) which would have had the effect of over-estimating the sensitivity and under-estimating the specificity of both the TSB and clinical risk factor based prediction models. However, unlike previous studies,3,16,17 we restricted the sample used to derive the clinical risk factor score to infants born during months when >75% of infants had post-discharge bilirubin measurements, which should have minimised verification bias. Finally, the paucity of Asians in our study population did not allow discernment of the known association between this ethnic group and the development of significant hyperbilirubinaemia.
What is already known on this topic
Multiple healthcare quality monitoring organisations have recommended pre-discharge screening of newborns for risk of subsequent hyperbilirubinaemia
Two screening strategies have been recommended—clinical risk factor assessment and determination of hour specific bilirubin values—but their relative predictive performance is not known
In employing either bilirubin screening or risk factor assessment, clinicians must take efforts to avoid measurement error that can affect predictive performance. Methods to improve bilirubin measurement accuracy and inter-laboratory variability include routine calibration of instruments and strict adherence to quality assurance procedures.24,25 Providers using clinical risk factor prediction rules to assess risk must be sure to collect accurate information about clinical risk factors using the same definitions as those used in developing the prediction rules. Deviations from those original definitions will compromise predictive accuracy. For this reason, risk assessment strategies that utilise objective factors (such as use of oxytocin or vacuum extraction) will likely produce more generalisable results compared with those that utilise observer dependent factors (such as bruising, cephalohaematoma, and extent of jaundice) and factors that are ambiguous or difficult to define (such as maternal or infant race).
Finally, clinicians should not rely exclusively on the predictions provided by risk assessment strategies at the time of discharge. Events occurring after the prediction has been made, such as inadequate feeding or haemolysis due to cephalohaematoma, may increase the risk of severe hyperbilirubinaemia and will not be reflected in the pre-discharge risk assessment. For example, the one infant in this study whose TSB jumped from the low risk zone prior to discharge to the high risk zone after discharge was breast fed and had his pre-discharge bilirubin measured at 14 hours of life, which was probably too early to reflect all the factors that result in significant hyperbilirubinaemia, most notably breast feeding problems. For this reason all infants discharged prior to 72 hours should be examined by a health care professional in the first few days after discharge to assess infant well being and the presence of jaundice. The predicted probabilities generated by both risk assessment strategies can be used to determine the timing and frequency of follow up visits for these infants.2 For example, infants in the higher risk zones (>75th centile), 26% of whom developed significant hyperbilirubinaemia in the this study, need to be seen closer to 72 hours after birth, while those in the lower risk zones (<75th centile), 2% of whom developed significant hyperbilirubinaemia, could probably be seen closer to 120 hours, unless other risk factors and feeding issues necessitate earlier post-discharge follow up. If follow up within 12–24 hours cannot be assured, infants in the highest risk zone (>95th centile), who had a 54% rate of significant hyperbilirubinaemia, should remain in the hospital until their bilirubin trajectory is elucidated.
What this study adds
This study shows that the hour specific bilirubin expressed as a risk zone on a bilirubin nomogram is more accurate than a clinical risk factor scoring system for assessing risk of significant hyperbilirubinaemia
We thank Lois Johnson, MD for sharing data on the bilirubin nomogram and for useful advice on the study design, Emidio Sivieri, MS for his work on the research database, Jennifer Baldwin and Michele Tereschuk for their assistance with medical record reviews and Chris Feudtner, MD, MPH, PhD, Harold Sox, MD, MPH, and Tracy Lieu, MD, MPH for their comments on earlier versions of the manuscript.
Funding: Dr Keren was supported by grant number K23 HD043179 from the National Institute of Child Health and Human Development, Bethesda, MD, USA
Competing interests: none declared
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.