Field trials of the Baby Check score card in hospital

The Baby Check score card was used by junior paediatric doctors to assess 262 babies under 6 months old presenting to hospital. The duty registrar and two consultants inde-pendently graded the severity of each baby's illness without knowledge of the Baby Check score. The registrars assessed the babies at presentation while the consultants reviewed the notes. The consultants and registrars agreed about the need for hospital admission only about 75% of the time. The score's sensitivity and predictive values were similar to those of the registrars' grading. The score's specificity was 87%. Babies with serious diagnoses scored high, while minor illnesses scored low. The predictive value for requiring hospital admission increased with the score, rising to 100% for scores of 20 or more. The appropriate use of Baby Check should improve the detection of serious illness. It could also reduce the number of babies admitted with minor illness, without putting them at increased risk. Baby Check is a score card developed to help parents and health professionals assess the severity of acute illness in babies under 6 months old.' 2 It consists of 19 checks (seven questions about symptoms in the previous 24 hours and 12 simple examination signs), each carrying a score. The scores for positive checks are added together. The higher the total score, the sicker the baby. an

Baby Check is a score card developed to help parents and health professionals assess the severity of acute illness in babies under 6 months old.' 2 It consists of 19 checks (seven questions about symptoms in the previous 24 hours and 12 simple examination signs), each carrying a score. The scores for positive checks are added together. The higher the total score, the sicker the baby.
Two versions have been produced: an illustrated booklet for parents and a card for doctors, health visitors, and midwives.' 3 The card includes definitions, a table showing the chance of the baby being well or mildly ill, moderately ill, or seriously ill at different scores, and notes about low scoring conditions which require attention.
For parents, the total score is divided into four groups, a score between 0 and 7 means the baby is well or only mildly ill and is unlikely to need medical attention at present, a score between 8 and 12 that the baby is unwell but is not seriously ill, and that advice should be sought from a doctor, health visitor, or midwife, a score between 13 and 19 that the baby is ill and needs to be seen by a doctor, and a score of 20 or more that there is a high chance of serious illness, and the baby needs to be seen by a doctor straight away. These cut offs were identified using Receiver Operating Characteristic curves and predictive values.
The score card simulates the clinical judgment of the original assessor, an experienced paediatrician, accurately. However, the predictive values, sensitivity, and specificity of the score apply only to the original study population. Before Baby Check can be adopted, it must be shown to be accurate in identifying seriously ill babies in other populations. This paper reports a field trial in which Baby Check was used to score babies presenting to hospital with an acute illness.

Methods
During 44 weeks, 13 paediatric house officers at Addenbrooke's Hospital were asked to score every baby under 26 weeks old presenting for assessment of an acute illness. They received no instruction in the use of the score card. As soon after presentation as possible, without knowledge of the score, the duty paediatric registrar graded each baby's illness on a seven point scale, ranging from: 'Baby needs urgent hospital treatment for a life threatening condition' to: 'Well baby not requiring any special care or treatment'. The registrars' grading reflected the baby's state at the time of presentation. Two consultant paediatricians reviewed each baby's notes after discharge, using the same scale and without knowledge of the score. Their gradings took into account the investigation results, diagnosis, treatment, and outcome. For the analyses, these gradings were simplified into four categories, shown in table 1.
Baby Check's performance in identifying the babies the consultants considered needed admitting for observation or treatment was compared with that of the registrars' grading. Differences in sensitivity, specificity, and predictive accuracy between the score and the registrar's grading were explored using X2 analyses.

Results
During the study 303/357 (85%) babies presenting to casualty were seen by the paediatric team and were thus eligible. The house officers scored 243 (80%). Nineteen babies presenting to the wards were also scored. Of these 262, 196 (75%) had their illnesses graded by the registrars: 172 (88%) were seen shortly after presentation, and in the remainder the registrar graded the illness from the notes a few hours later, using only the information available at the time of presentation. Fourteen babies who were sent home and 52 who were admitted did not have their illness graded by the registrar. Two hundred and fifty nine (99%) babies had their illness graded by consultant A and 260 (99%) by consultant B.

SEVERITY OF THE BABIES ILLNESSES
Of the babies scored, 227 (87%) were admitted (seven to intensive care) and 34 (13%) sent home (one not recorded). The median stay was two days, ranging from a few hours to 99 days; none died. The babies had a broad range of diagnoses, from minor complaints such as nappy rash to serious illnesses such as meningitis. Table 1 shows the registrars' and consultants' gradings of illness severity.

PROFILE OF SCORES
The scores ranged from 0 to 57 (fig 1), with a median of 12 (10th and 90th centiles 0, 34). The median score for babies sent home was 3 (0, 7),  Field trials ofthe Baby Check score card in hospital and for those admitted to paediatric wards and to intensive care, 13 (3, 34) and 30 respectively. Those graded by the registrars as needing hospital treatment (grade 1) had a median score of 27 (10, 46). Those graded as well or mildly ill (grade 4) had a median score of 4 (0, 16).
The score groups Of the 262 scores, 100 (38%) were between 0 and 7, 40 (15%) 8 to 12, 51 (20%) 13 to 19, and 71 (27%) 20 or more. Table 2 shows the diagnoses and gradings of illness severity for a random selection of babies in each score group. Table 3 shows the distribution of scores for some well defined diagnoses. The highest scores (52, 57) were for babies with bacterial pneumonia.

THE SCORE COMPARED WITH THE GRADINGS OF ILLNESS SEVERITY
A registrar and both consultants graded the illness in 193 (74%) cases. Table 4 shows comparisons of the gradings. Table 5 shows the score groups compared with the gradings. Data for 193 babies are presented to facilitate comparisons using the same cohort throughout. Using all the available data gave very similar results.
Concordance between the paediatricians The three paediatricians' gradings were compared for concordance using the percentage   Concordance between the score and the paediatricians The same procedure was used to compare the score and the illness gradings, using the same cut off between grades 2 and 3, and a score of 13 or more. The results were: score compared with registrars, 67%, x=0 34; score compared with consultant A, 64%, x=0 28; and score compared with consultant B, 71%, x=0 42.
Thus the agreement between the score and the paediatricians was slightly less good than among the paediatricians. SENSITIVITY Babies requiring hospital treatment (grade 1) There was no significant difference between the sensitivity of the score and that of the registrar in identifying babies the consultants graded as needing hospital treatment. The sensitivity of the registrars was 37/94 (39%) for consultant A and 30/63 (48%) for consultant B. The sensitivity of a score of 20 or more was 43/94 (46%) and 38/63 (60%) respectively. Babies requiring hospital admission (grades I and 2) A score of 13 or more identified significantly fewer of the babies graded by the consultants as requiring admission than the registrar did (p<O0O5). The sensitivity of the registrars was 103/138 (75%) for consultant A and 104/126 (83%) for consultant B. The sensitivity of a score of 13 or more was 82/138 (59%) and 83/ 126 (66%) respectively. SPECIFICITY The specificity of the registrars identifying babies the consultants graded as mildly ill (grade 4) was 8/12 (67%) and 9/17 (53%) respec-tively. The specificity of a score of 0 to 7 was 11/12 (92%) for consultant A and 14/17 (82%) for consultant B.
A low score thus had a higher specificity than the registrars' grading. The difference was not significant due to the small numbers. Figure 2 shows the predictive values of each score for each grade of illness, averaged over the two consultants (n=259). The higher the score the sicker the baby. The chance of needing hospital admission or treatment increased with the score. For example, the predictive value for needing hospital treatment was low at low scores, rising to 67% at a score of 20 and 100% at a score of 28.

Individual scores
Few babies scoring over 8 were graded as mildly ill. The number graded as needing careful observation at home (grade 3) also decreased as the score increased. The predictive values of a score between 0 and 7 were 11/68 (16%) and 14/68 (21%) respectively (table 5). These results were not significantly different.

FALSE NEGATIVES AND FALSE POSITIVES
These were identified using the registrars' grading, because this was recorded at the same stage of the illness as the score. Misclassifications arising from the consultants' gradings were very similar.
False negatives: babies with low scores requiring hospital treatment (grade 1) Three babies (7%) graded 1 had scores between 0 and 7. Their diagnoses were apnoeic episode, hydrocephalus, and pyloric stenosis (only scored for vomiting). Five (11%) scored 8 to 12. Their diagnoses were thyroglossal cyst, proctitis, viral infection (initially thought to be meningitis), pyloric stenosis, and staphylococcal skin and eye infection.  Figure 2 Predictive value ofeach scorefor each grade ofillness (n=259). Three babies did not have their illness graded by both paediatricians; they scored 16, 34 and 39. Thefour grades ofillness are defined as: Grade 1, has a serious illness needing hospital treatment. Grade 2, requires hospital admissionfor observation due to uncertainty about the severity ofthe illness. Grade 3, needs careful observation and treatment. Could be managed at home by a capable mother. Grade 4, mildly ill or well. Could be managed at home by any mother.
False positives: babies with high scores graded as well or mildly ill (grade 4) Three babies (10%) graded 4 had scores of 13 or more. None scored over 19. The diagnoses made were: proctitis, upper respiratory tract infection, and jaundice.

Discussion
These results show that Baby Check provides an accurate means of grading the severity of acute systemic illness in babies. The predictive value for serious illness increases with the score. Babies with serious diagnoses score high and those with minor illnesses score low. The score has a high specificity. The sensitivity and predictive value are similar to a paediatric registrar's grading of illness severity.
There is no gold standard to measure illness severity, and the field trial shows how difficult it is, even for experienced paediatricians. They only agreed about whether a baby needed admitting three quarters of the time. Many of the babies had non-specific illnesses which were difficult to assess, and about two thirds of admissions were because of uncertainty about the severity of the illness. The agreement between the score and the paediatricians was only slightly less good than that between the paediatricians themselves.
In a study of babies presenting to hospital it might seem strange that over half scored less than 13. But many of the babies were not seriously ill. About 40% were graded as suitable for home management, provided the mother could cope. The number admitted reflects hospital policy-babies presenting for acute assessment are admitted for observation unless they are obviously well.
Although the use of 'cut offs' is necessary to explore the score's sensitivity and specificity, they are a crude test of its accuracy. The score provides a continuous measure of illness severity. The risk of serious illness increases with the score (fig 2). The irregularities in the predictive values are due partly to the small numbers of babies at the higher scores. The predictive values from the original population shown on the professionals' version of Baby Check are smoothed.' The common diagnoses gave rise to a range of scores and paediatricians' gradings, reflecting the variation in the severity of disease. Baby Check is thus useful in grading illness severity in variable conditions such as bronchiolitis. Serious conditions such as meningitis and pneumonia always scored 20 or more. Diagnoses such as jaundice and convulsions were associated with lower scores, reflecting the fact that they often have few systemic signs.
Scores of 20 or more had a predictive value for being graded as needing admission of over 95%, with over 70% graded as needing hospital treatment. Scores in this range were as effective as the registrars in identifying the sickest babies-those graded as requiring treatment for a known serious condition. Most babies with scores over 19 had serious diagnoses.
A score of 13 or more was slightly less sensitive than the registrars in identifying babies graded as needing admission (grades 1 and 2), identifying approximately two thirds, with a positive predictive value of around 85%.
Baby Check must not miss seriously ill babies. The low scoring babies graded as needing admission fell into two groups: some had specific conditions such as convulsions, abscesses, and injuries that have few systemic signs but still need investigation. The score card is not designed to assess such conditions, which are easily recognisable. It includes a warning that they may give a low score but still warrant careful assessment. The second group comprised those admitted because of uncertainty about how the illness might progress. These babies had a wide range of illnesses and scores. These two factors also accounted for the poor negative predictive value of the lower scores.
One fifth of the babies graded by the registrars as needing hospital treatment (grade 1) scored less than 13. According to the consultants, however, the registrars overestimated the number needing treatment in a quarter of babies.
The specificity of the score was higher than that of the registrars. Some 87% of the well or mildly ill babies scored less than 8.
The false positive rate was low. Only three babies graded by the registrars as mildly ill scored over 12. None scored over 19. All were graded more seriously by the consultants.
Predictive values and sensitivities apply only to the population from which they were derived. In this trial Baby Check was tested by multiple observers in an environment that differed from the original one. The predictive values from the original population, shown on Baby Check,' and the values observed in this study differ mainly at the lower scores. The original study included a large community cohort,' 2 so that most of the low scoring babies were well.
Baby Check is designed for use by parents at home, general practitioners, health visitors, and midwives in the community and junior doctors in hospital. In each of these environments the prevalence of illness is different, which will affect the interpretation of the lower scores. At home most babies are well and achieve low scores.3 A baby with a low score at home has more than a 90% chance of being well or mildly ill. 1 2 In hospital many low scoring babies have conditions (such as a convulsion) which obviously need attention, as demonstrated by the poor predictive value of the low scores in this study.
Although many of the babies scoring 13 to 19 were graded as requiring admission once they had presented to hospital, few were thought to need treatment for a known serious illness, and if seen by a general practitioner some of these babies might have been managed at home with careful review. Scores of 20 or more are rare in the community,1-3 5 however, and this study confirms that babies who achieve them warrant urgent assessment wherever they are seen.
The score provides an objective means of grading a baby's illness, which can be used in association with clinical judgment to decide what action should be taken. It should help doctors assess the severity of acute systemic illness in babies. The score is of particular value in identifying the well and the most seriously ill babies, and should assist in the management of the large group of babies who are currently admitted because of uncertainty about the severity of the illness. Provided the home circumstances and previous history were satisfactory, some of those with low scores could be sent home and kept under review by the mother and the general practitioner using Baby Check, 3 6 returning to hospital if the score increased.
The appropriate use of Baby Check in hospital and in the community should improve the detection of serious illness. It could also reduce the number of babies admitted with mild illness, without putting them at increased risk.