Article Text

Validation of a clinical algorithm to identify neonates with severe illness during routine household visits in rural Bangladesh
  1. Gary L Darmstadt1,
  2. Abdullah H Baqui1,
  3. Yoonjoung Choi1,
  4. Sanwarul Bari2,
  5. Syed M Rahman2,
  6. Ishtiaq Mannan1,
  7. A S M Nawshad Uddin Ahmed3,
  8. Samir K Saha4,
  9. Habibur Rahman Seraji2,
  10. Radwanur Rahman2,
  11. Peter J Winch1,
  12. Stephanie Chang1,
  13. Nazma Begum2,
  14. Robert E Black1,
  15. Mathuram Santosham1,
  16. Shams El Arifeen2 for the Bangladesh Projahnmo-2 (Mirzapur) Study
  1. 1Department of International Health, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA
  2. 2Public Health Sciences Division, ICDDR,B, Dhaka, Bangladesh
  3. 3Department of Pediatrics, Kumudini Women's Medical College, Mirzapur, Bangladesh
  4. 4Department of Microbiology, Bangladesh Institute of Child Health, Dhaka Shishu Hospital, Dhaka, Bangladesh
  1. Correspondence to Dr Gary L Darmstadt, Family Health Division, Global Health Program, Bill & Melinda Gates Foundation, PO Box 23350, Seattle, WA 98102, USA; gary.darmstadt{at}


Background To validate a clinical algorithm for community health workers (CHWs) during routine household surveillance for neonatal illness in rural Bangladesh.

Methods Surveillance was conducted in the intervention arm of a trial of newborn interventions. CHWs assessed 7587 neonates on postnatal days 0, 2, 5 and 8 and identified neonates with very severe disease (VSD) using an 11-sign algorithm. A nested prospective study was conducted to validate the algorithm (n=395). Physicians evaluated neonates to determine whether newborns with VSD needed referral. The authors calculated algorithm sensitivity and specificity in identifying (1) neonates needing referral and (2) mortality during the first 10 days of life.

Results The 11-sign algorithm had sensitivity of 50.0% (95% CI 24.7% to 75.3%) and specificity of 98.4% (96.6% to 99.4%) for identifying neonates needing referral-level care. A simplified 6-sign algorithm had sensitivity of 81.3% (54.4% to 96.0%) and specificity of 96.0% (93.6% to 97.8%) for identifying referral need and sensitivity of 58.0% (45.5% to 69.8%) and specificity of 93.2% (92.5% to 93.7%) for screening mortality. Compared to our 6-sign algorithm, the Young Infant Study 7-sign (YIS7) algorithm with minor modifications had similar sensitivity and specificity.

Conclusion Community-based surveillance for neonatal illness by CHWs using a simple 6-sign clinical algorithm is a promising strategy to effectively identify neonates at risk of mortality and needing referral to hospital. The YIS7 algorithm was also validated with high sensitivity and specificity at community level, and is recommended for routine household surveillance for newborn illness. no. NCT00198627.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


An effective strategy to decrease neonatal mortality in low resource settings is to introduce community-level interventions with linkages to the healthcare system for treatment of severe illness.1 2 In many rural resource-poor settings, trained community health workers (CHWs) can: promote essential newborn care practices at home3,,5; improve care seeking for severe neonatal illness by providing parental education in recognition of signs of illness; identify signs of illness through direct assessments at routine surveillance visits and refer sick infants to a health facility6 7; and manage illness at home when a referral is not complied with or facility-level referral is not feasible.3 8,,10

What is already known on this topic

  • Community health workers (CHWs) are capable of validly identifying sick newborns.

  • The Young Infant Study 7-sign algorithm (YIS7) can be used by healthcare workers at health facilities to identify sick newborns needing urgent care in hospital.

  • No validated clinical algorithm exists for assessment of newborns by frontline workers during routine household surveillance.

What this study adds

  • We identified a simple, valid, 6-sign clinical algorithm for use by CHWs to assess newborns for illness during routine household visits.

  • The algorithm was valid for identifying newborns who were sick and at risk of death.

  • The YIS7 algorithm performed similarly and is also recommended for screening young infants for illness during routine home visits.

Accurate assessment by CHWs is an important prerequisite for successful community-level management of neonatal illness, and programmes have used various clinical algorithms for illness identification.3 7 8 11 WHO Integrated Management of Childhood Illness (IMCI) protocols have been evaluated at facility level and include an algorithm for children <2 months of age.12 However, protocols evaluated at the facility level may not necessarily have the same validity, or be programmatically feasible, in community settings where the algorithm will be applied to a large proportion of well children.12 Recently, we validated the ability of CHWs to assess newborns for the presence of clinical signs and classification of illness against assessment by physicians in populations with relatively moderate13 and high burdens of disease.14

Building on these analyses, the primary purpose of this paper was to validate the clinical algorithm itself that was used during routine household surveillance in Mirzapur, Bangladesh7 in identifying neonates needing urgent referral-level evaluation for severe illness. Study objectives were (1) to validate the clinical algorithm against physicians' algorithm-independent judgement of need for referral-level evaluation as a gold standard, (2) to compare the validity of current and further simplified algorithms and (3) to assess the validity of the algorithms in identifying neonatal mortality.


Study design and CHW surveillance

The clinical algorithm validation study was a prospective study nested within the PROJAHNMO-2 trial in Mirzapur, Bangladesh, described previously.7 13 15 In the intervention arm, CHWs conducted surveillance to identify pregnant women in a population of approximately 4000, made two prenatal visits at home to promote birth and newborn care preparedness, and visited the mother and newborn infant on the day of delivery and each live born infant at home on postnatal days 2, 5 and 8. During the postnatal visits, CHWs assessed neonates for the presence of severe illness using a clinical algorithm adapted from the Bangladesh Young Infant IMCI protocol for the management of sick children <2 months of age at first-level health facilities, and recommended urgent referral of neonates with severe illness to Kumudini Hospital, a 750-bed, private, referral-level hospital. CHWs ascertained the presence of 16 historical factors and 28 clinical signs, and conducted a detailed assessment of breastfeeding, as described previously.13

Mirzapur CHW clinical algorithm

The primary purpose of the clinical algorithm was to identify neonates with very severe disease (VSD) requiring urgent referral to the hospital for further evaluation and treatment. Initially, a neonate was categorised as having VSD if she/he had one or more of eight signs observed by a CHW (see online supplementary table 1). In 2005, based on high case death rates from preliminary analyses, the algorithm was revised to include three additional signs (weak, abnormal or absent cry; lethargic or less than normal movement; and not able to feed or not able to suck at all based on the breastfeeding assessment), for a total of 11 signs, for the classification of VSD (online supplementary table 1).13 Previously, we showed that CHWs' classification of neonates with severe illness using the 11-sign algorithm had high sensitivity and specificity compared to classification by physicians.13

The CHW clinical algorithm validation study

Study physicians from Kumudini Hospital randomly selected one of the project CHWs each day, and conducted a complete assessment of all the neonates seen by that CHW in a 24 h period, except those being seen in follow-up by the CHW after a hospital visit. Physicians completed the same standardised newborn assessment as the CHWs, except for the feeding assessment which was conducted by a female nurse if the physician was male, due to cultural sensitivity. In addition, physicians categorised neonates as to whether they needed urgent referral-level evaluation based on their clinical discretion, independent of the algorithm (hereafter referred to as referral need). Physicians assessed neonates less than 12 h after the CHWs' assessments either at home (for well babies and referral failures) or at the hospital (for successfully referred neonates) and were blinded to the CHWs' evaluation results. The average time between CHW and physician assessment was 3.0 h (SD 1.6 h, median 2.8 h), and 96% of the neonates were assessed at home by both physicians and CHWs. All neonates had complete assessments by both a CHW and a physician. We did not measure interobserver reliability in assessment among physicians or CHWs.

Data and analysis

A target sample size of 395 was calculated to achieve a predetermined agreement between CHW (n=44) and physician (n=8) assessments.16 We assumed a VSD prevalence of 5% identified by CHWs, 5% prevalence of neonatal illness requiring referral-level evaluation/management as determined by physicians, and a κ statistic of 0.90 with ±0.1 precision. Given the calculated sample size, the expected 95% CIs for a conservatively estimated sensitivity of 70% and specificity of 80% would be 50% to 90% and 76% to 84%, respectively.

During the validation study period (November 2005 to December 2006), 4226 live births occurred and 3038 of them were assessed by CHWs at least once. A total of 395 neonates were randomly selected for validation of the clinical algorithm in identifying neonates requiring referral-level care (figure 1). To assess the validity of the algorithm in identifying neonates at risk of mortality, we analysed data from 6924 neonates who were assessed by CHWs at least once during the first 10 days of life throughout the entire study period (figure 1).

Figure 1

Flowchart of the surveillance (January 2004 to December 2006) and selection of the validation study sample. CHW, community health worker.

Modifications of the algorithm for VSD

We further modified the revised 11-sign VSD algorithm in order to explore simplified, alternate algorithms for identifying VSD by CHWs, since an 11-sign algorithm poses significant challenges for training and supervision of CHWs. Since the algorithm was adopted from a widely used IMCI algorithm, we aimed to explore modifications of the current revised 11-sign VSD algorithm rather than conduct an exhaustive examination of associations between referral need and all individual signs and symptoms.12 17 Thus, the algorithm was modified based on clinical significance, assessed using case death rates (results not shown), and the practicality of assessment of the signs. Online supplementary table 1 presents signs and symptoms included in seven sequentially modified algorithms. We also applied a community-based algorithm which was used by the Society for Education, Action and Research in Community Health (SEARCH) to screen for neonates with suspected serious infection11 and a recently validated WHO Young Infant Study 7-sign (YIS7) algorithm used to screen for severe illness requiring hospital admission, excluding jaundice, among those who visited outpatient facilities (online supplementary table 1).12

Statistical analysis

The validation study sample of 395 neonates was analysed to assess associations between physicians' judgement of referral need and a computed VSD categorisation based on individual signs and symptoms assessed by CHWs; the validity of a computed VSD categorisation using assessment of signs and symptoms by physicians was also examined and found to produce similar results. Consistent with the YIS7 used to define IMCI guidelines,12 and to avoid the influence of treatment, we utilised physicians' judgement of need for referral to hospital as the gold standard to calculate sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). κ Statistics were also calculated to determine agreement between the computed VSD classification and physicians' judgement, and were considered as poor (<0), slight (0.0–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80) and almost perfect agreement (0.81–1.00).18 95% CI was calculated for all estimates, and we compared 95% CIs across algorithms in order to examine statistical significance in differential estimates.

In addition, to assess validity in identifying mortality, various VSD algorithms were applied to the 6924 neonates who were assessed by CHWs at least once during the first 10 days of life (figure 1), using mortality at the end of the 10-day period as the gold standard outcome. A neonate was categorised as having VSD if he/she had VSD at any one or more of CHW assessments during the period. We estimated sensitivity, specificity, PPV and NPV. We further calculated the likelihood ratio for a positive result, a measure useful in clinical settings, summarising both sensitivity and specificity. In addition, bivariate analyses were conducted to estimate the OR of mortality based on logistic regression models and the population attributable fraction (PAF) of mortality risk by each VSD classification. PAF was further calculated for selected individual signs and symptoms. PAF estimates the proportion of deaths that would be prevented following elimination of a condition, assuming the condition is causal.19 20

Binomial exact 95% CIs were calculated for all proportion estimates. We compared 95% CIs across algorithms in order to examine statistical significance in differential estimates. STATA 9.0 statistical software (Stata, College Station, Texas, USA) was used for all analyses. This study was approved by the Committee on Human Research at the Johns Hopkins Bloomberg School of Public Health, and the Ethical Review Committee and Research Review Committee at ICDDR,B, and was registered at (no. NCT00198627).


Validity in identifying neonates with VSD

The validation sample of newborns was comparable to the overall population of newborns in the parent trial (data not shown). About 71% and 98% of the validation sample were assessed during the first 7 and 10 days of life, respectively.

Study physicians reported a referral need prevalence of 4.1%. Based on CHWs' assessments, the revised 11-sign algorithm was able to correctly identify 50.0% of neonates with VSD and 98.4% of those without VSD (table 1). The YIS7 algorithm produced slightly higher sensitivity (62.5%) but slightly lower specificity (95.8%). The SEARCH algorithm could not be evaluated, since no observation in our data set met the VSD criteria of the SEARCH algorithm. There was no statistically significant difference in validity measures between computed VSD classifications using CHWs' and physicians' assessment results.

Table 1

Sensitivity, specificity, PPV and NPV of different screening algorithms for VSD (n=395)

The inclusion of jaundice (modification A, see online supplementary table 1) improved sensitivity (table 2). Substituting observed specific feeding problems with a history of a generic feeding problem did not change sensitivity (modification B). Expanding fever and hypothermia cut-offs (from >101.0°F to ≥100.0°F and from <95.5°F to <97.5°F, respectively) increased sensitivity substantially (modification C). Eliminating respiratory rate did not change sensitivity (modification F). Finally, using WHO IMCI fever and hypothermia cut-offs (≥99.5°F and <95.9 °F, respectively) decreased sensitivity (modification G). Specificity remained high and relatively similar throughout modifications, ranging from 94% to 98%, implying that the choice of the optimal algorithm would be largely based on sensitivity.

Table 2

Sensitivity, specificity, PPV and NPV of various modifications to the revised Mirzapur 11-sign algorithm for VSD (n=395)

We applied similar modifications to YIS7 algorithms (table 2); results were qualitatively comparable to those for the Mirzapur 11-sign algorithm modifications. The final modification (modification Z: including jaundice, excluding fast breathing and altering fever and hypothermia cut-offs (online supplementary table 1)) showed a sensitivity of 81%, comparable to that of the Mirzapur algorithm modification F, as the two algorithms are identical except that history of convulsion is included in modification Z.

Finally, sensitivity analyses using only 380 neonates who were assessed at home resulted in similar, non-significantly lower estimates of validity, compared to the estimates using the full sample (results not shown).

Validity in identifying deaths

Among 6924 neonates who were assessed by CHWs during the first 10 days in the parent trial, 69 died within the 10-day period. About 86% of the deaths occurred within 3 days following the last CHW assessment. Sensitivity and specificity did not vary significantly across the Mirzapur 11-sign and modified algorithms (results not shown). The Mirzapur and the YIS7 algorithms showed remarkably comparable results (table 3) with a sensitivity of 57–58% and specificity of 93–95%. The likelihood ratio for a positive result suggested that neonates who died within the 10-day period were 8–12 times more likely to have been identified with VSD using the algorithms, compared to those who survived (table 3). The SEARCH algorithm showed lower sensitivity (2.9% for the algorithm requiring the presence of two signs) and slightly higher specificity (99.6%) compared to the Mirzapur or the YIS7 algorithms. The Mirzapur 11-sign and the Young Infant Study (YIS) algorithms had a PAF of mortality risk of 53–56% (table 4). The sign ‘moderate to severe hypothermia’ alone had a PAF of 46%.

Table 3

Sensitivity, specificity, likelihood ratio positive, PPV and NPV of identifying mortality during the first 10 days of life by VSD algorithm (n=6924)

Table 4

OR from univariable logistic regression models and PAF of mortality during the first 10 days of life by VSD classification and selected individual signs (n=6924)


IMCI protocols for young infants under 2 months of age have been validated in the past at facility level, thus potentially introducing care seeking bias in study samples.12 21 Moreover, the recent, multicentre, facility-based YIS was largely affected by two sites (Dhaka and Karachi), and the authors highlighted the importance of external validation of the algorithm for neonates in the first week of life, particularly for use during routine household surveillance.12 In our study, algorithms were validated not only at the community level during routine household surveillance, but also primarily among neonates in the first week of life.

The Mirzapur 11-sign clinical algorithm administered by CHWs had a sensitivity of 50% for identifying neonates needing referral to hospital. Since a screening algorithm is aimed at identifying subjects who could potentially benefit from further identification and management of illness, sensitivity is of paramount importance, and thus, the sensitivity of the 11-sign algorithm was deemed to be unacceptably low, prompting further analysis to identify potential improvements to the algorithm. Modifications of the algorithm increased sensitivity but did not affect the initial high specificity, indicating potential improvement of the algorithm in identifying severely ill neonates without burdening the healthcare system with falsely identified, non-ill newborns. Sensitivity was 81% in a simplified algorithm with only six signs and symptoms which are relatively easy to ascertain. Exclusion of respiratory rate measurement and a detailed feeding assessment, and reliance instead on maternal reporting of feeding problems,13 did not compromise algorithm performance and would reduce the time and complexity of neonatal assessment substantially.

The revised 11-sign and further simplified 6-sign algorithms had a sensitivity of 58% and specificity of about 93–95% in identifying mortality during the first 10 days. Further, neonates identified as having VSD, across all algorithm modifications, had significantly increased odds of neonatal death compared to those not meeting criteria for VSD. PAF mortality risk analyses imply that about 55% of deaths during the 10 days may be reduced if VSD can be identified and successfully managed. Although the algorithm was developed primarily to identify neonates with severe morbidity (ie, VSD) in the context of an intervention promoting facilitated referral,13 the algorithm included signs associated with high mortality risk and may be a useful tool in identifying risk for mortality as well, although this needs further validation.

The WHO YIS7 algorithm also performed well in a community setting. Moreover, the performance of the YIS7 algorithm was further improved with minor modifications. Considering the cost of introducing a new algorithm and the cost and complexity of training on the relatively minor variations between the Mirzapur and YIS7 algorithms, the WHO YIS7 algorithm developed for use at primary healthcare level appeared suitable in its current form for use at community level. Further validation of the YIS7 algorithm is needed in other settings, however, particularly when used during routine household surveillance.

We applied the SEARCH algorithm to our data since it is, to our knowledge, the only widely recognised community-based algorithm for use by CHWs to identify neonatal illness. The algorithm showed low sensitivity in our study sample, but it was developed to identify ‘death due to probable sepsis’, which is more specific than ‘neonates with VSD needing urgent referral’.11 Moreover, the two study populations had different population-to-worker ratios and distinctively different epidemiological characteristics among the study neonates assessed by CHWs that influence their performance (online supplementary table 2),7 11 22 highlighting challenges in developing a community-level clinical algorithm for diverse populations and varying programme designs. Overall, however, we believe that the study design in Mirzapur may more closely resemble programmes in which skilled attendance at birth is not ensured, which remains the case in many high mortality areas. Moreover, as births and care seeking for illness increasingly take place in health facilities, the Mirzapur and YIS7 algorithms become increasingly germane.

There are two major limitations of the study. First, there was about a 3 h interval between the CHW and physician assessments due to logistical reasons, during which clinical signs might have changed. Second, while physicians were blinded to the results of CHWs' assessments, the location of the assessment could have biased the physician's judgment. However, the vast majority (96%) were assessed in the community by both physicians and CHWs. When we excluded the 15 newborns who were assessed by physicians at the hospital, the results (data not presented) showed that sensitivity and PPV were slightly lower across algorithms in the subsample (n=380) compared to those in the entire study sample of 395 neonates. However, relative validity among the various algorithms was comparable with the main results. Further, given the interval between assessments, it would have been unethical to require sick neonates to stay home to wait for the study physicians' arrival.

In conclusion, considering the simplicity of having the same algorithm for community and facility use, we recommend the YIS7 algorithm for use at the community level in screening for neonates who need referral-level care. Further community-based validation of the YIS7 algorithm in populations with different disease burdens will be needed.


This study was supported by the Wellcome Trust–Burroughs Wellcome Fund Infectious Disease Initiative 2000 and the Office of Health, Infectious Diseases and Nutrition, Global Health Bureau, US Agency for International Development through the Global Research Activity Cooperative agreement with the Johns Hopkins Bloomberg School of Public Health (award HRN-A-00-96–90006-00). Support for data analysis and manuscript preparation was provided by the Saving Newborn Lives programme through a grant by the Bill & Melinda Gates Foundation to Save the Children, US. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

    • Web Only Data - This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
    • Web Only Data - This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Bangladesh Projahnmo-2 (Mirzapur) Study Group A S M Nawshad Uddin Ahmed, Saifuddin Ahmed, Nabeel Ashraf Ali, Abdullah H Baqui, Nazma Begum, Robert E Black, Sanwarul Bari, Atique Iqbal Chowdhury, Gary L Darmstadt, Shams El-Arifeen, A K M Fazlul Haque, Zahid Hasan, Amnesty LeFevre, Ishtiaq Mannan, Anisur Rahman, Radwanur Rahman, Syed Moshfiqur Rahman, Taufiqur Rahman, Samir K Saha, Mathuram Santosham, Habibur Rahman Seraji, Ashrafuddin Siddik, Hugh Waters, Peter J Winch and K Zaman.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Ethics approval The study was approved by the Committee on Human Research at the Johns Hopkins Bloomberg School of Public Health, and the Ethical Review Committee and Research Review Committee at ICDDR,B.

  • Contributors GLD, AHB, PJW, REB, MS and SEA were primarily responsible for study design and securing funding for the study. SB, SMR, IM, ASMNUA, HRS and RR were responsible for day-to-day management of the project, including data collection. SC and NB were responsible for project data management. YC was primarily responsible for data analysis, and YC and GLD were primarily responsible for preparing the manuscript. All authors reviewed and approved the manuscript.