Article Text

Download PDFPDF

Paper 3
Rating early child development outcome measurement tools for routine health programme use
  1. Dorothy Boggs1,2,
  2. Kate M Milner1,3,
  3. Jaya Chandna4,
  4. Maureen Black5,6,
  5. Vanessa Cavallera7,
  6. Tarun Dua7,
  7. Guenther Fink8,
  8. Ashish KC9,
  9. Sally Grantham-McGregor10,
  10. Jena Hamadani11,
  11. Rob Hughes12,13,
  12. Karim Manji14,
  13. Dana Charles McCoy15,
  14. Cally Tann1,16,
  15. Joy E Lawn1
  1. 1 Maternal, Adolescent, Reproductive and Child Health Centre, London School of Hygiene and Tropical Medicine, London, UK
  2. 2 International Centre for Evidence in Disability, London School of Hygiene and Tropical Medicine, London, UK
  3. 3 Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
  4. 4 Institute of Translational Medicine, University of Liverpool, Liverpool, UK
  5. 5 University of Maryland School of Medicine, Baltimore, Maryland, USA
  6. 6 Research Triangle Park, RIT International, Durham, USA
  7. 7 Department of Mental Health and Substance Abuse, World Health Organisation, Geneva, Switzerland
  8. 8 Swiss Tropical and Public Health Institute and University of Basel, Basel, Switzerland
  9. 9 International Maternal and Child Health, Department of Women’s and Children’s Health, Uppsala University, Uppsala, Sweden
  10. 10 Institute of Child Health, Faculty of Population Health Sciences, University College London, London, UK
  11. 11 Maternal and Child Health Division, International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh
  12. 12 Children’s Investment Fund Foundation, London, UK
  13. 13 Maternal & Child Health Intervention Research Group, Department of Population Health, London School of Hygiene and Tropical Medicine, London, UK
  14. 14 Department of Paediatrics and Child Health, Muhimbili University of Allied Health Sciences, Dar es Salaam, Tanzania
  15. 15 Harvard Graduate School of Education, Harvard University, Massachusetts, USA
  16. 16 Neonatal Medicine, University College Hospitals NHS Trust, London, UK
  1. Correspondence to Dorothy Boggs, London School of Hygiene & Tropical Medicine, London, UK; dorothy.boggs{at}


Background Identification of children at risk of developmental delay and/or impairment requires valid measurement of early child development (ECD). We systematically assess ECD measurement tools for accuracy and feasibility for use in routine services in low-income and middle-income countries (LMIC).

Methods Building on World Bank and peer-reviewed literature reviews, we identified available ECD measurement tools for children aged 0–3 years used in ≥1 LMIC and matrixed these according to when (child age) and what (ECD domains) they measure at population or individual level. Tools measuring <2 years and covering ≥3 developmental domains, including cognition, were rated for accuracy and feasibility criteria using a rating approach derived from Grading of Recommendations, Assessment, Development and Evaluations.

Results 61 tools were initially identified, 8% (n=5) population-level and 92% (n=56) individual-level screening or ability tests. Of these, 27 tools covering ≥3 domains beginning <2 years of age were selected for rating accuracy and feasibility. Recently developed population-level tools (n=2) rated highly overall, particularly in reliability, cultural adaptability, administration time and geographical uptake. Individual-level tool (n=25) ratings were variable, generally highest for reliability and lowest for accessibility, training, clinical relevance and geographical uptake.

Conclusions and implications Although multiple measurement tools exist, few are designed for multidomain ECD measurement in young children, especially in LMIC. No available tools rated strongly across all accuracy and feasibility criteria with accessibility, training requirements, clinical relevance and geographical uptake being poor for most tools. Further research is recommended to explore this gap in fit-for-purpose tools to monitor ECD in routine LMIC health services.

  • low and middle income countries
  • health systems
  • early child development tools
  • maternal, newborn and child health
  • metrics

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key findings

  1. WHY? Multiple tools: of the 100 tools that exist for early child development (ECD) outcome measurement, 27 met criteria for rating (measurement started <2 years and covered at least three developmental domains), however few are fit-for-purpose for use in routine health systems.

  2. WHAT IS NEW? Remit and range of tools: of the tools identified, few adequately address multiple domains required for monitoring ECD, with the majority omitting vision and hearing. The two population-level tools rated highest in reliability, cultural adaptability, administration time and geographical uptake. The individual-level screening and ability tools rated highest for reliability and lowest for accessibility, training, clinical relevance and geographical uptake.

  3. WHAT TO DO? Accuracy and feasibility of tools: few existing tools are both accurate (ie, valid, reliable) and feasible for training and routine use (eg, time, cost, accessibility) in LMIC settings.

  4. KEY GAPS? The population-level tools (Caregiver Reported Early Development Instruments and Indicators of Infant and Young Child Development), along with the D-Score, are being harmonised into the WHO-led Global Scale for Early Development for population and programmatic level measurement. An optimal individual-level tool remains a gap. Additional research on tool assessment is needed to improve reporting, links to action and utility in planning and evaluating early intervention.


The Sustainable Development Goals (SDGs) and Global Strategy for Women’s Children’s and Adolescents’ Health 2016–2030 envision a world where every child can survive and ‘thrive’, reaching their full developmental potential.1 2 Global policy in early child development (ECD) is encapsulated within the WHO, UNICEF and World Bank Nurturing Care Framework (NCF) and low-income and middle-income countries (LMIC) have increasingly supported this ‘beyond survival’ agenda with 45% (68 countries) having national level ECD policies and programmes.3 4

Birth to 3 years is well-established as the critical period for ECD, when returns on investment are greatest.5–8 Seizing this window requires early identification of children with developmental difficulties, particularly through existing large-scale maternal, newborn and child health (MNCH) programmes such as health surveillance immunisation and growth monitoring.3 9 10 Developmental monitoring in high-income countries (HIC) has been shown to improve early identification and access to intervention for children at risk of developmental delay and/or impairment.11–14

As highlighted in this series, challenges exist in monitoring and evaluation of ECD programmes, and also for measurement of outcomes in routine systems, despite a plethora of tools.15 Over the past several years, there have been several reviews for ECD measurement.16–20 The most recent and comprehensive review, the World Bank’s Toolkit for Measuring Early Child Development in Low-income and Middle-income Countries, provided an update to their previous toolkit and alongside published the ECD Measurement Inventory which summarised a total of 147 tools for children up to 8 years of age with reviews of peer-reviewed and grey literature up until 2017 (hereafter referred to as the World Bank’s Toolkit and Inventory, respectively).16 19 21

In this paper, we systematically evaluate multidomain measurement of ECD in LMIC with a new and specific focus on those tools that measure a range of domains and could be applied for young children from 0 to 3 years of age through routine health services.

Scope and structure of series

This paper is the third in a series examining evidence to inform design and implementation of ECD interventions at national and subnational level in LMIC. The series is structured around a programme cycle including key processes and decision points (figure 1). This paper focuses on potential ECD monitoring and evaluation tools for routine health services. Other papers have reviewed overall design decisions,9 monitoring and evaluation,15 financing22 and overall process to scale-up.23

Figure 1

Programme cycle for design, implementation and scaling of early child development programmes.

Aim and objectives

We review ECD measurement tools for children 0–3 years of age and systematically assess appropriateness for use in routine health services in LMIC.

Our objectives are to:

  1. Identify existing ECD measurement tools covering ages 0–3 years according to initial selection criteria (ie, including ≥2 domains, used in at least one LMIC).

  2. Matrix these ECD measurement tools according to when (age) and what (domains) are included.

  3. Rate accuracy and feasibility of selected tools that meet further eligibility criteria (commencing under 2 years of age and including ≥3 domains, one of which is cognition) according to a systematic rating approach for these tools characteristics.


Objective 1: identify existing ECD measurement tools covering ages 0–3 years

The World Bank’s Toolkit and Inventory is the most comprehensive review of ECD measurement tools for use in LMIC.16 21 The latest World Bank toolkit, published in December 2017, involved reviews of peer-reviewed literature regarding child development measurement tools in LMIC through keyword searches of PubMed, Google Scholar, PsycINFO and other databases, as well as grey literature including other collections.16–19 21 24 We also reviewed recent reviews17 18 24 and consulted experts including the coauthors on this paper to identify relevant tools not included in the Inventory. Tools were categorised according to purpose (population and individual levels) and type of measurement (ability, screening and both) as defined in web supplementary web appendix 1.

Objective 2: matrix ECD measurement tools according when (age) and what (domains) are included

A matrix was developed to cross tabulate when measurement is performed (child age bands) and what developmental domains are measured. The age bands were based on the early years, considering likely opportunities for measurement within existing MNCH programmes in LMIC (eg, immunisation, growth monitoring). The domains were selected based on standard domains measured in global burden of disease assessments, which are also consistent in most clinical assessments.16 21 25 We considered those domains used by the World Bank, such as motor, cognition, and others, notably vision and hearing.

From all the tools identified in objective 1, we mapped onto the matrix those tools which had been used in at least one LMIC as defined by the World Bank Country Income Groups and covered ≥2 developmental domains.26–29 We used the cut-off ≥2 developmental domains as the standard clinical definition for global developmental delay or impairment.27–29

Objective 3: rate accuracy and feasibility of selected tools

Tools measuring <2 years of age and including ≥3 domains, one of which is cognition, were selected for rating to ensure earlier multidomain measurement alongside health surveillance immunisation and growth monitoring.

Rating of tool characteristics focused on a minimum of seven distinct criteria, informed by the literature and agreed by the author group. Items focused on tool accuracy (ie, ‘Does the tool work?') were informed by developmental measurement literature, including available existing literature focused on LMIC, and focused on validity, reliability and cultural adaptability.20 30 31 Feasibility criteria (ie, ‘Can the tool be delivered?'), particularly informed by Fischer et al’s work, who assessed feasibility of ECD screening tools for use by community health workers in LMIC, focused on tool accessibility, training, administration time and geographical uptake18 20 29 An eighth criteria for clinical relevance and utility was included for individual-level tools only since population-level tools are not intended to measure individual-level assessment. Rating criteria for assessing early child development measurement tool accuracy and feasibility for use in routine programmes is presented in table 1 (online supplementary appendix 1).

Table 1

Rating criteria for assessing early child development measurement tool accuracy and feasibility for use in routine programmes

Rating of tools for each of these characteristics was informed by the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) system as a guiding framework.32 The GRADE system is widely used including by WHO and scores four levels of evidence quality (high, moderate, low and very low) and classifies recommendations as strong or weak.33 34 Hence for each of the characteristics in our rating tool, we consistently applied a descending scale of ‘3’ to ‘0’ (ie, four levels), according to strength and/or utility of each tool characteristic based on available evidence.

To apply the rating, two authors independently rated the tools (agreement 92%) and consensus was reached with KMM by reviewing evidence for ratings which were not in agreement. To identify evidence for rating each tool, one of the authors searched on PubMed, Google Scholar databases and on Google for easily available peer-reviewed and grey literature, including individual tool manuals and test websites (supplementary web appendix 2).

In view of the potential bias introduced by excluding tools for not measuring the ‘cognitive’ domain (as per the World Bank Inventory’s definition), we also rated four excluded tools, analysing at least one tool from each of the three groups (population, or individual screening or ability).


Objective 1: identify existing ECD measurement tools covering ages 0–3 years

Out of the 147 tools included in World Bank’s Inventory, 99 tools covered the ages 0–3 years (figure 2 and supplementary web appendix 3).21 The WHO Indicators of Infant and Young Child Development (IYCD), released after the World Bank’s review, was identified separately by experts.35–37

Figure 2

Early child development (ECD) tools flow chart for multidomain matrix mapping and grading. IYCD, Infant and Young Child Development; LMIC, low- and middle-income countries.

Objective 2: matrix ECD measurement tools according to when (age) and what (domains) are included

Matrix development

Our matrix included the following age intervals: 3 months to first year, then 6 months until 3 years of age (figure 3). Two additional age groups of children aged >3 years were also included for ongoing developmental monitoring. Nine developmental domains, as defined in table 2 according to the World Bank’s Inventory, were included, including two learning domains, which are typically tested from 2.5 years onwards and noting that some domains were more constructs.16 21 Hearing and vision, particularly critical given their importance in broader development and lack of universal screening for sensory impairments in LMIC, were also included for a total of 11 domains.16–18 24 38 39

Figure 3

Heat map matrix of early child development measurement tools 0–3 inclusion of identified times, ages and domains. (A) Population-level. (B) Individual-level (screening, ability and both screening and ability tools). (C) Screening tools. (D) Ability tools.

Table 2

Nine developmental domains up to 3 years of age from the World Bank’s Inventory 16

Tool mapping

Sixty-one tools met criteria for inclusion onto the matrix. The majority of tools (92%, n=56) were individual-level screening (n=22) or ability tests (n=33), while the remaining 8% (n=5) were population-level tools (figure 2). One tool, the Movement Assessment Battery for Children, was identified as an individual-level screening and ability test so is counted in both categories (figure 2 and supplementary web appendix 3).40

Cognitive, motor and language domains were most commonly included across all tool groups. At the population-level, 60% (n=3) tools included these three domains from 24 to 36 months, and all five tools (100%) measured motor and language at 36 months. At the individual-level (n=56), 55% (n=31) of tools measured all three domains. Specifically, in screening tools (n=23), the motor domain was measured from 24 to <30 months of age in 96% (n=22) tools. In ability tools (n=34), both motor and language was measured from 24 to <30 months in 62% (n=21) tools.

There were noticeable measurement gaps in most other domains for all tool types, especially in ages 0–3 years. No population-level tool covered personal-social adaptive, disability screener, vision and hearing domains (figure 3A). In individual-level tools, there were noticeable gaps in measurement of attention/executive function, disability, academic preacademic, approaches to learning, vision and hearing domains, with fewer than 25% (n=12) measuring each domain from 0 to 3 years (figure 3B). No screening tool measured attention/executive function or approaches to learning, and fewer than 40% (n=9) measured each socioemotional/temperament, personal-social/adaptive, disability screener, academic preacademic, vision and hearing domain across 0–3 years (figure 3C). Less than 36% (n=12) of ability tools measured each of the remaining eight domains from 0 to 3 years, with disability, approaches to learning, vision or hearing only measured by one ability tool each across a very limited age range (figure 3D).

Individual tool mappings can be found in online web appendix 4.

Objective 3: rate accuracy and feasibility of selected tools

Forty-eight per cent (n=27) of tools met criteria for inclusion for rating of accuracy and feasibility (figure 2 and supplementary web appendix 3). Total ratings (figure 4) were analysed for the 27 tools for each characteristic and tool with recommendations classified as strong to weak (figures 5 and 6).32–34

Figure 4

Heat map of accuracy and feasibility ratings for selected early child development (ECD) measurement tools.

Figure 5

Rating criteria characteristic heat map for early child development tools 0–3 years. (A) Population-level tools. (B) Individual-level screening tools. (C) Individual-level ability tools.

Figure 6

Early child development 0–3 tool overall rating mapped by each accuracy and feasibility criteria. (A) Population-level tools. (B) Individual-level screening tools. (C) Individual-level ability tools.

Population-level tools (n=2)

Two population-level tools rated strongly for both accuracy and feasibility criteria, with high ratings in cultural adaptability as well as in accessibility, administration time and geographical uptake.35 37 41 42 Caregiver-Reported Early Child Development Instruments (CREDI) rated strongest within the population-level tools, rating strongly in validity and reliability in well documented multicountry studies and moderately for training.41 The IYCD rated very low in training, low in validity and moderate in reliability, as the tool’s complete psychometric results are forthcoming.35 37 42

Individual-level screening tools (n=14)

These demonstrated great variability, rating between 0 and 20. The Guide for Monitoring Child Development rated strongest within the individual-level tools, followed by Parents’ Evaluation of Developmental Status (PEDS) and then Ages and Stages Questionnaire (ASQ).43–45 The Developmental Screening Questionnaire rated lowest, with all characteristics rating either very low (n=2) or not known (n=6).46 Overall, this tool group had the strongest ratings for administration time and strongest ratings for reliability with 50% (n=7) rating strongly (ie, 3) for this characteristic. Accessibility was ‘not known’ or ‘very low’ for 71% (n=10) and geographical uptake was also ‘very low’ for 29% (n=4). 50% (n=7) of tools in this group rated ‘not known’ for cultural adaptability and clinical relevance and utility.

Individual-level ability tools (n=11)

Ratings for ability tests also varied widely (ie, ratings 3–16) with Intergrowth 21st Neurodevelopment Assessment (INTER-NDA) rating highest and The Oxford Neurodevelopment Assessment (OX-NDA) rating lowest.47 48 Overall, this tool group rated highest on psychometrics including reliability and then validity, although fewer than 20% (n=2 and n=1, respectively) were rated as ‘strong’ in each characteristic. 55% (n=6) rated ‘not known’ for cultural adaptability and ‘very low’ in accessibility, training and geographical uptake. 73% (n=8) rated either ‘very low’ or ‘not known’ for clinical relevance and utility.

Rating of tools excluded in World Bank document

To address potential exclusion bias, four tools that had been excluded since they did not formally measure ‘cognition’ as per the World Bank’s Inventory were rated (box 1). The Malawi Developmental Assessment Tool (MDAT) rated strongly tied with INTER-NDA for highest rating of 16 for the individual-level ability tools.

Box 1

Consideration of tools excluding the World Bank’s ‘cognitive’ domain

In the World Bank’s inventory, the cognitive domain was defined as ‘the test assesses cognitive development, including general intellectual ability, problem-solving, conceptual development, reasoning, visual-spatial ability, memory, learning, etc'.16 21 Although tools were usefully categorised as ‘yes’ if they explicitly measured this domain, other tools were categorised as ‘no’ despite measuring cognition implicitly alongside other child development domains. This was often due to the child development tool measuring aspects of cognition, but not listing it formally as one of the formal domains measured.

On review of the tools that were excluded when the three selected cognitive, language and motor domain filter was applied in objective 3 (online supplementary web appendix 3), it was noted that many of these tools do in fact measure cognition. Given this finding, four tools were selected across the three tool categories (population-level, individual-level screening and individual-level ability) for rating to address this possible exclusion bias to compare these rates with the 27 tools that were initially rated. This methodology followed a similar method as outlined in the main paper, except KMM was the second reviewer for MDAT.

The ratings are shown below:

All four of these tools demonstrated good rating potential with evidence available.

MICS ECDI, the population-level tool, rated a ‘10’ which is lower than the CREDI and IYCD tool rates of 20 and 15, respectively. MICS ECDI rated strongly on administration and geographical uptake, however has noticeable psychometric gap with validity and reliability unknown.

The Screening Test Battery for Assessment of Psychosocial Development, the individual-level screening tool, rated a low rate of 5. This tool rated strongly in reliability, which was consistent with this tool group, however either rated ‘very low’ or ‘not known’ in six of the eight tool characteristics.

However, it is the individual-level ability tool category which is most notably striking. The MDAT rated highest in this supplemental analysis with a rate of 16, which is tied for the highest overall rate in this tool category with INTER-NDA and the KDC rated 9, which is more similar to other tools in this category (figure 4). MDAT rated moderately or strongly in seven of the tool characteristics, and evidence was available for all criteria. It is also noted that MDAT covers a much broader age range 0–8 years, compared with INTER-NDA 22–26 months.

Although the recent World Bank’s Toolkit and Inventory have advanced the ECD field, this finding indicates that caution might need to be applied when applying the filters with their respective definitions for further analysis.

MICS, Multiple Indicator Cluster Surveys ECDI, Early Child Development Index; INTER-NDA, Intergrowth 21st Neurodevelopment Assessment; KDC, Kilifi Developmental Checklist; MDAT, Malawi Developmental Assessment Tool; NK, not known.


This paper systematically rates the accuracy and feasibility of multidomain ECD 0–3 measurement tools with an explicit focus on routine use within the health sector in LMIC. Despite a plethora of ECD tools, our results indicate that none cover all domains and are accurate and feasible.24 Among the 27 ECD tools that were rated, no tool adequately covered the majority of the domains or rated strongly for all accuracy and feasibility grading characteristics. However, at least one tool rated highly enough in each group: CREDI for population-level tools, GMCD for individual-level screening tools followed closely by PEDS and ASQ, and INTER-NDA for individual-level ability tools. These results have important implications for ECD measurement within health programmes by identifying existing tools that can be used and are reliable yet more feasible, for example, requiring shorter administration time or less complex training.

Cognitive, language and motor domains were most frequently measured, with gaps across other domains. Vision, hearing and disability screener were missing in all in population-level tools, along with the personal-social/adaptive domain, and <20% (n=9/56) individual-level tools measured these domains. Vision, hearing and disability screening are critical at population-level and individual-level for early identification of developmental delay and/or impairment and to ensure referrals and/or follow-up for children identified. The academic/preacademic and approaches to learning domains, typically measured from age 2.5 years onwards, were perhaps understandably not frequently assessed given our aged-restricted inclusion criteria as well as higher level attention and executive functions.

Overall, accuracy characteristics were most difficult to obtain information on rating, with validity evidence rarely detailed. Generally, all three tool groups rated more strongly in reliability than validity, with 10 tools rating a ‘3’ for reliability and only three tools rating a ‘3’ for validity. More research is required to better test and document psychometric properties in LMIC, in order to meet more rigorous validity criteria, such as ‘strong’ which is to be predictive validity in different contexts.28 Since using HIC norms is not optimal, tools need to have a local comparison group of reference or control children for standardisation.28 49 Furthermore, a noticeable documentation gap in accuracy characteristics was cultural adaptability, with half of all tools rated as ‘not known’. Often studies cited which items were modified during translation/back translation but did not discuss the process and/or the complexity of implementing this process.44 46 48 50–62 An example of good documentation is mentioned in the study by Gladstone et al, which detailed the adaptation process of creating a culturally relevant developmental assessment tool in rural Africa.63 In future, accuracy should be reported in a more standardised way and the adaptation process better documented.30 64 65

Feasibility information was easier to locate compared with other criteria, although ratings were typically lower. Administration time and geographical uptake characteristics rated highest across all tools and both were well documented in the World Bank Inventory, although those authors acknowledged that the country list was not exhaustive.21 No tool rated strongly on training criteria indicating a need to aim for shorter tool trainings by non-specialist trainers.

Only the two population-level tools CREDI and IYCD rated strongly for accessibility, indicating the majority of ECD tools are not readily and freely accessible online with app availability for use, highlighting another area for future improvement.

Almost half of all individual-level tools rated clinical relevance as ‘not known’ and only a quarter of individual-level tools rated this highly; considering clinical relevance is usually the basis for referral and follow-up, this highlights a critical gap for frontline workers. It is important this criteria is easy to interpret with clear thresholds for action and structured counselling responses, and especially essential that accessible service infrastructure for assessment of children who screen positively needs to be in place.

For comparison, the review by Fischer et al recommended Ten Questions Questionnaire (TQQ), GMCD and MDAT for feasibility of use in LMIC health settings. Although our analysis rated GMCD highly, TQQ and MDAT were excluded for grading as they respectively do not measure development for children <2 years or formally document measuring cognition as per World Bank’s definition. Box 1 shows results of rating four tools which were excluded, where MDAT rated 16, tied with INTER-NDA for highest individual-level ability tool rate, indicating a limitation of the filter definitions.

Strengths and limitations

The World Bank’s Toolkit and Inventory were recently published and provided crucial input for our work, and identified 106 new tools for a total of 147 ECD tools 0–8 years.16 21 However, this framing may also have had limitations. For example, filters based on the Inventory provided a useful way to categorise tool content; however, this categorisation also limited analysis of domains, such as personal-social/adaptive which could be measured through other domains, and of other tools, such as those that did not adhere to the specific ‘cognitive’ domain definition (box 1). When imposing filters the tools’ ‘country used’ information is not exhaustive, and the vision and hearing domains may not be comprehensive. Newer, lesser known tools and those not available in English, or used in one country may also be under-represented, as well as specific tools measuring multidomain disabilities or impairments in young children. This was due to the World Bank’s primary focus on ECD 0–8 years and less for early identification of children with multidomain disabilities or impairments; however, it is important to note that most LMIC cannot afford separate screening systems.

Tools with the highest rates were generally more widely used with more documentation in the public domain; hence these higher rates might reflect increased use as much as, or more than, accuracy and feasibility. Also, although some tools rated low on certain criteria, it is acknowledged that they may be suitable to purposes beyond health.

Finally, this review prioritised looking at ECD multidomain tools in young children; however, it is acknowledged that home context is extremely important alongside this measurement. As highlighted in the paper by Milner et al, contextual tools that measure both maternal/caregiver mental health as well as caregiver capabilities, caregiver-child interactions and/or the home environment and long-term educational outcomes need to be considered.9 66–68

Further research

This exercise highlights that ECD tool characteristics are inconsistently reported in literature and overall rated weakly on accuracy and feasibility characteristics. Following the development of the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) statement in response to inadequate reporting of observational studies, several extensions to STROBE have been created to provide more nuanced field-specific guidance for authors, such as for newborn infection.69 70 Development of a STROBE extension checklist could establish standards for reporting on ECD tools and core data.69 A systematic way to document these characteristics would reduce such inconsistencies. One step could be to expand on the Quick checklist for appraising a CDAT by Sabanathan et al, which provided five key questions for consideration of an assessment tool.24

In addition, the tools could be further examined according to administration of test (ie, caregiver report vs direct child observation), and this ECD tool mapping and rating exercises demonstrated both the strengths and limitations of employing a ‘domain lens’ approach to ECD tools. Although this provided a helpful classification for measurement and analysis for the purposes of this paper, it also highlights the need for widely agreed and established definitions for each domain, and the limits of imposing a siloed perspective and approach across tools (box 1), especially when domain specificity is not strong in infants and young children. Therefore, it is recommended that the ECD sector look more holistically at the child’s functioning and environment when assessing and measuring children’s abilities through health and other intersectoral areas such as education. Examples of moving away from a siloed domain perspective are exhibited in UNICEF and Washington Group on Disability Statistics’ Child functioning module from 2 to 17 years of age, which assesses functional difficulties for censuses and national surveys,71–75 and in recent review by Oberklaid et al, which highlights a move away from universal developmental surveillance using structured tools towards broader conversations and support with families in HIC.76

Finally, given the large number of tools available, there is a need for fit-for-purpose population-level and individual-level screening and ability tools that could better meet accuracy and feasibility criteria to monitor ECD in routine LMIC health services. Joint work at the population-level is currently in process. The CREDI and IYCD teams have come together with the Global Child Development Group (developers of the Developmental ‘D-Score’ Growth Chart) to form the Global Scales for Early Development (GSED).36 The GSED will include a single set of open-access metrics for capturing population-level ECD for children under 3 years, as well as a programme evaluation measure. As part of this process, this group is considering many of the issues outlined above, including the reliability, validity, cross-cultural applicability and feasibility, greatly enhancing ECD measurement and monitoring at population-level. Following on from this work, an individual-level fit-for-purpose tool is equally needed for both global screening and ability testing purposes. It is recommended that these approaches are aligned and adhere to a similar process, especially giving key consideration to the accuracy and feasibility criteria.


Improved measurement of ECD in routine maternal, newborn and child health services is urgently needed to ensure that programme implementation and monitoring are aligned with The Global Strategy and the SDG targets, especially in terms of reaching the most vulnerable young children at highest risk of developmental delays and/or impairment. Despite multiple tools exist for measuring ECD outcomes in children aged 0–3 years, few adequately meet accuracy and feasibility criteria for use at either population or individual levels. Recently developed population-level child development measurement tools are promising, but further research is required to develop accurate and feasible individual-level tools for use in routine health programmes at scale in LMIC. In addition, more consistent reporting of studies of the development and use of ECD tools is necessary to allow comparisons and more rapid learning.


The authors would like to thank all the World Bank team for the extensive and helpful review. The authors would like to thank Dr Melissa Gladstone for her contributions and review of this paper. The authors would also like to thank Victoria Ponce Hardy for compiling and formatting figures and references, and to Claudia da Silva and Fion Hay for administrative assistance.



  • Contributors Technical oversight of the series was led by JEL and KMM. The first draft of the paper and analysis was undertaken by DB, with input from KMM and JEL. JC was the second scorer. The Early Child Development Expert Advisory Group (Pia Britto, TD, Esther Goh, SG-McG, MG, JH, RH, KMM, Jamie Radner, Muneera Rasheed, Karlee Silver, Arjun Upadhyay) contributed to the conceptual process throughout. All authors gave input to scoring criteria and reviewed the manuscript.

  • Funding This supplement has been made possible by funding support from the Bernard van Leer Foundation. Saving Brains® impact and process evaluation funded by Grand Challenges Canada®.

  • Disclaimer The authors alone are responsible for the views expressed in this article and they do not necessarily represent the views, decisions or policies of the institution with which they are affiliated.

  • Competing interests The following authors on this paper have intellectual inputs and leadership roles for some of the tools reviewed: MDAT (JC), IYCD (VC, TD) and CREDI (DCM and GF). None of these authors rated any of these tools.

  • Provenance and peer review Commissioned; externally peer reviewed.

  • Patient consent for publication Not required.

Linked Articles