Background Standardised developmental tests are now widely used in neurodevelopmental assessments of infants and children. In 2006, the revised and updated version of the Bayley Scales of Infant and Toddler Development (version III) replaced the previous version and is now widely used in neonatal developmental follow-up clinics. Several papers from Australia have highlighted underestimation of developmental impairment up to age 2 using this revised version. We aimed to ascertain how a cohort of healthy 3-year-old children performed compared to the standardised norms of the Bayley Scales of Infant and Toddler Development (version-III).
Method Term healthy newborn control infants from the prospective Development after Infant Surgery (DAISy) study were included. At 3 years of age, the mean scores on each of the five subscales for 156 children were compared with the standardised norms.
Results At 3 years of age, the mean scores were higher than the standardised norms on four of the subscales, cognition (<0.05), receptive and expressive language and fine motor (p<0.001). There was no significant difference in the gross motor scale (p=0.435).
Conclusions Healthy term Australian children have a statistically significantly higher mean score on the Bayley Scales of Infant and Toddler Development (version-III) compared with the standardised means in four of the subtests, with the greatest difference in receptive language. This has implications for the assessment of children as the test may miss those with a minor delay and not reflect the severity of delay of infants that it does identify. We recommend that consideration ought to be given to re-standardising this assessment on Australian children.
Statistics from Altmetric.com
What is already known
The Bayley Scales of Infant Development (BSID) and its revisions are the most widely reported measures.
The Bayley III version has, however, been standardised on American paediatric population.
A few recent papers have highlighted and questioned the validity and estimation of Bayley III version in different geographic population.
What this study adds
Standardised scores of the Bayley III version on 3-year-old Australian children underestimate developmental delay.
Consideration should be given to re-standardising Bayley III assessment on specific local geographic population
More study is required to determine the best interpretation of Bayley III scores.
Developmental surveillance and intervention in the earliest years of life provides the greatest medical, social and economic benefits to the individual, their family and the wider community.1 As a consequence of the passage of ‘The Individuals With Disabilities Education Act’ (IDEA), the emphasis of screening has shifted to identifying disabilities at a younger age, with the current focus being on infants and children from birth through 2 years of age.2 Standardised developmental tests are widely used and undergo extensive testing for validity, reliability and accuracy and are standardised using children and families who represent the cultural, linguistic and economic diversity of the intended population. Good developmental screening tests have sensitivities and specificities of 70–80%3; however, these tests have inherent limitations. The high false-positive results increase the cost of screening, casting doubts on the viability of universal developmental screening efforts and associated lingering parental anxiety.4–6
While there is no ideal assessment of development,7 ,8 the Bayley Scales of Infant Development (BSID)9 and its revisions10 ,11 are the most widely reported measures. The second version has been extensively used in assessing the developmental outcome of children,12 ,13 especially in the preterm cohort.14–17 In 2006, the Bayley Scales of Infant and Toddler Development (version-III) (Bayley III) was published18 as a revised and updated version of the BSID-II. This was designed for use with children from 16 days to 42 months 15 days of age. These changes have resulted in improvements in administration and a simplified scoring system. The normative data were also updated and addressed performance on five distinct scales, compared with the three of the previous version.11
The Bayley III version has, however, been standardised on an American paediatric population.11 A few recent papers have questioned the validity of this test in the Australian context suggesting that this version may be overestimating development and as such, underestimating developmental delay.19–21 We have previously reported a significant difference between our 1-year control outcomes and the normative means of the Bayley III.20 As it is preferable to assess and follow any child up until at least 3 years of age, it is important to assess whether these differences remained at 3 years.
The aim of this study was to compare the performance of a cohort of healthy 3-year-old Australian children with the standardised normative means of the revised Bayley Scales of Infant and Toddler Development (version-III).
Healthy newborn infants were enrolled as controls from 1 August 2006, to 31 July 2008, as the control group in a prospective population-based cohort study comparing the neurodevelopmental outcome of infants in New South Wales (NSW) who had undergone early major surgery with control infants.22 Infants with a known chromosomal anomaly were excluded from the control group. Using a previously established method by Draper et al,23 which ensured a random selection, healthy term infants were selected from the maternity units co-located to the children's hospitals in NSW.
Infants were assessed at 3 years of age using the current version of The Bayley Scales of Infant and Toddler Development (version-III, BSID-III).11 This assessment consists of five scales: Cognition, Receptive Language, Expressive Language, Fine Motor and Gross Motor. The previous version has been used widely in Australia and in our preliminary work.18 The composite scores of the BSID-III, combine expressive and receptive language together and gross and fine motor together. The five subscales were used independently as they identify differences in the language and the motor scales which are not evident with the composite scores. As this is a standardised test, all assessors had completed the training and were experienced in the use of this assessment.
The analysis consisted of comparing the mean scores of each scale for the control infants against the standardised norms of the BSID-III assessment using t tests. Developmental test results were compared with published norms, and delay was defined as deviating from normative data. This standardised test of infant development is age normed (ie normative scaled scores are derived depending on the age of the child) to have a mean of 10 and a SD of three. Mild developmental delay was considered from >−2 SD to −1 SD, moderate delay >−3 to −2 SD and severe delay a score of ≤−3 SD below the mean. Data were analysed using SPSS for Windows, V.19.0 (SPSS Inc, Chicago, Illinois, USA).
The Research and Ethics Committees of the Sydney Children's Hospital Network at the Children's Hospital at Westmead, and Westmead Hospital, Westmead, Australia, approved this study. Written informed consent was obtained directly from parents (figure 1).
Of the total controls enrolled in the study, 280 were assessed at 1 year of age. Of these, 225 were singletons, greater than 36 weeks’ gestation and had no chromosomal anomalies affecting development. At 3 years of age, 168 of the 225 infants assessed at 1 year of age were re-assessed using the Bayley III. Of these, two were excluded having had minor surgery and 10 had incomplete assessments (due to the children refusing to participate), and thus, the results of 156 infants were analysed. The mean age at assessment was 36 months 21 days (SD=29 days). The infants were all term, with a mean gestational age of 39.5 weeks (SD=1.1 weeks) and a mean birth weight of 3553 g (SD=500 g). Fifty-five per cent of the cohort was male.
At 3 years of age, the mean standardised scores were significantly higher than the standardised norms on four of the subscales, cognition (p<0.05), receptive and expressive language and fine motor (p<0.001) (see table 1). There was no significant statistical difference in the gross motor subscale (p=0.435). Australian infants scored higher on cognitive, language and fine motor subscales than the norms published in the Bayley III manual.
In this study of 156 infants, we found that healthy 3-year-old Australian children scored higher on the Bayley III assessments on all subscales excluding gross motor. Standardised scores of the Bayley III on 3-year-old Australian children thus underestimate developmental delay, which is of concern as the test may miss those with a minor delay and not reflect the severity of delay of infants that it does identify. Most of the neonatal units within NSW are using the Bayley III for developmental assessment in hospital-based newborn follow-up clinics.20 As this assessment has not been re-standardised on a local population, contemporary Australian normative data are not available.
Possible explanations for these results include demographic and cultural differences. Higher mean scores in Australian children tested using US norms have been reported on a number of developmental and intelligence tests, and it has been suggested that cultural issues may alter the rate of maturation as well as the performance on individual test items.24 ,25 The test's publishers suggest that the elevated scores are possibly due to demographic changes.26 Our cohort population was randomly selected and were from Caucasian and Asian backgrounds, with only a few Hispanics and no African–American infants which reflect the population of NSW (ABS 2009).20
On the cognitive subscale, the mean score for the Australian children was significantly higher than the Bayley III normative mean. Although the Bayley III manual states that the cognitive scores are higher than the previous version (Bayley II), there are studies which found even higher cognitive scores19 ,21 evaluated at 18–24 months of age with the Bayley III. The reason for this difference is not entirely clear. One possible reason is that Australian children may have more exposure to the types of puzzles and games involved in Bayley III assessment.
The separation of language and cognitive scores occurred in the revised Bayley III version to minimise the effect of language delay on the cognitive assessment. We found higher mean scores on expressive and receptive language scales, the greatest difference being in receptive scale. We have already reported this similar finding at 1 year of age.20 This continued to be even more significant at 3 years of age. The reason behind this may be due to increased utility of the instrument in administration of the language test and standardisation of the data. Another interesting finding in our Australian children is that they scored significantly higher on the fine motor scales but not on gross motor. The reason for this difference is not clear but might be due to cultural differences in practices of upbringing. The impact of culture and language on test performance has to be considered. The use of norms from disparate populations may obscure a child's level of risk. More specifically, overestimating a child's functional ability in comparison with a regionally inappropriate reference group could result in the child not qualifying for early intervention and follow-up services, the lack of which may negatively influence their neurodevelopmental trajectory.
Our study is limited by the possibility of selection bias as we only included complete assessments; however, our results have been supported by other Australian studies. We believe that our findings are not due to administrative or scoring bias, as all our assessors were trained and highly experienced in administration of the Bayley III assessment.
Although the Bayley III is widely recognised as an excellent developmental assessment, culture-specific normative data are necessary for clinical and research application. We recommend that consideration ought to be given to urgently re-standardising this assessment on Australian children to determine the reference values for specifically defined age group bands.
Contributors SC contributed in analysis and interpretation of the data and in the writing of the manuscript. KW, contributed to the design, assessed the infants, contributed to the analysis and interpretation and critically reviewed the paper. RH, NB and AL-F contributed to the analysis and interpretation and critically reviewed the paper. SC has written the manuscript. All co-authors have contributed to and reviewed the manuscript and approved the version to be published.
Funding March of Dimes Birth Defects Foundation, Project Grant #12-FYO6-232.
Competing interests None.
Patient consent Obtained.
Ethics approval The Research and Ethics Committees of the Sydney Children's Hospital Network at the Children's Hospital at Westmead, and Westmead Hospital, Westmead, Australia.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.