Article Text

Download PDFPDF
Misclassification due to age grouping in measures of child development
  1. Scott Veldhuizen1,2,3,
  2. Christine Rodriguez2,
  3. Terrance J Wade3,
  4. John Cairney2,3
  1. 1Centre for Addictions and Mental Health, Toronto, Ontario, Canada
  2. 2Department of Family Medicine & Psychiatry and Behavioural Neurosciences, McMaster University, Hamilton, Ontario, Canada
  3. 3Department of Community Health Sciences, Brock University, St. Catharines, Ontario, Canada
  1. Correspondence to Scott Veldhuizen, Social and Epidemiological Research Department, Centre for Addiction and Mental Health, 33 Russell St., Suite T308, Toronto, ON, Canada M5S 2S1; scott.veldhuizen{at}


Purpose Screens for developmental delay generally provide a set of norms for different age groups. Development varies continuously with age, however, and applying a single criterion for an age range will inevitably produce misclassifications. In this report, we estimate the resulting error rate for one example: the cognitive subscale of the Bayley Scales of Infant and Toddler Development (BSID-III).

Design Data come from a general population sample of 594 children (305 male) aged 1 month to 42.5 months who received the BSID-III as part of a validation study. We used regression models to estimate the mean and variance of the cognitive subscale as a function of age. We then used these results to generate a dataset of one million simulated participants and compared their status before and after division into age groups. Finally, we applied broader age bands used in two other instruments and explored likely validity limitations when different instruments are compared.

Results When BSID-III age groups are used, 15% of cases are missed and 15% of apparent cases are false positives. Wider age groups produced error rates from 27% to 46%. Comparison of different age groups suggests that sensitivity in validation studies would be limited, under certain assumptions, to 70% or less.

Implications The use of age groups produces a large number of misclassifications. Although affected children will usually be close to the threshold, this may lead to misreferrals. Results may help to explain the poor measured agreement of development screens. Scoring methods that treat child age as continuous would improve instrument accuracy.

  • Screening
  • Measurement
  • Neurodevelopment

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.