Article Text

other Versions


Agreement between routine and research measurement of infant height and weight
  1. M Bryant1,2,
  2. G Santorelli2,
  3. L Fairley2,
  4. E S Petherick2,
  5. R Bhopal3,
  6. D A Lawlor4,5,
  7. K Tilling4,5,
  8. L D Howe4,5,
  9. D Farrar2,
  10. N Cameron6,
  11. M Mohammed7,
  12. J Wright2,
  13. the Born in Bradford Childhood Obesity Scientific Group
  1. 1Clinical Trials Research Unit, University of Leeds, Leeds, UK
  2. 2Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Trust, Bradford, UK
  3. 3Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK
  4. 4MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
  5. 5School of Social and Community Medicine, University of Bristol, Bristol, UK
  6. 6School of Sport, Exercise and Health Sciences, Loughborough University, Loughborough, UK
  7. 7School of Health Studies, University of Bradford, Bradford, UK
  1. Correspondence to Dr Maria Bryant, Clinical Trials Research Unit, University of Leeds, Leeds LS2 9JT, UK; m.j.bryant{at}


In many countries, routine data relating to growth of infants are collected as a means of tracking health and illness up to school age. These have potential to be used in research. For health monitoring and research, data should be accurate and reliable. This study aimed to determine the agreement between length/height and weight measurements from routine infant records and researcher-collected data.

Methods Height/length and weight at ages 6, 12 and 24 months from the longitudinal UK birth cohort (born in Bradford; n=836–1280) were compared with routine data collected by health visitors within 2 months of the research data (n=104–573 for different comparisons). Data were age adjusted and compared using Bland Altman plots.

Results There was agreement between data sources, albeit weaker for height than for weight. Routine data tended to underestimate length/height at 6 months (0.5 cm (95% CI −4.0 to 4.9)) and overestimate it at 12 (−0.3 cm (95% CI −0.5 to 4.0)) and 24 months (0.3 cm (95% CI −4.0 to 3.4)). Routine data slightly overestimated weight at all three ages (range −0.04 kg (95% CI −1.2 to 0.9) to −0.04 (95% CI −0.7 to 0.6)). Limits of agreement were wide, particularly for height. Differences were generally random, although routine data tended to underestimate length in taller infants and underestimate weight in lighter infants.

Conclusions Routine data can provide an accurate and feasible method of data collection for research, though wide limits of agreement between data sources may be observed. Differences could be due to methodological issues; but may relate to variability in clinical practice. Continued provision of appropriate training and assessment is essential for health professionals responsible for collecting routine data.

  • Growth
  • Monitoring
  • routine data
  • PCHR
  • research

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

What is already known on this topic

  • Additional to providing a comprehensive health record, the Personal Child Health Record (PCHR) has the potential to be an excellent source of data for use in research settings.

  • Use of routine data for research has many advantages (readily available, large samples, feasible, cost effective), but these data may be less reliable and/or valid.

  • Infant height and weight data collected in the early 1990s in a predominantly Causian sample suggest agreement between researcher and PCHR data.

What this study adds

  • Although this study found general agreement between Personal Child Health Record (PCHR) data and those collected by researchers (average of 0.5 cm difference), in many instances, wide limits of agreement were observed, with differences in length up to 5 cm in some children.

  • Differences could be due to methodological issues (eg, collected on different days); however, they may relate to inaccuracies caused by variability in clinical practice.

  • Results indicate that routine data can be an excellent and accurate resource for researchers. But there is a need to ensure health visitors receive regular training and quality assurance to ensure that collection of data relating to height and weight are accurate, for the purposes of clinical practice and for increased use by researchers.


Routine data on growth in childhood are widely collected in many countries and used to monitor populations and individuals with respect to health and development.1 ,2 Increasingly, these data are also used in research. Knowing how these data compare with similar measurements collected in more controlled research settings is important for their use in clinical/public health practice and research.

Use of routine data in research has many advantages; data describing varied outcomes are readily available, they enable inclusion of large samples of the population, and can provide a feasible, cost-effective method to collect information in a way that is more acceptable to patients.1 Use of routine data also offers lower burden to investigators for research conduct and ethical approval. However, there may be drawbacks to using routine data. It may be assumed that they are less reliable and valid than data that have been collected by researchers who have been trained in methods to improve accuracy and reliability and to avoid bias.2 Routine data are not usually collected blindly and may, therefore, be influenced by knowledge of an individual and/or their care. Additionally, there is more likely to be variability in the methods employed, the training and attributes of the administrators of the test, and the equipment used, compared to data collected as part of a research study; for example, where stricter procedures have to be applied for reliability and accuracy. Significantly, the importance and purpose for which data are collected differs substantially between a focused data collection made purely to generate data for research and those collected clinically, as part of a routine practice, alongside other competing priorities.1

Similarly to other routine data, Personal Child Health Record (PCHR) measurements are relatively quick and straightforward to take. Data should be robust, if appropriate training for those obtaining measures is provided. Reliability of the data collected however, is unlikely to be regularly quality tested, like in the case of research data.3 ,4 Inaccuracy in measurement impacts clinical decision making and care, and may result in unreliable data. It is therefore important to assess the quality and reliability of routine data. Evaluation of consistency of infant length/height and weight measurements taken by health professionals indicates good reliability.4 However, to our knowledge, there is no contemporary data comparing length/height and weight in PCHR measurements and those collected within a research setting. This study aims to determine the degree of agreement between weight and length/height measurement data in PCHR records, and data collected by researchers.



Born in Bradford (BiB) is a longitudinal multiethnic birth cohort study aiming to examine the impact of environmental, psychological and genetic factors on maternal and child health and well-being.5 Bradford is a city in the North of England with high levels of socioeconomic deprivation and ethnic diversity. Women were recruited at the Bradford Royal Infirmary during a routine hospital appointment at 26–28 weeks gestation. For those consenting, a baseline questionnaire was completed. The full BiB cohort recruited 12 453 women comprising 13 776 pregnancies between 2007 and 2010, and the cohort is broadly characteristic of the city's maternal population. A subsample of the BiB cohort (BiB 1000) recruited between August 2008 and March 2009 were invited to participate in more detailed follow-up assessments. One thousand nine hundred and seventeen women were eligible to be in this substudy, and 1735 consented and were included. Of these women, 1707 had a singleton birth and 28 had twin births. A full account of the methods is published elsewhere.5–7

Height and weight measurements

BiB1000 measurements: Weight and length/height were collected by approximately 10 BiB study administrators during home or research clinic visits when the infant was aged 6, 12, 18, 24 and 36 months. Study administrators were trained during a training day, run by expert community researchers, and measurements were assessed for inter-rater reliability.8 All training was conducted in accordance with written guidelines, a training manual was provided for all researchers, and measurements were taken with on-going support.

Measurements were excluded here if they were taken more than ±2 months of the target age. Weight was measured using Seca baby scales (Harlow Healthcare, London, UK) to the last 0.1 kg; length (to 18 months) was measured to the last 0.5 cm using a standard issue neonatometer (Harlow Health Care, London, UK); height (at 24 and 36 months) was measured using a Seca Leicester height measure (Harlow Healthcare, South Shields, UK). Both measurements were performed with infant clothes and nappy removed.

Routine data collection (PCHR data): In the UK, data are recorded in a Personal Child Health Record (PCHR or ‘red book’), which is given to parents, and is aimed at improving communication between parents and health professionals, enhancing continuity of care and helping parents understand their child's health development. This method is endorsed by the National Service Framework for Children2 and is a key component in Health for All Children; a guidance for all professionals that support children and young people's health and development.9 It is recommended that length/height and weight data is collected by health visitors at birth to 28 days, 6–8 weeks, 7–9 months and at 2 years of age.10 For the current study, these data were obtained from Bradford and Airedale Primary Care Trust, and were compared with BiB1000 height and weight measurements within the same age range (±2 months). Where there was more than one PCHR measurement, the one closest in age to the BiB1000 measurement was chosen. Unlike BiB1000 measurements, the regularity of data collected by health visitors is voluntary on behalf of the parents, resulting in greater variability in measurement frequency and infant age. For the research clinic and routine (PCHR) measurements of weight and height, values were excluded if they were taken more than ±2 months of the target ages of 6, 12, 18, 24 and 36 months.


All data were converted to age-adjusted and sex-adjusted z-scores relative to WHO 2006 growth standard.11 Mean differences (with SDs) in z-scores were calculated by subtracting PCHR measurements from BiB1000 measurements. A mean z-score difference of zero indicates agreement between BiB and PCHR measurements, whereas positive values reflect a lower measurement by PCHR measurements and negative values a higher measurement. Agreement between PCHR and BiB1000 measurements was assessed graphically using Bland–Altman plots12 to plot the difference between BiB1000 and PCHR z-scores against their mean, with lines indicating the mean difference and the 95% limits of agreement (mean difference ±2 SD of the difference). We also calculated the mean difference in age at which the BiB1000 research and PCHR measurements were undertaken, together with 95% limits of agreement, so that we could determine whether any differences between measurements from the two sources with age was due to larger age differences between data sources at specific assessments. To account for such differences, predicted length/height and weight (ie, normalised for differences in the age when PCHR and BiB data were collected) at the target questionnaire age, are presented as mean difference and 95% level of agreement (mean difference ±2 SD of the difference). We examined correlations between the differences in routine and researcher-collected data for height and the differences observed for weight, to indicate whether there was any systematic error in measurements.

Exploratory multivariable linear regressions were performed to explore whether the following factors were potentially predictive of differences between PCHR and BiB1000 measurements: ethnicity (self-assigned by the mother using the same classification as the 2001 UK census13;) maternal education; infant gender; preterm delivery; and low birth weight. Ethnicity and education were self-reported at baseline (28 weeks gestation), and other data were obtained from maternity records. As differences in measurements were modelled, positive coefficients from the multivariable regressions indicate a lower PCHR measurement compared to the BiB1000 measurement, and negative coefficients indicate a higher PCHR measurement. We used an approach which could model scale and shape parameters (Generalized Additive Models for Location Scale and Shape) as implemented in the GAMLSS library in R.14 We examined the following mother and baby-related covariates: mother’s ethnicity, mother’s education, infant’s gender, z-score for weight, z-score for height, gestation (<37 weeks or ≥37 weeks) and low birth weight. We allowed the SD parameter of each GAMLSS model to vary with the differences in z-scores for weight plus z-scores for height.

Analyses were performed in Stata/IC V.12.1 (StataCorp, College Station, Texas, USA) and GAMLSS in R, V.4.2.8 (R Development Core Team, 2013, London).


Figure 1 shows the study sample at each target age. The number of children with BiB1000 and PCHR height and weight measurements, respectively, included at each assessment was: 6 months (n=158 and 560), 12 months (n=101 and 166), 18 months (n=7 and 23), 24 months (n=307 and 434) and 36 months (n=33 and 56). Measurements at 18 and 36 months were excluded from further analyses due to the small sample sizes; resulting from a lack of routine data available within 2 months of researcher-collected data. Table 1 shows the mean (SD) age, height/length and weight at each assessment. Children tended to be younger at the 6 month BiB1000 assessment and older at the 12 and 24 month assessments compared to PCHR measurements. This table also presents the mean difference and 95% level of agreements for differences in age at which the BiB1000 and PCHR measurements were conducted. Mean differences were less than 1 month at all ages.

Table 1

Mean age, height and weight of BiB1000 and PCHR measurements

Figure 1

Study sample flow chart.

There was agreement between data collected from the BiB1000 team and those collected for the PCHR, though this was weaker for height than for weight (table 2; figure 2). Height was somewhat underestimated in routine compared with research data at 6 months, and overestimated at 12 and 24 months. Weight was slightly overestimated in routine compared with research data at all three ages. Limits of agreement were wide, particularly for height (table 2; figure 2). Correlations between differences in routine and researcher-collected z-score data for height and weight measurements indicated that measurement error was non-systematic, with r values of 0.07 at 6 months (ie, between differences in z-scores for height and differences in z-scores for weight), 0.19 at 12 months and 0.07 at age 24 months. This suggested that differences observed between both sources of data for height were independent from those seen for weight.

Table 2

Mean (SD) differences for z-scores of height and weight, and mean differences with 95% levels of agreement for predicted height (cm) and weight (kg) between BiB1000 and PCHR data

Figure 2

Agreement between PCHR and BiB1000.

Multivariable analysis to explore whether known characteristics (eg, ethnicity, maternal education, infant gender, gestation, birth weight and height/length and weight) impacted on measurement error showed evidence of systematic bias in relation to mean measurements for length/height (see online supplementary table S1), where routine PCHR data underestimated the infants length by 0.54 cm (95% CI 0.18 to 0.89) more for every additional 1 cm mean height of an infant at 6 months (ie, PCHR underestimated length in shorter children;). The overestimation of height at 12 and 24 months was random with respect to mean height. PCHR height data was significantly lower in Pakistani infants at 6 months compared to Caucasian infants (0.83 cm (95% CI 0.09 to 1.57, p=0.03). Multivariable analyses also showed an impact of gestation on the agreement in weight data at 6 and 12 months, with higher PCHR weight in preterm infants (gestation <37 weeks) at the 6-month measurement (0.82 kg greater (95% CI −1.01 to −0.64, p<0.0001) compared to gestation ≥37 weeks; but lower PCHR weight in preterm infants at 12 months (0.64 kg (95% CI 0.36 to 0.93, p<0.001)). At 12 months, PCHR weight data was greater in the infants of mothers with A level education by 0.18 kg (95% CI −0.31 to −0.04, p=0.014) compared to infants of mothers with ≥5 GCSEs. Other differences in weight measurement appeared random and were not associated with any other characteristics (see online supplementary table S1).

When we repeated the analyses while only including infants with a routine data measurement within 1 month of the research sample, there did not appear to be strong evidence that the agreement was better compared to the main analyses in which we allowed a 2-month difference, though the sample sizes were small (N=38–170 for different analyses). At 6 months, the mean z-score differences were 0.27 for height and −0.02 for weight (n=49 and 296); at 12 months they were −0.19 for height and −0.12 for weight (n=38 and 87), and at 24 months the differences were −0.14 and −0.08 for height and weight, respectively, (n=170 and 264) (results not shown).


Bland–Altman plots from the current study and those of Howe et al15 suggest that routine data collection of height and weight shows little evidence of bias when compared to research data. However, both demonstrated relatively wide limits of agreement around the estimate. So, while population averages appear accurate, there were substantial differences for some individuals, with predicted differences between PCHR and research data ranging between −0.4–4.91 cm for height and −1.19–0.98 kg for weight. This may be related to measurement error, but may also be attributable to differences in the dates in which assessments were made. The current study only included PCHR data that were collected within 2 months of the research-collected data (with mean differences all less than 1 month); similar to the methodology applied previously by Howe et al.15 Using this approach, data were normalised for differences in the actual date that the data were collected; however, normalisation relies on the theory that infants track uniformly along a growth curve at a constant rate. In reality, individual variation in growth means that many infants do not always track uniformly along growth chart trajectories. We conducted additional sensitivity analysis to explore this bias; including only data collected within 1 month of each other. Findings suggest similar agreement in mean values but wider levels of agreement; however, the sample size was substantially reduced.

In the BiB1000 protocol, there was a requirement for repeated measures (taken on the same day) to be within 0.5 cm and 0.1 kg of each other. Guidance from the International Fetal and Newborn Growth consortium (Inter-growth-21st)16 suggest that maximum allowed difference in repeated measures of length should be 0.7 cm, and should be 0.5 kg in measures of weight. Fifty per cent of the BiB1000 data were outside both these parameters. However, these protocols/guidelines refer to measurements conducted on the same child on the same day, and are, therefore, not as relevant to the present study, which compares measurements taken by different people, which are unlikely to take place on the same day.

In this population, routinely collected health visitor data tended to underestimate height for measurements taken when infants were aged 6 months with agreement improving with age. Poorer agreement for data relating to length collected on younger infants is likely to be a reflection of a number of factors; collection of this data is more difficult in a very young infant as they have to be kept still; measurement has to be taken with the infant stretched out; growth velocity, which is greatest at this time and, therefore, might cause real differences even when the two measurements being compared are close in time; and at this age, in absolute terms, the mean difference in age was greatest and in relative terms a difference of, for example, 1 month at 6 months of age, is greater in terms of its likely impact on size differences than a difference of 1 month at 24 months of age. It is also likely that the motivations of clinicians and researchers differ, where during routine measurement, clinicians are more likely to focus on identifying outliers of height and weight rather than accurate clinical assessment of all children, particularly in instances when infants and, consequently, parents, are distressed during taking of measurements. Other characteristics, which appeared to impact on the agreement between weight measures at 6 and 12 months included, education, preterm birth and birth weight. All these factors are related to birth size and growth, but it is unclear why these impacted on weight measurement. They may be related to healthcare provision/interaction, but it is not possible to test this with these BiB data.

Agreement of length/height and weight measurements in PCHRs has been previously assessed by comparing PCHR data to data collected by researchers as part of the ALSPAC cohort study in the UK using data that were collected almost 20 years ago. In comparison with the BiB data used here, in which the sample are multiethnic (predominately of Pakistani or White British origin), the ALSPAC cohort are a predominantly White British population.15 Similar patterns in the differences between height and weight z-scores over time were observed, compared to the present study, although mean differences were higher in data presented by Howe et al15 with the exception of height at 24 months. Similar to the present study, Howe et al observed greatest disagreements in height at younger ages. Unlike Howe et al, the current study was also able to examine any potential influence of ethnicity, and showed a greater likelihood of underestimation of length in Pakistani children at 6 months.

Routine data collected for the PCHR is essential in clinical practice, for monitoring, to help provide continuity of care, and as a means to improve communication with parent.15 Results of the current study indicate that there is also scope to use routine data in research; either as separate or linked data. This will enable further investigation of the influence of child growth on the health and well-being of children in larger samples. This is clearly demonstrated in the existing use of the PCHRs, which have already been successfully used in studies examining infant growth, growth trajectories,17 and risk of obesity4 ,18 in BiB and in other birth cohorts (Howe et al18), which have the potential to strengthen our understanding of early life growth and how it links to health outcomes. Importantly, routine data have the capacity to complement research data through data linkage processes.19 However, the wide levels of agreement identified here also suggest that researchers should exert a degree of caution. Poor agreement with individual cases may be clinically important and may or may not relate to issues of the disparities in the timing of data collection, variability in clinical practice or inaccuracy in measurement. Though we would advocate use of these data for research purposes, such considerations should not be overlooked in the interpretation of findings. Some of the routine data analysed in the current study were taken by health workers who had received training as part of a reliability study4 (though this could have been up to 3 years prior to data collection at age 24 months). Thus, agreement may be better than other cities if regular training has not occurred. Those intending on using routine data for research purposes are therefore encouraged to ensure that staff are trained and, if using data collected retrospectively, to investigate whether training and/or validation has occurred (with caution applied if necessary).


We are grateful to all the families who took part in this study, to the midwives for their help in recruiting them, the paediatricians and health visitors, and to the Born in Bradford team which included interviewers, data managers, laboratory staff, clerical workers, research scientists, volunteers and managers.


View Abstract
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Collaborators The Born in Bradford Childhood Obesity Scientific Group; Amanda Farrin, Helen Ball, Carolyn Summerbell, Sally Barber, Andrew Hill, Neil Small, Pauline Raynor and Rosie McEachan.

  • Contributors MB led the design, conduct and writing of the study and manuscript. GS and MM wrote the statistical analysis plan, and cleaned and analysed the data. LF, KT, ESP, LDH, MM and DAL all provided expertise in the study design and analysis plan and contributed towards interpretation of the findings. NC, DF, MM and JW also contributed to the interpretation of findings, providing additional clinical insight. All authors reviewed and revised the manuscript. All member of the Born in Bradford Childhood Obesity Scientific Group designed and managed the cohort study from which the data were derived and provided direct expertise to the submitted study.

  • Funding This work was funded by an NIHR CLAHRC implementation grant and an NIHR applied programme grant (RP-PG-0407-10044). This paper presents independent research commissioned by the National Institute for Health Research (NIHR) under the CLAHRC programme. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. No funding bodies had any role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

  • Competing interests All authors had some financial support from an NIHR CLAHRC implementation grant and/or an NIHR applied programme grant (RP-PG-0407-10044) for the submitted work, but have had no financial relationships with any organisations that might have an interest in the submitted work in the previous 3 years, and no other relationships or activities that could appear to have influenced the submitted work. LDH is funded by a UK Medical Research Council fellowship (G1002375). DAL, KT and LDH work in a unit that receives core funding from the UK Medical Research Council and the University of Bristol (MC_UU_12013/9).

  • Ethics approval Bradford Research Ethics Committee (Ref 07/H1302/112).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Authors have provided a non-identifiable dataset linked to the current analysis, which will be uploaded as a web based file. For further information on sharing of Born in Bradford data, see information provided at

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.