Article Text

Download PDFPDF

Inter-rater reliability in the Paediatric Observation Priority Score (POPS)
  1. Lisa Langton1,
  2. Adam Bonfield1,
  3. Damian Roland1,2
  1. 1 Paediatric Emergency Medicine Leicester Academic (PEMLA) Group, University Hospitals of Leicester NHS Trust, Leicester, UK
  2. 2 SAPPHIRE Group, Health Services, University of Leicester, Leicester, UK
  1. Correspondence to Dr Adam Bonfield, Children’s Emergency Department, University Hospitals of Leicester NHS Trust, Leicester LE1 7RH, UK; ab798{at}


Objective The primary objective of this study was to determine the level of inter-rater reliability between nursing staff for the Paediatric Observation Priority Score (POPS).

Design Retrospective observational study.

Setting Single-centre paediatric emergency department.

Participants 12 participants from a convenience sample of 21 nursing staff.

Interventions Participants were shown video footage of three pre-recorded paediatric assessments and asked to record their own POPS for each child. The participants were blinded to the original, in-person POPS. Further data were gathered in the form of a questionnaire to determine the level of training and experience the candidate had using the POPS score prior to undertaking this study.

Main outcome measures Inter-rater reliability among participants scoring of the POPS.

Results Overall kappa value for case 1 was 0.74 (95% CI 0.605 to 0.865), case 2 was 1 (perfect agreement) and case 3 was 0.66 (95% CI 0.58 to 0.744).

Conclusion This study suggests there is good inter-rater reliability between different nurses’ use of POPS in assessing sick children in the emergency department.

  • nursing
  • inter-rater reliability
  • early warning score
  • emergency severity index

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

What is already known on this topic?

  • Inter-rater reliability has previously been shown to be good for triage systems both adult and paediatric.

  • The Paediatric Observation Priority Score, an assessment system as opposed to a formal triage process, has been internally and externally validated in emergency care settings.

  • There are limited studies evaluating the reliability of assessment systems specifically designed for paediatric emergency care.

What this study adds?

  • The inter-rater reliability of the POPS is acceptable in children with abnormal physiological parameters.

  • The use of pre-recorded video of patient scenarios is an effective tool for assessing educational and improvement initiatives.


Triage scores are traditionally the method used in emergency departments (EDs) to risk assess and manage the flow of patients entering the system. These are designed to identify sick patients and allocate a priority sequence. Triage categories do not absolutely quantify the severity of illness or accurately predict the length of stay or eventual outcome.1

The Paediatric Observation Priority Score (POPS) is a risk stratification tool designed to aid assessment of children presenting to the emergency and urgent care setting. POPS balances the need to identify the most unwell patients rapidly while avoiding poor sensitivity, which results in low thresholds for hospital admission, poor resource management and patient and family care.2 It also facilitates discharge of patients with mild symptoms to primary care.3

The POPS (figure 1) incorporates physiological parameters, such as heart rate, as well as more subjective observations, for example, level of concern. Its derivation has previously been described4 and it has undergone preliminary external validation.5 Physiological parameters within the normal range score a 0. Variation outside the norm scores 1 or 2 depending on degree of derangement. This gives a total score of 0 to 16 .

Figure 1

Paediatric Observation Priority Score (POPS) matrix. RR, respiratory rate; PMH, past medical history.

POPS was designed to reduce cognitive load and aid decision-making.2 There was a specific intent at its inception not to link POPS scores with actions, as opposed to a Paediatric Early Warning System, which is designed for this purpose. Given the wide range of outcomes in emergency and urgent care, and that the majority of children are likely to go home, a prescriptive approach was felt to be unhelpful. The only local guidance we have recommended is that for very high scores (8+; less than 1% of attendances6), the child should be considered for the emergency room/resuscitation area (if they are not already there).

POPS was specifically designed for use in the emergency and urgent care setting by registered nurses, regardless of their level of clinical and/or paediatric experience. An initial training package involving group teaching and competency packs familiarises staff with the use of POPS.

The primary objective of this study is to establish if variation exists between different nurses’ use of the POPS assessment tool.


Video footage of three pre-recorded paediatric assessments were shown to the participants who were asked to record the POPS of each child. These videos had previously been piloted on experienced nursing staff not involved in the study to determine face validity of the recordings.

Blinding was ensured between all the participants and the in-person POPS. Participants recorded data on standard documentation with access to the POPS matrix, as used in normal clinical practice. Further data were gathered in the form of a questionnaire to determine the level of training and experience the candidate previously had using the POPS. No extra training was given to the candidates prior to this study.

Once data had been collected, these were analysed manually using Microsoft Excel 2007. The data sets were checked for completeness and errors. The Fleiss kappa statistic was used for the calculation of inter-rater reliability. This calculates the proportion of agreement for each parameter, in addition to giving an overall kappa value between participants case by case. The values were cross-referenced using online calculator Recal 3.7

Ethics approval was sought and approved by DeMontfort University Ethics committee. Written consent for video recording of patients was sought prior to their POPS being measured. Participants were recruited voluntarily with information leaflets stating that competency was not being assessed in the research.


A convenience sample of paediatric ED nursing staff, comprising 21 people, were invited to participate in this study. Of this, 12 nurses were consented to the study, each completing a POPS score for three cases, giving a total of 36 data sets.

Participants included a variety of nursing experience (UK equivalent band 5 to band 7), with an IQR of 8 years 8 months qualified (median 9 years 11 months).

This sample is representative of the staff profile across the department. Table 1 gives a profile of the participants.

Table 1

Profile of participants

There were three video cases for the participants to score. Case 1 is described below. The background and data to the other two cases can be found in online supplementary appendix.

Supplementary file 1

Table 2 gives an overview of the participants’ overall POPS for each video and a comparison with the POPS given to the child on admission. The child subject for case 1 is an 8-week-old girl. She was brought into the ED by her mother with increased work of breathing. She had a medical history of congenital heart disease. The paediatric advanced nurse practitioner (PANP) assessing her recorded the following observations:

Table 2

Overview of video cases and the participants’ overall Paediatric Observation Priority Score (POPS)

Oxygen saturations 70% (her normal is 70%–85%);

Work of breathing, mild;


Level of concern, low;

Medical history of congenital heart disease;

Pulse 168;

Respiratory rate 55;

Temperature 36.5;

Table 3 shows the data sets of the POPS given by the participants for case 1. The data sets for cases 2 and 3 can be found in online supplementary appendix. The kappa value demonstrating proportion of agreement between the 12 respondents can be seen at the bottom of each observational score. The overall kappa value for case 1 was 0.74 (95% CI 0.605 to 0.865).

Table 3

Data set of Paediatric Observation Priority Score for case 1

Data shown in online supplementary appendix demonstrates that the overall kappa value for case 2 was 1, representing perfect agreement, and the overall kappa value for case 3 was 0.66 (95% CI 0.58 to 0.744).


Inter-rater reliability

It has been suggested that a kappa value in the range of 0.6 to 0.8 demonstrates good inter-rater reliability.8 Therefore, analysis of the results of this study indicates inter-rater reliability between nurses for the POPS score to be ‘good’. Cases 1 and 3 featured unwell children, with abnormal physiological parameters. Kappa values for these cases were 0.74 and 0.66, respectively. Case 2 featured a child with observations within normal limits and clinical appearance of a well child. Perfect agreement was reached in this case with a kappa value of 1.

Learning from the individual cases

Case 1

Total POPS ranged from 6 to 10. One participant scored a 6 (observer 2) and another scored a 10 (observer 3). On analysis of these individual data sets, the scoring appears acceptable within the context of the cases and the other scores; therefore, these are not believed to be outlying data.

Oxygen saturations achieved the lowest proportion of agreement with a value of 0.333. A feature in case 1 is that the normal range of oxygen saturation levels for this child is 70%–85% due to a background of congenital heart disease. This is outside the normal range defined by the POPS matrix. Oxygen saturations varied throughout the video, resulting in documented values in the data sets ranging from 68% to 73%. There are differences between participants’ response to this. Four nurses gave a score of 0 indicating observation is within normal range, two staff gave a score of 1 indicating a mild deviation from normal, and six staff gave a score of 2 indicating significant derangement. It is important to note that, regardless of a background baseline oxygen saturation, the POPS score should be implemented based on the observations recorded. This demonstrates an area that could be targeted in educating staff members on complex patients presenting to the ED.

The observational parameter of gut feeling also demonstrated a lower level of agreement, with a value of 0.591. Defined parameters for scoring gut feeling are not stated within the POPS tool. This was an active decision by the tool’s designers to ensure that staff would likely over-score, that is, junior nursing and medical staff would likely score 1 or 2 when a more experienced staff member may score 0. This would ensure patients remained safe and also alert senior staff to the relative competencies of those undertaking the observations.

All other parameters score highly on observer agreement. Work of breathing and respiratory rate have a proportion of agreement of 0.833, each with only 1 of the 12 participants scoring differently. However, previous work using a similar methodology to assess difficulty in breathing has demonstrated poor reliability between clinicians.9 The  small sample size is a potential limitation in drawing conclusions from our result.

Perfect agreement was demonstrated in AVPU, other factors (cardiac history was recognised and scored a 2 in all cases), pulse and temperature. It is expected that perfect agreement would be seen in the temperature parameter because this was given to participants in the film.

Case 2

The PANP recorded all parameters within normal limits, giving an overall POPS of 0. All of the participants also recorded the POPS as 0, demonstrating complete agreement across the cohort. This result indicates that when a child is physiologically well, the POPS, regardless of experience, will identify these children as well. This is important from a triaging perspective as identifying well children early should correlate with shorter duration of admission and a reduction in resource use, ultimately increasing efficiency. This would be a useful area of future research to confirm whether this conclusion can be drawn.

Case 3

The total POPS for case 3 range from 3 to 5. The range is close, with no obvious outlying values. This case presented a child with an oxygen saturation of 90%. The POPS matrix states that >95% should score a 0, 90–94% a 1 and <90% a 2. Two observers made an error, scoring an oxygen saturation level of 90% as a 2. Observer 2 recorded oxygen saturations as 96%, giving a corresponding parameter score of 0, even though the saturations never displayed this value. It is difficult to state whether this is participant error or a limitation of the video methodology.

The lowest proportion of agreement is in gut feeling, with a value of 0.561. All other parameters scored highly on observer agreement. ‘Oxygen Saturations’ and ‘Other’ produced a proportion of agreement of 0.651 and 0.682, respectively. The parameters AVPU, respiratory rate and pulse show near complete agreement, all with values of 0.833. Work of breathing and temperature have perfect agreement. This is expected in the temperature recording, as this observation is given to participants in the film. However, work of breathing assessment is a subjective observation. The good level of agreement shown between nurses scoring of work of breathing in all cases is encouraging, although as stated previously, the small sample size limits the ability to draw conclusions regarding this.

Comparison with the literature

With reference to inter-observer studies looking at other types of Emergency Severity Index (ESI), this study demonstrates that the level of inter-rater reliability with the use of POPS is comparable. Table 4 presents a summary of the findings from studies identified in the literature.

Table 4

Summary of literature pertaining to Emergency Severity Index (ESI) scores

Inter-rater reliability in the use of POPS lies within the same range as studies such as Jordi et al 10 and rates higher than in the studies such as Allen et al. 11 An absolute value at which inter-rater reliability confers a gold standard reliability does not exist; therefore, favourable comparison with other studies is an encouraging finding.

Previous research into ESI inter-rater reliability has used written scenarios10–16 or direct patient observation with simultaneous12 15 or consecutive17 assessments performed by the nurse and the researcher.

The justification for selecting video footage as a data collection instrument is to overcome the flaws identified in previous research. For example, it has been questioned how well subjective elements can be captured in written scenarios18 and that changing clinical parameters between assessments may occur in direct observation.19 Whereas video footage can be accessed by the researcher and staff at convenience, without the requirement to intervene in the running of the ED or on the patient journey. Multiple cases can be viewed by numerous staff, enhancing the quantity of data collected. These aspects convey similar advantages to the written scenario but also gives the participant the experience of patient factors, such as pain, emotion and distress, which are recognised to influence clinical decision making.18 Furthermore, the condition of the patient remains unaffected by the passage of time, overcoming this problem when performing successive assessment as performed with direct patient observation.


The validity of the findings of this study are limited by the small sample size. This was determined by access to a maximum of 21 participants. The small scale of the study restricts the ability to compare these finding with other larger studies within the literature. However, in the absence of definitive measures of inter-rater reliability, comparison with other studies is used as a frame of reference.

It has been suggested that staff from dedicated paediatric EDs had greater accuracy in triage practice compared with other generic facilities.12 As this study is conducted exclusively in a paediatric department, generalisability of the findings is limited to the specific population studied, and possibly similar departments with an equitable staff and patient profile.

The use of video footage has not been previously used for the purpose of inter-observer studies in this context, thus, reliability of this approach has not yet been established. However, it has been demonstrated that actual video quality may be a less important part of decision-making than perceived quality20; therefore, it is likely the study should be replicable without the video display to the participants being the same. Practically, the skill and dexterity of the participant in using equipment to measure pulse, oxygen saturation levels and temperature are not tested. This is likely to lead to a higher inter-rater reliability when using video recordings compared with participants performing the physiological measurements themselves. Finally, a higher inter-rater reliability may also be seen due to the videos being observed away from the clinical area, without the distractions and competing demands of a busy ED.


This study has produced evidence of good inter-rater reliability between different nurses’ use of POPS in assessing potentially unwell children in the ED. Inter-rater reliability is comparable with that for other ESIs. The addition of this evidence to existing work done in validation of POPS supports its continued use, evaluation and uptake to an increasingly wider network of emergency care facilities. Further work on the use of video cases as a means of standardising assessment and deployment of scoring systems should now be considered.


We would like to acknowledge the work of paediatric advanced nurse practitioner Vickie Wells for her help in the video recording assessment of the cases.



  • Twitter @damian_roland

  • Contributors LL undertook the design and research of this project while being supervised by DR. All named authors have contributed to the data interpretation and writing up of the article for publication.

  • Competing interests None declared.

  • Patient consent Written consent was sought prior to the POPS being recorded for each case.

  • Ethics approval DeMontfort University Ethics Committee.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement All data gathered have been presented within the article submitted for publication.