Background Portfolios are a compilation of evidence that through critical reflection of their contents demonstrate personal and professional development along with achievement. Portfolios are being used increasingly for summative purposes within the medical profession and are highlighted as potential assessment tools for professional competence. The most often cited limitation of the use of reflective portfolios is the lack of reliability with which they can be assessed.
Aims To design a portfolio assessment tool and investigate the tool’s reliability. We aim to assess both intra and inter-observer reliability.
Methods The study took place over 5 months. We studied nine e-portfolios belonging to Specialist Trainees in Paediatrics within a specific Deanery. Appropriate consent and ethical approval were obtained. We asked Consultant Paediatricians who are educational supervisors to mark each of these portfolios using a newly designed assessment tool. These marks were anonymously collated, and by assessing this data we were able to look for consistency in the marks awarded for each portfolio, and use statistics to determine reliability of our assessment tool.
Results Nine portfolios were assessed by eight assessors. The results showed low inter-rater reliability of the assessment tool. Aiming for mean differences (bias) close to zero, the inter-rater bias ranged from 3.6% to 19%, with standard deviations ranges from 6.3 to 10.2. Intra-observer reliability was better (bias of 1.1%, SD of 5). Aiming to achieve a kappa score of >0.8 for summative assessments, our kappa scores ranged from 0.2–0.72 for inter-rater reliability and was 0.59 for intra-rater reliability.
Conclusion Judging the quality of a reflective portfolio is becoming increasingly important with their use in summative assessment and revalidation. Our study has shown that individual assessments using our portfolio tool show poor inter-rater reliability and are untrustworthy in high-stakes assessment. Improved rater training and multiple rater assessments are likely to improve this reliability but further research would be needed to assess this.