Statistics from Altmetric.com
Performance assessment requires careful thought and planning
This paper outlines the principles of good assessment, including the importance of defining the purpose of assessment as well as what should be assessed. It then considers how SpR assessment should be undertaken, including possible tools for assessment such as peer ratings, patient assessment mini-CEX, and portfolios. It concludes with a brief discussion of how to draw together the various aspects discussed and some advice on remediation.
There is a requirement for annual assessment of all specialist registrars (SpRs).1 However, this is undertaken on an ad hoc basis with wide variation in practice both between and within specialties. Little of the assessment undertaken to date has been sufficiently robust to withstand legal challenge. Annual assessment for SpRs is soon to be extended to SHOs through Modernising Medical Careers (MMC) and public, political, and professional pressure to show that adequate self regulation has been important in driving revalidation forward. A priority for the newly established Postgraduate Education Training and Standards Board (PMETB) has been to provide a principles and standards framework for assessment within postgraduate medical training2 (box 1). Annual assessment for trainees will be used to show continuing fitness to practice within the revalidation framework. Assessment therefore is increasingly recognised as a priority; what remains unclear is how this should be done—how will we measure success?
Box 1: PMETB assessment principles
The assessment system must be fit for a range of purposes
The content of the assessment (sample of knowledge, skills, and attitudes) will be based on curricula for postgraduate training which themselves are referenced to all of the areas of Good Medical Practice
The methods used within the programme will be selected in the light of the purpose and content of that component of the assessment framework
The methods used to set standards for classification of the trainee’s performance/competence must be transparent and in the public domain
Assessments must provide relevant feedback
Assessors/examiners will be recruited against criteria for performing the tasks they undertake
There will be lay input in the development of assessment
Documentation will be standardised and accessible nationally
There will be resources sufficient to support assessment
The authors are part of a team responsible for implementing performance assessment for paediatricians in training on behalf of the RCPCH. This article provides guidance on good practice in relation to postgraduate assessment with particular reference to SpRs in paediatrics, although the principles discussed are generic in nature. Consensus and clarity regarding the purpose of assessment (why), the content of assessment (what), and the method of assessment (how) is essential. There is also a need for a clear framework to address problems for the minority of doctors whose assessment raises concerns about their ability to practice effectively.
PRINCIPLES OF GOOD ASSESSMENT
Key characteristics of assessment tools include reliability, validity, educational impact, feasibility, and cost effectiveness. The relative importance of these characteristics varies according to the purpose of the assessment.
Reliability is a measure of reproducibility—would you get the same results if you administered the assessment again? Possible influences on reliability include differences between observers (inter-observer reliability), variation within observers (intra-observer reliability), the nature of the test itself (test-retest reliability), and the nature of the problem itself (case specificity). Case specificity is a particular problem for clinical assessment of all types. It is important to realise that subjectivity and reliability are not incompatible. Subjective judgements, while not reliable on an individual basis may be reliable if sufficiently widely sampled. Within medical education, reliability is increasingly being evaluated using a technique based on analysis of variance; generalisability theory.3 By analysing components of variance it makes use of all the data to quantify known sources of error without multiple experiments. Identification of the sources of error is important as it allows sampling to take place across the sources of error. It is also possible to mathematically model the circumstances that would be required (in terms of number of observations and numbers of independent observers) to achieve a given reliability. This means that it is possible to plan assessment in a way that ensures adequate reliability will be achieved. Conventionally a reliability coefficient of 0.8 is desirable for high stakes assessments such as certification procedures, although a lower reliability may be acceptable for widespread screening assessments.
Validity is a measure of how completely an assessment tool measures what it purports to. There are a number of different types of validity including construct, content, and criterion validity.4 Reliability is essential to the defensibility of an assessment—but demonstrated validity is also a fundamental requirement. It doesn’t matter how reliable a test is, if it is not actually assessing the area of interest it is not worth using.
Evaluation of feasibility is essential. Assessment methods that are highly feasible on a small scale may prove very difficult to implement on a larger scale in a range of different settings. Consultants are already hard pressed and the time they have available to commit to assessment limited. Use of peers and patients for assessment as well as other health professionals is an important part of optimising feasibility.
Tools used should generate educational feedback that informs professional development planning. Because assessment drives learning, robust assessment which is seen to be valid and feasible is likely to have a greater educational impact and be more acceptable than one which is not. Tools which have low feasibility on a wide scale may be very useful in the context of remediation for doctors about whom there is concern.
PURPOSE OF ASSESSMENT
Fundamental to the development of any assessment process is clarifying the purpose of the assessment process. For trainees this means determining readiness to progress to the next stage of their training. The terms summative and formative are widely used to describe the difference between assessments which concentrate on making pass-fail judgements (summative) and those which concentrate on providing feedback on an individual’s strengths and weaknesses (formative). Summative assessment may produce feedback which clarifies areas of concern or excellence for doctors, but it is not intrinsic to the process (as it is for a formative assessment process). Assessment is the most powerful stimulus to learning—we should use this strategically. By assessing the areas of practice that we consider to be the most important we will inevitably stimulate learning in these areas.
WHAT SHOULD BE ASSESSED?
Guidance on SpR annual assessment to inform the Record of In-Training Assessment (RITA) focuses on the assessment of practice, rather than knowledge.1 For doctors in practice there is a trend away from competence assessment towards performance assessment. Competence assessment is a measure of what a practitioner is capable of doing (the best he/she can do under controlled circumstances), whereas performance assessment is a measure of what he or she actually does in daily practice.5 Competence assessment does not necessarily predict performance.6 Miller provides a useful framework for conceptualising the difference between performance (does) and competence (shows how)7 (fig 1). However, there continues to be confusion about the use of the terms competence and performance; workplace based assessment may be a better term as it avoids such confusion. Miller’s pyramid also emphasises the fact that performance is built on a foundation of knowledge—without adequate knowledge it will not be possible to perform satisfactorily across a range of situations. Workplace based assessment provides an authentic representation of the way in which a doctor functions within a complex environment where there are many potential influences on their behaviour. Establishing assessment procedures requires careful thought and a detailed and structured plan is necessary if this is to be done properly.8 There is currently a paucity of adequately evaluated performance assessment tools, and this is acknowledged worldwide.5 Tool development must be mapped to domains of competence, and guidance on these is available.5 Within the UK, Good Medical Practice (GMP) provides the framework for defining what a doctor is expected to be able to do.9 Mapping of SpR assessment to GMP will ensure that the requirements for revalidation are fulfilled by the process. Details of SpRs’ practice profiles in terms of both the distribution of tasks (emergency versus outpatient work, for example) and relative frequency of different clinical problems are not, however, currently available.
HOW SHOULD WE BE DOING SPR ASSESSMENT?
The principles of how the development of a rigorous performance assessment programme for SpRs could be approached as well as illustrative discussion of some specific tools will be covered by discussion of how this problem has been approached within one, specific, postgraduate setting: annual assessment for paediatric SpRs in the UK.
Areas on which to focus initial efforts in relation to tool development have been informed by a range of sources (box 2). It is intended that all paediatric SpRs will be assessed using high feasibility “screening” tools such as peer ratings.10,11 Where a potential problem is identified, more detailed assessment in the area of concern can be undertaken (fig 2). This is important both to ensure that a real problem exists in this area and to provide a detailed profile of the nature of the problem to inform planning of remediation.
Box 2: Areas on which to focus initial efforts in relation to tool development for paediatric SpRs
These have been informed by a range of sources:
Detailed review of performance assessment literature
Areas recognised as being central to the practice of most physicians, in particular the consultation
Good Medical Practice (GMP)9
Agreed international domains for performance assessment (which map to GMP)5
Areas recognised as being common areas of complaint/poor performance (communication and teamwork in particular)
Consensus exercise with paediatric tutors identifying key features of an effective (paediatric) consultation
Areas identified in GMC pilot for revalidation as being areas where doctors found it difficult to provide evidence of adequate performance (implying a deficiency of tools)—in particular, patient feedback and teamwork
Areas complementary to work being undertaken by the RCP, to avoid unnecessary duplication of effort
Possible sources of evidence for performance assessment can be divided into two broad groups: generic and specific skills. Examples of possible tools in both groups are discussed.
POTENTIAL TOOLS FOR PERFORMANCE (WORKPLACE BASED) ASSESSMENT
Tools suitable for widespread screening (level 1, fig 2)
Peer ratings refer to judgements made by other health professionals about a doctor’s performance. They are broadly equivalent to 360 feedback which has been used in industry for many years. They usually consist of a questionnaire with some sort of scale against which the doctor is judged in a number of areas. Peer rating is an attractive means of assessing a broad range of competencies for doctors in practice. It has huge potential as a high feasibility tool that is reliable and is able to assess areas that are otherwise difficult to assess, such as teamworking. Ramsey studied the clinical performance of physicians using written questionnaires mailed to professional associates of the physicians (both doctors and nurses).10,11 Ramsey’s questionnaire consists of 11 categories, and the rater is asked to score the physician in each category from 1 to 9 (or score UA if they feel unable to comment in a particular category). Eleven raters are needed to achieve a reliability coefficient of 0.7.
Other workers have evaluated the use of peer ratings in different settings, mainly within the USA or Canadian healthcare system.12–14 In one of these studies 71% of surgeons followed up three months after administration of the instrument contemplated or initiated change on the basis of the multi-source feedback (based on self reporting).14
The GMC and PMETB advocate the use of peer ratings for work based assessment, and the RCPCH intends to utilise peer ratings as part of the standardised SpR assessment process. A peer assessment tool developed and evaluated with paediatric SpRs has been shown to have good reliability and validity in a pilot study. The tool consists of a 25 point questionnaire (Sheffield Peer Assessment Tool, SPRAT) mapped to GMP. For SpRs 11 raters across a range of health professionals are needed to achieve a reliability of 0.7 (generalisability analysis). SPRAT performs particularly well in the areas of team working and communication, areas which are traditionally difficult to assess. It is feasible and generates feedback which can be used to inform personal development planning (fig 3).
Performance assessment based on written records
Correspondence between professionals
Correspondence between health professionals is an important record of a healthcare event. From April 2004 it will be mandatory to copy correspondence between professionals to patients/carers, and many doctors are already doing this. A validated tool for the assessment of outpatient correspondence has been developed (Sheffield Assessment Instrument for Letters, SAIL). It has been shown to have good reliability and feasibility,15 although further work is needed to determine how best to utilise SAIL in the context of widespread SpR assessment. Potentially other SpRs could be used as raters, which has high feasibility and good educational impact. A small pilot study has shown a significant improvement in letter writing following training with SAIL.16
Patient and parent feedback
Patients and/or their carers are ideally placed to provide feedback on how well a doctor communicates with them. A vast number of patient satisfaction tools are available. However, there has been almost no work undertaken which robustly evaluates how to use the patient’s perspective as a defensible component of a rigorous assessment process, a problem highlighted by Sitzia in his review of patient satisfaction data.17 The American Board of Internal Medicine (ABIM) has done work to evaluate a patient assessment tool.18 A recent study of 351 paediatric consultations using the Sheffield Patient Assessment Tool (SHEFFPAT) has shown that 25 consultations are sufficient for parents’ feedback to meet the criteria required for inclusion in a performance assessment programme (reliability of >0.8 evaluated using generalisability).19 Utilising patients/carers not only meets criteria for performance assessment, but also fits well with the GMC and PMETB requirements and the concept of the expert patient.
Tools better suited for more detailed testing (level 2, fig 2)
Performance assessment based on written records
Medical records have huge potential for assessment, but ward based records may not represent the performance of the doctor making decisions (for example, an SpR or consultant ward round recorded by an SHO).20 Furthermore, assessment of the medical record alone does not allow an assessment of decision making/patient management skills as record keeping is simply too inconsistent. It may be important to assess an individual’s ability to keep adequate written records, but this does not necessarily reflect their decision making skills. A potential way of using medical records to test more complex skills such as decision making, however, is through chart stimulated recall.
Chart stimulated recall
Chart stimulated recall (CSR) consists of assessment of performance through structured interviews, for which a selection of medical records from a physician’s caseload acts as the focus. Assessment may be based on the quality of data acquisition, patient evaluation, the clinician’s choices about patient management, and knowledge base. The GMC and College of Physicians and Surgeons of Ontario (CPSO) Canada incorporate CSR within their procedures for evaluating poorly performing doctors. Charts are reviewed by two assessors, and used as the focus for discussions with the physicians. Because case specificity (performance of an individual which is dependent on the nature of the medical case around which the interview is based) is a problem with this type of assessment tool, CSR interviews would need to be conducted on several occasions during training, in order to ensure that the method has adequate sampling validity.
Norman and Salvatoori have shown satisfactory reliability using CSR, and found it to be feasible.21,22 Data on the use of CSR in the UK setting for performance assessment are not available, but CSR has the potential to allow assessment areas which are otherwise very difficult to assess.
Clinical and technical skills
Although the assessment of communication, teamworking, and other generic skills is clearly essential, it is also important that core clinical skills are assessed. Possible tools available include mini-CEX.23 Mini-CEX is utilised by the ABIM to assess residents in training. Its aim is to assess residents’ clinical skills, attitudes, and behaviours using a structured rating form completed by a senior member of faculty while observing a clinical encounter. It takes on average 20 minutes per encounter. Scores on the mini-CEX are influenced by the difficulty of the clinical case encountered as well as the nature of the problem. Wide sampling is essential and it is a method that is relatively time consuming for senior doctors, potentially limiting its feasibility. Work is being undertaken by the Royal College of Physicians to evaluate its measurement characteristics in a UK training grade setting.24
Assessment of core technical skills is clearly also important. A range of potential methodologies for doing this have been developed within the surgical specialties, and these could be modified for a paediatric setting.25 The use of video also has potential for the assessment of technical skills.26
Video has also been used for the assessment of consultation skills, most widely within a primary care setting within the UK, and also in the Netherlands and Australia.27–29 This has tremendous potential for providing structured feedback that can inform personal development planning for doctors. Pilot work is underway evaluating a paediatric consultation assessment tool based around existing models of the consultation. It may be appropriate, however, that the use of video for assessment purposes is reserved for those doctors who appear to have difficulty in the area of communication identified by higher feasibility screening tools such as patient assessment and peer rating. All trainees could, however, valuably be exposed to review of videos of their consultations as part of their training.
Many other attributes and skills make up a doctor’s practice and have not been discussed here. These include, for example, teaching and training, research skills, and critical appraisal. Portfolios offer potential as a means of assessing a doctor’s overall professional profile, including these aspects of practice, as well as, importantly, an individual’s ability to learn from experience. However, while their usefulness as a means of supporting professional development is widely acknowledged, their role in performance assessment is controversial.30
Putting it all together
None of these individual assessments on their own will be sufficient. In order to obtain a representative picture of a doctor’s overall practice, a number of assessments sampling widely across the doctor’s practice will be essential. These should be planned well in advance, and the nature and timing of the assessment processes made explicit to the trainees being assessed as well as to the assessors.
Training for assessors is essential and is likely to improve the reliability of the process. The nature and purpose of the assessments and the criteria against which judgements are being made should be made explicit to all participants. Quality assurance processes must be built into the process and should evaluate reliability and validity as well as ensuring that the overall assessment programme is in line with the PMETB assessment standards framework.2
Detailed assessment should be undertaken of possible problem areas identified by screening tools such as peer ratings.
Remediation: principles and planning the process
Remediation should be offered to all doctors about whom significant concern in any area of their practice has been raised, and adequate (or otherwise) remediation confirmed by appropriate assessment processes. Ideally each SpR and SHO programme should have an individual who is responsible for remediation. A written, individualised, framework should be produced in line with a nationally agreed outline framework for remediation. Additionally, it is of particular importance that an attempt is made to measure an individual’s degree of insight into areas of concern as this may significantly affect the success of a remediation programme. Tools to assess insight are not yet well developed, but it is recognised as being a priority area for performance assessment development.31
Appropriate rigorous assessment of doctors is a challenge. Development and implementation of performance assessment requires careful thought and planning and a considerable investment of resources, both time and money. However, such investment is essential. Assessment is not only mandatory, it is the most powerful stimulus to learning. We should ensure that it is undertaken rigorously, but also that we assess not simply what is easiest, but what is most important, so that assessment has a real influence on doctors’ practice and hence the quality of care they provide for patients.
Special thanks are extended to Dr Vin Diwakar for his thoughtful comments and input. Thanks also are due to Dr Julian Archer, Dr Jim Crossley, and Miss Judith Ellis.
Performance assessment requires careful thought and planning