This paper reports the reliability in assessments of a series of portfolios assembled by a cohort of participants attending a course for prospective general practice trainers. Initial individual assessments are compared with open discussion between random pairs of assessors to produce paired composite scores, and analysed using kappa statistics. Overall reliability of a global pass/refer judgement improved from a kappa of 0.26 (fair) using individual assessment, to 0.5 (moderate) with paired discussants.