Review ArticleThe dependence of Cohen's kappa on the prevalence does not matter
Introduction
Cohen's κ is today the standard tool for the analysis of agreement on a binary outcome between two observers or two methods of measurement; however, many authors have pointed out difficulties in interpreting the κ-statistic, difficulties related to a dependence of κ on the prevalence. The editors of the Encyclopedia of Biostatistics (published in 1998) ranked this topic so high that it has an entry of its own [1]. The present article reviews this dependence.
Actually, one can identify two different types of dependence on the prevalence discussed in the literature. The first is a dependence of κ on the observed marginal prevalences, keeping the observed agreement rate fixed. This dependence is discussed, for example, by Feinstein and Cicchetti [2], but is also mentioned in textbooks (e.g., [3]). The second type is a dependence of κ on the prevalence of the true latent binary variable, keeping sensitivity and specificity of the measurement methods fixed. This dependence was first considered by Thompson and Walter [4], but is also mentioned in textbooks and review articles (e.g., in the above-mentioned entry in the Encyclopedia of Biostatistics [1]). We will investigate both types of dependences in order to clarify whether they really constitute a major drawback of κ.
We start with recapitulating the basic motivation behind the definition of Cohen's κ. Table 1 shows the hypothetical results of an agreement study comparing an interview-based and a questionnaire-based method to assess the smoking status of schoolchildren. Among 153 children, we observe that both methods indicate smoker for 81 children and both methods indicate nonsmoker for 43 children. Hence, the raw agreement rate is a = (81 + 43)/153 = 0.81. At first glance, this looks rather impressive; however, we must remember that we would observe some agreement by chance even if the two methods produce random results. To quantify the expected agreement by chance, we can look at the marginal frequencies of positive results. For the interview method, we have a marginal frequency of p1 = 98/153 = 0.64 and for the questionnaire methods we have a marginal frequency of p2 = 93/153 = 0.61. Hence, if the two methods act in an independent manner, then we would expect a positive result by both methods by chance with probability p1p2, and a negative result by both methods by chance with a probability of (1 − p1)(1 − p2). Hence, we would expect an agreement rate of e = p1p2 + (1 − p1)(1 − p2). In our example, we obtain e = 0.64 × 0.61 + 0.36 × 0.39 = 0.53, such that we have to judge the observed agreement rate of a = 0.81 relative to the expected agreement rate by chance of e = 0.53. One way is to compare the difference between a and e with the maximally possible difference in the case of perfect agreement, that is, with 1 − e. This results in the κ-statistic defined as
In our example, we obtain κ = (0.81 − 0.53)/(1 − 0.53) = 0.60, which looks less impressive than the raw agreement rate of 0.81.
Section snippets
The dependence of κ on the observed prevalences for fixed agreement rates
In Table 2, we consider hypothetical results of two agreement studies. In both studies, we observe an agreement rate of 0.8. In the first study, however, we have a marginal prevalence of 80% for both methods, and in the second a marginal prevalence of 50%. Consequently, the expected amount of agreement by chance e and the values of the κ-statistic differ.
One can investigate this type of dependence more systematically by considering κ as a function of the agreement rate a and the marginal
The dependence of κ on the true prevalence for fixed sensitivity and specificity
Thompson and Walter [4] considered another, but related type of dependence. They assume that there exists a latent binary variable reflecting the true status of a subject, and that we regard the two methods as imperfect measurements of the true status. Then we can describe the accuracy of any of the two methods in measuring the true status by its sensitivity Sei (i.e., the conditional probability of a positive finding in method i given the true status is positive) and its specificity Spi (i.e.,
The dependence of κ on a shift in the underlying population
Let us assume that we have a latent continuous variable X and a threshold c, such that the true status S of a subject is positive, if and only if X > c. In the schoolchildren example, we can identify X with the time since start of smoking (such that negative values indicate the time until start of smoking) and c with 0. If μ and σ2 denote mean and variance of X and Fμ,σ2 denotes the distribution function of X, then the prevalence π of S can be expressed as
Changing
The dependence of κ on the composition of the population
We can conclude from Fig. 3 that the variance of the latent variable X is much more important in determining κ than the prevalence π and that κ shares this property with sensitivity and specificity. This is also intuitively clear: the smaller the variance, the closer is the population to the critical value c. Because we fix the error variances in our considerations, this implies that we increase the number of subjects for whom the decision is difficult, and hence sensitivity, specificity, and
Discussion and conclusion
We have reviewed in this article two types of dependences of κ on the prevalence, to clarify whether they imply limitations in using κ as a tool to measure the agreement among binary measurement methods. The first type is a dependence of κ on the observed marginal prevalences, keeping the agreement rate fixed. We conclude that this dependence is a direct consequence of the definition of κ and its aim to adjust a raw agreement rate with respect to the expected amount of agreement under chance
Acknowledgments
The author is grateful to Poul Flemming Høilund-Carlsen for his constructive comments.
References (11)
- et al.
High agreement but low kappa: I. The problems of two paradoxes
J Clin Epidemiol
(1990) - et al.
A reappraisal of the kappa coefficient
J Clin Epidemiol
(1988) Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity
J Clin Epidemiol
(2000)- et al.
Kappa coefficients in epidemiology: an appraisal of a reappraisal
J Clin Epidemiol
(1988) Kappa and its dependence on marginal rates