Review Article
The dependence of Cohen's kappa on the prevalence does not matter

https://doi.org/10.1016/j.jclinepi.2004.02.021Get rights and content

Abstract

Background and Objective

The dependence of Cohen's κ on the prevalence has been a major concern in the literature. Indeed, it indicates a serious limitation with respect to comparing κ-values among studies with varying prevalences.

Study Design and Setting

The basic arguments used by different authors are reviewed.

Results

Two types of dependence can be distinguished: a dependence on the observed marginal prevalences and a dependence on the prevalence of a latent binary variable, representing the true status. The first dependence is simply a consequence of the purpose of κ, which is to improve the interpretation of agreement rates, and so does not constitute a real argument against κ. The second occurs only if one can change the prevalence without changing sensitivity and specificity. Typically, in agreement studies a change in prevalence implies also a change in sensitivity and specificity, and we show that in such a framework the dependence on the prevalence becomes negligible.

Conclusion

We should stop criticizing κ for its dependence on the prevalence. Instead, we should focus on its dependence on the composition of the population with respect to subjects easy or difficult to agree on.

Introduction

Cohen's κ is today the standard tool for the analysis of agreement on a binary outcome between two observers or two methods of measurement; however, many authors have pointed out difficulties in interpreting the κ-statistic, difficulties related to a dependence of κ on the prevalence. The editors of the Encyclopedia of Biostatistics (published in 1998) ranked this topic so high that it has an entry of its own [1]. The present article reviews this dependence.

Actually, one can identify two different types of dependence on the prevalence discussed in the literature. The first is a dependence of κ on the observed marginal prevalences, keeping the observed agreement rate fixed. This dependence is discussed, for example, by Feinstein and Cicchetti [2], but is also mentioned in textbooks (e.g., [3]). The second type is a dependence of κ on the prevalence of the true latent binary variable, keeping sensitivity and specificity of the measurement methods fixed. This dependence was first considered by Thompson and Walter [4], but is also mentioned in textbooks and review articles (e.g., in the above-mentioned entry in the Encyclopedia of Biostatistics [1]). We will investigate both types of dependences in order to clarify whether they really constitute a major drawback of κ.

We start with recapitulating the basic motivation behind the definition of Cohen's κ. Table 1 shows the hypothetical results of an agreement study comparing an interview-based and a questionnaire-based method to assess the smoking status of schoolchildren. Among 153 children, we observe that both methods indicate smoker for 81 children and both methods indicate nonsmoker for 43 children. Hence, the raw agreement rate is a = (81 + 43)/153 = 0.81. At first glance, this looks rather impressive; however, we must remember that we would observe some agreement by chance even if the two methods produce random results. To quantify the expected agreement by chance, we can look at the marginal frequencies of positive results. For the interview method, we have a marginal frequency of p1 = 98/153 = 0.64 and for the questionnaire methods we have a marginal frequency of p2 = 93/153 = 0.61. Hence, if the two methods act in an independent manner, then we would expect a positive result by both methods by chance with probability p1p2, and a negative result by both methods by chance with a probability of (1 − p1)(1 − p2). Hence, we would expect an agreement rate of e = p1p2 + (1 − p1)(1 − p2). In our example, we obtain e = 0.64 × 0.61 + 0.36 × 0.39 = 0.53, such that we have to judge the observed agreement rate of a = 0.81 relative to the expected agreement rate by chance of e = 0.53. One way is to compare the difference between a and e with the maximally possible difference in the case of perfect agreement, that is, with 1 − e. This results in the κ-statistic defined asκ=(ae)/(1e)

In our example, we obtain κ = (0.81 − 0.53)/(1 − 0.53) = 0.60, which looks less impressive than the raw agreement rate of 0.81.

Section snippets

The dependence of κ on the observed prevalences for fixed agreement rates

In Table 2, we consider hypothetical results of two agreement studies. In both studies, we observe an agreement rate of 0.8. In the first study, however, we have a marginal prevalence of 80% for both methods, and in the second a marginal prevalence of 50%. Consequently, the expected amount of agreement by chance e and the values of the κ-statistic differ.

One can investigate this type of dependence more systematically by considering κ as a function of the agreement rate a and the marginal

The dependence of κ on the true prevalence for fixed sensitivity and specificity

Thompson and Walter [4] considered another, but related type of dependence. They assume that there exists a latent binary variable reflecting the true status of a subject, and that we regard the two methods as imperfect measurements of the true status. Then we can describe the accuracy of any of the two methods in measuring the true status by its sensitivity Sei (i.e., the conditional probability of a positive finding in method i given the true status is positive) and its specificity Spi (i.e.,

The dependence of κ on a shift in the underlying population

Let us assume that we have a latent continuous variable X and a threshold c, such that the true status S of a subject is positive, if and only if X > c. In the schoolchildren example, we can identify X with the time since start of smoking (such that negative values indicate the time until start of smoking) and c with 0. If μ and σ2 denote mean and variance of X and Fμ,σ2 denotes the distribution function of X, then the prevalence π of S can be expressed asπ=P(Spositive)=P(X>c)=1Fμ,σ2(c)

Changing

The dependence of κ on the composition of the population

We can conclude from Fig. 3 that the variance of the latent variable X is much more important in determining κ than the prevalence π and that κ shares this property with sensitivity and specificity. This is also intuitively clear: the smaller the variance, the closer is the population to the critical value c. Because we fix the error variances in our considerations, this implies that we increase the number of subjects for whom the decision is difficult, and hence sensitivity, specificity, and

Discussion and conclusion

We have reviewed in this article two types of dependences of κ on the prevalence, to clarify whether they imply limitations in using κ as a tool to measure the agreement among binary measurement methods. The first type is a dependence of κ on the observed marginal prevalences, keeping the agreement rate fixed. We conclude that this dependence is a direct consequence of the definition of κ and its aim to adjust a raw agreement rate with respect to the expected amount of agreement under chance

Acknowledgments

The author is grateful to Poul Flemming Høilund-Carlsen for his constructive comments.

References (11)

There are more references available in the full text version of this article.

Cited by (0)

View full text