When to Use Scott’s π or Krippendorff's α, If Ever?

Page **6** of **36**

**Equation **

**9 **

** **

** **

** **

**Equation **

**10 **

** **

** **

** **

Equation 9 and Equation 10 are equivalent to each other, for the same reason that Equation 2 and

Equation 3 are equivalent to each other. The difference is in *c*

*a*

, expected chance agreement. To estimate

*c*

*a*

, Scott (1955) multiplied average* *positive rate (*M/N*) by itself, and multiplied the average negative rate

(*W*/*N*)

by itself, as shown in Equation 8, which can be also expressed as:

**Equation 11 **

** **

By contrast, Krippendorff (1980) subtracted 1 from the two multipliers’ numerators and

denominators:

**Equation 12 **

** **

Equation 9, Equation 10, and Equation 12 constitute Krippendorff's α for binary scale with two

coders. With multiple coders and multi-category nominal scales, Krippendorff's α takes more

complicated form (Hayes and Krippendorff, 2007; Krippendorff 2004a, 2004b). While this paper focuses

on binary scale with two coders to outline the boundaries of legitimate application of the two indicators,

these boundaries should also apply to more categories and more coders.

**III. Fourteen Paradoxes of π and α **

While Scott’s π and Krippendorff's α have been recommended and used as general indicators of

reliability, their behavior often deviates from what’s expected from such an indicator. Here we list

fourteen paradoxes.

**Paradox 1**: *High agreement rate, low π and α.* Suppose two coders coded 1,000 magazine

advertisements for cigarettes in the United States. Their task was to see whether the Surgeon General's

warning had been inserted as required by law. Suppose each coder found 999 “yes” and one “no,” with