Inter-Rater Reliability

Also known as: Inter-Coder Reliability, Inter-Annotator Agreement, IRR

A statistical measure of the degree to which two or more independent raters or coders agree in their assessments or classifications of the same data. In accessibility research, inter-rater reliability is used to validate qualitative coding of user study data, annotation of accessibility violations, and evaluation of AI-generated content. Common metrics include Cohen's kappa (for two raters) and Fleiss' kappa (for multiple raters), where values above 0.80 generally indicate strong agreement. High inter-rater reliability ensures that research findings are not dependent on any single researcher's interpretation, which is particularly important when coding subjective phenomena like the presence of disability stereotypes or the quality of accessible content.

Category: research methods

Related: Qualitative Coding · Think-Aloud Protocol

Sources

https://en.wikipedia.org/wiki/Inter-rater_reliability