Evaluating Inter-rater Reliability of Indicators to Assess Performance of Medicines Management in Health Facilities in Uganda

Evaluating Inter-rater Reliability of Indicators to Assess Performance of Medicines Management in Health Facilities in Uganda

By: Belinda Blick, Stella Nakabugo, Laura F. Garabedian, Morries Seru, Birna Trap
Publication: Journal of Pharmaceutical Policy and PracticeMay 2018, vol. 11, p. 11. DOI: https://doi.org/10.1186/s40545-018-0137-y.



To build capacity in medicines management, the Uganda Ministry of Health introduced a nationwide supervision, performance assessment and recognition strategy (SPARS) in 2012. Medicines management supervisors (MMS) assess performance using 25 indicators to identify problems, focus supervision, and monitor improvement in medicines stock and storage management, ordering and reporting, and prescribing and dispensing. Although the indicators are well-recognized and used internationally, little was known about the reliability of these indicators. An initial assessment of inter-rater reliability (IRR), which measures agreement among raters (i.e., MMS), showed poor IRR; subsequently, we implemented efforts to improve IRR. The aim of this study was to assess IRR for SPARS indicators at two subsequent time points to determine whether IRR increased following efforts to improve reproducibility.


IRR was assessed in 2011 and again after efforts to improve IRR in 2012 and 2013. Efforts included targeted training, providing detailed guidelines and job aids, and refining indicator definitions and response categories. In the assessments, teams of three MMS measured 24 SPARS indicators in 26 facilities. We calculated IRR as a team agreement score (i.e., percent of the MMS teams in which all three MMS had the same score). Two sample tests for proportions were used to compare IRR scores for each indicator, domain, and overall for the initial assessment and the following two assessments. We also compared the IRR scores for indicators classified as simple (binary) versus complex (multi-component). Logistic regression was used to identify supervisor group characteristics associated with domain-specific and overall IRR scores.


Initially only five (21%) indicators had acceptable reproducibility, defined as an IRR score ≥ 75%. At the initial assessment, prescribing quality indicators had the lowest and stock management indicators had the highest IRR. By the third IRR assessment, 12 (50%) indicators had acceptable reproducibility, and the overall IRR score improved from 57% to 72%. The IRR of simple indicators was consistently higher than that of complex indicators in the three assessment periods. We found no correlation between IRR scores and MMS experience or professional background.


Assessments of indicator reproducibility are needed to improve IRR. Using simple indicators is recommended.