Conceptualisation and Annotation of Drug Nonadherence Information for Knowledge Extraction from Patient-Generated Texts

Anja Belz1, Richard Hoile2, Elizabeth Ford3, Azam Mullick1
1University of Brighton, 2Sussex Partnership NHS Foundation Trust, 3Brighton and Sussex Medical School


Approaches to knowledge extraction (KE) in the health domain often start by annotating text to indicate the knowledge to be extracted, and then use the annotated text to train systems to perform the KE. This works well for situations where the items to be annotated are named entities or other contiguous noun phrases (drugs, some drug effects), but becomes increasingly difficult when items tend to be expressed across multiple, possibly noncontiguous, syntactic constituents (e.g. most descriptions of drug effects in user-generated text). Other issues include that it is not always clear how annotations map to actionable insights, or how they scale up to, or can form part of, more complex KE tasks. This paper reports our efforts in developing an approach to extracting knowledge about drug nonadherence from health forums which led us to conclude that development cannot proceed in separate steps but that all aspects—from conceptualisation to annotation scheme development, annotation, KE system training and knowledge graph instantiation—are interdependent and need to be co-developed. Our aim in this paper is two-fold: we describe a generally applicable framework for developing a KE approach, and present a specific KE approach, developed with the framework, for the task of gathering information about antidepressant drug nonadherence, and report the conceptualisation, the annotation scheme, the annotated corpus, and an analysis of annotated texts.