Skip to page content

by Jonathan Brown

Poster Download


Purpose/Problem

This service evaluation project (8622) investigated the potential for natural language processing (NLP) algorithms to comprehend free text reports within electronic patient records (EPR) to characterise a countywide inflammatory bowel disease (IBD) cohort.

Patients with IBD are likely to undergo multiple lifetime endoscopic procedures which generate histopathological reports. Managing these patients requires clinicians to derive a phenotypic overview from numerous episodes and diverse sources which can be time consuming, incomplete and subjective.

Method

118,108 lower GI endoscopic procedure reports (2002-2017) and 62,051 lower GI histology reports (2008-2017) from GRH were pseudo-anonymised with hexadecimal GUIDs and imported into an SQL database. Text processing was undertaken in Python pandas dataframes and involved conversion to lower case, key word spelling correction, sentence tokenization and regular expression identification of diagnoses with supporting or negating text.

Results

A 64 bit desktop computer took 11 minutes to identify 2119 colitis, 1166 Crohn’s and 231 IBD unclassified patients. The algorithms were100% sensitive and specific at distinguishing index cases from follow up procedures, 100% sensitive at identifying IBD in linked histology reports and 98% specific at rejecting diagnoses other than IBD.

Conclusion

NLP offers a powerful tool for the automated characterisation of IBD cohorts from text in semi-structured endoscopy and histology reports. The potential for the scheduling of surveillance and linkage to other systems, such as primary care prescribing, are obvious. The technology in the context of an EPR could be applicable many other chronic disease cohorts.