Phenotyping in Clinical Text with Unsupervised Numerical Reasoning for Patient Stratification
Profiling a patient using the phenotypes from clinical text has various applications in the healthcare domain, including patient stratification for hard-to-diagnose diseases. One of the key challenges in phenotyping is numerical reasoning which involves determining a phenotype based on numerical measurements. This is an open problem, and current state-of-the-art generic phenotyping methods perform poorly. Our study addresses this issue by presenting a novel unsupervised methodology that substantially outperforms the generic phenotyping methods. Our methodology is helpful for our research community as it requires no supervision, saving a lot of cost and time, as well as it can be generalized to other biomedical tasks requiring numerical reasoning. We demonstrate the impact of the work by building patient stratification systems for several diseases by using these phenotypes which can improve accuracy and substantially lessen the manual effort and time involved in screening of patients. In addition, the work provides a new solution to improve clinical data quality by imputing missing values into structured data.