Marco Carone, an assistant professor of biostatistics and the Norman Breslow Endowed Faculty Fellow at the University of Washington School of Public Health, recently received a $2.7 million research grant from the National Institutes of Health to develop novel statistical tools to more effectively describe the health effects, both intended and unintended, of common medical therapies using data from electronic health records (EHRs).
Though clinical trials remain the gold standard for establishing evidence of health effects, EHR data are playing an increasingly important role in complementing the information gathered in trials. Once dismissed for their poor suitability for research, EHR data have been more readily embraced by researchers and regulatory agencies recently, according to Carone. This is in part because the modernization and standardization of health information systems allows EHRs to be linked, thereby creating larger networks of data from which the effectiveness and safety of existing and emerging treatments can be learned.
"EHRs represent a largely untapped treasure trove of information on health effects that are otherwise very difficult – and sometimes impossible – to capture," Carone says. "In view of the shear number of individuals included in EHR datasets, they are ideal, for example, to study rare but possibly important adverse health events that may be missed in clinical trials."
There are several challenges with the use of EHR data. As with other sources of observational data, the treatment a patient is observed to have received is often associated with patient characteristics such as acute indications, sex, age, socioeconomic status and health history, and these factors must be carefully accounted for to assess treatment effects. Additionally, data are often missing in EHRs, and the EHR of certain patient types may be more likely to include missing data than for others. Careful adjustment for such phenomenon is needed to produce reliable inferences, Carone says. The field of machine learning provides powerful and data-driven approaches to do just this. However, it can be difficult to accurately quantify the uncertainty of findings when using machine learning tools.
In this NIH-funded project, Carone and a multidisciplinary team with expertise in biostatistics, health informatics and pharmaco-epidemiology will tackle this challenge: to develop novel techniques for leveraging machine learning tools to make robust, well-calibrated inference on treatment effects. They will also develop efficient online learning techniques for use with data collected on an ongoing basis (e.g., weekly reporting of safety events). Finally, the team will develop statistical methods for using prior information (e.g., demographic, epidemiologic or pharmacodynamics knowledge) to make inference in the context of rare outcomes, a notoriously difficult statistical problem at the heart of pharmacological safety surveillance.
Their work will build upon recent tools from statistics, causal inference and computer science. In addition to generating better tools for inference on health effects using EHR data, Carone hopes the team's work will "help draw further attention to the immense potential of EHR data and thus to stimulate an even greater push for standardization and stringent quality control in EHRs."