Analysis of data on 53,581 individuals from diverse backgrounds offers insights into population health
Just over a year since the first confirmed case of COVID-19 in the United States was identified in the Seattle area, the novel coronavirus continues to dominate news headlines and scientific jargon like ‘variants’ and ‘genome sequencing’ has become part of everyday conversations.
Now, a flagship paper co-authored by scores of researchers from across the University of Washington – including scientists from the UW Schools of Public Health and Medicine and the Brotman Baty Institute for Precision Medicine – shows just how valuable whole-genome sequencing data, and the variants they reveal, are to the diagnosis, treatment and prevention of heart, lung, blood and sleep disorders.
The paper, published online today in Nature, shares insights from an analysis of sequencing data representing the entire genomes of more than 53,000 individuals from diverse backgrounds.
The analysis is part of a multi-institution effort called the Trans-Omics for Precision Medicine (TOPMed) program, funded by the National Heart, Lung and Blood Institute, part of the National Institutes of Health. The national Data Coordinating Center for TOPMed, one of the largest whole-genome sequencing projects in the world, is housed in the UW School of Public Health’s Department of Biostatistics and coordinated by its Genetic Analysis Center.
“TOPMed is generating whole genome sequence data on a large and diverse set of study participants with detailed, and in some cases decades-long, records of health and disease,” said study co-author Sarah Nelson, a project manager for the Data Coordinating Center at the UW. “This allows researchers to discover genetic variation that impacts disease risk, progression and/or treatment options, which can help realize precision medicine and prevention approaches.”
The volume and diversity of TOPMed data have also enabled the detection of a huge number of genetic variations not previously detected or reported in prior research efforts, according to Nelson, also a research scientist in the biostatistics department. In this early analysis, researchers identified more than 400 million genetic variations, which can refer to differences between individuals or populations. Among these are extremely rare variants occurring in less than 1% of the population. Roughly half of these 400 million variants were seen in just one individual. These “singletons” were more likely to disrupt genes known to be either associated with human disease or essential to basic cell function.
TOPMed data have led to various discoveries related to a range of health conditions, including atherosclerosis, sickle cell disease, chronic obstructive pulmonary disease, blood pressure and asthma. A recent study, led by researchers in Colorado in collaboration with UW biostatisticians and others, defined airway responses to common coronavirus infections in children and revealed reasons why some people are more prone to infection than others.
Over the last six years, the TOPMed program has grown to include the genomic data of more than 180,000 participants from over 80 studies. The Data Coordinating Center provides logistical support to more than 1,300 investigators affiliated with the TOPMed program.
“We have pioneered innovative ways to share data, enabling investigators from different studies to efficiently pool their data and improve power to make new discoveries,” said study co-author Kenneth Rice, who co-led the Data Coordinating Center with corresponding author Cathy Laurie until her retirement in 2020. Susanne May, a professor of biostatistics in the School of Public Health, serves as the center’s new director.
The center’s researchers have developed new statistical methods and software, providing investigators with novel ways to analyze the huge datasets. “Our data-cleaning and quality-control work has been invaluable,” said Rice, also a professor of biostatistics.
“With such a large and diverse set of studies, it is inevitable that mistakes happen, mixing up data samples and labels, for example,” he said. “Through our rigorous evaluation of trait and genetic data across studies – which no other part of TOPMed can do – we have been able to catch and fix problems that would otherwise have invalidated analyses.”
Scientists in the center have also identified and addressed issues at the cross-section of genetics research and ethics. They continue to draft guidelines on the use of race, ancestry and genetics in TOPMed and lead TOPMed’s Ethical, Legal, and Social Issues Committee.