Enough Health Care Data for an Army: The Million Veteran Program
Researchers have enlisted U.S. veterans to help customize medicine.
Competing With Data & Analytics
The goal of the Million Veteran Program is to understand how genes affect our health. As the project steams ahead, it is reducing the cost of research, shortening the time it takes to generate research results, and building a data infrastructure to allow for new kinds of analysis that will eventually lead to individual gene-based therapies for all veterans.
Dr. J. Michael Gaziano and Dr. Saiju Pyarajan, two of the program’s lead scientists, explained the protocols and infrastructure of the project in a conversation with Sam Ransbotham, associate professor of information systems at the Carroll School of Management at Boston College and guest editor for the Data and Analytics Big Idea Initiative for the MIT Sloan Management Review. Managing complex security and risk is perhaps their greatest challenge, which they are addressing on several fronts: All personal data collected for the study is stored behind a firewall in VA data centers at secure VA facilities; different data domains are separated using a system that prevents cross-referencing; and each participant’s data is stored with appropriate security both for data in motion as well as at rest, making it closer to impossible to identify personal data across all data domains.
Gaziano, national co-principal investigator for the project, described what the research team wants to accomplish. “Our hope is to set in motion a process or a project that will allow for the exploration of very complex questions with very sophisticated data-curation tools to begin to unlock the mysteries of the health care universe.”
Would you start us off by giving us some context for the MVP initiative and explain what it is?
I can give you a 10,000-foot view of what we’re trying to do and then discuss that in the context of what others in the same space are trying to do.
Several years ago, a number of us decided that we wanted to analyze a large cohort to better understand how genes and environment affect disease.