A new study shows that the microbial communities we carry in and on our bodies—known as the human microbiome—have the potential to uniquely identify individuals, much like a fingerprint. Harvard T.H. Chan School of Public Health researchers and colleagues demonstrated that personal microbiomes contain enough distinguishing features to identify an individual over time from among a research study population of hundreds of people. The study, the first to rigorously show that identifying people from microbiome data is feasible, suggests that we have surprisingly unique microbial inhabitants, but could raise potential privacy concerns for subjects enrolled in human microbiome research projects.
The study appears online May 11, 2015 in the journal PNAS.
“Linking a human DNA sample to a database of human DNA ‘fingerprints’ is the basis for forensic genetics, which is now a decades-old field. We’ve shown that the same sort of linking is possible using DNA sequences from microbes inhabiting the human body—no human DNA required. This opens the door to connecting human microbiome samples between databases, which has the potential to expose sensitive subject information—for example, a sexually-transmitted infection, detectable from the microbiome sample itself,” said lead author Eric Franzosa, research fellow in the Department of Biostatistics at Harvard Chan.
Franzosa and colleagues used publicly available microbiome data produced through the Human Microbiome Project (HMP), which surveyed microbes in the stool, saliva, skin, and other body sites from up to 242 individuals over a months-long period. The authors adapted a classical computer science algorithm to combine stable and distinguishing sequence features from individuals’ initial microbiome samples into individual-specific “codes.” They then compared the codes to microbiome samples collected from the same individuals’ at follow-up visits and to samples from independent groups of individuals.
The results showed that the codes were unique among hundreds of individuals, and that a large fraction of individuals’ microbial “fingerprints” remained stable over a one-year sampling period. The codes constructed from gut samples were particularly stable, with more than 80% of individuals identifiable up to a year after the sampling period.
“Although the potential for any data privacy concerns from purely microbial DNA is very low, it’s important for researchers to know that such issues are theoretically possible,” said senior author Curtis Huttenhower, associate professor of computational biology and bioinformatics at Harvard Chan School. “Perhaps even more exciting are the implications of the study for microbial ecology, since it suggests our unique microbial residents are tuned to the environment of our body—our genetics, diet, and developmental history—in such a way that they stick with us and help to fend off less-friendly microbial invaders over time.”
This work was funded in part by NIH NIAID contract HHSN272200900018C (DG), NIH NHGRI U54HG004969 (to Broad Institute, DG), NIH NIGMS P50GM098911 (BJB), NIH NIAID R01 AI101018 (KPL), Danone Research PLF-5972-GD (to Wendy Garrett, CH), NIH NHGRI R01HG005969 (CH), ARO W911NF 11-1-0473 (CH), and NSF CAREER DBI-1053486 (CH).
“Identifying personal microbiomes using metagenomics codes,” Eric A. Franzosa, Katherine Huang, James F. Meadow, Dirk Gevers, Katherine P. Lemon, Brendan J. M. Bohannan, Curtis Huttenhower, PNAS, online May 11, 2015 doi:10.1073/pnas.1423854112