Researchers have found more than half of the public datasets provided with scientific papers are incomplete, which prevents reproducibility tests and follow-up studies.
However, slight improvements to research practices could make a big difference.
Lead researcher Dr Dominique Roche from The Australian National University (ANU) said many peer-reviewed biological journals now require authors to publicly archive their data when a paper is published.
“Unfortunately, our study suggests that many public datasets may be unusable,” Dr Roche said.
Making research data available improves the transparency and reproducibility of research results and avoids unnecessary duplication of data collection.
A survey of 100 papers published in leading journals in ecology and evolution found that more than 50 per cent of the datasets associated with these studies were incomplete due to missing data or essential information needed to interpret the data.
Dr Roche said that making the data public is extremely useful, but that the process is often compromised by simple errors made by researchers.
“Many scientists, including myself, lack proper training in public data archiving and open science practices. These are new practices for most researchers,” he said.
“Biologists often deal with large and complex data-sets that require good organisational skills to present in ways that others can use them. The archived data-sets can be just as important as the published paper.
“Fortunately, many of the problems we encountered in our study can be fixed relatively quickly and easily.”
The study, published in PLOS Biology, makes a number of suggestions such as providing basic but complete data descriptors, using standard file formats such as comma-separated values (csv) rather than pdfs or excel files, and archiving data-sets in an established, searchable online database, instead of as an appendix to the research paper.
Co-author Professor Loeske Kruuk from the ANU Research School of Biology said the paper recommended rewarding researchers that work transparently and collaboratively.
“Journals and databases don’t have the resources to check whether archived data-sets are adequate,” she said.
“The quality of the archived data-sets relies on researchers’ goodwill.”