On Institutional vs Central Repositories and Open Access

SUMMARY: The target content of the global Open Access (OA) movement is the 2.5 million articles published yearly in the planet’s 25,000 peer-reviewed journals. The natural and optimal locus to deposit these articles to make them OA is the author’s institutional repository. That way deposit mandates from both institutions and research funders collaborate and converge, covering all research output. Unmandated central repositories are no more successful in getting themselves filled with their target content than unmandated institutional repositories. The critical variable is the mandate, not the repository’s centrality or size. The denominator — the total target content relative to which we are trying to calculate, for a given repository, what proportion of it is being deposited — is far bigger for a central disciplinary repository than for an institutional repository.

(1) Repository size and “infrastructure” do not generate content.
(2) Empty repositories are useless.
(3) The only way to fill them is to mandate deposit.
(4) Not all or most research is funded.
(5) But all research originates from institutions.
(6) Institutions’ interests are served by hosting and managing their own research assets.
(7) Hence both institutional and funder mandates should converge on institutional deposit.
(8) Any central collections can then be harvested from the global distributed of institutional repositories.

Critique of: Romary, L & Armbruster, C. (2009) Beyond Institutional Repositories.

(The following are excerpts from the critique of Romary & Armbruster (R&A). Click here for the full text.)

R&A: “The argument proceeds as follows: firstly, while institutional open access mandates have brought some content into open access, the important mandates are those of the funders”

This “argument” is demonstrably incorrect.

Not all or even most of research is funded, whereas all research originates from institutions. Hence institutional mandates cover all research, whereas funder mandates cover only funded research.

The NIH, RCUK and ERC funder mandates were indeed important because they set an example for other funders to follow (and many are indeed following); but that still only covers funded research. Funder mandates do not scale up to cover all research.

The Harvard, Stanford and MIT institutional mandates were hence far more important, because they set an example for other institutions to follow (and many are indeed following); and this does cover all research output, because institutions are the universal providers of all research output, whether funded or unfunded, across all disciplines.

R&A: “[Funder mandates] are best supported by a single infrastructure and large repositories, which incidentally enhances the value of the collection (while a transfer to institutional repositories would diminish the value).”

This is again profoundly incorrect. The only “value enhancement” that empty collections need is their missing content. (Nor are we talking about “transfer” yet, since the target contents are not being deposited. We are talking about mandating deposit.)

Funder mandates can be fulfilled just as readily by depositing in institutional repositories or central ones. Repository size and locus of deposit are completely irrelevant. All OAI-compliant repositories are interoperable. The OAI-PMH allows central harvesting from distributed repositories. In addition, transfer protocols like SWORD allow direct, automatic repository-to-repository transfer of contents.

Hence there is no functional advantage whatsoever to direct central deposit, since central harvesting from institutional repositories achieves exactly the same functional result. Instead, direct central deposit mandates have the great disadvantage that they compete with institutional mandates instead of facilitating them.

Both the natural and the optimal locus of deposit is the institutional repository, for both institutions and funders. That way funder mandates and institutional mandates collaborate and converge, covering all research output.

Summary:
(1) Repository size and “infrastructure” do not generate content.
(2) Empty repositories are useless.
(3) The only way to fill them is to mandate deposit.
(4) Not all or most research is funded.
(5) But all research originates from institutions.
(6) Institutions’ interests are served by hosting and managing their own research assets.
(7) Hence both institutional and funder mandates should converge on institutional deposit.
(8) Any central collections can then be harvested from the global distributed of institutional repositories.

And now an important correction of a widespread misinterpretation of the relative success of institutional and central repositories in capturing their target content:

The Denominator Fallacy. With one prominent exception — which has absolutely nothing to do with the fact that the exceptional repository in question, the physics Arxiv, happens to be central rather than institutional — unmandated central repositories (and there are many) are no more successful in getting themselves filled with their target content than unmandated institutional repositories. The critical causal variable is the mandate, not the repository’s centrality or size.

The way to arrive at a clear understanding of this fundamental fact is to note that the denominator — i.e., the total target content relative to which we are trying to reckon, for a given repository, what proportion of it is being deposited — is far bigger for a central disciplinary repository than for an institutional repository.

For an institutional repository, its denominator is the total number of refereed journal articles, across all disciplines, produced by that institution annually.

For a central disciplinary repository, its denominator is the total number of refereed journal articles, across all institutions worldwide, published in that discipline annually. (For a national repository, like HAL, its denominator is the total research output of all the nation’s institutions, across all disciplines.)

So it is no wonder that central repositories are “larger” than institutional ones: Their total target content is much larger. But this difference in absolute size is not only irrelevant but deeply misleading. For the proportion of their total annual target content that unmandated central repositories are actually capturing is every bit as minuscule as the proportion that unmandated institutional repositories are capturing. And whereas the total size of a mandated institutional repository remains much smaller than an unmandated central repository, the reality is that the mandated institutional repositories are capturing (or near capturing) their total target outputs, whereas the unmandated central repositories are far from capturing theirs.

The reason Arxiv is a special case is not at all because it is a central repository but because the physicists that immediately began depositing in Arxiv way back in 1991, with no need whatsoever of a mandate to impel them to do so, had already long been doing much the same thing in paper (at the CERN and SLAC paper depositories), and necessarily centrally, because in the paper medium there is no way one can send one’s paper to “everyone,” nor to get everyone to access or “harvest” each new paper from each author’s own institutional depository (if there had been such a thing).

All of that is over now. And if physicists had made the transition from paper preprint deposit to online preprint deposit directly today rather than in 1991, in the OAI-MPH era of repository interoperability and harvesting, there is no doubt that they would have deposited in their own respective institutional repositories and CERN and SLAC and Arxiv would simply have harvested the metadata automatically from there (with the obvious computational alerting mechanisms set up for harvesting, export and import).

But that longstanding cultural practice of preprint deposit among physicists would be just as anomalous if physicists had begun it all by depositing institutionally rather than centrally, for no other (unmandated) central repository (or discipline) is capturing the high portion of its annual total target content that the physics Arxiv is capturing (in certain preprint-sharing subfields of physics) and has been capturing ever since since 1991, in the absence of any deposit mandate.

So the centrality, size and success of Arxiv is completely irrelevant to the problem of how to fill all other unmandated repositories, whether central or institutional, large or small, in any other discipline, and regardless of the “robustness” of the repository’s “infrastructure.” Only the mandated repositories are successfully capturing their target content, and there is no longer any need to deposit directly in central repositories: In the OAI-compliant OA era, central “repositories” need only be collections, harvested from the distributed local repositories of the universal research providers: the institutions.

R&A: “Secondly, we compare and contrast a system based on central research publication repositories with the notion of a network of institutional repositories to illustrate that across central dimensions of any repository solution the institutional model is more cumbersome and less likely to achieve a high level of service.”

The assumption is made here — with absolutely no supporting evidence, and with all existing evidence (other than the single special case of Arxiv, discussed above) flatly contradicting it — that researchers are more likely to deposit their refereed journal articles in big central repositories than in their own institutional repositories.

All evidence is that researchers are equally unlikely to deposit in either kind of repository unless deposit is mandated, in which case it makes no difference whether the repository is institutional or central — except that if both funders and institutions mandate institutional deposit then their mandates converge and reinforce one another, whereas if funders mandate central deposit and institutions mandate institutional deposit then their mandates diverge and compete with one another.

(And of course the natural direction for harvesting is from local to central, not vice versa: We all deposit on our institutional websites and google harvests from there; it would be absurd for everyone to deposit in google and then back-harvest to their own institutional website. The same is true for any central OAI harvesting service.)

R&A: “Next, three key functions of publication repositories are reconsidered, namely a) the fast and wide dissemination of results; b) the preservation of the record; and c) digital curation for dissemination and preservation.”

Again, these functions in no way distinguish central and institutional repositories (both can and do provide them) and have no bearing whatsoever on the real problem, which is the absence of the target content — for which the remedy is to mandate deposit. Otherwise there’s nothing to curate, preserve and disseminate.

R&A: “Fourth, repositories and their ecologies are explored with the overriding aim of enhancing content and enhancing usage.”

You cannot enhance content if the content is not there. And you cannot enhance the usage of absent content. Hence it is it not enhancements that are needed but deposit mandates to generate the nonexistent content for which all these enhancements are being contemplated…

R&A: “Fifth, a target scheme is sketched, including some examples.”

The target scheme includes a suggestion that publishers should do the depositing, of their own proprietary version of the refereed article. This is perhaps the worst suggestion of all. Just when institutions are at last realizing that after decades of outsourcing it to publishers, they can now host and manage their own research output by mandating that their researchers deposit their final refereed drafts in their own institutional repositories, Romary & Armbruster instead suggest “consolidated” central “publication repositories” in which publishers do the depositing. (The question to contemplate is: If it requires a mandate to induce researchers to deposit, what will it require to induce publishers to deposit — other than paying them to do it? And if so, who will pay how much for what, out of what money — and why?)

Most of the rest of R&C’s suggestions are superfluous, and fail completely to address the real problem: the absence of OA’s target content. You can’t go “beyond” institutional repositories until you first succeed in filling them.

Stevan Harnad

Fuel Independent Science Reporting: Make a Difference Today

If our reporting has informed or inspired you, please consider making a donation. Every contribution, no matter the size, empowers us to continue delivering accurate, engaging, and trustworthy science and medical news. Independent journalism requires time, effort, and resources—your support ensures we can keep uncovering the stories that matter most to you.

Join us in making knowledge accessible and impactful. Thank you for standing with us!

Leave a Comment Cancel reply