COS goes FOSS
The sorry state of scientific publishing and how we could move to an open and resilient infrastructure
29 January 2026
Gáspár Jékely
Centre for Organismal Studies, Heidelberg University
The crysis of publishing—symptomes
- reproducibility crisis
- only a small fraction of primary data available
- even smaller fraction of code
- open access, if exists, is very expensive, maintains the profit of legacy publishers
- antiquated, dysfunctional system that rewards prestige/hype over quality/integrity
- scholarly workflows use non-professional, closed-source software (MS, Adobe, Prism etc.)
- sharing, integration, automation and collaboration difficult (who can use Git?)
- final product of years of research: pdf file (1990s tech) behind a paywall
- data, code, text not searchable, reuseable, discoverable
Most source data collected by scientists are not available
![]()
Code is very often not shared or not shared stably
![]()
- study to assess the effectiveness of code sharing policy
- random sample of 204 Science papers
- artifacts from 44%
- reproduce the findings for 26%
“The data files remains our property and are not deposited for free access.”
“When you approach a PI for the source codes and raw data, you better explain who you are, whom you work for, why you need the data and what you are going to do with it.”
“I have to say that this is a very unusual request without any explanation! Please ask your supervisor to send me an email with a detailed, and I mean detailed, explanation.”
“We do not typically share our internal data or code with people outside our collaboration.”
Flipped protein structures due to buggy program
![]()
The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right).
- buggy non-published program flipped two columns, inverting electron density
- program was inherited from another lab
- mistake repeated in several papers
- led to five retractions (three in Science)
Gene name errors are widespread in the scientific literature
![]()
Most scientists use software developed for accounting
![]()
- the symbol MARCH1 has now become MARCHF1
- SEPT1 has become SEPTIN1, and so on
Reporting and citation bias
![]()
- The cumulative impact of reporting and citation biases on the evidence base for antidepressants
- 50% of randomized controlled trials have never been published
- trials with statistically significant findings are more likely to be published
- citation bias -> studies with positive results receive more citations than negative studies
Majority of high-impact cancer studies fail to replicate
- The Reproducibility Project: Cancer Biology (RP:CB)
- failure to replicate 30 of 53 papers published by Science, Nature, and Cell from 2010 to 2012
- credibility of preclinical cancer biology?
- need for authors to share more details of their experiments
- vague protocols and uncooperative authors
- one-third of contacted authors declined or did not respond
An epidemic of retractions
- steep increase in retractions
- monitoring retractions: http://retractionwatch.com
- majority of all retractions is due to misconduct
Perverse incentives, publish or perish
Brembs
- chasing ‘stories’ and IF instead of integrity, hypothesis-testing, rigour, openness
- under the spell of glamour journals
- “If I get this result, this will be a Nature paper!”
- reporting bias (only positive results are reported)
- low statistical power (a p-value of 0.05 brings only 50% reproducibility)
- in worst cases data are fabricated
Impact factor - not a metric of quality
- IF = number of citations to articles in a journal (the numerator), normalized by the number of articles in that journal (the denominator) in the last 2 y
- calculated by Thomson Reuters
- originally created to help librarians, not as a measure of quality
- yet, emerged as a pervasive metric of quality
- in some cases not calculated but negotiated (the denominator, e.g Curr Biol)
- removing editorials/News-and-Views articles from the denominator (so called “front-matter”) can dramatically alter the resulting IF
- not reproducible, not open (calculated from proprietary data)
- a composite of multiple, highly diverse article types
- comparison of journals not mathematically sound
IF - statistically flawed
- highly skewed distributions
- distorted by outliers (see Nature)
- journal IF comparisons: comparison of means of two populations
- allowed only if distributions follow a normal distributions!
- simple ranking by mean is incorrect
- median would be better or a more complex statistical test (e.g. Kruskal–Wallis test)
IF - strongly biased by outliers
- fitting a more complex exponential function to the citation data
- a journal impact factor can be calculated from the parameters of the fit
- Science JIF = 25.3 instead of the reported 34
- Nature JIF = 26.8 instead of the reported 37
- a few highly cited papers have a substantial effect on the mean, but less on the exponential function
JIF does not correlate with quality metrics (e.g. statistical power)
- no association between statistical power and journal IF
But: JIF correlates with retractions
- ‘journal rank’ is a strong predictor of the rate of retractions
- (also of Excel errors :))
Current system is hugely wasteful
![]()
Robert Maxwell in 1985. Photograph: Terry O’Neill/Hulton/Getty
- worldwide sales > USD 19 billion
- dominated by five large publishing houses: Elsevier, Black & Wiley, Taylor & Francis, Springer Nature and SAGE
- Elsevier has a profit margin around 40 % (higher than Microsoft, Google and Coca Cola)
- about USD 6 billion per year goes to profits = 2 CERNs/year
- APCs can be as high as $12,000
Kleptistan (Binjistan) - I
- an oligopoly of legacy publishers
- Elsevierstan
- Wileystan
- Taylorfrancistan
- Springerstan
- …
Kleptistan (Binjistan) - II
- workflow monopoly
- tools to cover the entire academic workflow (e.g. Elsevier)
- high risk of vendor lock-in
- totalizing, homogenising workflows, extractive of research communities
Kleptistan (Binjistan) - III
or
How to milk the same cow multiple times?
- scientists provide content for free
- scientist peer review for free
- scientists buy over-priced product as APC, subscription or ‘transformative’ deals
- publisher (now ‘data analysis company’) sells entire workflows to scientists
- publishers tracks scientists on its platforms
- sells the data to the employers (e.g. quality assessment) or third parties
Unacceptable practices of data tracking by publishers
Watchthem ‘Data gathering is an essential process, and most companies use it for their success.’
- tracking site visits via authentication systems
- detailed real-time data on the information behaviour of individuals and institution
- page visits, accesses, clicks, downloads, etc.
- assembly of granular profiles of academic behaviour
- without user consent
- selling the data, e.g. RELX – the parent company of Elsevier – establishes PURE at universities around the world
- to provide ‘insights’ into the entire research cycle
- RELX now also sells data to ICE…
The problem is the system
- journal publishing system is fundamentally broken
- a legacy system that prevents science from meeting its true potential for society
- about 40,000 journals
- public trust problem
- science publishing must be built anew
- illusion of truth and finality
- artificial scarcity
- narrow formats
- incomplete information
- prestige and journal-rank fallacies
What would a better system look like?
- data, code and text are shared, indexed, archived and discoverable
- analyses and workflows are also shared
- reagents (e.g. plasmids), strains (e.g. mutants) shared
- open-source software
- maximise reproducibility
- text + data + code = publication
- publications openly accessible, not behind a paywall
- affordable publishing (not hijacked by corporate for-profit publishers)
- preprints = publication, followed by post-publication peer review
One example - the Uniprot database
- a comprehensive resource for protein sequence and annotation data
- entries uniquely identified by a stable URL
- rich metadata that is both human-readable and machine-readable
- shared vocabularies and ontologies
- interlinking with more than 150 different databases
- © 2002 – 2026 UniProt consortium
A history of Open Access — The Budapest Open Access Initiative
- https://www.budapestopenaccessinitiative.org/
February 14, 2002
Budapest, Hungary
- removing barriers to literature
- free and unrestricted online availability = open access
- the costs of providing OA to literature are far lower than the costs of traditional publishing (printed press)
- opportunity to save money and expand the scope of dissemination
- recommendations: self-archiving (I.) and a new generation of open-access journals (II.)
The launch of PLoS
![]()
Paywalls and the story of Aaron Swartz
- in 2011, 24 y old internet hacktivist Aaron Swartz was arrested at MIT
- he downloaded several million articles from an online archive (JSTOR)
- legal troubles
- Swartz committed suicide in 2013
- the internet was created so that scientists could communicate their research results with each other
- billions of videos of cats for free, research results behind paywalls
- (GPT-4o showed an 82% recognition rate for paywalled content)
The politicians weigh in - Plan S
- Plan S was initiated in 2018 as a political solution to the OA problem
- funders mandate immediate open access
- proposal to cap APC (article processing charge)
- publishers whine that this would hurt their profit
- scrap the cap
- let the ‘market’ solve it (it didn’t)
- ‘prestige’ journals can charge as much as they like (Nature-$12,690; Cell-$11,400)
![]()
OA has been hijacked by publishers
- Diamond: OA journal without an APC
- Green: not openly accessible from the publisher website but a free copy is accessible via a repository
- Gold: OA journal with APC (profits can remain high!)
- Hybrid: some papers OA others not (profits can remain high!)
- Bronze: free to read, no identifiable licence
- ‘transformative agreements’
Towards some solutions…
What should we do now?
What should scientists and institutions do?
for the realist:
Safest bet: buy RELX stocks
(RELX: parent company of Elsevier)
![]()
We have the solution, but not the balls to implement it
- under ‘closed’ models, institutions spend a lot of money on publishing
- transitioning those funds to support community-led diamond OA, could fully support a global shift to OA
- strenghten scholarly infrastructure for code, data, interoperability etc.
- huge potential for cost savings (Schimmer et al. 2015)
- publishing in the hand of public institutions
- USD 6 billion/year is a lot of money for that
- would also solve the problems with predatory publishers
What should scientists and institutions do?
for the idealist:
Taking back control
- Public institutions (universities, libraries, funders etc.) should take back control of the digital scholarly infrastructure
- create conditions of open competition for private sector (not oligopoly of a few publishers)
- control data, text, code, citation metrics, scholarly workflows, databases, standards etc.
- cancel all subscriptions and use money to fund databases, libraries, publishing etc.
- support initiatives like OpenAIRE
- build community, the commons -> publishing is community, care
New approaches to research assessment
- the San Francisco Declaration on Research Assessment
- eliminate the use of journal-based metrics (IF) in funding, appointment, and promotion decisions
- assess research on its own merits rather than based on the journal
- capitalize on the opportunities provided by online publication (e.g. relax limits on the number of words, figures, and references)
Chasing False Metrics — the Prestige Game
![]()
Harold Varnus
“We need to get away from false metrics and return to the task of looking at our colleagues’ work closely.”
- we believe the most important work is published in so-called ‘high-impact’ journals
- ceding judgments to journal editors
- we have to eliminate the current situation in which the fate of researcher and their trainees depends on publishing in certain journals
eLife
![]()
- funded by HHMI, Wellcome Trust, MPS, K&A Wallenberg Foundation
- https://elifesciences.org/
- The eLife process has five steps:
- Submission or transfer of a preprint from bioRxiv
- Peer review (eLife editors - who are all active researchers - discuss new submissions and decide which will be peer reviewed)
- Publication of Reviewed Preprint
- Publication of revised version
- Publication of Version of Record
- papers published together with eLife Assessment
- eLife has no IF! (good!!)
Sharing code in an ideal world - federated GitLab servers
- Institutions should host their GitLab server for code
- (GitLab is a database-backed web application running git)
- (git is a Distributed Version Control Systems)
- servers should be federated
- European (-> world-wide) network of research/education institutions and libraries
- code shared upon publication in a permanent repo with DOI
Code with persistend DOI
- Permanent repository for data, text and code
- integration with GitHub
- version control
- Safe — your research is stored safely for the future in CERN’s Data Centre for as long as CERN exists
- citeable
- usage statistics
What is the solution? - The Fediverse for Science
- a federated infrastructure
- run by public institutions (universities, libraries etc.)
- for communication (microblogging = Mastodon)
- for code (GitLab), data (e.g. Omero), text (preprint servers) etc.
- taking back control of scholarly infrastructure
Towards a new, federated scholarly infrastructure
- plan for a federated scholarly information network
- a system that cannot be taken over by corporations
- designed redundantly
- open standards
- “a decentralized, resilient, evolvable network that is interconnected by open standards and open-source norms under the governance of the scholarly community”
One example for publishing - Open Research Europe (ORE)
![]()
- open access publishing venue for EC-funded researchers
- no author or reader fees
- Diamond (but authors need to be EC funded)
- maintained by the European Commission
- Wellcome Open Research (https://wellcomeopenresearch.org/) is similar, maintained by the Wellcome Trust
- but too centralised and no community behind it
- open up access methods, results, publications, data, software, materials, tools and peer reviews
- standard tender process held regularly
- no lock-in with a single publisher
- regular procurement processes, no monopoly, fair prices
An European Infrastructure for Open Science
https://open-science-cloud.ec.europa.eu/
- Available Services:
- File Sync & Share
- Interactive Notebooks
- Large File Transfer
- Virtual Machines
- Cloud Container Platform
- Bulk Data Transfer
…still early days
While we wait…
- individual labs can change behaviour
- my lab has completely switched to publishing preprints and in OA-only not-for-profit journals
- raise your voice in hiring/promotion committees for DORA principles
Further reading
![]()
Samuel A. Moore: Publishing Beyond the Market
US library needs to cut subscriptions except to Elsevier etc. https://www.science.org/content/article/doge-order-leads-journal-cancellations-u-s-agricultural-library
Move away from US cloud https://www.reuters.com/world/europe/dutch-parliament-calls-end-reliance-us-software-2025-03-18/