Newsstand Menu

COVID-19 machine learning effort: Preprints are key

logo of bioRxiv and medRxiv

Cold Spring Harbor Laboratory’s preprint servers bioRxiv and medRxiv are critical components in a newly announced US government-led project, COVID-19 Open Research Dataset (CORD-19), created in response to the COVID-19 pandemic. The plan is to create a machine-readable database of research papers that can be mined with semantic search strategies and other AI techniques for new knowledge about the pandemic. Preprint servers are critical in the midst of this evolving pandemic because they provide the latest research.

screenshot of COVID-19 preprints from bioRxiv medRxiv
This screen shot was taken on 3/25/20. Visit bioRxiv for the most up to date list of preprints on COVID-19 SARS-CoV-2.

CORD-19 is organized by The US Office of Science and Technology Policy. It brings together the Chan Zuckerberg Initiative (CZI—a financial supporter of bioRxiv), Microsoft, The National Library of Medicine (NLM), and the Allen Institute for AI to create a large database of machine-readable preprints and papers. Georgetown University coordinated the collaborators. Microsoft identified and brought together worldwide scientific efforts and results, CZI Meta identified relevant bioRxiv and medRxiv pre-publication content, NLM provided access to journal literature, and the Allen AI team assembled the database.

Cold Spring Harbor Laboratory has made available a webpage containing a free, continually updated collection of COVID-19 and SARS-CoV-2 preprints from bioRxiv and medRxiv. There are nearly 800 papers in the collection, which began in mid-January with first reports from the outbreak center in Wuhan, China. These are among 13,000 machine-readable papers in the CORD-19 database which also contains papers from PubMed Central (a US government repository of published papers) and a pandemic-related collection of articles maintained by the World Health Organization.

John Inglis a Cold Spring Harbor Laboratory faculty member and co-founder of bioRxiv and medRxiv, says, “The pandemic has produced a tsunami of new research findings on the biology of the SARS-CoV-2 virus, the disease it provokes, and its clinical course. Many of these observations were first reported on preprint servers and we are delighted to assist this important mission to harness machine discovery in the search for new understanding that will assist patients throughout the world”.

Written by: Eliene Augenbraun, Creative Director | augenbr@cshl.edu | 516-367-5055

Stay informed

Sign up for our newsletter to get the latest discoveries, upcoming events, videos, podcasts, and a news roundup delivered straight to your inbox every month.

  Newsletter Signup

Tags