The scientific community is increasingly using computational notebooks for executing, managing, and sharing their workflows and analyses, often in cloud-based computing clusters. Popular notebook tools include Jupyter and R Markdown among other resources, and tools that integrate and execute these together, such as Binder. These environments allow for project work to be combined including the data manipulation leading to the final results, the linked software involved in that analysis and manipulation, and the narratives that in effect form the methodology, protocols, and results of a study. Currently, these are then extracted, with loss of functionality and integration, to form a research publication, which is primarily still text-based.
Thus providing notebooks as available and curated research outputs would greatly enhance the transparency and reproducibility of research, integrating into computational workflows. The notebooks allow deeper investigations into studies and display of results because they link data and software together dynamically with what are often final figures and plots. Unfortunately, the current peer-review and publications workflows across the sciences do not readily support notebooks as research outputs or encourage their use and curation. Currently, few publishers allow these as linked supplements. AGU recently developed author instructions (Erdmann, 2021) for depositing them in repositories. Notebooks are not included in the paper peer-review workflow, inhibiting a deeper evaluation by reviewers into the data processing and thus results.
We propose to develop a better approach: an end-to-end scholarly publishing workflow that would treat notebooks, both Jupyter and R Markdown, as a primary element of the scientific record. This would include an approach where the notebook is the submitted product and is available natively for peer-review. We intend to transform the publication process in a way that elevates transparent and reproducible work by authors, where data and software, together with narrative, are efficiently documented and shared, where access to computation is more equitable, and where new forms of credit can be extended to the wider research community, including research software engineers or RSEs. Metadata will be extracted to provide well-established publication and discovery services. We envision that certain standards around notebooks would be needed to enable such an end-to-end workflow, for example, around copy-editing, production, platforms, and configuration. The goal would be to maximize functionality and simplify author requirements. We expect that new publication platforms and methods may be needed or current platforms will need to evolve in significant ways.
The signatories to this proposal represent a Steering Committee across the key stakeholders to envision and guide the development of this end-to-end workflow. We propose to engage a larger set of stakeholders (~70 in-person, ~20 virtual) to develop this model in a series of three workshops (one in-person/hybrid, two virtual) with several workstreams in-between focused on steps in the process (e.g., (pre-)submission, review, publication). These workshops will help to align the collective guidance, requirements, and common solutions from the group to support the full-workflow vision and to get buy-in. This proposal is aimed at visioning and designing. It will prepare us for the important steps of implementation. The planned deliverable is a complete design for an end-to-end workflow that includes all stages of the publication process needed for an interactive notebook.
All of the deliverables, documentation, and methods, including the pilot project work, will be provided openly and designed around open standards for broad adoption. Having a standard model for publishing notebooks allows for all publishers to support the growing community of notebook development across all scientific disciplines.
- Posted on:
- January 1, 2022
- 3 minute read, 572 words