Sharing

Ending an environmental data science project brings a mixture of excitement and nervousness. Is the analysis “finished”? What story have you decided you tell? Sharing your project with the wider world involves more than just “writing it up”.

We start out with a chapter on a common approach for moving from the tangled mess of possibility that represents the code and data in your project at the very beginning, to a polished product ready for assessment and use by others. Developing code for a broader audience (18 From the code workshop to the world) is a crucial skill that many self-taught scientist-programmers lack. From decades of experience, we know that often the analysis ends when the key plot is generated. We advocate for a more continuous approach, where initial outputs are used to inform refining the code and documentation and producing refined output, until the positive cycle of polish results in something that is ready to ‘step out of the sandbox’ and provide value to others beyond yourself.

The key parts of this phase of a project include making sure you take the time to add in sufficient documentation of your code (19 Sharing your code the open way) and data (20 Sharing your data the open way). Without the proper documentation and metadata, you can deposit your data and code in as many public repositories as you like but it still won’t be as broadly used or understood without a proper user’s guide, so to speak. We talk through some of the many places where code and data can be shared openly, and why you might want to pick one repository over another given the specifics of your project. In an era where many journals and funding agencies around the world are requiring deposition of code and data in open repositories, understanding the lay of this landscape can be a key piece of making sure you meet funder or editor guidelines. And remember – even if you don’t plan to publish, making your code and data documented and publicly available means that you will be able to re-engage with that project much more easily in the future, should you need to do so. The repositories handle the backup for you and your helpful notes mean it’s easy to get back up to speed.

Once your code and data are squared away, you may be wondering how to best write it all up. We start (21 Writing up your work the open way) highlighting some practical tips for making forward progress on the writing up of your results (many programmers would rather be writing more code instead!). We also cover citation software tools, different ways that the peer review process might unfold at different journals, and different models of journal access (paid or open). We also highlight the various ways you can integrate your prose write-up directly with the analysis code that produces your findings using tools like LaTeX, Quarto, Rmarkdown, or Jupyter notebooks (CITEME). These machine-executable writing approaches help keep all the parts together and ensure that they all work together as expected.

Since it is rare for modern scientific projects to be created by a single individual, deciding how to share credit for the work that has been done is essential to having positive collaborations and can be a high-stakes element of career progression for many. The last chapter in this section describes different models of authorship (22 Building durable and open data walls) and some of the pros and cons of the different approaches promoted by different organizations.

The implementation of these approaches will allow you to create durable, and open, environmental data science projects. You’ll help those who want to understand your insights and you’ll help your future self remember what you did and how you did it. Both are beneficial for strengthening the urgent, always ongoing, ever-more collaborative process of modern research in the environmental sciences.