22  Building durable and open data walls

Congratulations! You are now done with a project, have already documented your code, and perhaps are ready to take the next step to publish your work. Publishing may mean several different things in context:

No matter the ways in which you publish the work a key question that you will need to address is authorship. One of the principles in D’Ignazio and Klein’s Data Feminism is “What get’s counted counts” (D’Ignazio and Klein 2020). For a given project, who should be considered an author and what are the standards used to assess authorship? Amongst all people involved in a project, developing clear criteria and understandings of authorship is essential.

While Chapter 9 identified how to tear down the data wall, this chapter focuses on fortifying the remaining wall in an open way, emphasizing the sometimes thorny issue of authorship and how to equitably define it. Let’s begin.

22.1 Deciding authorship

As a scientist, the currency is publication, and careers especially in academia are measured through research outputs and sharing in journals (Rawat and Meena 2014). The number of scientific studies published each year is in the millions (Evelyth 2014; National Science Board 2023), with the overall trend of the number of authors increasing (Duffy 2017; Thelwall and Maflahi 2022). That’s a lot of information; not all papers are authored by a single person.1

In the environmental sciences, there are around 7 authors per paper for a given environmental science journal (Sijp 2018). Given this default standard for multi-authored papers, it is imperative to address questions of authorship. Authorship disputes are very real and have the tendency to wall out many for a successful career in science. Unfortunately this has fallen on historically marginalized groups (Settles et al. 2024).

One viewpoint for authorship is the requirement that everyone contributes equally to the work. For the journal Nature, inclusion of being an author means you completely understand the data, experiment, results and conclusions, and had full editorial control and contributed all stages of the writing process (Nature Publishing Group 2023).2 For multi-authored papers or synthesis papers this standard could be challenging to meet. Both Naupaka and John work with undergraduates and first generation populations. In a given project, undergraduates may cycle on and off a project or participate for a short term such as a summer. Assuming they are 100% responsible for all parts of a finished manuscript is too heavy of a responsibility - and impossible since many projects are defined across multiple years.

22.2 Inclusive authorship models

D’Ignazio and Klein (2020) argue that work should be credited at all stages of the workflow process. One possible solution is a shift towards contributions, elaborated through the CRediT or Contributor Roles Taxonomy. See Table 22.1 for a description of the different roles within the project.

Table 22.1: Description of aspects of authorship from the Contributor Roles Taxonomy (CRediT). See https://credit.niso.org/ for more details.
Term Definition
Conceptualization Ideas; formulation or evolution of overarching research goals and aims
Methodology Development or design of methodology; creation of models
Software Programming, software development; designing computer programs; implementation of the computer code and supporting algorithms; testing of existing code components
Validation Verification, whether as a part of the activity or separate, of the overall replication/ reproducibility of results/experiments and other research outputs
Formal analysis Application of statistical, mathematical, computational, or other formal techniques to analyze or synthesize study data
Investigation Conducting a research and investigation process, specifically performing the experiments, or data/evidence collection
Resources Provision of study materials, reagents, materials, patients, laboratory samples, animals, instrumentation, computing resources, or other analysis tools
Data curation Management activities to annotate (produce metadata), scrub data and maintain research data (including software code, where it is necessary for interpreting the data itself) for initial use and later reuse
Writing - original draft Preparation, creation and/or presentation of the published work, specifically writing the initial draft (including substantive translation)
Writing - review & editing Preparation, creation and/or presentation of the published work by those from the original research group, specifically critical review, commentary or revision – including pre-or postpublication stages
Visualization Preparation, creation and/or presentation of the published work, specifically visualization/ data presentation
Supervision Oversight and leadership responsibility for the research activity planning and execution, including mentorship external to the core team
Project administration Management and coordination responsibility for the research activity planning and execution
Funding acquisition Acquisition of the financial support for the project leading to this publication

Perhaps the criteria for authorship in the journal Nature as someone who completely understands the data, experiment, results and conclusions, along with full editorial control and contribution all stages of the writing process is too limiting. Arguably the contributor roles in presented in Table 22.1 could be a more inclusive range of contributors over the lifecycle of a project.

At the onset of any project, Table 22.1 is a useful guide to understand each person’s roles; we suggest creating a tracking sheet or live document of Table 22.1 that is updated as the project progresses. It is the responsibility of the corresponding author to understand each contributor’s piece to the entire project whole. Our experiences using the Contributor Roles Taxonomy is helps to articulate the scope of participation and contributions. The Contributor Roles Taxonomy does not diminish the value of contributions. This taxonomy defines manageable tasks commensurate with the range of experiences someone may have on a project. As an example, we published a study that spanned severals with different cohorts of undergraduates that contributed to code development and data collection (Zobitz et al., n.d.). Table 22.1 was extremely helpful to characterize the important contributions that everyone contributed - and is more nuanced than a list of authors.

22.3 Author order

Another consideration is understanding authorship order, which can vary by discipline. In environmental sciences, first author indicates the primary or corresponding author, whereas the last author indicates senior roles (Duffy 2017). In the mathematical sciences, large multi-authored papers tends to be the exception than the rule; author order is traditionally alphabetical (American Mathematical Society 2004). If you wanted to go rogue, authorship order could be determined through athletic prowess (kidding!).

While the above sections focused on authorship and author order for academic journals, we argue these considerations matter no matter how a project is shared (in a website, github, zenodo). An environmental data scientist works across several disciplines and therefore has to be culturally competent in understanding the nuances of authorship in different disciplines (and the journal that one is publishing in). We borrow inspiration again, from Frost’s “Mending Wall” (Frost 2022):

   Before I built a wall I’d ask to know
   What I was walling in or walling out,
   And to whom I was like to give offense.


Determining authorship and publication order fortifies your data wall, and arguably is foundational to a successful study. Let the Contributor Roles Taxonomy in Table 22.1 and the other considerations discussed in this chapter serve as tools to help build up a durable data wall.

22.3.1 Acknowledgements

While this chapter has focused on authorship, there is also a role for everyone in your support system, which could include:

  • lab managers
  • information technology support specialists
  • students who perhaps did a rotation through your lab
  • anonymous reviewers who provided critical feedback that you begrudingly know made the manuscript better.

While individual people in your support system may not have had a large role in Table 22.1, they are important nonetheless. Most studies have a section for acknowledgments that will allow you to publicly thank those people. Doing science is a collective effort, and sharing gratitude is always important.

TipAcknowledging fun

While science can be serious business, the acknowledgments section can be a place to have (some) fun. In some of John’s publications he has acknowledged people such as Ben S. Chelton or B. N. Acheson. As far as he knows, these aren’t real people but rather anagrams for his family.

22.4 Exercises

  1. Select one or two journals your are interested in publishing in. How do they define authorship? Is usage of the Contributor Roles Taxonomy encouraged?

  2. Audit a project you are working in now using the Contributor Roles Taxonomy.

  3. Depending on your primary academic discipline, investigate standards for authorship order.

  4. At institutions that require publication in academic journals, how is authorship order weighed in evaluation and review processes?


  1. A UNESCO report estimated there are approximately 8-9 million scientists worldwide (Schneegans et al. 2021). Given that scientific publications are often peer-reviewed, by the numbers it seems there should be enough potential reviewers for manuscripts.↩︎

  2. We chose the journal Nature simply because its family of journals consistently has the highest-impact factor.↩︎