Staging

Beginning an environmental data science project is always exciting —- who doesn’t love exploring a fresh dataset to see what new associations there are to find? Or perhaps the feeling is more one of anxiety or trepidation – you may be asking yourself, “where do I even start?”

In both cases, it can be helpful to lay out the tools and structures you think you’ll need as you dive into the data analysis journey. One reason this is particularly important in Environmental Data Science is that much of the data that we work with as environmental data scientists is inherently a bit messy. We don’t know exactly where our analysis are going to take us when we start out. In the discipline of data science proper, there is a type of analysis called exploratory data analysis. This is distinct from analyzing data to answer a question that you already have in mind with a data set that you already fully understand the structure of. Exploratory data analysis is very much an iterative process because we don’t know exactly where we’re going. When we set out, the journey to our destination in this type of analysis can often create quite a mess. It is not all too far off from the state of creative chaos in an artist studio, mid-painting.

Like any project, having a proper organizational structure is important. In the same way that a chef will prepare their workstation before the dinner rush by mincing the various garnishes and making sure the ingredients are available in their proper amounts, or a plumber checks that they have their pipe wrench, solder, and enough fuel for their blowtorch with them when they set out for a project, computational data scientists also often find a lot of value in properly setting up their computational project before diving in.

This first part of this book works to build your understanding of what environmental data science is (1  Why environmental data science? and 2  Data for environmental science) relative to other related disciplines, how to stage your computational environment and workflows for success (3  Core languages for environmental data science and 5  Version control and file management), and why ethical considerations for working with environmental data are important at all levels of a project (4  Ethics in environmental data science) and particularly at the early stages. With so many tools and approaches available, preparing your project from the onset is an important and impactful choice. We hope these next few chapters help reduce the anxiety and amplify the excitement of setting off to work on a new project.