12 AI for Modern Environmental Data Science
AI – spectrum of logistic regression to general intelligence What is AI for Environmental Data Science? LLMs - Machine Learning, Random Forest Models, support vector machines (traditionally throught of us machine learning) Modern Data scientist needs understanding of spectrum of machine learning tools - where can this be. Enormous value Moore’s Law for computing, innovation going on current examples at the time of publication.
12.1 Introduction
The use of machine learning and ‘Artificial Intelligence’ (AI) has exploded in recent years, but it can be challenging to keep up not only because of rapid tool and method development, but also because definitions of what constitute AI differ even among experts in the field.
Some say xyz, others pqr, etc
This chapter will give you an overview of some of the approaches and terminology in this rapidly evolving area and point you to some resources to learn more about the many tools and techniques available.
12.2 What do people mean when they say AI?
12.2.1 Statistics
12.2.2 Machine Learning
12.2.3 Deep Learning
12.2.4 Large Language Models (LLMs)
12.2.4.1 Strengths of LLMS
Summarizing text, lower barriers for intercultural language translation, personalized feedback, help with coding
12.2.4.2 Weaknesses of LLMs
hallucinations, environmental cost, data security/privacy issues
12.2.4.3 Types of LLMs
Model size, raw API vs chatbot, ‘thinking’ LLMs
12.2.4.4 Local and Open LLMs
Ollama, Hugging Face, etc
12.2.5 Case studies of Machine Learning and AI in the Environmental Sciences
12.2.5.1 Code translation from one language to another
Rather, let’s examine differences between common environmental data science languages using the power of chatGPT.
Generative AI is an offshoot of machine learning methods from data science, and so provides a good case study to examine differences environments for data science. These artificial intelligence tools (e.g. chatGPT and others) have rapidly transformed our daily lives (especially post 2023) and how we interact with the internet. For scientific research disclosing the use of generative AI tools is recognized as maintaining scientific integrity (Bockting et al. 2023; American Journal Experts (AJE), n.d.; Bertolo and Antonelli 2024). Let’s use them here to contrast how the different environments in Table 3.1 produce output.
If we chose R, a prompt to chatGPT might be the following Figure 12.1:
Figure 12.2 shows its response, which (admittedly) a well-organized (and documented!) explanation of starter code:
The provided code loads up the correct library (tidyverse), converts the time to the POSIXct format (which makes working with dates and times easier) and generates a well-labeled plot. Not too shabby. Based on your knowledge of R, we would also award extra credit points for using the tidyverse pipe (%>%) in the code, but perhaps not full credit because of the adoption of the base R pipe (|>).
Now let’s give the same prompt with python (Figure 12.3):
Examining the code seems like a beat for beat rehash of the same code with R, but now with python (like how Star Wars: The Force Awakens was reviewed - please don’t @ us!). There are some differences to note:
- Similar to R, in python libraries are defined at the start of the code (with the command
import). However, those libraries need to be referred to when you wish to use a command from a particular library (e.g. the function to_datetime is a function in the pandas library). Thankfully in python you can identify shortcuts to these libraries, such as the code that saysimport pandas as pd- whichever makes sense for you. (NOTE: In R if you only want to use a particular function in a library, then refer to it with the double colon (::, e.g. PACKAGE::FUNCTION) ). - Python doesn’t have the native pipe operator (|>) like R. The assignment operator is equals (=) versus a left facing arrow (<—).
We finally asked chatGPT the same prompt, this time with Julia (Figure 12.4):
Can you spot the differences (and similarities) with the Julia output in Figure 12.4 compared to R (Figure 12.2) or Python (Figure 12.3)? For all practical purposes, it comes down to preference - which one you are more familiar with.
12.2.5.2 Image Processing/Classification/Annotation
12.2.5.3 Audio Processing/Classifying
12.2.5.4 Other
12.2.6 Environmental concerns with AI use
Training models Running models Power and water requirements Carbon footprint What types of models use more or less energy and resources to run
12.3 Sources
Generative AI link: LINK