Prepare climate data

Last updated: 2023-07-26

Checks: 7 0

Knit directory: ms_mariposas_pheno/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20230601)

The command set.seed(20230601) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 7c33a30

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 7c33a30. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/raw_data/.DS_Store

Untracked files:
    Untracked:  data/best_models_prec.csv
    Untracked:  data/best_models_prec.xlsx
    Untracked:  data/best_models_temp.csv
    Untracked:  data/best_models_temp.xlsx
    Untracked:  data/doy_medio_sp.csv
    Untracked:  data/doy_medio_sp.xlsx
    Untracked:  data/models_prec_scaled.csv
    Untracked:  data/models_prec_scaled.xlsx
    Untracked:  data/models_tmed_scaled.csv
    Untracked:  data/models_tmed_scaled.xlsx

Unstaged changes:
    Modified:   analysis/climate_sensibility.Rmd
    Modified:   data/models_prec.csv
    Modified:   data/models_prec.xlsx
    Modified:   data/models_tmed.csv
    Modified:   data/models_tmed.xlsx
    Modified:   data/selected_species.csv
    Modified:   data/transectos_climate.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/prepare_climate.Rmd) and HTML (docs/prepare_climate.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	7c33a30	ajpelu	2023-07-26	add orcid
html	32f6fa3	ajpelu	2023-07-26	Build site.
Rmd	00ea503	ajpelu	2023-07-26	climate data
Rmd	4e72a27	ajpelu	2023-07-11	prepare climate data
html	baaf783	ajpelu	2023-07-10	Build site.
Rmd	1a4104b	ajpelu	2023-07-10	prepare data climate with new data
html	a834041	ajpelu	2023-06-26	Build site.
Rmd	3600071	ajpelu	2023-06-26	get data from 1st oct to sep+1
Rmd	03899a4	ajpelu	2023-06-04	add plots climate
html	59c4576	ajpelu	2023-06-01	Build site.
Rmd	2f9f824	ajpelu	2023-06-01	change name
html	b4c84e5	ajpelu	2023-06-01	Build site.
Rmd	55a33ae	ajpelu	2023-06-01	add prepare climate data

Introduction

library(here)
library(tidyverse)
library(purrr)
library(runner)
library(kableExtra)
library(DT)

We used the climate data from REDIAM (500 x 500 grid for all Andalusian territory). Gridded data comes from meteorological stations. For each pixel values of maximum, minimum and median monthly temperatures are available from 1971 to 2021. Monthly rainfall are available from 1951 to 2022.
For each transect we obtained all the data of the pixels that contact with the transects (see code/get_climate_transect.R). Then we generated the average value of each variable by transect and month.

files <- list.files("data/raw_data/climate_transects", pattern = "transect.csv", full.names = TRUE)

custom_f <- function(x) { 
  out <- 
    read_csv(x) |> 
    dplyr::select(-ID) |> 
    pivot_longer(-c("transectid","x", "y")) |> 
    separate(name, into = c("var", "year", "month", "cog")) |> 
    dplyr::select(-cog, -x, -y)
  return(out)
  }


raw <- files |> 
  map_df(~custom_f(.)) |> 
  mutate(var = recode(var, 
                      "tm2" = "tmin", 
                      "tm3" = "tmax"))


climate_transect <- raw |> 
  group_by(transectid, var, year, month) |> 
  summarise(avg_transect = mean(value)) |> 
  mutate(date = as.Date(paste(year, month, "01", sep="-"), format="%Y-%m-%d")) |> ungroup() |> 
  filter(date >= as.Date("2007-01-01", format="%Y-%m-%d")) |>
  mutate(month_names = strftime(date, '%b')) 

write_csv(climate_transect, "data/raw_data/climate_transects/climate_transect_all_avg.csv")

climate_transect <- read_csv("data/raw_data/climate_transects/climate_transect_all_avg.csv")

See aspect of the table:

Compute month, bi-month and three-month data

For each transect, we selected monthly data. Then we computed the bi-months and three-months average.

Prepare hydrological months

aux <- climate_transect |> 
  mutate_at(vars(month, year), as.numeric) |> # convert to numeric 
  mutate(
    hydro_month = case_when(
      month >= 10 ~ (month - 9), 
      TRUE ~ month + 3), 
    hydro_year = case_when(
      month >= 10 ~ (year + 1), 
      TRUE ~ year), 
    name_m = case_when(
      month >= 10 ~ paste0(month_names, "_pre"), 
      TRUE ~ month_names
    )
  ) 

# |> 
#   filter(hydro_year > 2007)

Temperature data

Monthly data

We selected temperature data in the climate_transect dataframe. For each variable (i.e. tmax, tmed, tmin) we pivot the data to obtain the temperature data for each transect (ID transect) and year (from 2007 to 2021). The resulted dataframe is called d1mt (data-1-month-temperature).

d1mt <- aux |> 
  filter(var != "p") |> 
  dplyr::select(-month, -date, -month_names, -year, -hydro_month) |> 
  pivot_wider(values_from = avg_transect, 
              names_from = name_m)

Explore the results

Bimonthly data

To obtain values of temperature aggregated each 2-months, we used the runner() function of runner package. This function computes a function over a temporal window. For the temperature data we use the bimonth average temperature. As monthly data, we pivot this dataframe to obtain for each variable, transect and year the temperature value (in this case the 2-month average temperature). The name of the variables will be a combination of the two averaged month, i.e.: JanFeb. The resulting dataset is called d2mt (data-2-month-temperature).

aux2 <- aux |> 
  filter(var != "p") |> 
  group_by(var) |> 
  mutate(
    avg_two_months = runner::runner(
      x = avg_transect,
      k = 2,
      f = mean,
      na_pad = TRUE
    )) |> 
  mutate(
    month_names2 = runner::runner(
      x = name_m, 
      k = 2, 
      f = paste, 
      na_pad = TRUE, 
      collapse="")
  ) |> 
  ungroup()

d2mt <- aux2 |> 
  filter(var != "p") |> 
  dplyr::select(-month, -date, -month_names, -year, -hydro_month, -avg_transect, -name_m) |> 
  filter(!(is.na(month_names2))) |>
  pivot_wider(values_from = avg_two_months, 
              names_from = month_names2)

See the results

Some test were performed to check the results:

d2008 <- aux |>
  filter(var == "tmax") |>
  filter(transectid == 14) |>
  filter(hydro_year == 2008)

# test 1
janfeb <- mean(as.vector(d2008[4:5, "avg_transect"]$avg_transect))

test_aux2 <- aux2 |>
  filter(var == "tmax") |>
  filter(transectid == 14) |>
  filter(hydro_year %in% c(2007, 2008))

identical(janfeb, (test_aux2 |> 
  filter(hydro_year == 2008) |> 
  filter(month_names2 == "JanFeb") |> pull(avg_two_months)))

[1] TRUE

# test 2
Dec_pre_jan <- mean(as.vector(d2008[3:4, "avg_transect"]$avg_transect))  

identical(Dec_pre_jan, (test_aux2 |> 
  filter(hydro_year == 2008) |> 
  filter(month_names2 == "Dec_preJan") |> pull(avg_two_months)))

[1] TRUE

Trimonthly data

To obtain values of temperature aggregated each 3-months, we again used the runner() function of runner package. As monthly data, we pivot this dataframe to obtain for each variable, transect and year the temperature value (in this case the 3-month average temperature). The name of the variables will be a combination of the three averaged month, i.e.: JanFebMar. The resulting dataset is called d3mt (data-3-month-temperature).

aux3 <- aux |> 
  filter(var != "p") |>
  group_by(var) |> 
  mutate(
    avg_three_months = runner::runner(
      x = avg_transect,
      k = 3,
      f = mean,
      na_pad = TRUE
    )
  ) |> 
  mutate(
    month_names3 = runner::runner(
      x = name_m, 
      k = 3, 
      f = paste, 
      na_pad = TRUE, 
      collapse="")
  ) |> ungroup()

d3mt <- aux3 |> 
  filter(var != "p") |> 
  dplyr::select(-month, -date, 
                -month_names, -year, 
                -hydro_month, -avg_transect, -name_m) |> 
  filter(!(is.na(month_names3))) |> 
  pivot_wider(values_from = avg_three_months, 
              names_from = month_names3)

See the results:

We generated a dataframe with monthly, 2-month and 3-month data

climate_temp <- d1mt |> 
  inner_join(d2mt) |> 
  inner_join(d3mt)

Rainfall data

As we see in the previous section, we also computed the monthly, bimonthly and 3-monthly data for the rainfall, but in this case we used the cummulative precipitation (not the average), so the value for JanFeb corresponds to the cummulative rainfall of Jan and Feb.

auxp <- aux |> 
  filter(var == "p")


d1mp <- aux |> 
  filter(var == "p") |> 
  dplyr::select(-month, -date, -month_names, -year, -hydro_month) |> 
  pivot_wider(values_from = avg_transect, 
              names_from = name_m) 


aux2p <- aux |> 
  filter(var == "p") |> 
  mutate(
    sum_two_months = runner(
      x = avg_transect,
      k = 2,
      f = sum,
      na_pad = TRUE
    )) |> 
  mutate(
    month_names2 = runner(
      x = name_m, 
      k = 2, 
      f = paste, 
      na_pad = TRUE, 
      collapse="")
  ) |> ungroup()


d2mp <- aux2p |> 
  dplyr::select(-month, -date, -month_names, 
                -year, -hydro_month, -avg_transect, -name_m) |> 
  filter(!(is.na(month_names2))) |> 
  pivot_wider(values_from = sum_two_months, 
              names_from = month_names2) 

aux3p <- aux |> 
  filter(var == "p") |> 
  mutate(
    sum_three_months = runner(
      x = avg_transect,
      k = 3,
      f = mean,
      na_pad = TRUE
    )
  ) |> 
  mutate(
    month_names3 = runner(
      x = name_m, 
      k = 3, 
      f = paste, 
      na_pad = TRUE, 
      collapse="")
  ) |> ungroup()

d3mp <- aux3p |> 
  dplyr::select(-month, -date, -month_names, 
                -year, -hydro_month, -avg_transect, -name_m) |> 
  filter(!(is.na(month_names3))) |> 
  pivot_wider(values_from = sum_three_months, 
              names_from = month_names3) 
    

climate_prec <- d1mp |> 
  inner_join(d2mp) |> 
  inner_join(d3mp)

Generated data

The temperature and rainfall data were joinned and exported.

climate_data  <- bind_rows(
  climate_temp, climate_prec)

# Filter erroneous date windows
climate_data <- climate_data |> 
  dplyr::select(-SepOct_pre, -AugSepOct_pre, -SepOct_preNov_pre)

climate_data |>
  mutate(across(-c(transectid, var, hydro_year), ~round(., digits = 2))) |>
  DT::datatable()

sessionInfo()

R version 4.2.3 (2023-03-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur ... 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DT_0.26          kableExtra_1.3.4 runner_0.4.2     lubridate_1.9.2 
 [5] forcats_1.0.0    stringr_1.5.0    dplyr_1.1.0      purrr_1.0.1     
 [9] readr_2.1.4      tidyr_1.3.0      tibble_3.1.8     ggplot2_3.4.1   
[13] tidyverse_2.0.0  here_1.0.1       workflowr_1.7.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10       svglite_2.1.0     getPass_0.2-2     ps_1.7.1         
 [5] rprojroot_2.0.3   digest_0.6.31     utf8_1.2.2        R6_2.5.1         
 [9] evaluate_0.19     httr_1.4.4        pillar_1.8.1      rlang_1.1.0      
[13] rstudioapi_0.14   whisker_0.4       callr_3.7.3       jquerylib_0.1.4  
[17] rmarkdown_2.19    webshot_0.5.4     htmlwidgets_1.6.2 bit_4.0.4        
[21] munsell_0.5.0     compiler_4.2.3    httpuv_1.6.8      xfun_0.39        
[25] pkgconfig_2.0.3   systemfonts_1.0.4 htmltools_0.5.4   tidyselect_1.2.0 
[29] fansi_1.0.3       viridisLite_0.4.1 crayon_1.5.2      tzdb_0.3.0       
[33] withr_2.5.0       later_1.3.0       grid_4.2.3        jsonlite_1.8.4   
[37] gtable_0.3.1      lifecycle_1.0.3   git2r_0.30.1      magrittr_2.0.3   
[41] scales_1.2.1      vroom_1.6.3       cli_3.6.0         stringi_1.7.8    
[45] cachem_1.0.6      fs_1.6.2          promises_1.2.0.1  xml2_1.3.3       
[49] bslib_0.4.2       ellipsis_0.3.2    generics_0.1.3    vctrs_0.6.0      
[53] tools_4.2.3       bit64_4.0.5       glue_1.6.2        crosstalk_1.2.0  
[57] hms_1.1.2         processx_3.7.0    parallel_4.2.3    fastmap_1.1.0    
[61] yaml_2.3.7        timechange_0.1.1  colorspace_2.0-3  rvest_1.0.3      
[65] knitr_1.41        sass_0.4.5