5 min read

Shoot growth modeling 5

This update is focused on data processing, priors, reparameterization, and visualization.

Filling in censored data

For each individual, shoot measurements stop after the shoot length plateaus. Now I fill in the censored data after sampling stopped with the last observed shoot length. I make sure each site and year have the same end date of sampling.

Note that there are also NAs or inconsistent start dates. I did not fill those in.

# dat_shoot <- tidy_shoot(extend = F)
plot_shoot(dat_shoot %>% filter(year == 2016, site == "hwrc"), speciesoi = "queru")
dat_shoot_extend <- tidy_shoot(extend = T)
plot_shoot(dat_shoot_extend %>% filter(year == 2016, site == "hwrc"), speciesoi = "queru")
dat_shoot_extend_cov <- tidy_shoot(extend = T, covariate = T)

(Deprecated) Making closed canopy the baseline and open canopy a treatment

I attempted to change the coding of the closed and open canopy conditions to make closed canopy the baseline and infer the effects of open canopy. Thought it might be useful for inference of the effects of canopy opening on phenology.

However, because closed canopy does not have drying treatment, it would disrupt the full factorial design and make the previously designed interaction terms difficult to infer.

Decided to keep open canopy as a baseline.

dat_all <- dat_shoot_extend_cov %>%
  filter(doy > 90, doy <= 210) %>%
  filter(shoot > 0) %>%
  drop_na(barcode) %>%
  group_by(species) %>%
  mutate(group = str_c(site, year, sep = "_") %>% factor() %>% as.integer()) %>%
  ungroup() %>%
  mutate(heat_trt = factor(heat_name, levels = c("ambient", "+1.7C", "+3.4C"), labels = c(0, 1, 2)) %>% as.character() %>% as.integer()) %>%
  mutate(water_trt = factor(water_name, levels = c("ambient", "reduced"), labels = c(0, 1)) %>% as.character() %>% as.integer()) %>%
  mutate(canopy_code = factor(canopy, levels = c("open", "closed"), labels = c(0, 1)) %>% as.character() %>% as.integer())

(Deprecated) Setting independent priors for each species

In the previous version, models for all species shared the same priors. This may lead to built-in correlation of parameters among species. I tried to use uniform priors to fit an exploratory model for each species using data from the control, and use the mean and sd of inferred min, start, end, and speed in priors for mu_min, mu_start, mu_end, and mu_speed, respectively.

dat_14 <- dat_all %>%
  filter(species == "queru")
df_MCMC_14 <- calc_bayes_fit(
  data = dat_14 %>%
    filter(
      canopy == if_else("closed" %in% (dat_14 %>% pull(canopy) %>% unique()), "closed", "open"),
      heat == "_",
      water == "_"
    ) %>%
    mutate(tag = "training"),
  version = 14 # uniform prior, reparameterized
)
write_rds(df_MCMC_14, "alldata/intermediate/shootmodeling/df_MCMC_14.rds")
df_MCMC_14 <- read_rds("alldata/intermediate/shootmodeling/df_MCMC_14.rds")

mat_prior <- summ_mcmc(df_MCMC_14,
  option = "all",
  stats = "mean"
) %>%
  column_to_rownames(var = "param") %>%
  as.matrix()

mat_prior
##               mean          sd
## min     1.49398157 0.074656741
## start 121.98624641 3.051821367
## end   168.89969484 2.910885103
## speed   0.05330639 0.005509204
## sigma   1.09351416 0.043747358

Although these empirical informative priors are close to the informative priors I previously prescribed, this is not a standard practice because 1) I was using the same data twice and 2) I used a different model for exploration from the model for inference.

Uniform prior instead of informative normal priors

I also tried the same broad uniform priors for all species, now coded in Models 18, 19, 20.

dat_18 <- dat_all %>%
  filter(species == "queru")
df_MCMC_18 <- calc_bayes_fit(
  data = dat_18 %>% mutate(tag = "training"),
  version = 18, # with uniform priors, regular parameterization
  num_iterations = 50000,
  nthin = 5
)
write_rds(df_MCMC_18, "alldata/intermediate/shootmodeling/df_MCMC_18.rds")
df_MCMC_18 <- read_rds("alldata/intermediate/shootmodeling/df_MCMC_18.rds") %>% tidy_mcmc(dat_18)
p_bayes_diagnostics <- plot_bayes_diagnostics(df_MCMC = df_MCMC_18)
p_bayes_diagnostics$p_MCMC
p_bayes_diagnostics$p_coefficient

Mixing and inference very similar to those from normal priors. Can consider.

(Deprecated) Reparameterizing with start, end, and speed

In previous models, the logistic function was parameterized with min, asym, xmid, and logk. I tried to reparameterize the logistic function with min, start, end, and speed. This was intended to make the inference more interpretable and to break the built-in correlations between treatment effects on the derived parameters e.g., start and end, duration and speed.

The reparameterization compiled much slower, had worse mixing, did not improve the model fit, and the inferred parameters did not look reasonable. I decided to keep the original parameterization with asym, xmid, and logk and proceed with post-hoc analyses.

Here I show the poor mixing from the reparameterized models (15, 16, and 17).

dat_15 <- dat_all %>%
  filter(species == "queru")
df_MCMC_15 <- calc_bayes_fit(
  data = dat_15 %>% mutate(tag = "training"),
  mat_prior = mat_prior,
  version = 15, # with intuitive parameterization
  num_iterations = 50000,
  nthin = 5
)
write_rds(df_MCMC_15, "alldata/intermediate/shootmodeling/df_MCMC_15.rds")
df_MCMC_15 <- read_rds("alldata/intermediate/shootmodeling/df_MCMC_15.rds") %>% tidy_mcmc(dat_15)
p_bayes_diagnostics <- plot_bayes_diagnostics(df_MCMC = df_MCMC_15)
p_bayes_diagnostics$p_MCMC
p_bayes_diagnostics$p_posterior
p_bayes_diagnostics$p_coefficient

Improving visualization of treatment effects

Now I make marginal predictions for each treatment combination, regardless of site and year. I hide 1.7C warming treatments and hide predictive intervals for a simpler plot. I also predict over regular grids of 91, 96, …, 210 DOYs and predicted 1,000 iterations to make the plot smoother.

df_pred_18_marginal <- calc_bayes_predict(
  data = dat_18 %>%
    distinct(heat_trt, water_trt, canopy_code, doy) %>%
    group_by(heat_trt, water_trt, canopy_code) %>%
    complete(doy = seq(91, 210, 5)) %>%
    ungroup(),
  df_MCMC = df_MCMC_18,
  version = 18,
  marginal = T,
  num_iter = 1,
  random = F
)
## [1] "1 out of 1"
p_bayes_predict_marginal <- plot_bayes_predict(
  data = dat_18,
  data_predict = df_pred_18_marginal,
  marginal = T,
  fewer_heat = T,
  vis_log = T,
  vis_ci = F
)

p_bayes_predict_marginal$p_predict

Generalize to all species

calc_bayes_all(
  data = dat_all,
  independent_priors = F, # do not use species-specific empirical informative priors
  uniform_priors = T, # use uniform priors
  intui_param = F, # regular parameterization with asym, xmid, logk
  num_iterations = 50000,
  nthin = 5,
  path = "alldata/intermediate/shootmodeling/uni/",
  num_cores = 20
)
calc_bayes_derived(path = "alldata/intermediate/shootmodeling/uni/")
plot_bayes_all(path = "alldata/intermediate/shootmodeling/uni/")

Here I only summarize the results from species with full factorial design (2 canopy conditions, 2 water treatments).

df_bayes_all <- read_bayes_all(path = "alldata/intermediate/shootmodeling/uni/", full_factorial = T, derived = T)

p_bayes_summ <- plot_bayes_summary(df_bayes_all, plot_corr = F)
p_bayes_summ$p_coef_line