active_ecosites.Rmd

---
title: "active_ecosites"
author: "Nathan Roe"
date: "2/22/2022"
output: html_document
params:
  MLRA: 'placeholder'
  MLRA2: 'placeholer'
---

This script is designed to determine which ecosites are actively correlated to a soil component
(only major components and non-miscellaneous area components) in your MLRA. It builds off of Step 2
in the 'EDIT ecosite data' work flow (https://github.com/natearoe/EDIT_ecosite_data). 

You will need to identify the MLRAs of interest on line 25.

library
```{r, message=FALSE, warning=FALSE}
library(dplyr)
library(stringr)
```

Concatenate your MLRAs of interest. There can be one or more. It should have three numeric digits. For example, MLRA 18 would be 018. MLRA 22A would be 022A. MLRA 109 would be 109.
```{r}
MLRAs <- c("022", "018")

NASIS_report <- read.csv("./ecosite_report.csv")
```

non-miscellaneous areas
```{r}
NASIS_report_reduced <- NASIS_report  %>% filter(compkind != "miscellaneous area")
```

which ecosites are active, and how many correlations are they associated with?
```{r}
ecosite_df <- NASIS_report_reduced |> dplyr::group_by(ecosite_id) |> 
  dplyr::summarise(Numb.Components = n(), 
                   Maj.Components = sum(majcompflag == "yes"),
                   Min.Components = sum(majcompflag == "no")) |> dplyr::arrange(desc(Numb.Components))

print.data.frame(ecosite_df)
```

are components from your MLRA of interest correlated to ecosites from other MLRAs?
```{r}
ecosite_MLRAs <- unique(str_sub(ecosite_df$ecosite_id, start = 2L, end = 4L))
ecosite_MLRAs
```

Reduce to ecosites in MLRAs of interest
```{r}
MLRAs_vector <- NULL

 for(i in seq(MLRAs)){
  MLRAs_new <- stringr::str_subset(ecosite_df$ecosite_id, pattern = paste("(?<=R|F{1})", MLRAs[i], sep = ""))
  MLRAs_vector <- c(MLRAs_vector, MLRAs_new)
 }

```

create concatenation of ecosites for use in the next step of the 'EDIT ecosite data' workflow. 
```{r}
options(useFancyQuotes = FALSE)
ecosite_concat <- paste("c(", paste(sQuote(MLRAs_vector), collapse = ", "), ")")
ecosite_concat
```

This list of ecosites is going to be important in the 'EDIT ecosite data' process. You should select/copy the concatenated output from
c( to ). For example c('F018XC201CA', 'F018XI205CA', 'R018XI163CA'). You do not want the outer most quotation marks. 

Important: Perform QC on this list of ecosites. Sometimes, you might find that a component was correlated to two ecosites. This could result in something like, 'R018XI163CA & R018XD076CA'. These should be removed. Any other ecosites that are clearly erroneous should be removed as well.