Regression #16

laljeet · 2022-07-08T13:32:00Z

Predictor Variables:

Mean, max, Min Temperature
Extreme temp days (Above 30 degrees C)
PPT
Number of dry days
Gini Coeff
Shannon Diversity Index

Response Variable:

Could be any of

DEQ_Irrigation_withdrawals (Currently exploring) If meteorology affects reporting
Unreported withdrawals
All withdrawals

Equation used (confirm with Dr.Shortridge) :

lmm2 <- lmer(DEQ_Irrigation_withdrawals ~ PPT + Irrigation+Min_Temp+Mean_Temp+Max_Temp+Zero_PPT_days+ Extreme_days+SDI+GC+(1+YEAR|COUNTYFP),
data=dat2, REML = FALSE)

The effect of YEAR will vary between COUNTYFP. Random intercepts for YEAR, random slopes for COUNTYFP influenced by YEAR.??

rburghol · 2022-07-15T13:24:55Z

@laljeet is the DEQ irrigation here the one normalized by median historic withdrawal? Is it by county total? Thanks!

laljeet · 2022-07-15T13:28:11Z

In the above equation (lmer), it is by county. No transformations.

rburghol · 2022-07-15T13:32:12Z

Cool. Can we see a plot of this? Can we test for leverage of big or small values?

laljeet · 2022-07-25T18:20:46Z

Log Transformation of Response variables

Correlation plot for predictor variables.

Irrigation and precip are correlated, which is expected.

The same is confirmed by VIF.

vif.mer(lmm2)
lmm2 here is:
lmer(log(Total_irrigation_withdarwals_calculated) ~ PPT + Irrigation+Min_Temp+Mean_Temp+Max_Temp+Zero_PPT_days+ Extreme_days+SDI+GC+(YEAR|COUNTYFP), data=dat2)

rburghol · 2022-07-28T12:26:17Z

Thanks for sharing this.

By plot, I'm looking for DEQ irrigation withdrawals on one axis, and predicted withdrawals on the other. Scatterplot.
The other thing that I would ask is are we not normalizing by median ag census area reported, no?
Do the stats on PPT as compared to irrigation suggest that they are highly autocorrelated -- The numbers appear virtually identical? That is, irrigation is just (crop need - effective_PPT) * area, or am I thinking of a different data set?

laljeet · 2022-08-04T16:05:42Z

julieshortridge · 2022-08-12T15:37:16Z

Model form comparison - as a best practice, we should use an AIC criteria (I believe you can either do this within lme4 or the MuMin package) to investigate model form and eliminate unnecessary variables. Model forms to compare:

all variables, but removing mean_temp to keep VIF below 3 (what you've already done)
4-parameter model: PPT, T_max, Zero_PPT_days, and Extreme_days to capture average and extreme conditions for P+T
2-parameter model: PPT and T_max
1-parameter model: PPT (or irrigation, should be identical except for beta value but you can confirm this).

Do this for the Total Withdrawals and VDEQ Reported models. We'll mainly use the VDEQ reported results but should keep total just so we're consistent with what we scoped to USGS. We don't need to do the full scatterplot visualization and variable significance test for each form, but as a first step just create a table showing AIC values for each combination of parameters (all, 4, 2, and 1) and response variable (total and VDEQ reported withdrawals)

We can also test forms where we replace log(W) as the response variable with (W_obs)/(W_mean), where W_mean is the long-term average value for a county. In this case, we probably can replace the mixed effects model form with a simple OLS regression because all of the intercept terms would be equal to 1.

rburghol · 2022-08-16T01:00:46Z

Thanks for laying these out @julieshortridge -- @laljeet I can't recall if you were computing deficit on the same rolling 2-day window as effective precip, or aggregating it seasonally. Or did you do a variation of Irr where you're adding daily deficit? If so, it can quite possibly be different math.

FWIW, daily adding is certainly more appropriate imo in that if plants get no precip for 6 weeks then double precip for 3 weeks they won't make up that deficit.

As I mentioned in the meeting, my preference would simply be to accumulate running total (ET-rain) and then have the irrigation facilities pump that value. To elaborate, I think the model would work best if we accumulate deficit on a 3-7 day frequency (for computing total), and have irrigators pump 1-2 days per week. This may or may not fit well with this approach but I wanted to lay out more detail on how I think of that process.

laljeet · 2022-08-18T19:22:54Z

Regression Parameters: Based on VIF (VIF < 3) for DEQ and Total WIthdarwals model
Parameters selected:

PPT
Min_Temp
Zero_PPT_days
Extreme_days
SDI
GC

Selection of best model using AIC (lower AIC better model)
Five models were used (mentioned by Dr.Shortridge above) (T min was used instead of Tmax)

Using builtin AIC command: Model 2 using four parameters gave the best results.

	df	AIC
Deq_withdrawals_model1	12	3329.54
**Deq_withdrawals_model2	10	3326.949**
Deq_withdrawals_model3	8	3356.182
Deq_withdrawals_model4	7	3412.221
Deq_withdrawals_model5	7	3412.144

Total Withdrawals Model:

	df	AIC
Total_Withdrawals_Model_1	12	2641.302
**Total_Withdrawals_Model_2	10	2623.623**
Total_Withdrawals_Model_3	8	2748.969
Total_Withdrawals_Model_4	7	2885.136
Total_Withdrawals_Model_5	7	2884.645

Also, I tried the dropt1 command it highlights the significant terms but doesn't provide AIC values. The significant model is the same four-parameter model.

I also found another function in lmerTest that uses backward reduction: Here, the best-selected model was (Four parameter model without the random effect of (1|County)
log(DEQ_Irrigation_withdrawals) ~ PPT + Min_Temp + Zero_PPT_days + Extreme_days + (YEAR | COUNTYFP)

laljeet · 2022-08-18T19:38:50Z

Normalized model:
I normalized DEQ withdrawals with long-term county averages
The same model parameters were used for the normalized model:
glm(Norm_DEQ_withdarwals ~ PPT +Min_Temp+Zero_PPT_days+ Extreme_days+SDI+GC,
data=dat2)

used stepwise elimination for best fit
Best fit model:

Norm_DEQ_withdarwals ~ PPT + Min_Temp + Extreme_days + GC

Modeled withdrawals over predicts in this case:

laljeet · 2022-08-18T19:42:22Z

@rburghol Currently, here and before, we are using the crop water demand for the whole cropping season. We did look into it, but we didn't change methodology for the calculation of unreported withdrawals.

julieshortridge · 2022-08-18T21:12:57Z

Thanks Lal, and good work on the variable selection piece. To compare the lmer and normalized models, I'd suggest calculating an R2 value for each using the withdrawal values in MG (ie, don't calculate on the log values or the normalized values, and don't use the R2 values from the lmer or glm output directly). That will be a good complement to the scatterplots.
Rob, thank you for clarifying what you were thinking regarding the accumulating deficit. I see what you're saying now, and you're right that if deficit is calculated that way, it wouldn't just be a linear transformation of rainfall. I can see what you're saying about that being a better estimate of actual growing season water demand. However, I do have concerns about the feasibility of accomplishing that given the time frame we have before the project ending and Lal's graduation. But we can discuss more tomorrow if you'd like.

rburghol · 2022-08-19T10:38:08Z

Thanks:

@laljeet "We did look into it, but we didn't change methodology for the calculation of unreported withdrawals."
- totally got it, just wanted to make sure I was clear on the method in use/limitations.
@julieshortridge "I do have concerns about the feasibility of accomplishing that given the time frame"
- most definitely. If it were already done this way, it would be a different story.
is the only difference in the most recent model the use of normalized WD data? So does this tell us that if we consider all by normalizing to make all counties data weight equally, those counties that traditionally have smaller irrigation totals reported have worse performance? This is interesting as somewhere in my brain I thought Accomack had a high level of error in a previous step (tho that recollection is fuzzy)
@julieshortridge question: why do you prefer the $R^2$ for the non-normalized over the normalized? I am thinking that both would be interesting?
what is striking about the normalized scatter plot is that it does not under-predict. I would like to compare both methods ( irm vs non-norm) side by side in non-log form to see clearly on implications. Thoughts:
- A model that avoids under-predicting, i.e. a no greater than model, can be extremely valuable in demand forecasting provided that it's not ludicrous in terms of over-predicting.
- Is it the normalizing or the switch from Max_temp to Min_temp that causes the over-prediction?
- A test of "is this a valuable no greater than estimate or is it just ludicrous" would be: does the model predict greater than the historical max in any single year?
- this technique could be extended to the facility/intake level quite easily and would be of interest.

Looking forward to the 1 variable model as well.

laljeet · 2022-08-25T14:31:12Z

Last week's meeting notes:
The best Liner mixed effects model: 4 parameter model
lmer(log(DEQ_Irrigation_withdrawals) ~ PPT +Min_Temp+Zero_PPT_days+ Extreme_days+(YEAR|COUNTYFP)+(1|COUNTYFP),
data=Deq_dat2)

R square = 0.87

Parameter	VIF
PPT	1.921367
Min_Temp	1.157851
Zero_PPT_days	1.810487
Extreme_days	1.276739

________________________________________________________________________________________________________________

Normalized Withdrawals Model: (Normalized by mean withdrawals)
glm(Norm_DEQ_withdarwals ~ PPT + Min_Temp + Extreme_days + GC, data = dat2)

Predicted withdrawals in (mgd)
dat2$predict = predict(Best_model)*dat2$Mean_DEQ_withdarwals

R Square = 0.91

Predicted normalized Model (Million gallons)

Predict normalized coefficient
dat2$predict =predict(Best_model)
R Square = 0.04

One parameter model reduced this R square to 0.03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression #16

Regression #16

laljeet commented Jul 8, 2022 •

edited

Loading

rburghol commented Jul 15, 2022

laljeet commented Jul 15, 2022

rburghol commented Jul 15, 2022

laljeet commented Jul 25, 2022

rburghol commented Jul 28, 2022

laljeet commented Aug 4, 2022

julieshortridge commented Aug 12, 2022 •

edited

Loading

rburghol commented Aug 16, 2022

laljeet commented Aug 18, 2022

laljeet commented Aug 18, 2022

laljeet commented Aug 18, 2022

julieshortridge commented Aug 18, 2022

rburghol commented Aug 19, 2022 •

edited

Loading

laljeet commented Aug 25, 2022

Regression #16

Regression #16

Comments

laljeet commented Jul 8, 2022 • edited Loading

rburghol commented Jul 15, 2022

laljeet commented Jul 15, 2022

rburghol commented Jul 15, 2022

laljeet commented Jul 25, 2022

Log Transformation of Response variables

Correlation plot for predictor variables.

The same is confirmed by VIF.

rburghol commented Jul 28, 2022

laljeet commented Aug 4, 2022

julieshortridge commented Aug 12, 2022 • edited Loading

rburghol commented Aug 16, 2022

laljeet commented Aug 18, 2022

laljeet commented Aug 18, 2022

laljeet commented Aug 18, 2022

julieshortridge commented Aug 18, 2022

rburghol commented Aug 19, 2022 • edited Loading

laljeet commented Aug 25, 2022

laljeet commented Jul 8, 2022 •

edited

Loading

julieshortridge commented Aug 12, 2022 •

edited

Loading

rburghol commented Aug 19, 2022 •

edited

Loading