Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VIF #26

Merged
merged 16 commits into from
Sep 6, 2023
Merged

VIF #26

merged 16 commits into from
Sep 6, 2023

Conversation

palday
Copy link
Member

@palday palday commented Sep 5, 2023

as discussed here: JuliaStats/GLM.jl#428 (comment)

@codecov-commenter
Copy link

codecov-commenter commented Sep 5, 2023

Codecov Report

Patch has no changes to coverable lines.

Files Changed Coverage
src/regressionmodel.jl ø

📢 Thoughts on this report? Let us know!.

src/regressionmodel.jl Outdated Show resolved Hide resolved
is not particularly informative anyway.
"""
function vif(model::RegressionModel)
mm = Statistics.cov2cor!(vcov(model), stderror(model))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the model stores its covariance matrix directly such that vcov(model) is non-copying, this call could effectively invalidate the model object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a copy step

src/regressionmodel.jl Outdated Show resolved Hide resolved
src/regressionmodel.jl Outdated Show resolved Hide resolved
Comment on lines 171 to 173
# This generic function is owned by StatsModels.jl, which is the sole provider
# of the default definition. StatsModels needs to own the default definition
# because it depends on the @formula interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should only define gvif in StatsModels? It seems unlikely that modeling packages will define custom gvif functions without depending on StatsModels given that it depends on formulas.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine with me -- and we can always move it here if we decide that's appropriate in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost tempted to move vif to StatsModels too. We have a tradition of defining simple generic fallbacks in StatsAPI which is somewhat unfortunate as StatsAPI is supposed to be an interface package. Currently the two largest functions are r2 and adjr2, which are still relatively short. vif would be about as large. Opinions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kleinschmidt thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine by me, that seems like a reasonable home for it given that StatsModels is not trying to wrap things without being a direct dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that having teh name owned by StatsAPI is good though, just some basic implementation in StatsModels

test/statisticalmodel.jl Outdated Show resolved Hide resolved
@palday palday changed the title VIF and GVIF VIF Sep 6, 2023
@palday palday requested a review from nalimilan September 6, 2023 07:51
src/regressionmodel.jl Outdated Show resolved Hide resolved
Comment on lines 171 to 173
# This generic function is owned by StatsModels.jl, which is the sole provider
# of the default definition. StatsModels needs to own the default definition
# because it depends on the @formula interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost tempted to move vif to StatsModels too. We have a tradition of defining simple generic fallbacks in StatsAPI which is somewhat unfortunate as StatsAPI is supposed to be an interface package. Currently the two largest functions are r2 and adjr2, which are still relatively short. vif would be about as large. Opinions?

@palday
Copy link
Member Author

palday commented Sep 6, 2023

@nalimilan The name vif should still be owned by StatsAPI, but I could see moving the implementation to StatsBase. We don't actually need any of the StatsModels machinery for normal VIF and StatsModels is a fairly big package. But also the implementation here is not much code at all and only depends on Statistics, which I'm guessing most users have loaded if they're using this package.

@nalimilan
Copy link
Member

StatsBase doesn't contain code related to models anymore (it just reexports definitions from StatsAPI for historical reasons), and anyway most of it will be merged into Statistics as soon as we make it an upgradable stdlib. So I'd either put this in StatsAPI or in StatsModels.

We were precisely discussing the acceptability of having StatsAPI depend on Statistics with @bkamins at JuliaStats/Statistics.jl#87 (comment).

@palday
Copy link
Member Author

palday commented Sep 6, 2023

@nalimilan if the dependency on Statistics is part of the big question here, then it's easy enough to the row normalization directly and not use cov2cor! In fact, it could be more efficient than the current Statistics function because stderror is computed using vcov, so we have an extra evaluation here.

src/regressionmodel.jl Outdated Show resolved Hide resolved
src/regressionmodel.jl Outdated Show resolved Hide resolved
src/regressionmodel.jl Outdated Show resolved Hide resolved
@palday
Copy link
Member Author

palday commented Sep 6, 2023

After the discussion here and further discussion with @kleinschmidt, I think StatsAPI should have both the vif and gvif stubs so that people can extend those methods without taking on StatsModels as a dependency. That said, the default implementations for RegressionModel will live in StatsModels. It's not piracy, it's privateering.

src/StatsAPI.jl Outdated Show resolved Hide resolved
src/regressionmodel.jl Outdated Show resolved Hide resolved
src/regressionmodel.jl Outdated Show resolved Hide resolved
end # module TestRegressionModel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also be able to revert changes to this file and the next one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the missing terminal newlines? I think those should be required FWIW

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@kleinschmidt kleinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with defining the names here and the generic implementations in StatsModels.jl

palday and others added 2 commits September 6, 2023 15:38
@palday palday merged commit 20b38e1 into main Sep 6, 2023
@palday palday deleted the pa/vif branch September 6, 2023 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants