-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy rules #16
Comments
This looks great! I PR would be very welcome. One thing: |
Thanks Max.
About the design:
Please suggest. |
With other functions that they share, I add them to So try to write the function so that they share as much common code as possible then add those common functions and the |
@topepo Please review this draft before PR submission.
|
It looks really good. Some recommendations/comments:
> cubist(x = train_pred[, 1:2], y = train_resp)$model %>% cat()
id="Cubist 2.07 GPL Edition 2018-09-01"
prec="1" globalmean="22.41485" extrap="1" insts="0" ceiling="95" floor="0"
att="outcome" mean="22.41" sd="9.284727" min="5" max="50"
att="crim" mean="3.789463" sd="8.553482" min="0.00906" max="88.9762"
att="zn" mean="11.38" sd="23.47519" min="0" max="100"
entries="1"
rules="2"
conds="1" cover="54" mean="12.27" loval="5" hival="27.9" esterr="3.96"
type="2" att="crim" cut="9.2322998" result=">"
coeff="13.25" att="crim" coeff="-0.11" att="zn" coeff="0.009"
conds="1" cover="350" mean="23.98" loval="8.1" hival="50" esterr="5.59"
type="2" att="crim" cut="9.2322998" result="<="
coeff="21.79" att="crim" coeff="-0.62" att="zn" coeff="0.105"
> cubist(x = train_pred[, 1:2], y = train_resp) %>% summary()
Call:
cubist.default(x = train_pred[, 1:2], y = train_resp)
Cubist [Release 2.07 GPL Edition] Sat Sep 1 17:43:44 2018
---------------------------------
Target attribute `outcome'
Read 404 cases (3 attributes) from undefined.data
Model:
Rule 1: [54 cases, mean 12.27, range 5 to 27.9, est err 3.96]
if
crim > 9.2323
then
outcome = 13.25 - 0.11 crim + 0.009 zn
Rule 2: [350 cases, mean 23.98, range 8.1 to 50, est err 5.59]
if
crim <= 9.2323
then
outcome = 21.79 - 0.62 crim + 0.105 zn
Evaluation on training data (404 cases):
Average |error| 5.65
Relative |error| 0.85
Correlation coefficient 0.50
Attribute usage:
Conds Model
100% 100% crim
100% zn
Time: 0.0 secs Some testing code: library(Cubist)
library(AmesHousing)
library(tidymodels)
ames <- make_ames()
ames2 <-
ames %>%
dplyr::rename(`Gr Liv Area` = Gr_Liv_Area) %>%
mutate(
Overall_Qual = gsub("_", " ", as.character(Overall_Qual)),
MS_SubClass = gsub("_", " ", as.character(MS_SubClass))
)
cb_mod <-
cubist(
x = ames2 %>% dplyr::select(-Sale_Price),
y = log10(ames2$Sale_Price),
committees = 3
)
tr <- tidy_rules(cb_mod) |
Thanks Max,
I will submit a new PR shortly. edit: PR is here |
I think that you solved with with |
Hi Max,
I end up using C5 often for its speed and rules. Thanks for the package!
Although the
summary
function prints the rules in a handy way, it might be sometimes preferable to have them in a tidy way.Rules displayed on calling
summary
function:Output of the tidying function:
Note that the LHS is string parseable as a R expression. Hence, it can be simply pasted into
dplyr::filter
.Here is the code to tidy the rules and the code snippet to run an example.
Please suggest if it might be a good idea to include this in
broom
package instead of here. Else, let me know if you are open for a PR.Suggestions are welcome!
Regards,
Srikanth KS
The text was updated successfully, but these errors were encountered: