Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hyperparameter tuning plot functionality (and maybe other plots) #122

Closed
BTopcuoglu opened this issue Jul 22, 2020 · 12 comments
Closed
Assignees
Labels
feature A new feature request or enhancement

Comments

@BTopcuoglu
Copy link
Collaborator

Like @zenalapp suggested, it would be a good idea to export a figure that shows the user whether they are exhausting the tuning for hyperparameters (e.g. using all the cost parameters until we see a global maxima for ROC).

@zenalapp zenalapp changed the title Add hyperparameter tuning plot functionality Add hyperparameter tuning plot functionality (and maybe other plots) Sep 9, 2020
@zenalapp
Copy link
Collaborator

zenalapp commented Sep 9, 2020

@kelly-sovacool and I discussed having plots in the package that plot the output of multiple ML runs. Current ideas:

  • Definitely a hyperparameter tuning plot (1 hyperparameter and 2 hyperparameters).
  • Maybe a boxplot of AUROCs.
  • Maybe a boxplot of AUPRCs.

@BTopcuoglu
Copy link
Collaborator Author

I have some ggplot code to make nice dotplots with mean/median stats for AUPROC and AUROC values that we can implement if we want.

@kelly-sovacool
Copy link
Member

@BTopcuoglu do you want to get started on making a dotplot function based on your code then?

@BTopcuoglu BTopcuoglu self-assigned this Sep 10, 2020
@BTopcuoglu
Copy link
Collaborator Author

Now that I thought about this a little - this might be a better venue for snakemake workflow. Because the tuning results would not mean much if they are done only for 1 seed. The best hp you get in 1 datasplit might not be the same in another. Similarly the AUROC plots would make sense for 100 datasplit averages/medians but not for 1 datasplit.

@kelly-sovacool
Copy link
Member

Any plots which are better for multiple seeds should take a dataframe with each row as the result from one seed. We should probably include a function to merge results like the merge_results rule in https://github.com/SchlossLab/mikRopML-snakemake-workflow.

@kelly-sovacool
Copy link
Member

@BTopcuoglu have you pushed the progress you've made?

@BTopcuoglu
Copy link
Collaborator Author

I do have some code for hyperparameter tuning too..but it looks pretty bad right now :) https://github.com/SchlossLab/Topcuoglu_ML_mBio_2020/blob/master/code/learning/FigureS2.R

@kelly-sovacool
Copy link
Member

Does anyone have example code for feature importance plots? Would be nice to show an example in the Snakemake workflow, regardless of whether we include it in the package.

@BTopcuoglu
Copy link
Collaborator Author

BTopcuoglu commented Oct 9, 2020

# Data has a names column that has the feature/group of features name.
# Data has the auc_diff column that has real auc - permuted auc for each datasplit

perm_top10 <- data %>%
  group_by(names)%>%
  summarise(median = median(auc_diff), iqr_AUC = IQR(auc_diff), mean = mean(auc_diff), se = sd(auc_diff)/sqrt(n())) %>%
  mutate(sign = case_when(median > 0 ~ "positive", median < 0 ~ "negative")) %>%
  #  Arrange from highest median delta to  descending
  arrange(-median) %>%
  # Grab only the largest delta top 10
  head(n=10) %>%
  select(names, median, iqr_AUC, mean, se)



######################################################################
#Plot the feature importances based on permutation importance #
######################################################################

# ggplot2 bar plot
plot <- ggplot(perm_top10, aes(reorder(names, mean), mean)) +
	geom_bar(position = position_dodge(), width = .25, stat="identity", fill="steelblue")  +
	geom_hline(yintercept = 0, color = "black") +
        geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width=0.2) +
	theme(panel.grid.major = element_blank(),
				panel.grid.minor = element_blank()) +
	theme_bw() +
	theme(panel.grid.major = element_blank(),
				panel.grid.minor = element_blank()) +
	xlab("Features") +
	ylab('Mean difference between test and permuted AUROC') +
	coord_flip() +
	theme(axis.text.x = element_text(size = 10,  colour=c("black")),
				axis.text.y = element_text(size = 10, colour=c("black")),
				axis.title.x = element_text(size=12, vjust = 0),
				axis.title.y = element_text(size=12, vjust = 0.5),
				legend.text = element_text(size=13))

@kelly-sovacool kelly-sovacool added the feature A new feature request or enhancement label Oct 16, 2020
@kelly-sovacool
Copy link
Member

@BTopcuoglu: @pschloss was asking about when we might have plots for hyperparameter tuning incorporated. Would be helpful for @courtneyarmour's project.

I know we'll also need to document tuning better (#201).

@zenalapp
Copy link
Collaborator

Made a draft in branch iss-122_hp-plot. @BTopcuoglu feel free to modify if you'd like!

@BTopcuoglu
Copy link
Collaborator Author

Made a draft in branch iss-122_hp-plot. @BTopcuoglu feel free to modify if you'd like!

Working on it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants