-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Milestone 1 Feedback #39
Comments
Thank you @andytai7 for taking time to write a detailed review of our milestone 1 submission, really appreciate it. We as a team are learning a lot from this group project, along with the collaborative practices. However, we have difficulty understanding some of your observations which we would need clarifications or rather insist on closer observations from your side as well as ours to address the gaps:
What about class balance? There could be an imbalance in the classes, in which you would have to under-sample or oversample. Which one will you utilize?
What about missing data? How will you handle the missing data?
Why only use linear regression? Have you thought of using wrapper algorithms (boruta algorithm) for feature selection?
Will you do cross-validation?
What about metrics? That was not touched upon in the project proposal.
5. Exploratory data analysis in a literate code document: VIZ
This needs a lot more work to flesh out why you are doing certain EDA and data visualization. In addition, these reasons should be answered and informed how to proceed in terms of methods (ex., data transformation, data cleansing).
Given all of the above comments, we request you to have a relook into our proposal and EDA and if possible, have a regrade on these sections. |
Hi @shivajena Thank you for your comments! For future reference, all the TA's will include a "suggestion" section, which is basically where we give suggestions that do not follow the basic rubric. As you may know, these suggestions help with students' brainstorming and further development of their projects. Please keep this in mind when reading some of these comments you disagree on, as your group may not have gotten ANY marks off for the suggestions I have given. Herein, I try my best to clear up some of my comments. "Our project answers a prediction question: predicting giant pumpkin weights, which we have very explicitly stated in our project proposal in Readme file. We have spoken about our approach of data preparation for EDA, some preliminary as well as important EDA observations and the link to detailed EDA report. For eg., we have mentioned about distributions of some of the attributes which could be our potential features, along with input and output forms as per the rubrics indicators of milestone 1. Therefore, we request you to have a relook into this."
"Reiterating, we are dealing with a prediction problem of a continuous variable - giant pumpkin weight. This is not a classification problem, and hence, we are not sure about what exactly did you want to convey on class balance.
"Yes, and infact we have touched upon this as well as the method of using pipe operators along with hyperparameter optimisation in the predictive modelling section of our proposal. Request you to kindly have a relook."
"Again, we are answering a prediction question and ROC-AUC are classification metrics, not regression. For regression context, we have explicitly mentioned metrics such as R-square score and accuracy as our initial scoring metrics. Request you to have a relook."
"We have provided figure captions and highlighted important observations in the our EDA report. We can discuss more on the specifics if needed." When I am looking into your PDF file, there are no figure captions for your EDA. In addition, in your EDA report, you have not referenced any figures. Please see attached screenshot for reference. "For prediction problems where number of features are less, data transformation and cleaning are very important and that is why we have mentioned detailed observations on these aspects in the data summary part of our EDA report. Had it been an inferential question or a prediction problem with large number of features, that case would have been perfect to incorporate specific EDA along with explanations. But it is not applicable as we understand for our project given such limit amount of features."
I have noticed that you had mentioned your proposal. I don't see this document, even in your milestone 2. Thank you for your hard work, and i hope I shed light on some of the confusion this group might have. |
Thanks Andy, its much clearer now. We will discuss and implement the ideas. Actually, we thought proposal was to be written in the readme as per instructions in milestone 1. But we do get your point, we will try to be bit more explicit in the readme to indicate our proposal plan - may be proposal section. Rest others, we will definitely discuss among our group and update you on those. Really appreciate for the time taken |
Sorry Andy, we though the proposal is to be written in the README.md instead of in another document as suggested in the Milestone 1 instructions here. The empty pumpkin.Rmd is a just a placeholder for the final report in the first milestone. This is just to show our proposed project structure. And it should have mentioned in the README.md as well. Please advise if we have to create another proposal document. |
No this should be fine. Thank you. |
3. Project proposal: reasoning
Comments
What sort of EDA will you do? What types of plots? Why? Any hypothesizes?
What about class balance? There could be an imbalance in the classes, in which you would have to under-sample or oversample. Which one will you utilize?
What about missing data? How will you handle the missing data?
Why only use linear regression? Have you thought of using wrapper algorithms (boruta algorithm) for feature selection?
Will you do cross-validation?
What about metrics? That was not touched upon in the project proposal.
A suggestion for metrics to determine the performance of your models is Area Under Curve (AUC). The Area Under the Curve (AUC) measures the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve. Also, look into and SHAP (Shapley Additive exPlanations), which explains the direction of each variable compared to the outcome variable.
5. Exploratory data analysis in a literate code document: VIZ
Comments
If figure captions are not provided, the plot should be clearly explained in the text. I would recommend using figure captions: missing legends and X and Y-axis labels.
This needs a lot more work to flesh out why you are doing certain EDA and data visualization. In addition, these reasons should be answered and informed how to proceed in terms of methods (ex., data transformation, data cleansing).
5. Exploratory data analysis in a literate code document: REASONING
Comments
The rationale is acceptable, but I don't know what plots relate to what.
The text was updated successfully, but these errors were encountered: