Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Milestone 1 Review #36

Open
mohamad-amin opened this issue Dec 1, 2021 · 4 comments
Open

Milestone 1 Review #36

mohamad-amin opened this issue Dec 1, 2021 · 4 comments

Comments

@mohamad-amin
Copy link

Good job! Here are my feedbacks for milestone 1 assessment.

  1. Project proposal: reasoning
    You might need to pay more attention to these parts:
  • "Clearly state the research question and any natural sub-questions you need to address, and their type." In your proposal, have you analyzed different possible situations that might arise when working with textual data? Why do you use logistic regression if you are facing a classification problem? If you are not doing regression and are doing classification, why do you have AUC score? Moreover, these details are not very much acceptable by a not-so-technical person (like AUC score).
  • What about data visualization? What specifically are you going to do?
  • For these algorithms, what packages will you use? Have you thought of using wrapper algorithms (boruta algorithm) for feature selection?
  1. Exploratory data analysis in a literate code document: VIZ
    Have you looked into the HTML report file that you have provided? It's not really opening on github. First, your report should be openable on github so that everyone could see this. Second, you don't need to convert it to HTML. That's why it breaks. Please do not convert your notebooks to HTML files again.

  2. Exploratory data analysis in a literate code document: QUALITY

  • It's nice that you have used the pandas profiling tool, but where is your motivation for the things that you have done? How do you wanna handle the missing values? What did you infer from your analysis? Just plotting the results without any results seems a bit pointless.
@PANDASANG1231
Copy link
Owner

PANDASANG1231 commented Dec 2, 2021

@mohamad-amin Hey, thank you for the feedback. It is really helpful.

I think your idea is very clear. Just one question, I am not sure if I understand this word clearly.
"Why do you use logistic regression if you are facing a classification problem?" Although logistic regression has a name ending with the word 'regression', it is actually not a regression. Because it has a Softmax in the final layer and turns the algorithm into a binary classification algorithm. So do you mean we should try other classification algorithms besides LR, or you don't think LR is a good algorithm for classification? Thanks

@mohamad-amin
Copy link
Author

Hey, sorry isn't your problem inherently a regression problem? (Ramens' rating)
I assumed it would be a numerical rating, am I wrong?

@PANDASANG1231
Copy link
Owner

Yeah, finally we changed it into a binary classification. Maybe we can state it more clearly in the summary

@datallurgy
Copy link
Collaborator

Hi @mohamad-amin!

Re: Comment 5: Pandas-profiling does not render in the ipynb file and only exports in HTML and JSON. Pandas-profiling to_file documentation. I understand it's not ideal, as the HTML does not render in github because it's interactive, but the file is easily downloadable and you can open it in browser. It doesn't print nicely to PDF either because we considered uploading the PDF of the EDA as well.

What would be your recommendation for rendering pandas-profiling reports?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants