Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 1.45 KB

File metadata and controls

19 lines (13 loc) · 1.45 KB

A-Hitchhikers-Guide-To-Presenting-Modern-Data-Solutions

Topic: A Hitchhiker's Guide To Presenting Modern Data Solutions

github

Why this topic?: Tabular Data still accounts for a major chunk of data science-based analysis in applied works and many a times, a data science practitioner is unable to device a framework to present their findings and analysis while digging down a dataset. To effectively provide a narrative; this guide can provide building blocks to achieve the same and address all stakeholders involved.

github 2

Methodology: A typical step-by-step approach can be to -> – Take tabular data as input and provide a data glossary along with a preview in data frame format. – Perform exploratory data analysis to understand trends – Conduct Statistical tests using Hypothesis Testing on sample of dataset. – Feature Engineering for Machine Learning Training – Analyzing the model results via Shap Values

Lessons Learned: Using SweetViz & YData_Profiling (formerly Pandas Profiling) for EDA, Hypothesis Testing, Manuvering Scikit-Learn’s documentation, Building an end-to-end Streamlit Application, Using Shap values for Model Interpretation