Skip to content

datatales-with-pankaj/A-Hitchhikers-Guide-To-Presenting-Modern-Data-Solutions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A-Hitchhikers-Guide-To-Presenting-Modern-Data-Solutions

Topic: A Hitchhiker's Guide To Presenting Modern Data Solutions

github

Why this topic?: Tabular Data still accounts for a major chunk of data science-based analysis in applied works and many a times, a data science practitioner is unable to device a framework to present their findings and analysis while digging down a dataset. To effectively provide a narrative; this guide can provide building blocks to achieve the same and address all stakeholders involved.

github 2

Methodology: A typical step-by-step approach can be to -> – Take tabular data as input and provide a data glossary along with a preview in data frame format. – Perform exploratory data analysis to understand trends – Conduct Statistical tests using Hypothesis Testing on sample of dataset. – Feature Engineering for Machine Learning Training – Analyzing the model results via Shap Values

Lessons Learned: Using SweetViz & YData_Profiling (formerly Pandas Profiling) for EDA, Hypothesis Testing, Manuvering Scikit-Learn’s documentation, Building an end-to-end Streamlit Application, Using Shap values for Model Interpretation