Topic: A Hitchhiker's Guide To Presenting Modern Data Solutions
Why this topic?: Tabular Data still accounts for a major chunk of data science-based analysis in applied works and many a times, a data science practitioner is unable to device a framework to present their findings and analysis while digging down a dataset. To effectively provide a narrative; this guide can provide building blocks to achieve the same and address all stakeholders involved.
Methodology: A typical step-by-step approach can be to -> – Take tabular data as input and provide a data glossary along with a preview in data frame format. – Perform exploratory data analysis to understand trends – Conduct Statistical tests using Hypothesis Testing on sample of dataset. – Feature Engineering for Machine Learning Training – Analyzing the model results via Shap Values
Lessons Learned: Using SweetViz & YData_Profiling (formerly Pandas Profiling) for EDA, Hypothesis Testing, Manuvering Scikit-Learn’s documentation, Building an end-to-end Streamlit Application, Using Shap values for Model Interpretation