Data Science Portfolio

Repository containing portfolio of data science projects completed by me for academic, self learning, and hobby purposes. Presented in the form of Jupyter notebooks, and R markdown files (published at RPubs).

For a more visually pleasant experience for browsing the portfolio, check out sajalsharma.com

The R portfolio is located here.

Note: Data used in the projects (accessed under data directory) is for demonstration purposes only.

Instructions for Running Python Notebooks Locally

Install dependencies using requirements.txt.
Run notebooks as usual by using a jupyter notebook server, Vscode etc.

Machine Learning
- Predicting Boston Housing Prices: A model to predict the value of a given house in the Boston real estate market using various statistical analysis tools. Identified the best price that a client can sell their house utilizing machine learning.
- Supervised Learning: Finding Donors for CharityML: Testing out several different supervised learning algorithms to build a model that accurately predicts whether an individual makes more than $50,000, to identify likely donors for a fictional non-profit organisation.
- Unsupervised Learning: Creating Customer Segments: Analyzing a dataset containing data on various customers' annual spending amounts (reported in monetary units) of diverse product categories for discovering internal structure, patterns and knowledge.
- Reinforcement Learning: Training a Smartcab to Drive: Creating an optimized Q-Learning driving agent that will navigate a Smartcab through its environment towards a goal.
- Deep Learning: Digit Sequence Recognition using CNNs: Designing and implementing a Convolutional Neural Network that learns to recognize sequences of digits using synthetic data generated by concatenating images from MNIST.
Tools: scikit-learn, Pandas, Seaborn, Matplotlib, Pygame
Natural Language Processing
- Disaster Message Classifier: A multilabel classification model to predict the categories of a disaster message. Includes an ETL pipeline for data processing, a ML pipeline to train the model, and a web app, with visualizations, where the model can be used to classify messages. Tools: NLTK, Scikit-learn, XGBoost, Flask, Plotly
- 3-way Sentiment Analysis for Tweets: 3-way polarity (positive, negative, neutral) classification system for tweets, without using NLTK's sentiment analysis engine.
- Cross language Information Retrieval: Cross language information retrieval system (CLIR) which, given a query in German, searches text documents written in English.
Tools: NLTK, scikit
Data Analysis and Visualisation
- Python
  - Scalable Walkability Analysis of Melbourne: Analysis of walkability of suburbs in Melbourne, Victoria and its implications.
  - Titanic Dataset - Exploratory Analysis: Exploratory Analysis of the passengers onboard RMS Titanic using Pandas and Seaborn visualisations.
  - Stock Market Analysis for Tech Stocks: Analysis of technology stocks including change in price over time, daily returns, and stock behaviour prediction.
  - 2016 US General Election Poll Data Analysis: Very simple analysis of 2016 US General Election Poll data.
  - 911 Calls - Exploratory Analysis: Exploratory Data Analysis of the 911 calls dataset hosted on Kaggle. Demonstrates extraction of useful features from different variables.
Tools: Pandas, Folium, Seaborn and Matplotlib
- R
  - Behavioral Risk Factor Surveillance System(BRFSS) 2013: Exploratory Data Analysis: Exploratory analysis of the BRFSS-2013 data set, focusing on investigating the relationship between education and eating habits, sleep and mental health, and smoking, drinking and general health of a person.
  - Inferential Statistics: Do men or women oppose sex education? : Using the GSS (General Social Survey) dataset to infer if, in the year 2012, were men, of 18 years or above in the United States, more likely to oppose sex education in public schools than women.
  - Data Visualization: Corruption and Human Development: A scatter plot for the relationship between the 'Human Development Index' and the 'Corruption Perceptions Index' of countries.
  - Moneyball: Analysing and replacing lost players: Exploration of baseball data for the year 2001 to look at replacements for key players lost by the Oakland A's in 2001. Inspired by the book/movie: Moneyball.
Micro Projects:
- Python
  - ML with Logistic Regression: Using Logistic Regression to predict whether an internet user clicked an ad or not.
  - ML with K Nearest Neighbours: Using KNN to classify instances from a fake dataset into two target classes, while choosing the best value for K using the elbow method.
  - ML with Decision Trees and Random Forests: Using Decision Trees and Random Forests to predict whether a lender will pay their loan back. Uses publically available data from LendingClub.com
  - Movie Recommendations using Recommender Systems: A micro project to build a recommendation system that makes movie recommendations based on user review similarities.
- R
  - ML Logistic Regression: Predicting salary class of a person using logistic regression.
  - ML Decision Trees and Random Forests: Using Decision Trees and Random Forests to classify schools as Private or Public.

I also dabble in all other kinds of technology. You can find a general portfolio here.

If you liked what you saw, want to have a chat with me about the portfolio, work opportunities, or collaboration, shoot an email at [email protected].

Support My Work

If this project inspired you, gave you ideas for your own portfolio or helped you, please consider buying me a coffee ❤️.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.ipynb_checkpoints		.ipynb_checkpoints
ML Micro Projects		ML Micro Projects
boston_housing		boston_housing
customer_segments		customer_segments
data		data
finding_donors		finding_donors
.DS_Store		.DS_Store
.gitignore		.gitignore
2016 General Election Poll Analysis.ipynb		2016 General Election Poll Analysis.ipynb
3-Way Sentiment Analysis for Tweets.ipynb		3-Way Sentiment Analysis for Tweets.ipynb
911 Calls - Exploratory Analysis.ipynb		911 Calls - Exploratory Analysis.ipynb
Cross Language Information Retrieval.ipynb		Cross Language Information Retrieval.ipynb
LICENSE		LICENSE
Stock Market Analysis for Tech Stocks.ipynb		Stock Market Analysis for Tech Stocks.ipynb
Titanic Dataset - Exploratory Analysis.ipynb		Titanic Dataset - Exploratory Analysis.ipynb
digit_recognition-mnist-sequence.ipynb		digit_recognition-mnist-sequence.ipynb
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Portfolio

The R portfolio is located here.

Instructions for Running Python Notebooks Locally

Contents

Machine Learning

Natural Language Processing

Data Analysis and Visualisation

Micro Projects:

Support My Work

About

Releases

Packages

Contributors 2

Languages

License

sajal2692/data-science-portfolio

Folders and files

Latest commit

History

Repository files navigation

Data Science Portfolio

The R portfolio is located here.

Instructions for Running Python Notebooks Locally

Contents

Machine Learning

Natural Language Processing

Data Analysis and Visualisation

Micro Projects:

Support My Work

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages