Install JupyterNoteBook
- Go to App Data(run -> %Appdata%) >> Local >> Program >> python >> scripts
- open CMD on that location
- pip insatll jupyter
- After installation write jupyter notebook
Use Google Colab
- go to colab
- start new Notebook Import file in colab
from google.colab import files
files.upload()
- pre processing : Count Vectorizer
- ML model : MultinomialNB(naive_bayes)
- 99 % accurate 4.show performance matrix
- show confusion matrix 6.show classification report 7.K fold cross validation for getting accuracy scores
- Profit Prediction
- Null value checking and resolving
- one hot encoding for data pre processing
- concating onehot encoding values to the table
- Testing on single input and multiple test set
- R-Squared Value for model accuracy testing
House Price Prediction
1.Supervised learning
2.No Null values in the dataset
3.Linear Regression
4.Random forest
Predicting Heart Disease using Random Forest
dataset 70k dataset
- Supervised learning to predict if a patient has Heart Disease
- find coefficincy
- f_classif for learning and removing the least related features
- use seaborn to plot the data
- Do RandomForest and DecisionTreeClassifier
- Feature importance using ExtraTreesClassifier
- Univariate feature selection() -SelectKBest
- Feature importace -
Customer churn prediction
1.Supervised learning
2.No Null values in the dataset
3.vizualize data using seaborn countplots
4.LabelEncoder to remove multiple text realted data columns
5. scale the data using StandardScaler(reduces calculation later for training)
6.Logistic Regression
7.accuracy_score
8.confusion_matrix generation
9.classification_report
10.Perfromance Metrices
ML 12. KNN - Data preProc - LabelEncoder , Model - KNN
Must checked link
- vaex : https://pypi.org/project/vaex/
- Vaex api doc : https://vaex.readthedocs.io/en/latest/api.html
- https://www.kaggle.com/lavanyashukla01/how-i-made-top-0-3-on-a-kaggle-competition
- feature enginnering https://www.youtube.com/watch?v=6WDFfaYtN6s&list=PLZoTAELRMXVPwYGE2PXD3x0bfKnR0cJjN&ab_channel=KrishNaik-
- MLprojects :https://www.youtube.com/channel/UCX802rmp2Sg2ddyLU7Qo1bQ/videos
- feature selection : https://www.youtube.com/watch?v=YaKMeAlHgqQ&ab_channel=DataSchool
- web scrapping : https://www.youtube.com/watch?v=r_xb0vF1uMc&list=PL5-da3qGB5IDbOi0g5WFh1YPDNzXw4LNL&ab_channel=DataSchool
Use of Naive Bayes
- Real time prediction
- Recommendation system
- Text classifier
- Multi class predictor
- Sklearn's naive bayes variants - GaussianNB, BernoulliNB, MultinomialNB(skip this if there is any negative value in any row)
13.K means clustering
- Useage : for similar product showing by ecommerce , market basket , biology , finidng disticnt types of classes from this
- unsupervised learning so the data is Unlabled.
- Elbow mehod to learn the optimized number of cluster
Deep Learning 1.Hand written digit classifier
- use mnist
- keras
- Normalization
- flattening of an image array
- show image of a matrix using matshow
- Training a multi layer model
- optimize the model using 'adam'
- evaluating the model
- generating a confusion matrix
- Customer churn prediction Impact Learning This was done using ML algorithms but a new proposed method named Impact Leanring was used to do it again Doc - https://pypi.org/project/ImpactLearning/
- Data augmentation : https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Save a model -> Using pickle
- import pickle
- save the model
with open('svm_model','wb') as f:
pickle.dump(model,f)
3.open the model from file
with open('svm_model','rb') as f:
loadedModel = pickle.load(f)
-> using joblib
- import joblib
- joblib.dump(model,'model_joblib') # save the model
- svm = joblib.load('model_joblib') # load the model
Location to run scripts : C:\Users\PKS\AppData\Local\Programs\Python\Python37\Scripts Handling imbalance dataset
- Over Sampling : when dataset is not huge
- Under Sampling : when dataset is huge. It makes the outcome variable more balanced
- Unsupervised learning using sklearn
- sklearn pipelining
- import pickle
- call from frontend
- Beautiful Soup
- graphx
- hadoop