Skip to content

This Repository Contains R-Codes executed on various Datasets in RStudio. I Hope This Repository is very helpful for those who are Willing to build their Career in Data Science, Big Data.

Notifications You must be signed in to change notification settings

SeonaDabre18/DataScience_R_Codes

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hits

DataScience_R_Codes

This Repository Contains R-Codes executed on various Datasets in RStudio. I Hope This Repository is very helpful for those who are Willing to build their Career in Data Science, Big Data.

You will Need Rstudio to Execute all the Codes So Install it first and then Go through the Below Codes. To Download Rstudio, Click Here.

To Begin with the Basics of the Data Science, go through the Practice(Basics) Folder in the Repository.

Practice(Basics)

No. Name File
1. Basics practice.r
2. Confidence Interval Confidence_Interval.r
3. Probability Probability.r

Now we will do the Descriptive Statistics Analysis also known as Exploratory Data Analysis(EDA).

Descriptive Statistics - Exploratory Data Analysis(EDA).

No. Dataset Name File
1. Carbon Dioxide(CO2) Descriptive_Stats_CO2.r
2. Air Quality Descriptive_Stats_airquality.r

Now lets Go through Various Algorithms.

1. Hypothesis Testing

No. Name File
1. Hypothesis Testing Hypothesis Testing.r

2. Linear Regression

A. Simple Linear Regression

No Name Dataset File
1. Newspaper Data NewspaperData.CSV Newspaper_LinearRegression.r
2. Waist Circumference-Adipose Tissue WC-AT.csv WC-AT_LinearRegression.r

B. Multiple Linear Regression

No Name Dataset File
1. Cars Cars.csv Cars_Multi_Linear_Regression.r
2. Corolla Toyota_Corolla.csv Toyota_Multi_Linear_Regression.r

3. Logistic Regression

No Name Dataset File
1. Claimants Claimants.csv Logistic Regression.r

4. Association Rule

No Name Dataset File
1. Titanic Titanic.csv Titanic_Association_Rule.r

5. Principle Component Analysis (PCA) - Combines related Columns

No Name Dataset File
1. Cat Cat.jpg Example1_PCA.r
2. University Universities.csv Universities_PCA.r

6. Clustering - Combining Related Rows

No Name Dataset Heirarchical Clustering K-Means CLustering
1. Universities Univesities.csv Universities_Heirarchical_Clustering.r K-Means_Clustering.r

7. Survival Analysis

No Name Dataset File
1. Unemployment Survival_Unemployment.csv Survival_Unemployment.r

Now Lets see Various Supervised Machine Learning Algorithms(Techniques)

1. Decision Tree

There are 2 Techniques in Decision Tree - Bagging Technique and Boosting Technique

No Name File Bagging Bagging and Boosting
1. Example 1 DecisionTree.r Decision_tree_Bagging.r Decision_Tree_Bagging_Boosting.r

2. K-Nearest Neighbour(KNN)

No Name Dataset File
1. Cancer KNN.csv K-Nearest_Neighbour.r

3. Random Forest

No Name Dataset File
1. Iris Available in R Datasets random_forest.r

4. Artificial Neural Networks

No Name Dataset File
1. Concrete concrete.csv Concrete_Neural_Network.r

5. Support Vector Machine(SVM)

No Name Dataset File
1. Letter Data LetterData.csv LetterData_Support_Vector_Machine.r

6. Naive Bayes Classifier

No Name Dataset File
1. SMS Spam sms_spam.csv Naive_Bayes_Sms_Spam.r

7. Forecasting Analysis

No Name Dataset Prediction File File
1. Amtrak Amtrak.csv Predict_new.xlsx Amtrak_Forecasting.r
2. Aviation Aviation.csv --- Aviation_Exponential_Smooting_Forecasting.r

8. NLP - Natural Language Processing (Text Mining)

There are Two Approaches - Emotion Mining and Sentiment Analysis.

We require Positive Words and Negative Words for the Analysis.

No Name Dataset File
1. Emotion Mining Amazon Nokia Lumia Reviews.txt Emotion_Mining_Amazon.r
2. Sentiment Analysis McD_Small.csv Sentiment Analysis_McD.r

Web Scraping

If you want to extract the Reviews of a particular Product from Amazon then Run the Below Code in Rstudio.

This Code is Valid only for the Products on Amazon.

The Code Varies from site to site.

install.packages("rvest")
install.packages("XML")
install.packages("magrittr")

library(rvest)
library(XML)
library(magrittr)

# Amazon Reviews #############################
aurl <- "URL of Product Reviews page"
amazon_reviews <- NULL
for (i in 1:10){
  murl <- read_html(as.character(paste(aurl,i,sep="=")))
  rev <- murl %>%
    html_nodes(".review-text") %>%
    html_text()
  amazon_reviews <- c(amazon_reviews,rev)
}
length(amazon_reviews)
write.table(amazon_reviews,"apple.txt",row.names = F)

I have Performed this code for Extracting Reviews of Apple Macbook Air, Do check it Out.


After Going Through the basics, We will Now Perform Algorithms on Different Datasets.

Implementation of Algorithms on Datasets

1. Hypothesis Testing

No. Name Problem Statement Dataset File
1. Buyer Ratio .pptx BuyerRatio.csv BuyerRatio.r
2. Customer Order Form .pptx Customer+OrderForm.csv Customer+OrderForm.r
3. Cutlet Diameter .pptx Cutlets.csv Cutlet_Hyp_Test.r
4. Fantaloons .pptx Fantaloons.csv Fantaloons.r
5. Lab .pptx LabTAT.csv Lab_Hyp_Anova_test.r

2. Linear Regression

A. Simple Linear regression

No. Name Problem Statement Dataset File
1. Calories Consumed .txt Calories_Consumed.csv Calories_Simple_Linear.r
2. Delivery Time Data .txt Delivery_Time.csv Delivery_Simple_Linear_Regression.r
3. Employee Data .txt Emp_Data.csv Emp_Simple_Linear.r
4. Salary Data .txt Salary_Data.csv Salary_Simple_Linear.r

B. Multi Linear Regression

No. Name Problem Statement Dataset File
1. 50 Startup .txt 50_Startups.csv 50_Startup_Multi_Linear.r
2. Computer Data .txt Computer_Data.csv Computer_Data_Multi_Linear.r
3. Computer Data .txt ToyotaCorolla.csv ToyotaCorolla_Multi_Linear.r

3. Logistic Regression

No. Name Problem Statement Dataset File
1. Bank .txt Bank-Full.csv Bank_logistic_Regression.r
2. Credit Card .txt Creditcard.csv Creditcard_Logistic_regression.r

4. Association Rule

No. Name Problem Statement Dataset File
1. Books .txt Book.csv Book.r
2. Groceries .txt Groceries.csv Groceries.r
3. Movies .txt My_Movies.csv My_Movies.r

5. Clustering

No. Name Problem Statement Dataset File
1. Crime Data .txt Crime_Data.csv Crime_Data_Clustering.r
2. East West Airlines .txt EastWestAirlines.xlsx EastWestAirlines_Cluster.r

6. Principle Component Analysis(PCA)

No. Name Problem Statement Dataset File
1. Wine .txt Wine.csv Wine_PCA.r

Supervised Machine Learning Algorithms

1. Decision Tree

No. Name Problem Statement Dataset File
1. Company Data .txt Company_Data.csv Company_Data.r
2. Fraud Check .txt Fraud_Check.csv Fraud_Check.r
3. Iris .pdf Available in R Dataset Iris_ctree.r

2. Random Forest

No. Name Problem Statement Dataset File
1. Company Data .txt Company_Data.csv Company_Data.r
2. Fraud Check .txt Fraud_Check.csv Fraud_Check.r
3. Iris .pdf Available in R Dataset Iris.r

3. K-Nearest Neighbour (KNN) Classifier

No. Name Problem Statement Dataset File
1. Glass Data .txt Glass.csv Glass.r
2. Zoo .txt Zoo.csv Zoo.r

4. Artificial Neural Network (NN)

No. Name Problem Statement Dataset File
1. 50 Startups .txt 50_Startups.csv 50_Startups.r
2. Concrete .txt Concrete.csv Concrete.r
3. ForestFires .txt Forestfires.csv Forestfires.r

5. Support Vector Machine(SVM)

No. Name Problem Statement Dataset File
1. Forest Fires .txt Forestfires.csv Forestfires.r
2. Salary Data .txt Salary_Data_Train.csv, Salary_Data_Test.csv SalaryData.r

6. Naive Bayes Classifier

No. Name Problem Statement Dataset File
1. Salary_Data .txt SalaryData_Train.csv, SalaryData_Test.csv SalaryData.r
2. Sms Data .txt Sms_Raw_NB.csv Sms_Raw_NB.r

7. Forecasting Analysis

No. Name Problem Statement Dataset File
1. Airlines Data .txt Airlines+Data.xlsx Airlines+Data.r
2. Coca Cola Sales .txt CocaCola_Sales_Rawdata.xlsx CocaCola_Sales_Rawdata.r
3. Plastic Sales .txt PlasticSales.csv PlasticSales.r

8. NLP - Natural Language Processing(Text Mining)

You Require Positive-Words, Negative-Words and Stop-Words for this Analysis.

No. Name Problem Statement Dataset File
1. Amazon HP Review .txt HP Reviews.txt Amazon_HP_Reviews.r
2. IMDB Paatal Lok WebSeries Review .txt Paatal_Lok_Reviews.txt IMDB_Paatal_Lok.r

THANKYOU

About

This Repository Contains R-Codes executed on various Datasets in RStudio. I Hope This Repository is very helpful for those who are Willing to build their Career in Data Science, Big Data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%