SFG Project

This is the repository for a small data science project for SFG 2020

Download the repository

Manual download:

git CLI:

   git clone https://github.com/Software-Focus-Group/SFG2020_Project

Run the code

  pip install -r requirements.txt 
  python solution.py

The solution will be printed in the terminal and will often open up some plots and visualizations. Closing an opened graph will allow the code to continue further.

AIM :

Fish Weight prediction

How to approach a Data Science problem :

Define the problem
- First, it’s necessary to accurately define the data problem that is to be solved. The problem should be clear, concise, and measurable.
Decide on an approach
- There are many data science algorithms that can be applied to data, and they can be roughly grouped into the following families:
  - Two-class classification: useful for any question that has just two possible answers.
  - Multi-class classification: answers a question that has multiple possible answers.
  - Anomaly detection: identifies data points that are not normal.
  - Regression: gives a real-valued answer and is useful when looking for a number instead of a class or category.
  - Multi-class classification as regression: useful for questions that occur as rankings or comparisons.
  - Two-class classification as regression: useful for binary classification problems that can also be reformulated as regression.
  - Clustering: answer questions about how data is organized by seeking to separate out a data set into intuitive chunks
Collect data
- It’s important to understand that collected data is seldom ready for analysis right away. Most data scientists spend much of their time on data cleaning, which includes removing missing values, identifying duplicate records, and correcting incorrect values.
Analyze data
- The next step after data collection and cleanup is data analysis. At this stage, there’s a certain chance that the selected data science approach won’t work. This is to be expected and accounted for. Generally, it’s recommended to start with trying all the basic machine learning approaches as they have fewer parameters to alter.
Interpret results
- After data analysis, it’s finally time to interpret the results. The most important thing to consider is whether the original problem has been solved. You might discover that your model is working but producing subpar results. One way how to deal with this is to add more data and keep retraining the model until satisfied with it.

Content [ Fish.csv ]:

This dataset is a record of 7 common different fish species in fish market sales. With this dataset, a predictive model can be performed using machine friendly data and estimate the weight of fish can be predicted.

Columns :

Species : species name of fish
Weight : weight of fish in Gram g
Length1 : vertical length in cm
Length2 : diagonal length in cm
Length3 : cross length in cm
Height : height in cm
Width : diagonal width in cm

Example Screenshots :

Important Links:

Scikit Learn user guide
Numpu documentation and user guide
Matplotlib.pyplot documentation and user guide
Seaborn reference
Simple and Multiple Linear regression in Python
Beginners guide to Linear Regression
How to think like a data scientist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SFG Project

This is the repository for a small data science project for SFG 2020

Download the repository

Run the code

AIM :

How to approach a Data Science problem :

Content [ Fish.csv ]:

Columns :

Example Screenshots :

Important Links:

Files

README.md

Latest commit

History

README.md

File metadata and controls

SFG Project

This is the repository for a small data science project for SFG 2020

Download the repository

Run the code

AIM :

How to approach a Data Science problem :

Content [ Fish.csv ]:

Columns :

Example Screenshots :

Important Links: