GitHub - Software-Focus-Group/SFG2020_Project

SFG Project

This is the repository for a small data science project for SFG 2020

Download the repository

Manual download:

git CLI:

   git clone https://github.com/Software-Focus-Group/SFG2020_Project

Run the code

  pip install -r requirements.txt 
  python solution.py

The solution will be printed in the terminal and will often open up some plots and visualizations. Closing an opened graph will allow the code to continue further.

AIM :

Fish Weight prediction

How to approach a Data Science problem :

Define the problem
- First, it’s necessary to accurately define the data problem that is to be solved. The problem should be clear, concise, and measurable.
Decide on an approach
- There are many data science algorithms that can be applied to data, and they can be roughly grouped into the following families:
  - Two-class classification: useful for any question that has just two possible answers.
  - Multi-class classification: answers a question that has multiple possible answers.
  - Anomaly detection: identifies data points that are not normal.
  - Regression: gives a real-valued answer and is useful when looking for a number instead of a class or category.
  - Multi-class classification as regression: useful for questions that occur as rankings or comparisons.
  - Two-class classification as regression: useful for binary classification problems that can also be reformulated as regression.
  - Clustering: answer questions about how data is organized by seeking to separate out a data set into intuitive chunks
Collect data
- It’s important to understand that collected data is seldom ready for analysis right away. Most data scientists spend much of their time on data cleaning, which includes removing missing values, identifying duplicate records, and correcting incorrect values.
Analyze data
- The next step after data collection and cleanup is data analysis. At this stage, there’s a certain chance that the selected data science approach won’t work. This is to be expected and accounted for. Generally, it’s recommended to start with trying all the basic machine learning approaches as they have fewer parameters to alter.
Interpret results
- After data analysis, it’s finally time to interpret the results. The most important thing to consider is whether the original problem has been solved. You might discover that your model is working but producing subpar results. One way how to deal with this is to add more data and keep retraining the model until satisfied with it.

Content [ Fish.csv ]:

This dataset is a record of 7 common different fish species in fish market sales. With this dataset, a predictive model can be performed using machine friendly data and estimate the weight of fish can be predicted.

Columns :

Species : species name of fish
Weight : weight of fish in Gram g
Length1 : vertical length in cm
Length2 : diagonal length in cm
Length3 : cross length in cm
Height : height in cm
Width : diagonal width in cm

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Cheat sheets		Cheat sheets
screenshots		screenshots
Fish.csv		Fish.csv
README.md		README.md
requirements.txt		requirements.txt
solution.py		solution.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SFG Project

This is the repository for a small data science project for SFG 2020

Download the repository

Run the code

AIM :

How to approach a Data Science problem :

Content [ Fish.csv ]:

Columns :

Example Screenshots :

Important Links:

About

Releases

Packages

Languages

Software-Focus-Group/SFG2020_Project

Folders and files

Latest commit

History

Repository files navigation

SFG Project

This is the repository for a small data science project for SFG 2020

Download the repository

Run the code

AIM :

How to approach a Data Science problem :

Content [ Fish.csv ]:

Columns :

Example Screenshots :

Important Links:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages