Skip to content

Latest commit

 

History

History
72 lines (69 loc) · 4.11 KB

README.md

File metadata and controls

72 lines (69 loc) · 4.11 KB

SFG Project

This is the repository for a small data science project for SFG 2020

Download the repository
  • Manual download: download as zip file
  • git CLI:
       git clone https://github.com/Software-Focus-Group/SFG2020_Project
    
Run the code
  •   pip install -r requirements.txt 
      python solution.py 
  • The solution will be printed in the terminal and will often open up some plots and visualizations. Closing an opened graph will allow the code to continue further.

AIM :

Fish Weight prediction

How to approach a Data Science problem :
  1. Define the problem

    • First, it’s necessary to accurately define the data problem that is to be solved. The problem should be clear, concise, and measurable.
  2. Decide on an approach

    • There are many data science algorithms that can be applied to data, and they can be roughly grouped into the following families:

      • Two-class classification: useful for any question that has just two possible answers.
      • Multi-class classification: answers a question that has multiple possible answers.
      • Anomaly detection: identifies data points that are not normal.
      • Regression: gives a real-valued answer and is useful when looking for a number instead of a class or category.
      • Multi-class classification as regression: useful for questions that occur as rankings or comparisons.
      • Two-class classification as regression: useful for binary classification problems that can also be reformulated as regression.
      • Clustering: answer questions about how data is organized by seeking to separate out a data set into intuitive chunks
  3. Collect data

    • It’s important to understand that collected data is seldom ready for analysis right away. Most data scientists spend much of their time on data cleaning, which includes removing missing values, identifying duplicate records, and correcting incorrect values.
  4. Analyze data

    • The next step after data collection and cleanup is data analysis. At this stage, there’s a certain chance that the selected data science approach won’t work. This is to be expected and accounted for. Generally, it’s recommended to start with trying all the basic machine learning approaches as they have fewer parameters to alter.
  5. Interpret results

    • After data analysis, it’s finally time to interpret the results. The most important thing to consider is whether the original problem has been solved. You might discover that your model is working but producing subpar results. One way how to deal with this is to add more data and keep retraining the model until satisfied with it.

Content [ Fish.csv ]:

This dataset is a record of 7 common different fish species in fish market sales. With this dataset, a predictive model can be performed using machine friendly data and estimate the weight of fish can be predicted.

Columns :

  • Species : species name of fish
  • Weight : weight of fish in Gram g
  • Length1 : vertical length in cm
  • Length2 : diagonal length in cm
  • Length3 : cross length in cm
  • Height : height in cm
  • Width : diagonal width in cm

Example Screenshots :

  • Example 1
  • Example 2
  • Example 3

Important Links: