Skip to content

markmacwan/Early-Stage-Diabetes-UCI-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Early Stage Diabetes Prediction using Machine Learning Algorithms

Diabetes is one of the fastest growing chronic life threatening diseases that have already affected 422 million people worldwide according to the report of World Health Organization (WHO), in 2018.

The dataset contains 16 features and 520 records which were collected using direct questionnaires from the patients of Sylhet Diabetes Hospital in Sylhet, Bangladesh by M. M. Faniqul Islam, Rahatara Ferdousi, Sadikur Rahman, and Humayra Yasmin Bushra.

Some of the binary (Yes or No) features include:

  • Polyuria : Polyuria is urine output of > 3 L/day; it must be distinguished from urinary frequency, which is the need to urinate many times during the day or night but in normal or less-than-normal volumes.
  • Polydipsia : Polydipsia is a medical name for the feeling of extreme thirstiness. Polydipsia is often linked to urinary conditions that cause you to urinate a lot.
  • Polyphagia : Polyphagia is the medical term used to describe excessive hunger or increased appetite and is one of the 3 main signs of diabetes.
  • Genital Thrush : Thrush (or candidiasis) is a common condition caused by a type of yeast called Candida.
  • Visual Blurring : Lack of sharpness of vision with, as a result, the inability to see fine detail.
  • Irritability : Irritability is the excitatory ability that living organisms have to respond to changes in their environment. The term is used for both the physiological. reaction to stimuli and for the pathological, abnormal or excessive sensitivity to stimuli.
  • Partial Paresis : Paresis is the medical term for weakened muscle movement. It may also sometimes see it referred to as “mild paralysis” or “partial paralysis.
  • Alopecia : Alopecia is the term used for loss of hair, either diffuse or patchy, due to a structural or functional defect in the follicle or to a change in the hair itself.

Target Feature: Likelihood of Diabetes (Positive or Negative)

These clinically meaningfull features are analyzed with Naive Bayes Algorithm, Logistic Regression Algorithm, and Random Forest Algorithm after applying Cross- Validation and shuffle split to find the best hyper-parameters and optimum accuracy.

References

Islam M.M.F., Ferdousi R., Rahman S., Bushra H.Y. (2020) Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques. In: Gupta M., Konar D., Bhattacharyya S., Biswas S. (eds) Computer Vision and Machine Intelligence in Medical Image Analysis. Advances in Intelligent Systems and Computing, vol 992. Springer, Singapore. https://doi.org/10.1007/978-981-13-8798-2_12