readme.txt

General Information:
####################
The task is to do exploratory data analysis and build a classifier for
the provided data set. (you'll figure out the target variable)

Provide a zip file containing a jupyter notebook (python 3.x.x) with all your
code, graphs, etc. to reproduce your results. Also include trained
models, or anything else you deem necessary to run your notebook
(probably requirements.txt with all pyhton dependencies) on my machine.
Ideally i just unzip, pip install -r requirements.txt, and run your notebook.
If I cant run your notebook, we ll figure something out, no need to panic over
a broken dependency etc.


In the interview you will get 10-15 minutes to talk
us through your notebook interactively and use it to communicate your
findings/thought-process etc.
We also include a pdf of a relevant research paper of the group collecting
the data.


Notes:
a) you have to submit your jupyter notebook within 48 hours
   of your selected start date via email to andre.bieler@parashift.io
b) At the interview we will be using the notebook you submitted via email.
c) We also value clean code and valuable insights, not only pure performance
   metrics of your model.
d) There is no hold-out data set on which we will evaluate your model on.
e) The notebook should also include your code for training the models.
e) Please dont cheat and solve this on your own.
 

Data Set Information:
#####################
The data used in this study were gathered from 188 patients with PD
(107 men and 81 women) with ages ranging from 33 to 87 (65.1Â±10.9)
at the Department of Neurology in CerrahpaÅŸa Faculty of Medicine,
Istanbul University. The control group consists of 64 healthy
individuals (23 men and 41 women) with ages varying between 41 and 82 (61.1Â±8.9).
During the data collection process, the microphone is set to 44.1 KHz and
following the physicianâ€™s examination, the sustained phonation of
the vowel /a/ was collected from each subject with three repetitions.


Attribute Information:
#######################
Various speech signal processing algorithms including
Time Frequency Features,
Mel Frequency Cepstral Coefficients (MFCCs),
Wavelet Transform based Features,
Vocal Fold Features, TWQT features have been applied to the speech
recordings of Parkinson's Disease (PD) patients to extract clinically
useful information for PD assessment.