This is Kaggle competitions.
This is a synthetic data set of 40 features, representing objects from two classes (labeled as 0 or 1). The training set has 1000 samples and the testing set has 9000.
Numpy
Pandas
sklearn
Here I am used two algorithm. First using Gaussian mixture for clustering the data, after that using Random Forest classifier to classify the data into 1 and 0.
For Gaussian Mixture Models: https://pdfs.semanticscholar.org/734b/07b53c23f74a3b004d7fe341ae4fce462fc6.pdf