Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the Random Forest classifies everything to be 1 #68

Open
AlanSpencer2 opened this issue Dec 28, 2021 · 2 comments
Open

the Random Forest classifies everything to be 1 #68

AlanSpencer2 opened this issue Dec 28, 2021 · 2 comments

Comments

@AlanSpencer2
Copy link

AlanSpencer2 commented Dec 28, 2021

I am new to thundergbm, and just trying to get a simple Random Forest classifier going. But the classifier classifies every single sample to be 1. Not one single case out of 188244 samples is classified as 0. No other classifier behaves like this. I also tried different number of trees, depth etc. But it still classies everything to 1. Is there something wrong with the following code?

from thundergbm import TGBMClassifier
clf = TGBMClassifier(depth=6, n_trees = 1, n_parallel_trees=100, bagging=1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

#y_pred classifies everything in the test set (X_test) to one.

@AlanSpencer2 AlanSpencer2 changed the title the Random Forest doesn't work the Random Forest classifies everything to be 1 Dec 28, 2021
@Kurt-Liuhf
Copy link
Collaborator

Kurt-Liuhf commented Dec 29, 2021

@AlanSpencer2 Hi, I used the classifier with the same parameters to fit the covtype data set from sklearn but I could not reproduce your results. The predictions seem to be correct. So it would be better if you could provide a subset of your data set.
Thanks.

@AlanSpencer2
Copy link
Author

AlanSpencer2 commented Dec 29, 2021

Hi, the problem occurs with binary classification. That is, if the target variable is 0 or 1, True or False. Can you please try a binary classification problem? (Not regression, and not multiple classification.)

Here is the Iris dataset with 3 different flower types. The target/label variable is 1 if the flower is Setosa, and 0 for the other 2 flower types:
iris_data.csv

--------------------------------Python code------------------------------------
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from thundergbm import TGBMClassifier

df = pd.read_csv(r'C:\Python\iris_data.csv', encoding='ISO-8859-1', low_memory=False, index_col=0)
df
X=df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']]
y=df['Label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
clf = TGBMClassifier(depth=6, n_trees = 1, n_parallel_trees=100, bagging=1)
clf.fit(X_train, y_train)
pred_test = clf.predict(X_test)
#The predictions are never a mixture of 1s and 0s, but either all predictions are 0 or all predictions are 1.
--------------------------------end of code------------------------------------

ps. I have tried all kinds of different datasets. They all had the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants