Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightGBM consistently segfaults #21

Open
BradNeuberg opened this issue Oct 10, 2022 · 5 comments
Open

LightGBM consistently segfaults #21

BradNeuberg opened this issue Oct 10, 2022 · 5 comments

Comments

@BradNeuberg
Copy link

I've attempted to get the "AI Scoring" to work, but it consistently segfaults on me. I manually annotate a few heavy cloud and shadow pixels, then hit "A". I get the following segfault:

/Users/bradneuberg/src/iris/iris_env/lib/python3.9/site-packages/lightgbm/sklearn.py:598: UserWarning: 'silent' argument is deprecated and will be removed in a future release of LightGBM. Pass 'verbose' parameter via keyword arguments instead.
  _log_warning("'silent' argument is deprecated and will be removed in a future release of LightGBM. "
/Users/bradneuberg/src/iris/iris_env/lib/python3.9/site-packages/lightgbm/sklearn.py:726: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
/Users/bradneuberg/src/iris/iris_env/lib/python3.9/site-packages/lightgbm/sklearn.py:736: UserWarning: 'verbose' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose' argument is deprecated and will be removed in a future release of LightGBM. "
Segmentation fault: 11

It segfaults at this line:

gbm.fit(
        inputs[train_indices, :], train_labels,
        eval_set=[(inputs[val_indices, :], val_labels)],
        early_stopping_rounds=4, verbose=0
    )

I've tried upping the version of LightGBM and supporting libraries as follows, but it does not fix things:

-numpy==1.22.0
+numpy==1.23.3
 pyyaml==5.4.1
-lightgbm==3.3.0
+lightgbm==3.3.2
 rasterio==1.2.10
 requests==2.26.0
-scipy==1.7.1
+scipy==1.9.1

I've also tried removing the Sentinel-1 functionality from the demo config file, as I thought perhaps not labeling Sentinel-1 pixels were causing invalid inputs to be passed to LightGBM. This did not fix things.

I'm running Iris in a virtual env environment using Python 3.9.2 on Mac OS X 12.6.

@BradNeuberg
Copy link
Author

I've serialized the training and validation data to disk in numpy npz format:

train_indices, val_indices, train_labels, val_labels = train_test_split(
        user_indices, user_labels, stratify=user_labels,
        test_size=0.3, random_state=42
    )

    np.save("X", inputs[train_indices, :])
    np.save("X_labels", train_labels)

    np.save("y", inputs[val_indices, :])
    np.save("y_labels", val_labels)

    np.save("inputs", inputs)

    gbm = lgb.LGBMClassifier(
        num_leaves=config['ai_model']['n_leaves'],
        max_bin=128,
        max_depth=config['ai_model']['max_depth'],
        # min_data_in_leaf=1000,
        # bagging_fraction=0.2,
        # boosting_type='dart',
        tree_learner='data',
        learning_rate=0.05,
        n_estimators=config['ai_model']['n_estimators'],
        silent=True,
        #n_jobs=10,
    )

I've then created a reduced Jupyter notebook that takes exactly the same data and uses the same config and calls, and does not crash, so there must be something else imported into the Python environment causing this issue (perhaps thread issues?).

Here's a Google Drive folder I've put the *.npz files and my Jupyter notebook, named gbm.ipynb:
https://drive.google.com/drive/folders/1cv1n-kCGZzPlVdQq59cgTEafcU4YvJP7?usp=sharing

@BradNeuberg
Copy link
Author

I tried earlier versions of Python to see if that was causing the issue; Python 3.6 is impossible due to version skew between imageio and numpy. I was able to install Python 3.7 with earlier versions of numpy and everything else installed, and lightgbm still segfaulted. I suspect that there is a threading issue between Flask and LightGBM on Mac OS X for some reason. I tried to update to the latest version of Flask to see if that resolved things but the Flask API has changed and is now incompatible.

Next step is I'll spin up a Google Cloud Server with Ubuntu and serve it from there, and see if it works in that environment.

BTW, when you start up iris it reports its running on a debug, non-production server. Are there any recommendations for deploying iris in a production manner for access by multiple users?

@aliFrancis
Copy link
Collaborator

I've been meaning to do a bit more testing on Mac OS X side as I don't use it regularly. I'll do some digging too, but let me know how you get on and if you find a workable solution. For my own work with IRIS, I always use python 3.9 in a conda environment on Linux or (sometimes) Windows. This tends to work consistently for me.

In the meantime, I've created a PR to add a WSGI production server using the gevent package. Take a look and see what you think. I'm not an expert on deployment/web development stuff, so happy to hear your opinion on whether what I've done is useful/sensible #22

@BradNeuberg
Copy link
Author

Thanks!

@aliFrancis
Copy link
Collaborator

Any updates on this? Regarding the production server, I've merged PR #22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants