LightGBM consistently segfaults #21

BradNeuberg · 2022-10-10T21:27:30Z

I've attempted to get the "AI Scoring" to work, but it consistently segfaults on me. I manually annotate a few heavy cloud and shadow pixels, then hit "A". I get the following segfault:

/Users/bradneuberg/src/iris/iris_env/lib/python3.9/site-packages/lightgbm/sklearn.py:598: UserWarning: 'silent' argument is deprecated and will be removed in a future release of LightGBM. Pass 'verbose' parameter via keyword arguments instead.
  _log_warning("'silent' argument is deprecated and will be removed in a future release of LightGBM. "
/Users/bradneuberg/src/iris/iris_env/lib/python3.9/site-packages/lightgbm/sklearn.py:726: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
/Users/bradneuberg/src/iris/iris_env/lib/python3.9/site-packages/lightgbm/sklearn.py:736: UserWarning: 'verbose' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose' argument is deprecated and will be removed in a future release of LightGBM. "
Segmentation fault: 11

It segfaults at this line:

gbm.fit(
        inputs[train_indices, :], train_labels,
        eval_set=[(inputs[val_indices, :], val_labels)],
        early_stopping_rounds=4, verbose=0
    )

I've tried upping the version of LightGBM and supporting libraries as follows, but it does not fix things:

-numpy==1.22.0
+numpy==1.23.3
 pyyaml==5.4.1
-lightgbm==3.3.0
+lightgbm==3.3.2
 rasterio==1.2.10
 requests==2.26.0
-scipy==1.7.1
+scipy==1.9.1

I've also tried removing the Sentinel-1 functionality from the demo config file, as I thought perhaps not labeling Sentinel-1 pixels were causing invalid inputs to be passed to LightGBM. This did not fix things.

I'm running Iris in a virtual env environment using Python 3.9.2 on Mac OS X 12.6.

The text was updated successfully, but these errors were encountered:

BradNeuberg · 2022-10-11T00:12:29Z

I've serialized the training and validation data to disk in numpy npz format:

train_indices, val_indices, train_labels, val_labels = train_test_split(
        user_indices, user_labels, stratify=user_labels,
        test_size=0.3, random_state=42
    )

    np.save("X", inputs[train_indices, :])
    np.save("X_labels", train_labels)

    np.save("y", inputs[val_indices, :])
    np.save("y_labels", val_labels)

    np.save("inputs", inputs)

    gbm = lgb.LGBMClassifier(
        num_leaves=config['ai_model']['n_leaves'],
        max_bin=128,
        max_depth=config['ai_model']['max_depth'],
        # min_data_in_leaf=1000,
        # bagging_fraction=0.2,
        # boosting_type='dart',
        tree_learner='data',
        learning_rate=0.05,
        n_estimators=config['ai_model']['n_estimators'],
        silent=True,
        #n_jobs=10,
    )

I've then created a reduced Jupyter notebook that takes exactly the same data and uses the same config and calls, and does not crash, so there must be something else imported into the Python environment causing this issue (perhaps thread issues?).

Here's a Google Drive folder I've put the *.npz files and my Jupyter notebook, named gbm.ipynb:
https://drive.google.com/drive/folders/1cv1n-kCGZzPlVdQq59cgTEafcU4YvJP7?usp=sharing

BradNeuberg · 2022-10-11T00:51:44Z

I tried earlier versions of Python to see if that was causing the issue; Python 3.6 is impossible due to version skew between imageio and numpy. I was able to install Python 3.7 with earlier versions of numpy and everything else installed, and lightgbm still segfaulted. I suspect that there is a threading issue between Flask and LightGBM on Mac OS X for some reason. I tried to update to the latest version of Flask to see if that resolved things but the Flask API has changed and is now incompatible.

Next step is I'll spin up a Google Cloud Server with Ubuntu and serve it from there, and see if it works in that environment.

BTW, when you start up iris it reports its running on a debug, non-production server. Are there any recommendations for deploying iris in a production manner for access by multiple users?

aliFrancis · 2022-10-11T12:14:23Z

I've been meaning to do a bit more testing on Mac OS X side as I don't use it regularly. I'll do some digging too, but let me know how you get on and if you find a workable solution. For my own work with IRIS, I always use python 3.9 in a conda environment on Linux or (sometimes) Windows. This tends to work consistently for me.

In the meantime, I've created a PR to add a WSGI production server using the gevent package. Take a look and see what you think. I'm not an expert on deployment/web development stuff, so happy to hear your opinion on whether what I've done is useful/sensible #22

BradNeuberg · 2022-10-12T17:44:08Z

Thanks!

aliFrancis · 2022-11-08T09:06:19Z

Any updates on this? Regarding the production server, I've merged PR #22

aliFrancis mentioned this issue Oct 11, 2022

Production server #22

Merged

aliFrancis mentioned this issue Sep 26, 2023

TypeError: fit() got an unexpected keyword argument 'early_stopping_rounds' #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LightGBM consistently segfaults #21

LightGBM consistently segfaults #21

BradNeuberg commented Oct 10, 2022

BradNeuberg commented Oct 11, 2022

BradNeuberg commented Oct 11, 2022

aliFrancis commented Oct 11, 2022

BradNeuberg commented Oct 12, 2022

aliFrancis commented Nov 8, 2022

LightGBM consistently segfaults #21

LightGBM consistently segfaults #21

Comments

BradNeuberg commented Oct 10, 2022

BradNeuberg commented Oct 11, 2022

BradNeuberg commented Oct 11, 2022

aliFrancis commented Oct 11, 2022

BradNeuberg commented Oct 12, 2022

aliFrancis commented Nov 8, 2022