Skip to content

Latest commit

 

History

History
120 lines (88 loc) · 5.43 KB

README.md

File metadata and controls

120 lines (88 loc) · 5.43 KB

Shale Gas Well Productions Prediction

License

https://www.datascience-contest.com

The Korea National Oil Corporation was interested in purchasing shale gas wells from the United States and wanted to predict their productions to select wells that maximize profit.

A combination of LightGBM regression and Exponential smoothing is used to predict productions. 0-1 integer programming using Gurobi is used for optimization to maximize profit. Performance evaluation is based on sMAPE (symmetric Mean Absolute Percentage Error). Our team has one of the best performances, having a percentage error of 25.54%, compared to the best one of 19.49%.

Problem Description

Data

Unfortunately, the train and exam datasets are confidential. Therefore, they are not included in this repository.

  • trainSet.csv - Data of 280 shale gas wells for training models
  • examSet.csv - Data of 44 shale gas wells for prediction

Predicting Gas Production

The task is to predict the monthly average gas productions of 44 shale gas wells in examSet.csv for the next 6 months.

Performance evaluation is based on sMAPE (symmetric Mean Absolute Percentage Error):

  • Fi - predicted monthly average gas production of ith gas well over the next 6 months
  • Ai - actual monthly average gas production of ith gas well over the next 6 months
  • n - number of gas wells (44 in this problem)

Investment Decision

A budget of $15,000,000 is allocated. The task is to select gas wells among the 44 wells to maximize profit after predicting their monthly average gas productions:

  • Ai - actual monthly average gas production of ith gas well over the next 6 months
  • Pi - price of ith gas well
  • Ps - shale gas price ($5 per 1 Mcf)
  • Ci - monthly operation cost of ith gas well
  • Xi - decision variable to purchase ith gas well (if purchasing ith gas well: Xi = 1, else: Xi = 0)

Solution Approach

The wells are divided into new wells and old wells. New wells do not have data on gas production, non-gas production and hours operated per month. This data is available for old wells.

Therefore, regression is used to predict the monthly average productions of new wells for the first 6 months, and exponential smoothing is used to predict the monthly average productions of old wells for the last 6 months.

Open In Colab

After EDA (Exploratory Data Analysis) and feature engineering, the following advanced decision tree-based models for regression are tested:

  • BaggingRegressor
    • n_estimators=50
  • RandomForestRegressor
    • n_estimators=50
  • XGBRegressor
    • max_depth=5
    • objective='reg:squarederror'
  • LGBMRegressor
  • VotingRegressor
    • estimators=[bagging, random_forest, xgb, lgbm]
    • n_jobs=-1

Hyperparameter: train_test_split(test_size=0.2, random_state=42)

LGBMRegressor turns out as the best performing, with the minimum sMAPE.

LGBMRegressor hyperparameters after tuning with Ray Tune using Grid Search Algorithm:

  • boosting_type='gbdt'
  • learning_rate=0.1
  • max_bin=250
  • max_depth=-1
  • min_data_in_leaf=20
  • num_iterations=100
  • num_leaves=20

GPU is leveraged.

Open In Colab

The following exponential smoothing models are tested:

  • SimpleExpSmoothing
    • smoothing_level=0.2
    • smoothing_level=0.6
    • optimized smoothing level
  • Holt
    • Additive model
    • Multiplicative model
    • Damped additive model
    • Damped multiplicative model
  • ExponentialSmoothing
    • use_boxcox=True
      • Additive model
      • Damped additive model

Depending on the model with the minimum SSE (Sum of Squared Error) for each well, different models are used to forecast different wells.

Open In Colab

The following 0-1 integer programming model is used: