-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Proposal: split train.py into train.py and train_aml.py (#219)
This change splits train.py into two files. The new train.py is standalone, and has no references to AzureML. It defines three functions, split_data to split a dataframe into test/train data, and train_model which takes the test/train data and a parameter object and trains the model, and get_model_metrics, which evaluates metrics about the model. The script can be run locally, in which case it loads a dataset from a file. The second file, train_aml.py contains reasonably general AzureML logic. It reads data from a dataset, then calls the split_data function from train.py. It loads input parameters from a config file and logs them, then calls train_model from train.py. It then uploads the model and logs any metrics returned by get_model_metrics. The hope with these changes is to demonstrate a simple interface for integrating an existing ML script with MLOpsPython, as well as providing an example for how the core ML functionality can be invoked in multiple ways for development purposes. Co-authored-by: Bryan J Smith <[email protected]>
- Loading branch information
Showing
6 changed files
with
799 additions
and
155 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,27 +1,32 @@ | ||
import numpy as np | ||
from azureml.core.run import Run | ||
from unittest.mock import Mock | ||
from diabetes_regression.training.train import train_model | ||
from diabetes_regression.training.train import train_model, get_model_metrics | ||
|
||
|
||
def test_train_model(): | ||
X_train = np.array([1, 2, 3, 4, 5, 6]).reshape(-1, 1) | ||
y_train = np.array([10, 9, 8, 8, 6, 5]) | ||
data = {"train": {"X": X_train, "y": y_train}} | ||
|
||
reg_model = train_model(data, {"alpha": 1.2}) | ||
|
||
preds = reg_model.predict([[1], [2]]) | ||
np.testing.assert_equal(preds, [9.93939393939394, 9.03030303030303]) | ||
|
||
|
||
def test_get_model_metrics(): | ||
|
||
class MockModel: | ||
|
||
@staticmethod | ||
def predict(data): | ||
return ([8.12121212, 7.21212121]) | ||
|
||
X_test = np.array([3, 4]).reshape(-1, 1) | ||
y_test = np.array([8, 7]) | ||
data = {"train": {"X": X_train, "y": y_train}, | ||
"test": {"X": X_test, "y": y_test}} | ||
data = {"test": {"X": X_test, "y": y_test}} | ||
|
||
run = Mock(Run) | ||
reg = train_model(run, data, alpha=1.2) | ||
metrics = get_model_metrics(MockModel(), data) | ||
|
||
_, call2 = run.log.call_args_list | ||
nameValue, descriptionDict = call2 | ||
name, value = nameValue | ||
description = descriptionDict['description'] | ||
assert (name == 'mse') | ||
np.testing.assert_almost_equal(value, 0.029843893480257067) | ||
assert (description == 'Mean squared error metric') | ||
|
||
preds = reg.predict([[1], [2]]) | ||
np.testing.assert_equal(preds, [9.93939393939394, 9.03030303030303]) | ||
assert 'mse' in metrics | ||
mse = metrics['mse'] | ||
np.testing.assert_almost_equal(mse, 0.029843893480257067) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.