Skip to content

Commit

Permalink
fix guide
Browse files Browse the repository at this point in the history
  • Loading branch information
scne committed Mar 5, 2021
1 parent a59578e commit 3205a2d
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 102 deletions.
2 changes: 1 addition & 1 deletion docs/source/guide/new_alg.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Create a new Recommendation Model
======================

Elliot integrates, to date, 50 recommendation models partitioned into two sets. The first set includes 38 popular models implemented in at least two of frameworks reviewed in this work (i.e., adopting a framework-wise popularity notion).
Work in progress
142 changes: 41 additions & 101 deletions docs/source/guide/quick_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,111 +56,51 @@ to top\_k, unless otherwise noted.
top_k: 10
Basic Configuration
------------------------

In the first scenario, the experiments require comparing a group of RSs whose parameters are optimized via a grid-search.

The configuration specifies the data loading information, i.e., semantic features source files, in addition to the filtering and splitting strategies.

In particular, the latter supplies an entirely automated way of preprocessing the dataset, which is often a time-consuming
and non-easily-reproducible phase.

The simple_metrics field allows computing accuracy and beyond-accuracy metrics, with two top-k cut-off values (5 and 10)
by merely inserting the list of desired measures, e.g., [Precision, nDCG, ...].
The knowledge-aware recommendation model, AttributeItemKNN, is compared against two baselines: Random and ItemKNN,
along with a user-implemented model that is external.MostPop.

The configuration makes use of elliot's feature of conducting a grid search-based hyperparameter optimization strategy
by merely passing a list of possible hyperparameter values, e.g., neighbors: [50, 70, 100].

The reported models are selected according to nDCG@10.

**To see the full configuration file please visit the following** `link_basic <https://github.com/sisinflab/elliot/blob/master/config_files/basic_configuration.yml>`_

**To run the experiment use the following** `script_basic <https://github.com/sisinflab/elliot/blob/master/sample_basic.py>`_

Advanced Configuration
----------------------
------------------------

This configuration file takes movielens dataset from a specific path, then Elliot performs an exhaustive iterative k-core for both user
and item with a minimum number of 10 interactions. Later, a splitting strategy with test and validation solutions is adopted.
The test is split with a random subsampling for 1 fold and with a ratio of 20% with respect to the amount of data. Instead,
the validation portion is computed in cross-validation with 5 folds. In this way, models will declare in the following section are
trained 5 times (once per each train-validation pair) to estimate the validation performance.
The second scenario depicts a more complex experimental setting.
In the configuration, the user specifies an elaborate data splitting strategy, i.e., random_subsampling (for test splitting)
and random_cross_validation (for model selection), by setting few splitting configuration fields.

The next section of this configuration file is devoted to declaring the evaluation metrics and which cut-off Elliot has
to investigate to perform this evaluation step. The framework accepts both simple metrics (metrics that do not exploit
external files o configurations) and complex metrics (like metrics related to bias o fairness investigation). Note that
Elliot has a top_k parameter useful to produce recommendation lists with a specific number of relevant items, and for
the evaluation could have specific cut-offs.
The configuration does not provide a cut-off value, and thus a top-k field value of 50 is assumed as the cut-off.

The third part of this YAML structured file declares explicitly which models Elliot has to train and evaluate.
This section is the most expressive one because each model could be equipped with a specific hyperparameter exploration strategy.
Specifically, this file shows how NeuMF and MultiVae adopt a Bayesian optimization exploration named TPE (Tree Parzen Estimator),
which extracts 5 different model configurations that exploit the space strategy adopted by different parameters in both models.[to be continued]
Moreover, the evaluation section includes the UserMADrating metric.

Elliot considers it as a complex metric since it requires additional arguments.

.. code:: yaml
The user also wants to implement a more advanced hyperparameter tuning optimization. For instance, regarding NeuMF,
Bayesian optimization using Tree of Parzen Estimators is required (i.e., hyper_opt_alg: tpe) with a logarithmic uniform
sampling for the learning rate search space.

experiment:
dataset: movielens_1m
data_config:
strategy: dataset
dataset_path: ../data/movielens_1m/dataset.tsv
prefiltering:
strategy: iterative_k_core
core: 10
splitting:
save_folder: ../data/movielens_1m/splitting/
test_splitting:
strategy: random_subsampling
folds: 1
test_ratio: 0.2
validation_splitting:
strategy: random_cross_validation
folds: 5
top_k: 50
evaluation:
cutoff: 10
simple_metrics: [nDCG, ACLT, APLT, ARP, PopREO]
complex_metrics:
- metric: UserMADrating
clustering_name: Happiness
clustering_file: ../data/movielens_1m/u_happy.tsv
- metric: ItemMADrating
clustering_name: ItemPopularity
clustering_file: ../data/movielens_1m/i_pop.tsv
- metric: REO
clustering_name: ItemPopularity
clustering_file: ../data/movielens_1m/i_pop.tsv
- metric: RSP
clustering_name: ItemPopularity
clustering_file: ../data/movielens_1m/i_pop.tsv
- metric: BiasDisparityBD
user_clustering_name: Happiness
user_clustering_file: ../data/movielens_1m/u_happy.tsv
item_clustering_name: ItemPopularity
item_clustering_file: ../data/movielens_1m/i_pop.tsv
relevance_threshold: 1
gpu: 1
models:
NeuMF:
meta:
hyper_max_evals: 5
hyper_opt_alg: tpe
validation_rate: 5
lr: [loguniform, -10, -1]
batch_size: [128, 256, 512]
epochs: 50
mf_factors: [quniform, 8, 32, 1]
mlp_factors: [8, 16]
mlp_hidden_size: [(32, 16, 8), (64, 32, 16)]
prob_keep_dropout: 0.2
is_mf_train: True
is_mlp_train: True
MultiVAE:
meta:
hyper_max_evals: 5
hyper_opt_alg: tpe
validation_rate: 5
lr: [0.0005, 0.001, 0.005, 0.01]
epochs: 50
batch_size: [128, 256, 512]
intermediate_dim: [300, 400, 500]
latent_dim: [100, 200, 300]
dropout_pkeep: 1
reg_lambda: [0.1, 0.0, 10]
BPRMF:
meta:
hyper_max_evals: 5
hyper_opt_alg: rand
validation_rate: 5
lr: [0.0005, 0.001, 0.005, 0.01]
batch_size: [128, 256, 512]
epochs: 50
embed_k: [10, 50, 100]
bias_regularization: 0
user_regularization: [0.0025, 0.005, 0.01]
positive_item_regularization: [0.0025, 0.005, 0.01]
negative_item_regularization: [0.00025, 0.0005, 0.001]
update_negative_item_factors: True
update_users: True
update_items: True
update_bias: True
Moreover, Elliot allows considering complex neural architecture search spaces by inserting lists of tuples. For instance,
(32, 16, 8) indicates that the neural network consists of three hidden layers with 32, 16, and 8 units, respectively.


**To see the full configuration file please visit the following** `link_advanced <https://github.com/sisinflab/elliot/blob/master/config_files/advanced_configuration.yml>`_

**To run the experiment use the following** `script_advanced <https://github.com/sisinflab/elliot/blob/master/sample_advanced.py>`_

0 comments on commit 3205a2d

Please sign in to comment.