Do We Really Need to Drop Items with Missing Modalities in Multimodal Recommendation?

This is the official implementation of the paper "Do We Really Need to Drop Items with Missing Modalities in Multimodal Recommendation?", accepted at CIKM 2024 as a short paper.

Requirements

Install the useful packages:

pip install -r requirements.txt
pip install -r requirements_torch_geometric.txt

Datasets

Datasets download

Download the Office, Music, and Beauty datasets from the original repository:

And place each of them in the corresponding dataset folder accessible at ./data//. Then, run the following script:

python prepare_datasets.py --data <dataset_name>

this will create files for the items metadata and user-item reviews, and save the product images (check in the corresponding dataset folder). Moreover, statistics about the considered datasets will be displayed (e.g., missing modalities).

Multimodal features extraction

After that, we need to extract the visual and textual features from items metadata and images. To do so, we use the framework Ducho, running the following configuration file for each dataset:

dataset_path: ./data/<dataset_name>
gpu list: 0

visual:
    items:
        input_path: images
        output_path: visual_embeddings
        model: [
               { model_name: ResNet50,  output_layers: avgpool, reshape: [224, 224], preprocessing: zscore, backend: torch},
        ]

textual:
    items:
        input_path: final_meta.tsv
        item_column: asin
        text_column: description
        output_path: textual_embeddings
        model: [
            { model_name: sentence-transformers/all-mpnet-base-v2,  output_layers: 1, clear_text: False, backend: sentence_transformers},
          ]

where <dataset_name> should be substituted accordingly. This will extract visual and textual features for all datasets, accessible under each dataset folder.

Missing features imputations

First, we perform imputation through traditional machine learning methods (zeros, random, mean). To do so, run the following script:

python run_split.py --data <dataset_name>
python impute.py --data <dataset_name> --gpu <gpu_id> --method <zeros_random_mean>

this will create, under the specific dataset folder, an additional folder with only the imputed features, both visual and textual.

Before running the imputation through the graph-aware methods (neigh_mean, feat_prop, pers_page_rank), we need to split intro train/val/test and map all data processed so far to numeric ids. To do so, run the following script:

python to_id.py --data <dataset_name> --method <zeros_random_mean>

this will create, for each dataset/modality/imputation folder, a new folder with the mapped (indexed) data.

Now we can run the imputation with graph-aware methods.

NeighMean

python impute_all_neigh_mean.py --data <dataset_name> --gpu <gpu_id>
chmod +777 impute_all_neigh_mean_<dataset_name>.sh
./impute_all_neigh_mean_<dataset_name>.sh

MultiHop

python impute_all_feat_prop_pers_page_rank.py --data <dataset_name> --gpu <gpu_id> --method feat_prop
chmod +777 impute_all_feat_prop_<dataset_name>.sh
./impute_all_feat_prop_<dataset_name>.sh

PersPageRank

python impute_all_feat_prop_pers_page_rank.py --data <dataset_name> --gpu <gpu_id> --method pers_page_rank
chmod +777 impute_all_pers_page_rank_<dataset_name>.sh
./impute_all_pers_page_rank_<dataset_name>.sh

Now we are all set to run the experiments. We use Elliot to train/evaluate the multimodal recommender systems.

Results

Dropped setting

To obtain the results in the dropped setting, run the following scripts:

python run_split.py --data <dataset_name> --dropped yes
python data/<dataset_name>/to_id_final.py
python run_dropped.py --data <dataset_name>

Imputed setting

Then, we can compute the performance for the imputed setting. In the case of traditional machine learning imputation, we have:

chmod +777 ./do_all_zeros_random_mean.sh
./do_all_zeros_random_mean.sh <dataset_name> <zeros_random_mean>
python run_lightgcn_sgl.py --data <dataset_name>

For the graph-aware imputation methods, we run:

# For NeighMean
python neigh_mean.py --data <dataset_name> --gpu <gpu_id> --model <multimodal_recommender>
chmod +777 run_multimodal_all_neigh_mean_<dataset_name>_<multimodal_recommender>.sh
./run_multimodal_all_neigh_mean_<dataset_name>_<multimodal_recommender>.sh

# For MultiHop
python feat_prop.py --data <dataset_name> --gpu <gpu_id> --model <multimodal_recommender>
chmod +777 run_multimodal_all_feat_prop_<dataset_name>_<multimodal_recommender>.sh
./run_multimodal_all_feat_prop_<dataset_name>_<multimodal_recommender>.sh

# For PersPageRank
python pers_page_rank.py --data <dataset_name> --gpu <gpu_id> --model <multimodal_recommender>
chmod +777 run_multimodal_all_pers_page_rank_<dataset_name>_<multimodal_recommender>.sh
./run_multimodal_all_pers_page_rank_<dataset_name>_<multimodal_recommender>.sh

Collect results

To collect all results for MultiHop and PersPageRank (the settings with the highest number of configurations), run the following:

chmod +777 collect_results.sh
./collect_results.sh <dataset_name> <method> <model>
python collect_results.py --data <dataset_name> --model <multimodal_recommender> --method <method> --metric <metric_name>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Do We Really Need to Drop Items with Missing Modalities in Multimodal Recommendation?

Requirements

Datasets

Datasets download

Multimodal features extraction

Missing features imputations

NeighMean

MultiHop

PersPageRank

Results

Dropped setting

Imputed setting

Collect results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 190 Commits
config_files		config_files
data		data
elliot		elliot
external/models		external/models
README.md		README.md
collect_results.py		collect_results.py
collect_results.sh		collect_results.sh
do_all_cf_zeros_random_mean.sh		do_all_cf_zeros_random_mean.sh
feat_prop.py		feat_prop.py
impute.py		impute.py
impute_all_feat_prop_pers_page_rank.py		impute_all_feat_prop_pers_page_rank.py
impute_all_neigh_mean.py		impute_all_neigh_mean.py
neigh_mean.py		neigh_mean.py
pers_page_rank.py		pers_page_rank.py
prepare_datasets.py		prepare_datasets.py
requirements.txt		requirements.txt
requirements_torch_geometric.txt		requirements_torch_geometric.txt
run_dropped.py		run_dropped.py
run_lightgcn_sgl.py		run_lightgcn_sgl.py
run_multimodal.py		run_multimodal.py
run_split.py		run_split.py
to_id.py		to_id.py

sisinflab/Graph-Missing-Modalities

Folders and files

Latest commit

History

Repository files navigation

Do We Really Need to Drop Items with Missing Modalities in Multimodal Recommendation?

Requirements

Datasets

Datasets download

Multimodal features extraction

Missing features imputations

NeighMean

MultiHop

PersPageRank

Results

Dropped setting

Imputed setting

Collect results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages