Kiwhan Song*
·
Mohamed Ali Dhraief*
·
Muhua Xu
Locke Cai
·
Xuhao Chen
·
Arvind
·
Jie Chen
MIT
·
MIT-IBM Watson AI Lab
·
IBM
·
IBM Research
This is the official repository for the paper Identifying Money Laundering Subgraphs on the Blockchain. We provide the code for RevTrack, RevClassify, and RevFilter, together with the code for the experiments in the paper and model checkpoints. See the instructions below.
@inproceedings{song2024revtrack,
title={Identifying Money Laundering Subgraphs on the Blockchain},
author={Kiwhan Song and Mohamed Ali Dhraief and Muhua Xu and Locke Cai and Xuhao Chen and Arvind and Jie Chen},
booktitle={Proceedings of the Fifth ACM International Conference on AI in Finance},
year={2024},
}
Create a new conda environment and install the required packages:
conda create python=3.10 -n revtrack
conda activate revtrack
pip install -r requirements.txt
We use Weights & Biases for logging and checkpointing. Sign up for a wandb account, run wandb login
to login, and modify the wandb entity and project in configurations/config.yaml
to your wandb account and desired project name.
The original Elliptic2 dataset is available here. For the convenience of the users, we provide a preprocessed version of the dataset placed in the data/elliptic/raw
directory. However, we serve node embeddings separately on Google Drive due to its large size. Please download the node embedding file (raw_emb.pt
) and place it in the data/elliptic/raw
directory.
All model checkpoints are located in the checkpoints
directory. For RevTrack (DS variant), we provide three models each for both finetuned and non-finetuned versions. We also provide three models for each baseline (MLP, NGCF, LightGCN) in the subgraph recommendation task.
For running all the experiments in the paper, we use wandb sweeps, which allows us to search over hyperparameters, or run a set of experiments with different seeds or settings. We used a single V100 GPU for all experiments in the paper.
For all the experiments, we provide a yaml configuration file in the configurations/sweep
directory. You can run it using the following command:
# Initialize your sweep:
wandb sweep --project <project> --entity <entity> <path_to_yaml_file>
# Your terminal will output a sweep ID.
# Run the sweep by launching the sweep agent:
wandb agent <entity>/<project>/<sweep_id>
# (Launch on multiple terminals to parallelize, if you want)
Note that the YAML files reference our pre-trained model checkpoints in the checkpoints
directory. If you want to evaluate your own checkpoints, you can modify the parameters.load.values
field in the YAML files.
RevTrack is an algorithm that identifies potential senders and receivers of each subgraph. The provided dataset is already preprocessed using RevTrack, as mentioned above. We will also share the RevTrack preprocessing code soon.
We have two variants of RevClassify: RevClassifyBP and RevClassifyDS. Test metrics are logged as final_test/f1
and final_test/prauc
.
Task | variant | Sweep YAML File |
---|---|---|
Hyperparameter Tuning | RevClassifyBP | configurations/sweep/subgraph_classification/tuning/BP.yaml |
Hyperparameter Tuning | RevClassifyDS | configurations/sweep/subgraph_classification/tuning/DS.yaml |
Subgraph Classification (Full-shot) | RevClassifyBP | configurations/sweep/subgraph_classification/full_shot/BP.yaml |
Subgraph Classification (Full-shot) | RevClassifyDS | configurations/sweep/subgraph_classification/full_shot/DS.yaml |
Subgraph Classification (Few-shot) | RevClassifyBP | configurations/sweep/subgraph_classification/few_shot/BP.yaml |
Subgraph Classification (Few-shot) | RevClassifyDS | configurations/sweep/subgraph_classification/few_shot/DS.yaml |
We have four experiments for evaluating RevFilter. Each experiment has a corresponding folder with sweep yaml files. The test metrics are logged as final_test/HR
and final_test/NDCG
.
Task | Sweep YAML Files |
---|---|
Baseline comparison on multiple settings | configurations/sweep/subgraph_recommendation/multisettings/{RevFilter, MLP, NGCF, LightGCN}.yaml |
Studying the impact of sparsity | configurations/sweep/subgraph_recommendation/sparsity/{RevFilter, MLP, NGCF, LightGCN}.yaml |
Studying the impact of |
configurations/sweep/subgraph_recommendation/top_k/{RevFilter, MLP, NGCF, LightGCN}.yaml |
Ablation study | configurations/sweep/subgraph_recommendation/ablations/{default, no_finetuning, no_iter, no_keep_mult.yaml} |
python -m main +name=RevFilter_pretrain dataset=elliptic_recommendation algorithm=iterative_filtering experiment=exp_edge_recommendation 'experiment.tasks=[training]' experiment.validation.test_during_training=False
python -m main +name=RevFilter_finetune dataset=elliptic_recommendation algorithm=iterative_filtering experiment=exp_edge_recommendation 'experiment.tasks=[training]' experiment.training.early_stopping.enabled=False experiment.validation.test_during_training=False experiment.training.max_epochs=300 dataset.augment.enabled=True seed=0 load=<your_pretrained_wandb_id or checkpoints/RevTrack/0.ckpt>
python -m main +name=RevFilter_eval dataset=elliptic_recommendation algorithm=iterative_filtering experiment=exp_edge_recommendation 'experiment.tasks=[test]' experiment.test.batch_size=16 seed=0 load=<your_finetuned_wandb_id or checkpoints/RevTrack/0_tuned.ckpt> +shortcut=<your_setting>
# your_setting should be formatted as: a+b@k e.g., 10+1000@100
This research was sponsored by MIT-IBM Watson AI Lab.
This repo is forked from Boyuan Chen's research template repo. By its MIT license, you must keep the above sentence in README.md
and the LICENSE
file to credit the author. By directly reading the template repo's README.md
, you can learn how this repo is structured and how to use it.