This repo provides a reference implementation of TSeqE as described in the paper:
J. Yang, W. Zhou, W. Qian, J. Han and S. Hu, "Topic Sequence Embedding for User Identity Linkage from Heterogeneous Behavior Data," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 2590-2594.
- CONFIG.py:实验参数配置。
- embedding_learner.py:话题表示学习模块,提供Embedding_Learner类,成员函数fit()为训练函数入口。
- zhihu/ml_data_preprocess.py:提供数据预处理功能。
- zhihu/ml_main.py:实验入口程序,包含数据集分割、训练、测试。
- process_pool.py:多进程模块。
- validation.py:提供测试中的距离、准确率计算功能。
运行前需要将数据与代码组织成如下结构:
由于这是后期帮忙整理出的代码,没有全部跑过,不能保证跑通。
cd ./TSeq
# processing zhihu dataset information
python zhihu_data_preprocess.py
# run the model on zhihu_dataset
python zhihu_main.py
# processing MovieLens dataset information
python ml_data_preprocess.py
# run the model on MovieLens dataset
python ml_main.py
the datasets could be found in the following links:
If you find TSeqE useful for your research, please consider citing us :
@INPROCEEDINGS{TSeqE,
author={Yang, Jinzhu and Zhou, Wei and Qian, Wanhui and Han, Jizhong and Hu, Songlin},
booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Topic Sequence Embedding for User Identity Linkage from Heterogeneous Behavior Data},
year={2021},
volume={},
number={},
pages={2590-2594},
doi={10.1109/ICASSP39728.2021.9415111}
}