Code Summarization with Strcuture-induced Transformer

This repo serves as the official implementation of ACL 2021 findings paper "Code Summarization with Strcuture-induced Transformer".

If you have any questions, be free to email me.

Dependency

pip install -r requirements.txt

Data

For Python, we follow the pipline in https://github.com/wanyao1992/code_summarization_public.

For Java, we fetch from https://github.com/xing-hu/TL-CodeSum.

In the paper, we write the scripts on our own to parse code into AST. But it is a tough task. We are trying to find a nice way to do so and then experiment under SiT.

For just reproducing the results, you can download the data we used directly from here and put both python and java in the data directory.

The adjacency is too large to load on my personal server. So I allocate a guid for each code snippet in .guid and retrieve them one by one. What you need to do is:

cd sit3
unzip adjacency.zip

Quick Start

Training

cd main
python train.py --dataset_name python --model_name YOUR_MODEL_NAME

See the log through:

vi ../modelx/YOUR_MODEL_NAME.txt

In the paper, we run SiT for 150 epochs. For example in Java:

01/18/2021 01:12:25 PM: [ dev valid official: Epoch = 150 | bleu = 44.89 | rouge_l = 55.25 | Precision = 61.14 | Recall = 57.81 | F1 = 56.95 | examples = 8714 | valid time = 58.93 (s) ]

Testing

python test.py --dataset_name python --beam_size 5 --model_name YOUR_MODEL_NAME

**Issue**

For Python, we do not follow the original data split in Wei's paper and consequently rerun both SiT and Transformer on our split. This is a potential drawback of the paper if comparing to other LSTM baselines. If you want the original split, please refer to https://github.com/GoneZ5/SCRIPT. Thank you.

Acknowledgement: The implementation is based on https://github.com/wasiahmad/NeuralCodeSum.

Citation

@inproceedings{hongqiu2021summarization,
 author = {Hongqiu, Wu and Hai, Zhao and Min, Zhang},
 booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL)},
 title = {Code summarization with structure-induced transformer},
 year = {2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
c2nl		c2nl
main		main
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Summarization with Strcuture-induced Transformer

Dependency

Data

Quick Start

Citation

About

Releases

Packages

Languages

gingasan/sit3

Folders and files

Latest commit

History

Repository files navigation

Code Summarization with Strcuture-induced Transformer

Dependency

Data

Quick Start

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages