GPTCloneBench

GPTCloneBench is a clone detection benchmark based on SemanticCloneBench [1] and GPT [2,3,4,5]. This work is accepted at the ICSME2023 conference. We published another study following similar methodologies of GPTCloneBench named "Unveiling the potential of large language models in generating semantic and cross-language clones" in IWSC2023.

Please find the semantic clones (stand-alone + system injected clones) here: https://doi.org/10.5281/zenodo.10198952

For Cross-language clones: In this git repository, follow these two files:

cross_language.zip
cross_language_part_2.zip

Cross language clones are given in as stand-alone clones. They are not injected in a system.

System requirement

To install necessary libraries, please run the following command:

pip install -r requirements.txt

To run NiCad on generated Clone, you need to install TXL and NiCad.

Download TXL from this URL: http://txl.ca/txl-download.html
Download NiCad from this URL: http://txl.ca/txl-nicaddownload.html

To generate clones, you need to have SemanticCloneBench [1]. Follow this link to download SemanticCloneBench: https://drive.google.com/open?id=1KicfslV02p6GDPPBjZHNlmiXk-9IoGWl

To manually validate GPT clones, we have utilized tool from Jeffrey Svajlenko: https://github.com/jeffsvajlenko/ValidateClones

You need OpenAI API key to run the system. This link provided details on how to obtain OpenAI API key: https://www.maisieai.com/help/how-to-get-an-openai-api-key-for-chatgpt

Please follow the link to generate your own secret API key.

Generate and validate GPTCloneBench

To generate semantic clone, follow the following steps:

Clone this repository.
Copy SemanticCloneBench into this folder.
run python create_clones_for_gptclonebench.py. Follow the prompts to generate clones.
run python file_creation_for_validateClones.py to create input file for manual validation.
run python crossL_file_creation_for_validateClones.py to create input file for manual validation for cross language clones.

Benchmark Validator (Undergraduate Interns):

Chi Phuong Vu

GitHub ID: 115325256, Email: [email protected]
Olaoluwa Dayo-Olaide

Email: [email protected]
Souvik Ukil

Email: [email protected]
Aryan Mehta

GitHub ID: 90737338, Email: [email protected] or [email protected]
Dipika Ayshi

Email: [email protected]
Chi Cai

GitHub id: 68583124, Email: [email protected]

License

Benchmark: The benchmark is distributed under the Creative Commons, Attribution-NonCommercial-NoDerivatives. This license includes the benchmark database and its derivatives. For attribution, please cite this page and our publications below. This data is provided free of charge for non-commercial and academic benchmarking and experimentation use. If you would like to contribute to the benchmark, please contact us. If you believe your intended usage may be restricted by the license, please contact us, and we can discuss the possibilities. BibTex for the GPTCloneBench (initial version) and Unveiling the potential of large language models in generating semantic and cross-language clones:

@inproceedings{gptclonebench2023,
  title={GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench},
  author={Alam, Ajmain Inqiad and Roy, Palash Ranjan and Al-omari, Farouq and Roy, Chanchal Kumar and Roy, Banani and Schneider, Kevin},
  booktitle={Proceedings of the 39th International Conference in Software Maintenance and Evolution (ICSME 2023)},
  year={2023},
  organization={October 2023, Bogota, Colombia (to appear)}
}

@INPROCEEDINGS{10473618,
  author={Roy, Palash R. and Alam, Ajmain I. and Al-omari, Farouq and Roy, Banani and Roy, Chanchal K. and Schneider, Kevin A.},
  booktitle={2023 IEEE 17th International Workshop on Software Clones (IWSC)}, 
  title={Unveiling the Potential of Large Language Models in Generating Semantic and Cross-Language Clones}, 
  year={2023},
  volume={},
  number={},
  pages={22-28},
  keywords={Computer languages;Codes;Statistical analysis;Conferences;Semantics;Cloning;Linguistics;Language Models;Software Clone;Semantic Clone;Cross-language Clone;GPT;Semantic-CloneBench;Software Engineering},
  doi={10.1109/IWSC60764.2023.00011}}

Contact

Ajmain Inqiad Alam: [email protected] / [email protected]

Palash Ranjan Roy: [email protected] / [email protected]

Farouq Al-omari: [email protected]

Chanchal K. Roy: [email protected]

Banani Roy: [email protected]

Kevin Schneider: [email protected]

BibTeX Citation

1. @inproceedings{al2020semanticclonebench,
    title={Semanticclonebench: A semantic code clone benchmark using crowd-source knowledge},
    author={Al-Omari, Farouq and Roy, Chanchal K and Chen, Tonghao},
    booktitle={2020 IEEE 14th International Workshop on Software Clones (IWSC)},
    pages={57--63},
    year={2020},
    organization={IEEE}
  }

2. @article{brown2020language,
    title={Language models are few-shot learners},
    author={Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others},
    journal={Advances in neural information processing systems},
    volume={33},
    pages={1877--1901},
    year={2020}
}

3. @misc{morrison_2022, 
    title={GPT-3 developer OpenAI releases new Davinci Generative Text Model}, 
    url={https://techmonitor.ai/technology/ai-and-automation/gpt-3-openai-davinci-generative-text}, 
    journal={Tech Monitor}, 
    author={Morrison, Ryan}, 
    year={2022}, 
    month={Nov}
 }

4. @misc{jain_2022,
    title={OpenAI turns to Davinci to make GPT-3 Better},
    url={https://analyticsindiamag.com/openai-turns-to-davinci-to-make-gpt-3-better/},
    journal={Analytics India Magazine},
    author={Jain, Ayush},
    year={2022},
    month={Nov}
} 

5. @misc{monge_2022,
    title={New GPT-3 model: Text-DAVINCI-003 is awesome},
    url={https://medium.com/technology-hits/new-gpt-3-model-text-davinci-003-is-awesome-ada11ef660a9},
    journal={Medium},
    publisher={Technology Hits},
    author={Monge, Jim Clyde},
    year={2022},
    month={Dec}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
GPTCloneBench_semantic_standalone_clones.zip		GPTCloneBench_semantic_standalone_clones.zip
LICENSE		LICENSE
README.md		README.md
create_clones_file_ms.py		create_clones_file_ms.py
create_clones_for_gptclonebench.py		create_clones_for_gptclonebench.py
crossL_file_creation_for_validateClones.py		crossL_file_creation_for_validateClones.py
cross_language.zip		cross_language.zip
cross_language_part_2.zip		cross_language_part_2.zip
cross_language_process.py		cross_language_process.py
cross_language_process_config.txt		cross_language_process_config.txt
distinctive_config.txt		distinctive_config.txt
distinctive_run_nicad.py		distinctive_run_nicad.py
file_creation_for_validateClones.py		file_creation_for_validateClones.py
java_validate_clone.txt		java_validate_clone.txt
poetry.lock		poetry.lock
process_semanticbench_config.txt		process_semanticbench_config.txt
process_semanticclonebench.py		process_semanticclonebench.py
py_validate_clone.txt		py_validate_clone.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_nicad.py		run_nicad.py
standalone_clone_injection.py		standalone_clone_injection.py
type34_config.txt		type34_config.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPTCloneBench

System requirement

Generate and validate GPTCloneBench

Benchmark Validator (Undergraduate Interns):

License

Contact

BibTeX Citation

About

Releases

Packages

Contributors 3

Languages

License

srlabUsask/GPTCloneBench

Folders and files

Latest commit

History

Repository files navigation

GPTCloneBench

System requirement

Generate and validate GPTCloneBench

Benchmark Validator (Undergraduate Interns):

License

Contact

BibTeX Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages