Skip to content

GPTCloneBench is a clone detection benchmark based on SemanticCloneBench and GPT.

License

Notifications You must be signed in to change notification settings

srlabUsask/GPTCloneBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPTCloneBench

GPTCloneBench is a clone detection benchmark based on SemanticCloneBench [1] and GPT [2,3,4,5]. This work is accepted at the ICSME2023 conference. We published another study following similar methodologies of GPTCloneBench named "Unveiling the potential of large language models in generating semantic and cross-language clones" in IWSC2023.

Please find the semantic clones (stand-alone + system injected clones) here: https://doi.org/10.5281/zenodo.10198952

For Cross-language clones: In this git repository, follow these two files:

  • cross_language.zip
  • cross_language_part_2.zip

Cross language clones are given in as stand-alone clones. They are not injected in a system.

System requirement

To install necessary libraries, please run the following command:

pip install -r requirements.txt

To run NiCad on generated Clone, you need to install TXL and NiCad.

To generate clones, you need to have SemanticCloneBench [1]. Follow this link to download SemanticCloneBench: https://drive.google.com/open?id=1KicfslV02p6GDPPBjZHNlmiXk-9IoGWl

To manually validate GPT clones, we have utilized tool from Jeffrey Svajlenko: https://github.com/jeffsvajlenko/ValidateClones

You need OpenAI API key to run the system. This link provided details on how to obtain OpenAI API key: https://www.maisieai.com/help/how-to-get-an-openai-api-key-for-chatgpt

Please follow the link to generate your own secret API key.

Generate and validate GPTCloneBench

To generate semantic clone, follow the following steps:

  1. Clone this repository.
  2. Copy SemanticCloneBench into this folder.
  3. run python create_clones_for_gptclonebench.py. Follow the prompts to generate clones.
  4. run python file_creation_for_validateClones.py to create input file for manual validation.
  5. run python crossL_file_creation_for_validateClones.py to create input file for manual validation for cross language clones.

Benchmark Validator (Undergraduate Interns):

  1. Chi Phuong Vu

    GitHub ID: 115325256, Email: [email protected]

  2. Olaoluwa Dayo-Olaide

    Email: [email protected]

  3. Souvik Ukil

    Email: [email protected]

  4. Aryan Mehta

    GitHub ID: 90737338, Email: [email protected] or [email protected]

  5. Dipika Ayshi

    Email: [email protected]

  6. Chi Cai

    GitHub id: 68583124, Email: [email protected]

License

Benchmark: The benchmark is distributed under the Creative Commons, Attribution-NonCommercial-NoDerivatives. This license includes the benchmark database and its derivatives. For attribution, please cite this page and our publications below. This data is provided free of charge for non-commercial and academic benchmarking and experimentation use. If you would like to contribute to the benchmark, please contact us. If you believe your intended usage may be restricted by the license, please contact us, and we can discuss the possibilities. BibTex for the GPTCloneBench (initial version) and Unveiling the potential of large language models in generating semantic and cross-language clones:

@inproceedings{gptclonebench2023,
  title={GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench},
  author={Alam, Ajmain Inqiad and Roy, Palash Ranjan and Al-omari, Farouq and Roy, Chanchal Kumar and Roy, Banani and Schneider, Kevin},
  booktitle={Proceedings of the 39th International Conference in Software Maintenance and Evolution (ICSME 2023)},
  year={2023},
  organization={October 2023, Bogota, Colombia (to appear)}
}

@INPROCEEDINGS{10473618,
  author={Roy, Palash R. and Alam, Ajmain I. and Al-omari, Farouq and Roy, Banani and Roy, Chanchal K. and Schneider, Kevin A.},
  booktitle={2023 IEEE 17th International Workshop on Software Clones (IWSC)}, 
  title={Unveiling the Potential of Large Language Models in Generating Semantic and Cross-Language Clones}, 
  year={2023},
  volume={},
  number={},
  pages={22-28},
  keywords={Computer languages;Codes;Statistical analysis;Conferences;Semantics;Cloning;Linguistics;Language Models;Software Clone;Semantic Clone;Cross-language Clone;GPT;Semantic-CloneBench;Software Engineering},
  doi={10.1109/IWSC60764.2023.00011}}

Contact

Ajmain Inqiad Alam: [email protected] / [email protected]

Palash Ranjan Roy: [email protected] / [email protected]

Farouq Al-omari: [email protected]

Chanchal K. Roy: [email protected]

Banani Roy: [email protected]

Kevin Schneider: [email protected]

BibTeX Citation

1. @inproceedings{al2020semanticclonebench,
    title={Semanticclonebench: A semantic code clone benchmark using crowd-source knowledge},
    author={Al-Omari, Farouq and Roy, Chanchal K and Chen, Tonghao},
    booktitle={2020 IEEE 14th International Workshop on Software Clones (IWSC)},
    pages={57--63},
    year={2020},
    organization={IEEE}
  }

2. @article{brown2020language,
    title={Language models are few-shot learners},
    author={Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others},
    journal={Advances in neural information processing systems},
    volume={33},
    pages={1877--1901},
    year={2020}
}

3. @misc{morrison_2022, 
    title={GPT-3 developer OpenAI releases new Davinci Generative Text Model}, 
    url={https://techmonitor.ai/technology/ai-and-automation/gpt-3-openai-davinci-generative-text}, 
    journal={Tech Monitor}, 
    author={Morrison, Ryan}, 
    year={2022}, 
    month={Nov}
 }

4. @misc{jain_2022,
    title={OpenAI turns to Davinci to make GPT-3 Better},
    url={https://analyticsindiamag.com/openai-turns-to-davinci-to-make-gpt-3-better/},
    journal={Analytics India Magazine},
    author={Jain, Ayush},
    year={2022},
    month={Nov}
} 

5. @misc{monge_2022,
    title={New GPT-3 model: Text-DAVINCI-003 is awesome},
    url={https://medium.com/technology-hits/new-gpt-3-model-text-davinci-003-is-awesome-ada11ef660a9},
    journal={Medium},
    publisher={Technology Hits},
    author={Monge, Jim Clyde},
    year={2022},
    month={Dec}
} 

About

GPTCloneBench is a clone detection benchmark based on SemanticCloneBench and GPT.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages