Skip to content
This repository has been archived by the owner on Jul 6, 2021. It is now read-only.

mast-group/codemining-treelm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

codemining-treelm

codemining-treelm contains code for language models that work on trees.

codemining.ast contains code to convert ASTs to language-agnostic TreeNodes

codemining.lm contains an implementation of PCFGs and TSGs as well as some idiom-related code.

The project depends on three internal (maven) modules:

a) codemining-utils b) codemining-core c) codemining-sequencelm

The rest of the dependencies are declared in the maven dependencies.

Idiom Mining

This repository contains the code related to the paper:

@inproceedings{allamanis2014mining,
  title={Mining Idioms from Source Code},
  author={Allamanis, Miltiadis and Sutton, Charles},
  booktitle={Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering},
  pages={472--483},
  year={2014},
  organization={ACM}
}

To train a TSG for Java use the main class in codemining.lm.tsg.tui.java.SampleBlockedTSG with the arguments

/path/to/folder binaryvariables  filterblock 1.0 50

to run the TSG training as in the "Mining Idioms from Source Code" paper. For other options please explore the code.

Releases

No releases published

Packages

No packages published

Languages