Skip to content
sushil edited this page Sep 13, 2010 · 17 revisions

The codesearch module in Sourcerer has a tool that allows evaluation of retrieval schemes for locating APIs. This page lists the resources that are used to run the evaluation.

A paper titled “Leveraging Usage Similarity for Effective Retrieval of Examples in Code Repositories” will appear in FSE2010 conference. Here are pointers to the entire data set used for the evaluation in that paper. We hope this will be useful for others wanting to replicate or extend the study. The files used for FSE2010 paper is located in the following folder in this repository: (Link). Given below are links to individual files, and information on other resources.

Also see..

  • Sourcerer API Search uses the retrieval schemes mentioned above, and uses the snippet generation technique mentioned in the FSE paper

Candidate Queries and the Oracle

  • A list of 20 queries and the solutions used to make relevancy judgement. (Download File)

Evaluation Data

  • Two files in TREC format were produced that store the information on ranked hits for each query, and relevance judgement made for each result.

Repository information

The repository was created using the Jars from plugins folder of standard installation of Eclipse V3.5.1

Tools used:

  • Sourcerer Feature Extractor
  • Sourcerer’s Index Creator
  • Sourcerer’s Usage Calculator and Similarity Calculator to generate Hamming Distance and Tanimoto Coefficient based similarity information
  • Services
    • Sourcerer File Repository Service
    • SourcererDB
  • Sourcerer Code Search
    • Search Adapter
    • Snippet generator and Evaluation tool
  • Galagosearch’s evaluation tool for calculating metrics
  • R (to generate plot/statistical results)