-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset #2
Comments
@0bserver07 @guxd could you share the raw method names\descriptions? |
@matanpugach That's what I meant! I already downloaded the files from Gdrive, but they are preprocessed. Everyone using your dataset is limited to your features (token, api sequence, name tokens) and a vocabulary limit of 10,000. One cannot restore the original "RawCode" - "Documentation" mapping from your dataset, to - for example - try new features. |
i have python raw data how to preprocess it as same u did for java. I would to build code search for python code. How to do preprocessing?? |
@gauravkoradiya You should use python code parser. Python provides an |
|
Awesome..thank you....I got it. |
could you share the original datasets without any pickled or preprocessed files? |
It's a pity the authors do not release original dataset |
The raw code datasets are available at |
Hi guys, awesome project. Would you mind releasing the original training and testing dataset, without any pickled or preprocessed files?
The text was updated successfully, but these errors were encountered: