Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I access commit ID's for the ReVeal dataset? #13

Open
bstee615 opened this issue Jul 21, 2021 · 2 comments
Open

How can I access commit ID's for the ReVeal dataset? #13

bstee615 opened this issue Jul 21, 2021 · 2 comments

Comments

@bstee615
Copy link

Hi, how can I access the commit ID's for the code in the ReVeal dataset (https://drive.google.com/drive/folders/1KuIYgFcvWUXheDhT--cBALsfy1I4utOy)? The field "hash" is present, but this doesn't look to be a commit ID, so I can't figure out how to track down the original commit where the code was checked out. Any help is appreciated. Thanks!

Example from ReVeal, where the commit ID seems to be missing:

{
"code": "...", 
"hash": -8228664527580018723, 
"project": "debian", 
"size": 26
}

Example from Devign, where commit_id can be used to check out the repository version for the code:

{
"project": "FFmpeg",
"commit_id": "973b1a6b9070e2bf17d17568cbaf4043ce931f51",
"target": 0,
"func": "..."
}
@Splines
Copy link

Splines commented Aug 18, 2021

Same question here... I can't figure out what kind of hash this is in the first place. It is signed and doesn't have any alphanumeric characters in it. I don't know a hashing algorithm like that.

As can be seen from their process_data Jupyter notebook over here, the hash is already present in a file called debian_data.csv.
debian data

But I don't know where this data comes from either and there is also another GitHub issue for that.

@asejfia
Copy link

asejfia commented Nov 2, 2021

+1 for the question. I'd appreciate any help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants