-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data preprocessing code. #1
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a .gitignore
entry for this
Edit: Oh there already is one let's stop tracking it then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol. I noticed after the first push. It is added on .gitignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rename of question.json
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is an artifact. No longer needed. I should have removed both question.json
and naep_meta.json
. Just need to run preprocess_data.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll review the change. I push some more code yesterday night. We need to keep question.json since I added more attributes for each question which can help me merge some variables. I will write a document to clarify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kikumaru818 Oh, i see the context_var
. I'll apply that to the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only "context_var", I have more. Maybe modify it after we meet.
I think keeping the question.json is ok. Definitely we if had a code version that can produce question.json is also great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think having question.json
by typing through our hands is a good idea. I made a function add_additional_info
that takes care of additional infos you want to add to the json file. After discussing with you, I'll make the version that creates the json format you want. FYI, i just renamed it into meta_info.json
and placed under the data
directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should have a code version to generate it. I just mean it's ok to keep question.json file. Don't need to generate it every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah. I'll add the code that loads the json file if it exists.
LGTM! |
@kikumaru818 @wmcnicho @wmcnicho @alexscarlatos
Hi everyone, made a preprocessing code on top of mengxue's implementation.
Just let me know if something is wrong.
Thanks.
mkdir data
/ and put the all_items_train.txt in it.python3 preprocess_data.py