Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data preprocessing code. #1

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Data preprocessing code. #1

wants to merge 3 commits into from

Conversation

jwlee-jake
Copy link

@jwlee-jake jwlee-jake commented Apr 10, 2023

@kikumaru818 @wmcnicho @wmcnicho @alexscarlatos
Hi everyone, made a preprocessing code on top of mengxue's implementation.
Just let me know if something is wrong.
Thanks.

  1. mkdir data/ and put the all_items_train.txt in it.
  2. python3 preprocess_data.py

Copy link
Contributor

@wmcnicho wmcnicho Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a .gitignore entry for this

Edit: Oh there already is one let's stop tracking it then

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol. I noticed after the first push. It is added on .gitignore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rename of question.json right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is an artifact. No longer needed. I should have removed both question.json and naep_meta.json. Just need to run preprocess_data.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll review the change. I push some more code yesterday night. We need to keep question.json since I added more attributes for each question which can help me merge some variables. I will write a document to clarify it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kikumaru818 Oh, i see the context_var. I'll apply that to the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only "context_var", I have more. Maybe modify it after we meet.
I think keeping the question.json is ok. Definitely we if had a code version that can produce question.json is also great.

Copy link
Author

@jwlee-jake jwlee-jake Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think having question.json by typing through our hands is a good idea. I made a function add_additional_info that takes care of additional infos you want to add to the json file. After discussing with you, I'll make the version that creates the json format you want. FYI, i just renamed it into meta_info.json and placed under the data directory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should have a code version to generate it. I just mean it's ok to keep question.json file. Don't need to generate it every time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah. I'll add the code that loads the json file if it exists.

@wmcnicho
Copy link
Contributor

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants