-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about testing on new data #30
Comments
|
Thank you. Please share it with me! Really appreciate it!
…On Wed, Jul 31, 2019 at 10:18 AM Xuanyu Zhou ***@***.***> wrote:
1.
The speed is slow on non-cached Wikipedia titles, especially on CPUs,
because it runs multiple ELMo inferences to generate a title's
representation. I could provide a huge SQLite file (~72GB) that contains
all the Wikipedia titles, do you want me to share it? By having that file,
you could use this function
<https://github.com/CogComp/zoe/blob/master/zoe_utils.py#L39> instead
of load_cached_embeddings. Furthermore, it is recommended to cache
your test set as well, i.e. store what candidates are found at each
instance so that you can tune your type inference at a low cost. To do
this, I would suggest storing results into a map and pickle that map.
2.
Everything should work fine if you have your type mapping (inference)
part working. The previous point only speeds things up, without any impact
on the results.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#30?email_source=notifications&email_token=AFB56KISOX5OALWGS5P5TT3QCHCMZA5CNFSM4IIH7RI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3H6I7A#issuecomment-516940924>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFB56KKCVRXOCBC4ZLGAZC3QCHCMZANCNFSM4IIH7RIQ>
.
|
Updated the file "elmo_cache_correct.db" in the Google Drive https://drive.google.com/drive/u/1/folders/1fD6WfCEPQICGPhxqlwuVmf8uOot-jQq8?ths=true. Sorry for the delay, it's a huge file to upload. To use it, please refer to the function pointer above, and set |
Thank you. Downloading it now, will bother you more if there is any further problems! |
Hi, I'm trying to run ZOE on a new dataset and the following questions were raised:
In the main.py, should I comment out runner.elmo_processor.load_cached_embeddings("target.min.embedding.pickle", "wikilinks.min.embedding.pickle")? If yes, could you show me how these two files are generated and what are the format for the raw version of these two files? Currently I found running new data is extremely slow (processed 30 sentences after one night). Anything idea how I can speed up things?
Are there any other files/data I need to generate for testing on new dataset? (maybe vocab_test.txt?)
Thank you!
The text was updated successfully, but these errors were encountered: