Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add more fulltext sources #62

Closed
talentoscope opened this issue Sep 20, 2016 · 7 comments
Closed

How to add more fulltext sources #62

talentoscope opened this issue Sep 20, 2016 · 7 comments

Comments

@talentoscope
Copy link

This software is simply astounding.

I'd love to know, as would many, exactly how to add multiple sources of information.
Ideally by adding more documents to solr for indexing, such as other Wikis, Project Gutenberg texts, etc. I assume these would all be processed with the fulltext search using solr whether there is a dbpedia clue or not?

Please help.

@pasky
Copy link
Member

pasky commented Sep 21, 2016

Documenting this is a subject of #17 but it is not possible out of the box, only in principle. (The architecture allows it, but there is no explicit code support for querying multiple Solr indices. Sure, you could just make sure IDs are non-duplicate and index everything in a single Solr collection, as a starting point...)

@pasky pasky closed this as completed Sep 21, 2016
@talentoscope
Copy link
Author

Thanks for the reply. Will look at doing that, and maybe playing with the code to add multiple instances of solr (or multiple collections).

@pasky
Copy link
Member

pasky commented Sep 21, 2016 via email

@talentoscope
Copy link
Author

Will definitely be contributing code back if I come up with anything, just
getting up to speed with the code. Not used to java.

On Wed, 21 Sep 2016, 20:34 Petr Baudis, [email protected] wrote:

Any contributions to the code or just to the documentation will be
welcome!


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#62 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AVItTvG0Cu36QVOJKD8MhvpN0kwZ_YZbks5qsYaogaJpZM4KCKAv
.

@talentoscope
Copy link
Author

Having looked at the code, I really don't think I'm going to be much use there, so instead will create a good dataset of questions sourced from many places, will curate this with question, answer, LAT type and anything else you feel necessary to help towards training the system.

@pasky
Copy link
Member

pasky commented Sep 22, 2016 via email

@talentoscope
Copy link
Author

Made a start on this last night, up to about 250 questions. There are a few inference based ones in there too but shouldn't be too hard to still find the answer using current system. Benchmarking using base YodaQA too with correct answer position number and confidence. Hopefully that extra info will go towards diagnosis, easily removable. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants