Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query language? #38

Open
isoboroff opened this issue Mar 25, 2022 · 10 comments
Open

Query language? #38

isoboroff opened this issue Mar 25, 2022 · 10 comments

Comments

@isoboroff
Copy link

isoboroff commented Mar 25, 2022

How does Patapsco parse queries? In particular, when you send a query to the web service, is it parsed as a Lucene query, or something else?

The context is that I'm thinking about ways to handle queries on a combined traditional and simplified Chinese corpus.

Are parameters of the retrieval in the web service controlled by the "queries" and "retrieve" clauses of the config file?

@cash
Copy link
Member

cash commented Mar 25, 2022

@isoboroff when running the web services like so:

patapsco-web --run path/to/run --port 9090

It reads the configuration file saved in the run directory and uses the topic file section to grab the language of the queries (and uses the retrieve config for those parameters).

I think you're asking for the ability to override parts of the config on the command line. Is that right?

@isoboroff
Copy link
Author

My main question is how are the queries parsed. The answer seems to be the same way they are in batch mode. I think that's just word tokens with no operators or anything, right?

I'm adapting my collection search tool, which currently uses ElasticSearch, to use the Patapsco web service, on the hypothesis that it is better at tokenizing the languages I'm working with (Russian, Farsi, Chinese). Elastic has a lot of web service functionality like highlights and faceting and pagination which are nice when building an interactive search tool, and also it's not hard to use Lucene query syntax which supports some common operators.

@isoboroff
Copy link
Author

isoboroff commented Mar 28, 2022

Just adding the minimum configuration:

topics:
  input:
    lang: fas
retrieve:
  name: bm25
  number: 10

There is an error:

patapsco.error.ConfigError: 3 validation errors in configuration
  topics.input.format - missing field
  topics.input.source - missing field
  topics.input.path - missing field

These fields of course don't make sense for interactive queries. Does it mean that the query endpoint is expecting a JSON object like a batch query?

(edited: removed bad stand-in config. I needed a basic "queries" section which was missing.)

@dlawrie
Copy link
Collaborator

dlawrie commented Mar 28, 2022 via email

@isoboroff
Copy link
Author

isoboroff commented Mar 28, 2022

Frankly, I'm trying run the web service and send some queries from the command line so I can understand the request and response formats.

Your JS doesn't clarify the format of the query, and you appear to have a custom URL maybe meaning you have a proxy layer in there per language, or your own web service app.

@isoboroff
Copy link
Author

I see in patapsco/topic.py that there seem to be hooks for Lucene query processing, I'll start poking through that.

@cash
Copy link
Member

cash commented Mar 28, 2022

@isoboroff Yes, processing of queries/topics in the web services is controlled by the configuration file used to create the index. Most people use term-based queries or PSQ. I added support for Lucene syntax but it has to be configured for that and is not interoperable with PSQ. The only documentation that I have on this is here: https://github.com/hltcoe/patapsco/blob/master/docs/config.md#lucene-classic-query-parsing

@lizekui
Copy link

lizekui commented Sep 16, 2022

Hi @dlawrie your js code looks so subtle and concise, could you share your js code project for beginners as me? Thanks!

#38 (comment)

@dlawrie
Copy link
Collaborator

dlawrie commented Oct 11, 2022 via email

@dlawrie
Copy link
Collaborator

dlawrie commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants