Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend based indexing #193

Open
berndmoos opened this issue Mar 1, 2024 · 1 comment
Open

Backend based indexing #193

berndmoos opened this issue Mar 1, 2024 · 1 comment

Comments

@berndmoos
Copy link
Member

Currently, the indexer does not use the backend interface, but is directly handed the ISO/TEI files going into the index. I don't see why this has to be so, although I know that AGD indexing is based, not on the transcripts linked to the backend, but to a transformed / enriched version of them.
Running the indexer via the backend would be more consistent and transparent (for other users, for documentation), It will certainly do no harm to have an additional indexer which uses methods from BackendInterface to iterate over transcripts. The requirement that ISO/TEI transcripts will have to be pre-processed before being handed to the indexer could be handled via an abstract method:

public abstract Transcript preProcess(Transcript transcriptFromCorpus);
There may be performance issues, but they will be much less pronounced for anything smaller than FOLK or ZW (i.e. for almost all corpora). For the challenging cases, the current indexer would still be there (but maybe in the specific application, not in the "general API"?).

Wondering what @EleFri thinks :-)

@EleFri
Copy link
Collaborator

EleFri commented Jul 9, 2024

You are right. We need a simple backend-based indexer with a preProcess-method adding metadata also from the backend.

@EleFri EleFri assigned EleFri and berndmoos and unassigned berndmoos and EleFri Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants