Backend based indexing #193

berndmoos · 2024-03-01T14:34:49Z

Currently, the indexer does not use the backend interface, but is directly handed the ISO/TEI files going into the index. I don't see why this has to be so, although I know that AGD indexing is based, not on the transcripts linked to the backend, but to a transformed / enriched version of them.
Running the indexer via the backend would be more consistent and transparent (for other users, for documentation), It will certainly do no harm to have an additional indexer which uses methods from BackendInterface to iterate over transcripts. The requirement that ISO/TEI transcripts will have to be pre-processed before being handed to the indexer could be handled via an abstract method:

public abstract Transcript preProcess(Transcript transcriptFromCorpus);
There may be performance issues, but they will be much less pronounced for anything smaller than FOLK or ZW (i.e. for almost all corpora). For the challenging cases, the current indexer would still be there (but maybe in the specific application, not in the "general API"?).

Wondering what @EleFri thinks :-)

The text was updated successfully, but these errors were encountered:

EleFri · 2024-07-09T07:32:26Z

You are right. We need a simple backend-based indexer with a preProcess-method adding metadata also from the backend.

berndmoos added Base architecture ZuRecht labels Mar 1, 2024

berndmoos self-assigned this Mar 1, 2024

EleFri assigned EleFri and berndmoos and unassigned berndmoos and EleFri Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend based indexing #193

Backend based indexing #193

berndmoos commented Mar 1, 2024

EleFri commented Jul 9, 2024

Backend based indexing #193

Backend based indexing #193

Comments

berndmoos commented Mar 1, 2024

EleFri commented Jul 9, 2024