Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a few more details to "Syncing Data into Typesense" #167

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

alexander-zierhut
Copy link

Change Summary

  • Adds another data syncing strategy (Query Parsing)
  • Adds some advice about state recovery

PR Checklist

@@ -46,10 +55,10 @@ Read more about how to deploy Airbyte, and set it up [here](https://airbytehq.gi
## Sync real-time changes

In addition to the above, if you have a use case where you want to update some records in realtime, may be because you want a user's edit to a record to be immediately reflected in the search results (and not after say 10s or whatever your sync interval is in the above process),
you can also use the <RouterLink :to="`/${$site.themeConfig.typesenseLatestVersion}/api/documents.html#index-a-single-document`">Single Document Indexing API</RouterLink>.
you can also use the <RouterLink :to="`/${$site.themeConfig.typesenseLatestVersion}/api/documents.html#index-a-single-document`">Single Document Indexing API</RouterLink> each time a record change event happens. You may want to buffer these events in a queue for situations where real-time synchronization can not be achieved due to i.e. server load.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additions in this chunk feel a little redundant when taking the next paragraph into account


Note however that the bulk import endpoint is much more performant and uses less CPU capacity, than the single document indexing endpoint for the same number of documents.
So you want to try and use the bulk import endpoint as much as possible, even if that means reducing your sync interval for the process above to as less as say 2s.
So you want to try and use the bulk import endpoint as much as possible, even if that means reducing your sync interval for the process above to as less as say 2s. When using the afformentioned buffering strategy, your consumer may simply wait for a maximum of 2s in that case to gather events before importing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additions in this chunk feel a little redundant when taking the preceding paragraph into account

Instead, you want to do client-side batching, by controlling the number of documents in a single import API call and potentially do multiple API calls in parallel.

### Routines for restoring state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section feels like it's stating something that users might already have - a way to backfill data. Makes the article a bit verbose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants