Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for delay in Writer discovering new columns #377

Merged
merged 2 commits into from
May 9, 2024

Conversation

istreeter
Copy link
Contributor

After the loader alters the table to add new columns, it immediately opens a new Writer and expects the Writer to be aware of the new columns. However, we have found the Writer might get opened with no awareness of the newly added columns. Presumbably because of the async nature of BigQuery's architecture.

This fix works by retrying opening the writer until eventually it should get opened with awareness of the new columns

After the loader alters the table to add new columns, it immediately
opens a new Writer and expects the Writer to be aware of the new
columns. However, we have found the Writer might get opened with no
awareness of the newly added columns. Presumbably because of the async
nature of BigQuery's architecture.

This fix works by retrying opening the writer until eventually it should
get opened with awareness of the new columns
When schema evolves, e.g. when new nested field is added to entity, loader should explicity try to alter underlying BigQuery schema. It works correctly for self-describing events (unstruct columns), but as it turns in case of contexts it doesn't modify schema, alter is skipped, what results in bad data. This commit should fix this problem.
@pondzix pondzix merged commit b193fcd into handle_too_many_columns_v2 May 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants