-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Mongo to BigQuery: Long (Int64) is serialized as float-like (scientific) number #9590
Comments
Triage: Airbyte did recognize the mongo schema as number (see screenshot). FYI, the field with large values is the Subsequently, I hypothese the issue might be the serialization to JSON (source), i.e., an issue with the library. Or upon deserialization in the destination connector it is assumed to be float-like because it's a large value. |
Hey does it makes sense to change it to string and handle it whatever format you need over Custom DBT? Does this sound good? |
If I can instruct Airbyte to to treat it as string, that be great! Is that possible? |
I noticed a PR was merged which should fix this issue (#14362). I noticed after my report two other issues (#12606 and #12057) were opened, and supposedly closed by the aforementioned PR. I just tested this again, and unfortunately the data is still synced as floats instead of integer. I created a new connection to ensure it ran schema discovery again. Can this issue be researched again? Meanwhile we are running:
|
ah, that was my bad - I didn't notice the mongodb in the list of sources. Unfortunately mongo isn't a jdbc source (since it's not even an rdbms) so it wasn't solved by #14362. I did a little digging - updating this mapping is a good starting point (i.e. mapping under the hood, what's happening is that |
Normalization and custom dbt are deprecated features and will be removed soon from the project. For that reason I'm closing the issue as it won't be implemented anymore. |
Environment
Current Behavior
In our Mongo source, we have large numeric data (i.e., of type long). For instance the value:
3241201784
(> 2.1B). However, somewhere during the sync (not sure which component), the value is serialized as float-like type to3.241201784E9
.This is a problem, because the value is actually a numeric id. Thus I need the exact value, not a float-like type.
For the records I checked, the value as-is preserves the same precision (i.e., the same amount of decimal digits are present). So theoretically I can convert them back to long/int64. However I’m not confident that it is always this case and/or will happen when we roll over to >10B records.
Expected Behavior
Instead, I'd expected one of the following behaviors:
Steps to Reproduce
Are you willing to submit a PR?
Sure, but could use some guidance on how to debug the full flow to see which part the serialization goes wrong.
The text was updated successfully, but these errors were encountered: