Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation for different record formats #632

Open
saikumare-a opened this issue Jul 5, 2023 · 1 comment
Open

documentation for different record formats #632

saikumare-a opened this issue Jul 5, 2023 · 1 comment
Labels
question Further information is requested

Comments

@saikumare-a
Copy link

Question

could you help with providing the differences and in which scenario's the corresponding record formats should be used?

  1. .option("record_format", "D)"
  2. .option("record_format", "D2)"
@saikumare-a saikumare-a added the question Further information is requested label Jul 5, 2023
@yruslan
Copy link
Collaborator

yruslan commented Jul 7, 2023

.option("record_format", "D") should be used for most ASCII use cases with the latest Cobrix.

Basically, 'D' uses Hadoop ASCII file splitter which is faster. But until recently Cobrix didn't support ASCII charsets other than UTF8 using that splitter. So we implemented dour own and called it D2. But now 'D' supports ASCII charsets as well, so these record formats are almost identical. Please, use 'D' with the latest Cobrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants