-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] Document how to convert JDBC Adapter result into a Parquet file #316
base: main
Are you sure you want to change the base?
Changes from 2 commits
52a7034
7d448c0
b81312a
95edea4
991f40b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -579,3 +579,95 @@ Reading and writing dictionary-encoded data requires separately tracking the dic | |
Dictionary-encoded data recovered: [0, 3, 4, 5, 7] | ||
Dictionary recovered: Dictionary DictionaryEncoding[id=666,ordered=false,indexType=Int(8, true)] [Andorra, Cuba, Grecia, Guinea, Islandia, Malta, Tailandia, Uganda, Yemen, Zambia] | ||
Decoded data: [Andorra, Guinea, Islandia, Malta, Uganda] | ||
|
||
Customize Logic to Read Dataset | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we move this to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just maintain the steps needed to implement a data reader, and references as an example to jdbc page. |
||
=============================== | ||
|
||
If you need to implement a custom dataset reader, consider extending `ArrowReader`_ class. | ||
|
||
The ArrowReader class can be extended as follows: | ||
|
||
1. Write the logic to read schema on ``readSchema()``. | ||
2. If you do not want to define a logic for reading the schema, then you will also need to override ``getVectorSchemaRoot()``. | ||
3. Once (1) or (2) have been completed, you can proceed to ``loadNextBatch()``. | ||
4. At the end don’t forget to define the logic to ``closeReadSource()``. | ||
5. Make sure you define the logic for closing the ``closeReadSource()`` at the end. | ||
|
||
For example, let's create a custom JDBCReader reader. | ||
|
||
.. code-block:: java | ||
|
||
import java.io.IOException; | ||
|
||
import org.apache.arrow.adapter.jdbc.ArrowVectorIterator; | ||
import org.apache.arrow.adapter.jdbc.JdbcToArrowConfig; | ||
import org.apache.arrow.memory.BufferAllocator; | ||
import org.apache.arrow.vector.VectorSchemaRoot; | ||
import org.apache.arrow.vector.ipc.ArrowReader; | ||
import org.apache.arrow.vector.types.pojo.Schema; | ||
|
||
class JDBCReader extends ArrowReader { | ||
private final ArrowVectorIterator iter; | ||
private final JdbcToArrowConfig config; | ||
private VectorSchemaRoot root; | ||
private boolean firstRoot = true; | ||
|
||
public JDBCReader(BufferAllocator allocator, ArrowVectorIterator iter, JdbcToArrowConfig config) { | ||
super(allocator); | ||
this.iter = iter; | ||
this.config = config; | ||
} | ||
|
||
@Override | ||
public boolean loadNextBatch() throws IOException { | ||
if (firstRoot) { | ||
firstRoot = false; | ||
return true; | ||
} | ||
else { | ||
if (iter.hasNext()) { | ||
if (root != null && !config.isReuseVectorSchemaRoot()) { | ||
root.close(); | ||
} | ||
else { | ||
root.allocateNew(); | ||
} | ||
root = iter.next(); | ||
return root.getRowCount() != 0; | ||
} | ||
else { | ||
return false; | ||
} | ||
} | ||
} | ||
|
||
@Override | ||
public long bytesRead() { | ||
return 0; | ||
} | ||
|
||
@Override | ||
protected void closeReadSource() throws IOException { | ||
if (root != null && !config.isReuseVectorSchemaRoot()) { | ||
root.close(); | ||
} | ||
} | ||
|
||
@Override | ||
protected Schema readSchema() throws IOException { | ||
return null; | ||
} | ||
|
||
@Override | ||
public VectorSchemaRoot getVectorSchemaRoot() throws IOException { | ||
if (root == null) { | ||
root = iter.next(); | ||
} | ||
return root; | ||
} | ||
} | ||
|
||
|
||
|
||
|
||
.. _`ArrowReader`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/ipc/ArrowReader.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to
io.rst
? That's were "Read parquet" is.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, io.rst redirect to dataset.rst for read parquet.
What about to add write parquet on io.rst to also redirect to dataset.rst for write parquet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think it's actually better to put "Write Parquet" examples in
io.rst
. Thedataset.rst
examples are primarily for querying (reading) data.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed