Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Document how to convert JDBC Adapter result into a Parquet file #316

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

davisusanibar
Copy link
Contributor

In this example, we have the JDBC adapter result and trying to write them into a parquet file.

Current workaround:

  • Just put Allocator creation out of try-with-resources intentionally. Reviewing how to create/release allocator properly
  • If put Allocator creation inside try-with-resources errors appear such as: Closed with outstanding buffers allocated / RefCnt has gone negative

@davisusanibar
Copy link
Contributor Author

To close #315

Copy link
Member

@danepitkin danepitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding more examples!

===================

Go to :doc:`JDBC Adapter - Write ResultSet to Parquet File <jdbc>` for an example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would have a parquet example here that doesn't include things like JDBC. Do you think it would best to add it as part of this PR?

protected Schema readSchema() {
return schema;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add whitespace to the code below so it's organized into sections? I think it will be easier to read.

Comment on lines 311 to 315
Write ResultSet to Parquet File
===============================

In this example, we have the JDBC adapter result and trying to write them
into a parquet file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, was this specific example requested? I think it would be better to include a minimal read/write parquet example in dataset.rst and remove this one. The jdbc.rst already has an example for converting ResultSet to VectorSchemaRoot. What do you think?

@danepitkin
Copy link
Member

Ah I see the related issue now. I think it would be best if we had "read/write parquet" examples in dataset.rst and then added a very minimal example of why/how to extend the ArrowReader class for JDBC. What do you think?

@davisusanibar
Copy link
Contributor Author

Ah I see the related issue now. I think it would be best if we had "read/write parquet" examples in dataset.rst and then added a very minimal example of why/how to extend the ArrowReader class for JDBC. What do you think?

That make sense, let me also divide that.

@davisusanibar
Copy link
Contributor Author

Hi @lidavidm, Are there some recommendation for your side to where I could try to search/review for this issue?

Did you see this error when you were working with DatasetFileWriter.write or some errors related?

Current error messages:

07:52:55.995 [main] INFO org.apache.arrow.memory.BaseAllocator - Debug mode enabled.
07:52:55.999 [main] INFO org.apache.arrow.memory.DefaultAllocationManagerOption - allocation manager type not specified, using netty as the default type
07:52:56.001 [main] INFO org.apache.arrow.memory.CheckAllocator - Using DefaultAllocationManager at memory-netty/13.0.0-SNAPSHOT/arrow-memory-netty-13.0.0-SNAPSHOT.jar!/org/apache/arrow/memory/DefaultAllocationManagerFactory.class
07:52:56.020 [main] DEBUG io.netty.util.internal.logging.InternalLoggerFactory - Using SLF4J as the default logging framework
07:52:56.020 [main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetection.level: simple
07:52:56.020 [main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetection.targetRecords: 4
07:52:56.039 [main] DEBUG io.netty.util.internal.PlatformDependent0 - -Dio.netty.noUnsafe: false
07:52:56.039 [main] DEBUG io.netty.util.internal.PlatformDependent0 - Java version: 11
07:52:56.041 [main] DEBUG io.netty.util.internal.PlatformDependent0 - sun.misc.Unsafe.theUnsafe: available
07:52:56.041 [main] DEBUG io.netty.util.internal.PlatformDependent0 - sun.misc.Unsafe.copyMemory: available
07:52:56.042 [main] DEBUG io.netty.util.internal.PlatformDependent0 - sun.misc.Unsafe.storeFence: available
07:52:56.042 [main] DEBUG io.netty.util.internal.PlatformDependent0 - java.nio.Buffer.address: available
07:52:56.042 [main] DEBUG io.netty.util.internal.PlatformDependent0 - direct buffer constructor: unavailable: Reflective setAccessible(true) disabled
07:52:56.043 [main] DEBUG io.netty.util.internal.PlatformDependent0 - java.nio.Bits.unaligned: available, true
07:52:56.043 [main] DEBUG io.netty.util.internal.PlatformDependent0 - jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable: class io.netty.util.internal.PlatformDependent0$7 cannot access class jdk.internal.misc.Unsafe (in module java.base) because module java.base does not export jdk.internal.misc to unnamed module @d4342c2
07:52:56.044 [main] DEBUG io.netty.util.internal.PlatformDependent0 - java.nio.DirectByteBuffer.<init>(long, {int,long}): unavailable
07:52:56.044 [main] DEBUG io.netty.util.internal.PlatformDependent - sun.misc.Unsafe: available
07:52:56.060 [main] DEBUG io.netty.util.internal.PlatformDependent - maxDirectMemory: 8589934592 bytes (maybe)
07:52:56.060 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.tmpdir: /var/folders/d6/cz55k4qj52b40dmdvfjc_stm0000gn/T (java.io.tmpdir)
07:52:56.060 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.bitMode: 64 (sun.arch.data.model)
07:52:56.061 [main] DEBUG io.netty.util.internal.PlatformDependent - Platform: MacOS
07:52:56.063 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.maxDirectMemory: -1 bytes
07:52:56.063 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.uninitializedArrayAllocationThreshold: -1
07:52:56.063 [main] DEBUG io.netty.util.internal.CleanerJava9 - java.nio.ByteBuffer.cleaner(): available
07:52:56.063 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.noPreferDirect: false
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numHeapArenas: 32
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numDirectArenas: 32
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.pageSize: 8192
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxOrder: 9
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.chunkSize: 4194304
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.smallCacheSize: 256
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.normalCacheSize: 64
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxCachedBufferCapacity: 32768
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.cacheTrimInterval: 8192
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.cacheTrimIntervalMillis: 0
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.useCacheForAllThreads: false
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxCachedByteBuffersPerChunk: 1023
07:52:56.069 [main] DEBUG io.netty.util.internal.InternalThreadLocalMap - -Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
07:52:56.069 [main] DEBUG io.netty.util.internal.InternalThreadLocalMap - -Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
07:52:56.086 [main] DEBUG io.netty.buffer.AbstractByteBuf - -Dio.netty.buffer.checkAccessible: true
07:52:56.086 [main] DEBUG io.netty.buffer.AbstractByteBuf - -Dio.netty.buffer.checkBounds: true
07:52:56.087 [main] DEBUG io.netty.util.ResourceLeakDetectorFactory - Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector@72057ecf
07:52:56.105 [main] DEBUG org.apache.arrow.memory.rounding.DefaultRoundingPolicy - -Dorg.apache.memory.allocator.pageSize: 8192
07:52:56.105 [main] DEBUG org.apache.arrow.memory.rounding.DefaultRoundingPolicy - -Dorg.apache.memory.allocator.maxOrder: 11
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.maxCapacityPerThread: 4096
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.ratio: 8
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.chunkSize: 32
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.blocking: false
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.batchFastThreadLocalOnly: true
07:52:56.630 [main] DEBUG io.netty.util.internal.PlatformDependent - org.jctools-core.MpscChunkedArrayQueue: available
07:52:56.637 [main] DEBUG org.apache.arrow.memory.util.MemoryUtil - Constructor for direct buffer found and made accessible
07:52:56.637 [main] DEBUG org.apache.arrow.memory.util.MemoryUtil - direct buffer constructor: available
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.arrow.memory.util.MemoryUtil (file:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-memory-core/13.0.0-SNAPSHOT/arrow-memory-core-13.0.0-SNAPSHOT.jar) to field java.nio.Buffer.address
WARNING: Please consider reporting this to the maintainers of org.apache.arrow.memory.util.MemoryUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 8, length: 8
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 16, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 24, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 32, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 40, length: 16
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 56, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 12
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 80, length: 32
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 112, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 120, length: 12
07:52:57.642 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 136, length: 1
07:52:57.642 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 144, length: 20
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 8, length: 8
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 16, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 24, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 32, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 40, length: 16
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 56, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 12
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 80, length: 32
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 112, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 120, length: 12
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 136, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 144, length: 20
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 8, length: 4
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 16, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 24, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 32, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 40, length: 8
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 48, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 56, length: 8
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 16
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 80, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 88, length: 8
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 96, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 104, length: 4
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 8, length: 4
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 16, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 24, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 32, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 40, length: 8
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 48, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 56, length: 8
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 16
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 80, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 88, length: 8
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 96, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 104, length: 4
Exception in thread "Thread-8" java.lang.IllegalStateException: RefCnt has gone negative
	at org.apache.arrow.util.Preconditions.checkState(Preconditions.java:458)
	at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:130)
	at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
	at org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117)
	at org.apache.arrow.vector.BaseFixedWidthVector.clear(BaseFixedWidthVector.java:248)
	at org.apache.arrow.vector.BaseFixedWidthVector.close(BaseFixedWidthVector.java:238)
	at org.apache.arrow.util.AutoCloseables.close(AutoCloseables.java:97)
	at org.apache.arrow.vector.VectorSchemaRoot.close(VectorSchemaRoot.java:247)
	at org.apache.arrow.vector.ipc.ArrowReader.close(ArrowReader.java:143)
	at org.apache.arrow.vector.ipc.ArrowReader.close(ArrowReader.java:131)
	at org.apache.arrow.c.ArrayStreamExporter$ExportedArrayStreamPrivateData.close(ArrayStreamExporter.java:97)
	Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
		... 11 more
	Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
		at org.apache.arrow.util.Preconditions.checkState(Preconditions.java:458)
		at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:130)
		at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
		at org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117)
		at org.apache.arrow.vector.complex.BaseRepeatedValueVector.clear(BaseRepeatedValueVector.java:247)
		at org.apache.arrow.vector.complex.ListVector.clear(ListVector.java:624)
		at org.apache.arrow.vector.BaseValueVector.close(BaseValueVector.java:77)
		... 5 more
Exception in thread "main" java.lang.IllegalStateException: RefCnt has gone negative
	at org.apache.arrow.util.Preconditions.checkState(Preconditions.java:458)
	at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:130)
	at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
	at org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117)
	at org.apache.arrow.vector.BaseFixedWidthVector.clear(BaseFixedWidthVector.java:248)
	at org.apache.arrow.vector.BaseFixedWidthVector.close(BaseFixedWidthVector.java:238)
	at org.apache.arrow.util.AutoCloseables.close(AutoCloseables.java:97)
	at org.apache.arrow.vector.VectorSchemaRoot.close(VectorSchemaRoot.java:247)
	at org.apache.arrow.vector.ipc.ArrowReader.close(ArrowReader.java:143)
	at org.apache.arrow.vector.ipc.ArrowReader.close(ArrowReader.java:131)
	at dataset.domingo.WriteArrowObjectsToParquet.main(WriteArrowObjectsToParquet.java:70)
	Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
		... 11 more
	Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
		... 11 more
	Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
		at org.apache.arrow.util.Preconditions.checkState(Preconditions.java:458)
		at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:130)
		at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
		at org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117)
		at org.apache.arrow.vector.BaseVariableWidthVector.clear(BaseVariableWidthVector.java:270)
		at org.apache.arrow.vector.BaseVariableWidthVector.close(BaseVariableWidthVector.java:261)
		... 5 more
	Suppressed: java.lang.IllegalStateException: Allocator[allocatorParquetWrite] closed with outstanding buffers allocated (12).
Allocator(allocatorParquetWrite) 0/17746/51748/9223372036854775807 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 12
    ledger[101] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362784444..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[155], address:140388662068096, capacity:128
    ledger[111] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383364210168..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[165], address:140388662068608, capacity:128
    ledger[98] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362357658..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[152], address:140388661985520, capacity:2
    ledger[95] allocator: allocatorParquetWrite), isOwning: , size: , references: 0, life: 183383362099977..183383372675946, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[148], address:140388662035456, capacity:512
    ledger[103] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383363021510..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[157], address:140388662068352, capacity:128
    ledger[102] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362877116..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[156], address:140388662068224, capacity:128
    ledger[100] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362665191..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[154], address:140388662067968, capacity:128
    ledger[104] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383363121777..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[158], address:140388662068480, capacity:128
    ledger[105] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383363250885..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[159], address:140388661985536, capacity:8
    ledger[110] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383364066585..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[164], address:140388661985600, capacity:8
    ledger[99] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362510994..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[153], address:140388662059136, capacity:64
    ledger[96] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362167890..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[149], address:140388662140928, capacity:16384
  reservations: 0

		at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:445)
		at dataset.domingo.WriteArrowObjectsToParquet.main(WriteArrowObjectsToParquet.java:34)
	Suppressed: java.lang.IllegalStateException: Allocator[allocatorJDBC] closed with outstanding buffers allocated (8).
Allocator(allocatorJDBC) 0/49760/99536/9223372036854775807 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 8
    ledger[4] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382320693061..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[13], address:140388662017544, capacity:504
        ArrowBuf[11], address:140388662001664, capacity:16384
        ArrowBuf[12], address:140388662001664, capacity:15880
    ledger[3] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382316692606..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[8], address:140388661993472, capacity:32
        ArrowBuf[9], address:140388661993472, capacity:24
        ArrowBuf[10], address:140388661993496, capacity:8
    ledger[1] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382299608346..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[2], address:140388661985280, capacity:16
        ArrowBuf[3], address:140388661985280, capacity:8
        ArrowBuf[4], address:140388661985288, capacity:8
    ledger[2] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382315070903..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[7], address:140388661985304, capacity:8
        ArrowBuf[5], address:140388661985296, capacity:16
        ArrowBuf[6], address:140388661985296, capacity:8
    ledger[8] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382324122274..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[19], address:140388662058504, capacity:504
        ArrowBuf[18], address:140388662042624, capacity:15880
        ArrowBuf[17], address:140388662042624, capacity:16384
    ledger[9] allocator: allocatorJDBC), isOwning: , size: , references: 1, life: 183382325184402..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[20], address:140388661993504, capacity:32
    ledger[6] allocator: allocatorJDBC), isOwning: , size: , references: 1, life: 183382323054940..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[15], address:140388662018048, capacity:16384
    ledger[7] allocator: allocatorJDBC), isOwning: , size: , references: 1, life: 183382323374059..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[16], address:140388662034432, capacity:512
  reservations: 0

		at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:445)
		at dataset.domingo.WriteArrowObjectsToParquet.main(WriteArrowObjectsToParquet.java:34)
	Suppressed: java.util.ConcurrentModificationException
		at java.base/java.util.IdentityHashMap$IdentityHashMapIterator.nextIndex(IdentityHashMap.java:737)
		at java.base/java.util.IdentityHashMap$KeyIterator.next(IdentityHashMap.java:828)
		at org.apache.arrow.memory.BaseAllocator.print(BaseAllocator.java:693)
		at org.apache.arrow.memory.BaseAllocator.print(BaseAllocator.java:689)
		at org.apache.arrow.memory.BaseAllocator.toString(BaseAllocator.java:501)
		at org.apache.arrow.memory.RootAllocator.toString(RootAllocator.java:29)
		at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:432)
		at org.apache.arrow.memory.RootAllocator.close(RootAllocator.java:29)
		at dataset.domingo.WriteArrowObjectsToParquet.main(WriteArrowObjectsToParquet.java:34)
07:52:57.697 [main] DEBUG org.apache.arrow.memory.BaseAllocator - closed allocator[allocatorReader].

@lidavidm
Copy link
Member

lidavidm commented Aug 2, 2023

Have you isolated the problem? Looked at a debugger? Enabled allocation tracing?

@davisusanibar davisusanibar marked this pull request as ready for review August 11, 2023 19:34
@davisusanibar
Copy link
Contributor Author

Hi @danepitkin changes was added as requested.

@danepitkin
Copy link
Member

Nice work! I left a couple more comments. Let me know what you think.

@davisusanibar
Copy link
Contributor Author

Nice work! I left a couple more comments. Let me know what you think.

What are those comments?



Write Parquet Files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to io.rst? That's were "Read parquet" is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to io.rst? That's were "Read parquet" is.

Currently, io.rst redirect to dataset.rst for read parquet.

What about to add write parquet on io.rst to also redirect to dataset.rst for write parquet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think it's actually better to put "Write Parquet" examples in io.rst. The dataset.rst examples are primarily for querying (reading) data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -579,3 +579,95 @@ Reading and writing dictionary-encoded data requires separately tracking the dic
Dictionary-encoded data recovered: [0, 3, 4, 5, 7]
Dictionary recovered: Dictionary DictionaryEncoding[id=666,ordered=false,indexType=Int(8, true)] [Andorra, Cuba, Grecia, Guinea, Islandia, Malta, Tailandia, Uganda, Yemen, Zambia]
Decoded data: [Andorra, Guinea, Islandia, Malta, Uganda]

Customize Logic to Read Dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to jdbc.rst? I think it fits better there since its directly applicable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just maintain the steps needed to implement a data reader, and references as an example to jdbc page.

import ch.qos.logback.classic.Level;
import ch.qos.logback.classic.Logger;

class JDBCReader extends ArrowReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we somehow delete the duplicate code here and reuse the other one? Or combine the two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only JDBC is maintaining this demo example now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@danepitkin
Copy link
Member

I forgot to hit "Submit Review" 😅 sorry!

@davisusanibar
Copy link
Contributor Author

I would appreciate your help with a new code review, @danepitkin.

Copy link
Member

@danepitkin danepitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! Thank you @davisusanibar

import org.apache.arrow.vector.util.ByteArrayReadableSeekableByteChannel;

// read arrow demo data
Path uriRead = Paths.get("./thirdpartydeps/arrowfiles/random_access.arrow");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a comment describing what's in this file? Looks like it's three row groups of 3 rows each based on the output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

import ch.qos.logback.classic.Level;
import ch.qos.logback.classic.Logger;

class JDBCReader extends ArrowReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

}
}

((Logger) LoggerFactory.getLogger("org.apache.arrow")).setLevel(Level.TRACE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we fiddling with loggers and adding logback to the example? I don't think we need any of that?

import ch.qos.logback.classic.Level;
import ch.qos.logback.classic.Logger;

class JDBCReader extends ArrowReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain that we need this because writing a dataset takes an ArrowReader, so we have to adapt the JDBC ArrowVectorIterator to the ArrowReader interface

JdbcToArrowUtils.getUtcCalendar())
.setTargetBatchSize(2)
.setReuseVectorSchemaRoot(true)
.setArraySubTypeByColumnNameMap(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of keeping examples concise, let's use sample data that doesn't require us to deal with all of this in the first place.

@pronzato
Copy link

pronzato commented Sep 28, 2023 via email

@davisusanibar
Copy link
Contributor Author

Hi David, When I try to run JDBCReader I get URI has empty scheme java.lang.RuntimeException: URI has empty scheme: '/tmp at org.apache.arrow.dataset.file.JniWrapper.writeFromScannerToFile(Native Method) at org.apache.arrow.dataset.file.DatasetFileWriter.write(DatasetFileWriter.java:46) at org.apache.arrow.dataset.file.DatasetFileWriter.write(DatasetFileWriter.java:59) Any idea what could be causing this? Regards GP

Hi @pronzato, this project also uses JDBC reader https://github.com/davisusanibar/java-python-by-cdata.git.

Could you please try using that and confirm if it is also failing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants