Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Dataset Failed to update reservation while freeing bytes: JNIEnv was not attached to current thread #20379

Closed
asfimport opened this issue Aug 24, 2022 · 6 comments · Fixed by #40101

Comments

@asfimport
Copy link
Collaborator

Using Dataset NativeMemoryPool.getDefault() work very well with any size of data input as you see in:

 

import org.apache.arrow.dataset.file.FileFormat;
import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
import org.apache.arrow.dataset.jni.DirectReservationListener;
import org.apache.arrow.dataset.jni.NativeMemoryPool;
import org.apache.arrow.dataset.scanner.ScanOptions;
import org.apache.arrow.dataset.scanner.ScanTask;
import org.apache.arrow.dataset.scanner.Scanner;
import org.apache.arrow.dataset.source.Dataset;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.ipc.ArrowReader;

import java.io.File;
import java.util.ArrayList;
import java.util.List;

public class ReadingMultipleParquetFiles {
    public static void main(String[] args) {
        //https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-01.parquet
        File file = new File("src/main/resources/parquetfiles/yellow_tripdata_2022-01.parquet");
        List<VectorSchemaRoot> schemaRoots = new ArrayList<>();
        try(BufferAllocator allocator = new RootAllocator();
            NativeMemoryPool aDefault = NativeMemoryPool.getDefault();
            FileSystemDatasetFactory fileSystemDatasetFactory = new FileSystemDatasetFactory(
                    allocator, aDefault,
                    FileFormat.PARQUET, file.toURI().toString());
            Dataset dataset = fileSystemDatasetFactory.finish();
            Scanner scanner = dataset.newScan(new ScanOptions(1000))
            ){
            System.out.println(DirectReservationListener.instance().getCurrentDirectMemReservation());
            for (ScanTask scanTask : scanner.scan()) {
                try(ArrowReader execute = scanTask.execute()){
                    while(execute.loadNextBatch()){
                        schemaRoots.add(execute.getVectorSchemaRoot());
                    }
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        System.out.println(schemaRoots.size());
    }
} 

 

 

In case we decided to use NativeMemoryPool.createListenable we are seeing this error message: /Users/runner/work/crossbow/crossbow/arrow/java/dataset/src/main/cpp/jni_util.cc:78: Failed to update reservation while freeing bytes: JNIEnv was not attached to current thread:

import org.apache.arrow.dataset.file.FileFormat;
import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
import org.apache.arrow.dataset.jni.DirectReservationListener;
import org.apache.arrow.dataset.jni.NativeMemoryPool;
import org.apache.arrow.dataset.scanner.ScanOptions;
import org.apache.arrow.dataset.scanner.ScanTask;
import org.apache.arrow.dataset.scanner.Scanner;
import org.apache.arrow.dataset.source.Dataset;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.ipc.ArrowReader;

import java.io.File;
import java.util.ArrayList;
import java.util.List;

public class ReadingMultipleParquetFiles {
    public static void main(String[] args) {
        //https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-01.parquet
        File file = new File("src/main/resources/parquetfiles/yellow_tripdata_2022-01.parquet");
        List<VectorSchemaRoot> schemaRoots = new ArrayList<>();
        try(BufferAllocator allocator = new RootAllocator();
            NativeMemoryPool listenable = NativeMemoryPool.createListenable(
                    DirectReservationListener.instance());
            FileSystemDatasetFactory fileSystemDatasetFactory = new FileSystemDatasetFactory(
                    allocator, listenable,
                    FileFormat.PARQUET, file.toURI().toString());
            Dataset dataset = fileSystemDatasetFactory.finish();
            Scanner scanner = dataset.newScan(new ScanOptions(1000))
            ){
            System.out.println(DirectReservationListener.instance().getCurrentDirectMemReservation());
            for (ScanTask scanTask : scanner.scan()) {
                try(ArrowReader execute = scanTask.execute()){
                    while(execute.loadNextBatch()){
                        schemaRoots.add(execute.getVectorSchemaRoot());
                    }
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        System.out.println(schemaRoots.size());
    }
}
 

Log stack trace:

/Users/runner/work/crossbow/crossbow/arrow/java/dataset/src/main/cpp/jni_util.cc:78: Failed to update reservation while freeing bytes: JNIEnv was not attached to current thread
0   jnilib-1263766398115565476.tmp      0x000000013f46fc0c _ZN5arrow4util7CerrLogD2Ev + 204
1   jnilib-1263766398115565476.tmp      0x000000013f46fb2e _ZN5arrow4util7CerrLogD0Ev + 14
2   jnilib-1263766398115565476.tmp      0x000000013f464de2 _ZN5arrow4util8ArrowLogD1Ev + 34
3   jnilib-1263766398115565476.tmp      0x000000013e54d96d _ZN5arrow7dataset3jni31ReservationListenableMemoryPool4Impl4FreeEPhx + 237
4   jnilib-1263766398115565476.tmp      0x000000013f78c035 _ZN5arrow10PoolBufferD2Ev + 69
5   jnilib-1263766398115565476.tmp      0x000000013f78bd0e _ZN5arrow10PoolBufferD0Ev + 14
6   jnilib-1263766398115565476.tmp      0x000000013f70e1ce _ZN5arrow9ArrayDataD2Ev + 222
7   jnilib-1263766398115565476.tmp      0x000000013f5b8fde _ZN5arrow17SimpleRecordBatchD2Ev + 206
8   jnilib-1263766398115565476.tmp      0x000000013e5785b8 _ZNO5arrow6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE14ThenOnCompleteIZNS_23DefaultIfEmptyGeneratorIS4_EclEvEUt_NS5_17PassthruOnFailureIS9_EEEclERKNS_6ResultIS4_EE + 168
9   jnilib-1263766398115565476.tmp      0x000000013f46b486 _ZN5arrow18ConcreteFutureImpl21RunOrScheduleCallbackERKNSt3__110shared_ptrINS_10FutureImplEEEONS3_14CallbackRecordEb + 230
10  jnilib-1263766398115565476.tmp      0x000000013f46b2bd _ZN5arrow18ConcreteFutureImpl22DoMarkFinishedOrFailedENS_11FutureStateE + 189
11  jnilib-1263766398115565476.tmp      0x000000013e82dc82 _ZN5arrow6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE14DoMarkFinishedENS_6ResultIS4_EE + 290
12  jnilib-1263766398115565476.tmp      0x000000013e82d8e8 _ZN5arrow6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE12MarkFinishedENS_6ResultIS4_EE + 88
13  jnilib-1263766398115565476.tmp      0x000000013e82e8b0 _ZN5arrow8internal6FnOnceIFvRKNS_10FutureImplEEE6FnImplINS_6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE21WrapResultyOnComplete8CallbackINS_6detail16MarkNextFinishedISD_SD_Lb0ELb0EEEEEE6invokeES4_ + 160
14  jnilib-1263766398115565476.tmp      0x000000013f46b486 _ZN5arrow18ConcreteFutureImpl21RunOrScheduleCallbackERKNSt3__110shared_ptrINS_10FutureImplEEEONS3_14CallbackRecordEb + 230
15  jnilib-1263766398115565476.tmp      0x000000013f46b2bd _ZN5arrow18ConcreteFutureImpl22DoMarkFinishedOrFailedENS_11FutureStateE + 189
16  jnilib-1263766398115565476.tmp      0x000000013e82dc82 _ZN5arrow6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE14DoMarkFinishedENS_6ResultIS4_EE + 290
17  jnilib-1263766398115565476.tmp      0x000000013e82d8e8 _ZN5arrow6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE12MarkFinishedENS_6ResultIS4_EE + 88
18  jnilib-1263766398115565476.tmp      0x000000013e5c93fe _ZNK5arrow6detail14ContinueFutureclINS_24SerialReadaheadGeneratorINSt3__110shared_ptrINS_11RecordBatchEEEE11ErrCallbackEJRKNS_6StatusEENS_6ResultIS7_EENS_6FutureIS7_EEEENS4_9enable_ifIXaaaantsr3std7is_voidIT1_EE5valuentsr9is_futureISI_EE5valueoontsrT2_8is_emptysr3std7is_sameISI_SA_EE5valueEvE4typeESJ_OT_DpOT0_ + 110
19  jnilib-1263766398115565476.tmp      0x000000013e5c9335 _ZNO5arrow6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE14ThenOnCompleteINS_24SerialReadaheadGeneratorIS4_E8CallbackENS8_11ErrCallbackEEclERKNS_6ResultIS4_EE + 293
20  jnilib-1263766398115565476.tmp      0x000000013f46b486 _ZN5arrow18ConcreteFutureImpl21RunOrScheduleCallbackERKNSt3__110shared_ptrINS_10FutureImplEEEONS3_14CallbackRecordEb + 230
21  jnilib-1263766398115565476.tmp      0x000000013f46b2bd _ZN5arrow18ConcreteFutureImpl22DoMarkFinishedOrFailedENS_11FutureStateE + 189
22  jnilib-1263766398115565476.tmp      0x000000013e82dc82 _ZN5arrow6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE14DoMarkFinishedENS_6ResultIS4_EE + 290
23  jnilib-1263766398115565476.tmp      0x000000013e82d8e8 _ZN5arrow6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE12MarkFinishedENS_6ResultIS4_EE + 88
24  jnilib-1263766398115565476.tmp      0x000000013e5c7e46 _ZNK5arrow6detail14ContinueFutureclINS_6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE17PassthruOnFailureIZNS_7dataset16SlicingGeneratorclEvEUlRKS7_E_EEJRKNS_6StatusEENS_6ResultIS7_EES8_EENS4_9enable_ifIXaaaantsr3std7is_voidIT1_EE5valuentsr9is_futureISM_EE5valueoontsrT2_8is_emptysr3std7is_sameISM_SG_EE5valueEvE4typeESN_OT_DpOT0_ + 102
25  jnilib-1263766398115565476.tmp      0x000000013e5c7d9e _ZNO5arrow6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE14ThenOnCompleteIZNS_7dataset16SlicingGeneratorclEvEUlRKS4_E_NS5_17PassthruOnFailureISB_EEEclERKNS_6ResultIS4_EE + 222
26  jnilib-1263766398115565476.tmp      0x000000013f46b486 _ZN5arrow18ConcreteFutureImpl21RunOrScheduleCallbackERKNSt3__110shared_ptrINS_10FutureImplEEEONS3_14CallbackRecordEb + 230
27  jnilib-1263766398115565476.tmp      0x000000013f46b2bd _ZN5arrow18ConcreteFutureImpl22DoMarkFinishedOrFailedENS_11FutureStateE + 189
28  jnilib-1263766398115565476.tmp      0x000000013e82dc82 _ZN5arrow6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE14DoMarkFinishedENS_6ResultIS4_EE + 290
29  jnilib-1263766398115565476.tmp      0x000000013e82d8e8 _ZN5arrow6FutureINSt3__110shared_ptrINS_11RecordBatchEEEE12MarkFinishedENS_6ResultIS4_EE + 88
30  jnilib-1263766398115565476.tmp      0x000000013e62ad98 _ZN5arrow8internal6FnOnceIFvRKNS_10FutureImplEEE6FnImplINS_6FutureINS0_5EmptyEE21WrapStatusyOnComplete8CallbackIZNS_15MergedGeneratorINSt3__110shared_ptrINS_11RecordBatchEEEE5State14MarkFinalErrorERKNS_6StatusENS8_ISH_EEEUlSM_E_EEE6invokeES4_ + 56
31  jnilib-1263766398115565476.tmp      0x000000013f46b486 _ZN5arrow18ConcreteFutureImpl21RunOrScheduleCallbackERKNSt3__110shared_ptrINS_10FutureImplEEEONS3_14CallbackRecordEb + 230
32  jnilib-1263766398115565476.tmp      0x000000013f46b2bd _ZN5arrow18ConcreteFutureImpl22DoMarkFinishedOrFailedENS_11FutureStateE + 189
33  jnilib-1263766398115565476.tmp      0x000000013f4bb658 _ZN5arrow6FutureINS_8internal5EmptyEE14DoMarkFinishedENS_6ResultIS2_EE + 152
34  jnilib-1263766398115565476.tmp      0x000000013f4afbc1 _ZN5arrow6FutureINS_8internal5EmptyEE12MarkFinishedIS2_vEEvNS_6StatusE + 81
35  jnilib-1263766398115565476.tmp      0x000000013e626fda _ZN5arrow15MergedGeneratorINSt3__110shared_ptrINS_11RecordBatchEEEE5State20MarkFinishedAndPurgeEv + 58
36  jnilib-1263766398115565476.tmp      0x000000013e62b505 _ZN5arrow15MergedGeneratorINSt3__110shared_ptrINS_11RecordBatchEEEE13OuterCallbackclERKNS_6ResultINS1_8functionIFNS_6FutureIS4_EEvEEEEE + 1173
37  jnilib-1263766398115565476.tmp      0x000000013f46b486 _ZN5arrow18ConcreteFutureImpl21RunOrScheduleCallbackERKNSt3__110shared_ptrINS_10FutureImplEEEONS3_14CallbackRecordEb + 230
38  jnilib-1263766398115565476.tmp      0x000000013f46b2bd _ZN5arrow18ConcreteFutureImpl22DoMarkFinishedOrFailedENS_11FutureStateE + 189
39  jnilib-1263766398115565476.tmp      0x000000013e6220fa _ZN5arrow6FutureINSt3__18functionIFNS0_INS1_10shared_ptrINS_11RecordBatchEEEEEvEEEE14DoMarkFinishedENS_6ResultIS8_EE + 282
40  jnilib-1263766398115565476.tmp      0x000000013e621ef3 _ZN5arrow6FutureINSt3__18functionIFNS0_INS1_10shared_ptrINS_11RecordBatchEEEEEvEEEE12MarkFinishedENS_6ResultIS8_EE + 51
41  jnilib-1263766398115565476.tmp      0x000000013e78715b _ZN5arrow8internal6FnOnceIFvRKNS_10FutureImplEEE6FnImplINS_6FutureINSt3__18functionIFNS8_INS9_10shared_ptrINS_11RecordBatchEEEEEvEEEE21WrapResultyOnComplete8CallbackINS_6detail16MarkNextFinishedISH_SH_Lb0ELb0EEEEEE6invokeES4_ + 59
42  jnilib-1263766398115565476.tmp      0x000000013f46b486 _ZN5arrow18ConcreteFutureImpl21RunOrScheduleCallbackERKNSt3__110shared_ptrINS_10FutureImplEEEONS3_14CallbackRecordEb + 230
43  jnilib-1263766398115565476.tmp      0x000000013f454dcb _ZN5arrow18ConcreteFutureImpl11AddCallbackENS_8internal6FnOnceIFvRKNS_10FutureImplEEEENS_15CallbackOptionsE + 139
44  jnilib-1263766398115565476.tmp      0x000000013f454cfd _ZN5arrow10FutureImpl11AddCallbackENS_8internal6FnOnceIFvRKS0_EEENS_15CallbackOptionsE + 29
45  jnilib-1263766398115565476.tmp      0x000000013e786f78 _ZN5arrow8internal6FnOnceIFvvEE6FnImplINSt3__16__bindINS_6detail14ContinueFutureEJRNS_6FutureINS5_8functionIFNS9_INS5_10shared_ptrINS_11RecordBatchEEEEEvEEEEERFSH_PNS0_8ExecutorENSB_IN7parquet5arrow12_GLOBAL__N_114FileReaderImplEEEiRKNS5_6vectorIiNS5_9allocatorIiEEEEERSK_RSP_RKiSV_EEEE6invokeEv + 184
46  jnilib-1263766398115565476.tmp      0x000000013f453445 _ZNSt3__1L14__thread_proxyINS_5tupleIJNS_10unique_ptrINS_15__thread_structENS_14default_deleteIS3_EEEEZN5arrow8internal10ThreadPool21LaunchWorkersUnlockedEiE3$_3EEEEEPvSC_ + 693
47  libsystem_pthread.dylib             0x00007fff2072f8fc _pthread_start + 224
48  libsystem_pthread.dylib             0x00007fff2072b443 thread_start + 15Process finished with exit code 134 (interrupted by signal 6: SIGABRT) 

 

Reporter: David Dali Susanibar Arce / @davisusanibar

Note: This issue was originally created as ARROW-17508. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
The SO question is just a general "Dataset scanning takes a lot of memory" and is not particularly specific to Java. We just don't expose the configuration knobs. @westonpace has been improving this and I think this will continue to improve.

For here: why do we need the NativeMemoryPool? IMO, now that we have C Data, we could just get rid of it. We just need to solve ARROW-16673.

@fb64
Copy link
Contributor

fb64 commented Feb 11, 2024

Error spotted when trying to use Arrow-dataset to add support of parquet file in Kotlin Dataframe : Kotlin/dataframe#577 (comment)

@pitrou
Copy link
Member

pitrou commented Feb 13, 2024

Here is the C++-demangled version of the stack trace from the issue description:

/Users/runner/work/crossbow/crossbow/arrow/java/dataset/src/main/cpp/jni_util.cc:78: Failed to update reservation while freeing bytes: JNIEnv was not attached to current thread
0   jnilib-1263766398115565476.tmp      0x000000013f46fc0c arrow::util::CerrLog::~CerrLog() + 204
1   jnilib-1263766398115565476.tmp      0x000000013f46fb2e arrow::util::CerrLog::~CerrLog() + 14
2   jnilib-1263766398115565476.tmp      0x000000013f464de2 arrow::util::ArrowLog::~ArrowLog() + 34
3   jnilib-1263766398115565476.tmp      0x000000013e54d96d arrow::dataset::jni::ReservationListenableMemoryPool::Impl::Free(unsigned char*, long long) + 237
4   jnilib-1263766398115565476.tmp      0x000000013f78c035 arrow::PoolBuffer::~PoolBuffer() + 69
5   jnilib-1263766398115565476.tmp      0x000000013f78bd0e arrow::PoolBuffer::~PoolBuffer() + 14
6   jnilib-1263766398115565476.tmp      0x000000013f70e1ce arrow::ArrayData::~ArrayData() + 222
7   jnilib-1263766398115565476.tmp      0x000000013f5b8fde arrow::SimpleRecordBatch::~SimpleRecordBatch() + 206
8   jnilib-1263766398115565476.tmp      0x000000013e5785b8 arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::ThenOnComplete<arrow::DefaultIfEmptyGenerator<std::__1::shared_ptr<arrow::RecordBatch> >::operator()()::{unnamed type#1}, arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::PassthruOnFailure<{unnamed type#1}> >::operator()(arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> > const&) && + 168
9   jnilib-1263766398115565476.tmp      0x000000013f46b486 arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::__1::shared_ptr<arrow::FutureImpl> const&, arrow::FutureImpl::CallbackRecord&&, bool) + 230
10  jnilib-1263766398115565476.tmp      0x000000013f46b2bd arrow::ConcreteFutureImpl::DoMarkFinishedOrFailed(arrow::FutureState) + 189
11  jnilib-1263766398115565476.tmp      0x000000013e82dc82 arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::DoMarkFinished(arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> >) + 290
12  jnilib-1263766398115565476.tmp      0x000000013e82d8e8 arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::MarkFinished(arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> >) + 88
13  jnilib-1263766398115565476.tmp      0x000000013e82e8b0 arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::WrapResultyOnComplete::Callback<arrow::detail::MarkNextFinished<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >, arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >, false, false> > >::invoke(arrow::FutureImpl const&) + 160
14  jnilib-1263766398115565476.tmp      0x000000013f46b486 arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::__1::shared_ptr<arrow::FutureImpl> const&, arrow::FutureImpl::CallbackRecord&&, bool) + 230
15  jnilib-1263766398115565476.tmp      0x000000013f46b2bd arrow::ConcreteFutureImpl::DoMarkFinishedOrFailed(arrow::FutureState) + 189
16  jnilib-1263766398115565476.tmp      0x000000013e82dc82 arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::DoMarkFinished(arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> >) + 290
17  jnilib-1263766398115565476.tmp      0x000000013e82d8e8 arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::MarkFinished(arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> >) + 88
18  jnilib-1263766398115565476.tmp      0x000000013e5c93fe std::__1::enable_if<((!std::is_void<arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> > >::value)&&(!is_future<arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> > >::value))&&((!arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::is_empty)||std::is_same<arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> >, arrow::Status>::value), void>::type arrow::detail::ContinueFuture::operator()<arrow::SerialReadaheadGenerator<std::__1::shared_ptr<arrow::RecordBatch> >::ErrCallback, arrow::Status const&, arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> >, arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> > >(arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >, arrow::SerialReadaheadGenerator<std::__1::shared_ptr<arrow::RecordBatch> >::ErrCallback&&, arrow::Status const&) const + 110
19  jnilib-1263766398115565476.tmp      0x000000013e5c9335 arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::ThenOnComplete<arrow::SerialReadaheadGenerator<std::__1::shared_ptr<arrow::RecordBatch> >::Callback, arrow::SerialReadaheadGenerator<std::__1::shared_ptr<arrow::RecordBatch> >::ErrCallback>::operator()(arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> > const&) && + 293
20  jnilib-1263766398115565476.tmp      0x000000013f46b486 arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::__1::shared_ptr<arrow::FutureImpl> const&, arrow::FutureImpl::CallbackRecord&&, bool) + 230
21  jnilib-1263766398115565476.tmp      0x000000013f46b2bd arrow::ConcreteFutureImpl::DoMarkFinishedOrFailed(arrow::FutureState) + 189
22  jnilib-1263766398115565476.tmp      0x000000013e82dc82 arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::DoMarkFinished(arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> >) + 290
23  jnilib-1263766398115565476.tmp      0x000000013e82d8e8 arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::MarkFinished(arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> >) + 88
24  jnilib-1263766398115565476.tmp      0x000000013e5c7e46 std::__1::enable_if<((!std::is_void<arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> > >::value)&&(!is_future<arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> > >::value))&&((!arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::is_empty)||std::is_same<arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> >, arrow::Status>::value), void>::type arrow::detail::ContinueFuture::operator()<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::PassthruOnFailure<arrow::dataset::SlicingGenerator::operator()()::{lambda(std::__1::shared_ptr<arrow::RecordBatch> const&)#1}>, arrow::Status const&, arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> >, arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> > >(arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >, arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::PassthruOnFailure<arrow::dataset::SlicingGenerator::operator()()::{lambda(std::__1::shared_ptr<arrow::RecordBatch> const&)#1}>&&, arrow::Status const&) const + 102
25  jnilib-1263766398115565476.tmp      0x000000013e5c7d9e arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::ThenOnComplete<arrow::dataset::SlicingGenerator::operator()()::{lambda(std::__1::shared_ptr<arrow::RecordBatch> const&)#1}, arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::PassthruOnFailure<arrow::dataset::SlicingGenerator::operator()()::{lambda(std::__1::shared_ptr<arrow::RecordBatch> const&)#1}> >::operator()(arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> > const&) && + 222
26  jnilib-1263766398115565476.tmp      0x000000013f46b486 arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::__1::shared_ptr<arrow::FutureImpl> const&, arrow::FutureImpl::CallbackRecord&&, bool) + 230
27  jnilib-1263766398115565476.tmp      0x000000013f46b2bd arrow::ConcreteFutureImpl::DoMarkFinishedOrFailed(arrow::FutureState) + 189
28  jnilib-1263766398115565476.tmp      0x000000013e82dc82 arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::DoMarkFinished(arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> >) + 290
29  jnilib-1263766398115565476.tmp      0x000000013e82d8e8 arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >::MarkFinished(arrow::Result<std::__1::shared_ptr<arrow::RecordBatch> >) + 88
30  jnilib-1263766398115565476.tmp      0x000000013e62ad98 arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<arrow::internal::Empty>::WrapStatusyOnComplete::Callback<arrow::MergedGenerator<std::__1::shared_ptr<arrow::RecordBatch> >::State::MarkFinalError(arrow::Status const&, arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> >)::{lambda(arrow::Status const&)#1}> >::invoke(arrow::FutureImpl const&) + 56
31  jnilib-1263766398115565476.tmp      0x000000013f46b486 arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::__1::shared_ptr<arrow::FutureImpl> const&, arrow::FutureImpl::CallbackRecord&&, bool) + 230
32  jnilib-1263766398115565476.tmp      0x000000013f46b2bd arrow::ConcreteFutureImpl::DoMarkFinishedOrFailed(arrow::FutureState) + 189
33  jnilib-1263766398115565476.tmp      0x000000013f4bb658 arrow::Future<arrow::internal::Empty>::DoMarkFinished(arrow::Result<arrow::internal::Empty>) + 152
34  jnilib-1263766398115565476.tmp      0x000000013f4afbc1 void arrow::Future<arrow::internal::Empty>::MarkFinished<arrow::internal::Empty, void>(arrow::Status) + 81
35  jnilib-1263766398115565476.tmp      0x000000013e626fda arrow::MergedGenerator<std::__1::shared_ptr<arrow::RecordBatch> >::State::MarkFinishedAndPurge() + 58
36  jnilib-1263766398115565476.tmp      0x000000013e62b505 arrow::MergedGenerator<std::__1::shared_ptr<arrow::RecordBatch> >::OuterCallback::operator()(arrow::Result<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> > ()> > const&) + 1173
37  jnilib-1263766398115565476.tmp      0x000000013f46b486 arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::__1::shared_ptr<arrow::FutureImpl> const&, arrow::FutureImpl::CallbackRecord&&, bool) + 230
38  jnilib-1263766398115565476.tmp      0x000000013f46b2bd arrow::ConcreteFutureImpl::DoMarkFinishedOrFailed(arrow::FutureState) + 189
39  jnilib-1263766398115565476.tmp      0x000000013e6220fa arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> > ()> >::DoMarkFinished(arrow::Result<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> > ()> >) + 282
40  jnilib-1263766398115565476.tmp      0x000000013e621ef3 arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> > ()> >::MarkFinished(arrow::Result<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> > ()> >) + 51
41  jnilib-1263766398115565476.tmp      0x000000013e78715b arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> > ()> >::WrapResultyOnComplete::Callback<arrow::detail::MarkNextFinished<arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> > ()> >, arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> > ()> >, false, false> > >::invoke(arrow::FutureImpl const&) + 59
42  jnilib-1263766398115565476.tmp      0x000000013f46b486 arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::__1::shared_ptr<arrow::FutureImpl> const&, arrow::FutureImpl::CallbackRecord&&, bool) + 230
43  jnilib-1263766398115565476.tmp      0x000000013f454dcb arrow::ConcreteFutureImpl::AddCallback(arrow::internal::FnOnce<void (arrow::FutureImpl const&)>, arrow::CallbackOptions) + 139
44  jnilib-1263766398115565476.tmp      0x000000013f454cfd arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void (arrow::FutureImpl const&)>, arrow::CallbackOptions) + 29
45  jnilib-1263766398115565476.tmp      0x000000013e786f78 arrow::internal::FnOnce<void ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> > ()> >&, arrow::Future<std::__1::function<arrow::Future<std::__1::shared_ptr<arrow::RecordBatch> > ()> > (&)(arrow::internal::Executor*, std::__1::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, int, std::__1::vector<int, std::__1::allocator<int> > const&), arrow::internal::Executor*&, std::__1::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>&, int const&, std::__1::vector<int, std::__1::allocator<int> > const&> >::invoke() + 184
46  jnilib-1263766398115565476.tmp      0x000000013f453445 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3> >(void*) + 693
47  libsystem_pthread.dylib             0x00007fff2072f8fc _pthread_start + 224
48  libsystem_pthread.dylib             0x00007fff2072b443 thread_start + 15Process finished with exit code 134 (interrupted by signal 6: SIGABRT) 

@pitrou
Copy link
Member

pitrou commented Feb 13, 2024

The problem is quite clear and should be relatively easy to fix. It comes from here:

arrow::Status OnRelease(int64_t size) override {
JNIEnv* env;
if (vm_->GetEnv(reinterpret_cast<void**>(&env), JNI_VERSION) != JNI_OK) {
return arrow::Status::Invalid("JNIEnv was not attached to current thread");
}
env->CallObjectMethod(java_reservation_listener_, unreserve_memory_method, size);
RETURN_NOT_OK(arrow::dataset::jni::CheckException(env));
return arrow::Status::OK();
}

See the JNI APIs:
https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#AttachCurrentThread

@fb64
Copy link
Contributor

fb64 commented Feb 14, 2024

Thanks for the hint @pitrou , I'll try to add AttachCurrentThread to fix the issue if I manage to make the compilation of the JNI libraries work on my computer..

arrow::Status OnRelease(int64_t size) override {
  JNIEnv* env;
  if (vm_->GetEnv(reinterpret_cast<void**>(&env), JNI_VERSION) != JNI_OK) {
      if (AttachCurrentThread(vm_,&env) != JNI_OK) {
          return arrow::Status::Invalid("JNIEnv was not attached to current thread");
      }
  }
  env->CallObjectMethod(java_reservation_listener_, unreserve_memory_method, size);
  RETURN_NOT_OK(arrow::dataset::jni::CheckException(env));
  return arrow::Status::OK();
}

fb64 added a commit to fb64/arrow that referenced this issue Feb 16, 2024
…eing bytes: JNIEnv was not attached to current thread
fb64 added a commit to fb64/arrow that referenced this issue Feb 16, 2024
…eing bytes: JNIEnv was not attached to current thread
fb64 added a commit to fb64/arrow that referenced this issue Feb 16, 2024
…eing bytes: JNIEnv was not attached to current thread
fb64 added a commit to fb64/arrow that referenced this issue Feb 19, 2024
…eing bytes: JNIEnv was not attached to current thread
lidavidm pushed a commit that referenced this issue Feb 20, 2024
…ytes (#40101)

### Rationale for this change
Better controls JNI Thread management in java dataset module to fix #20379 
Re-use the same code found in the java arrow-c-data module : https://github.com/apache/arrow/blob/main/java/c/src/main/cpp/jni_wrapper.cc#L107

May JNIEnvGuard class code can be put in a common place for both libraries ...

### What changes are included in this PR?
N/A

### Are these changes tested?
These changes has been tested with :  https://gist.github.com/fb64/71880cde297bc5234b02b68b785670fd 
on Linux X86_64 architecture

### Are there any user-facing changes?
N/A

* Closes: #20379

Authored-by: Florian Bernard <[email protected]>
Signed-off-by: David Li <[email protected]>
@lidavidm lidavidm added this to the 16.0.0 milestone Feb 20, 2024
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
…eing bytes (apache#40101)

### Rationale for this change
Better controls JNI Thread management in java dataset module to fix apache#20379 
Re-use the same code found in the java arrow-c-data module : https://github.com/apache/arrow/blob/main/java/c/src/main/cpp/jni_wrapper.cc#L107

May JNIEnvGuard class code can be put in a common place for both libraries ...

### What changes are included in this PR?
N/A

### Are these changes tested?
These changes has been tested with :  https://gist.github.com/fb64/71880cde297bc5234b02b68b785670fd 
on Linux X86_64 architecture

### Are there any user-facing changes?
N/A

* Closes: apache#20379

Authored-by: Florian Bernard <[email protected]>
Signed-off-by: David Li <[email protected]>
thisisnic pushed a commit to thisisnic/arrow that referenced this issue Mar 8, 2024
…eing bytes (apache#40101)

### Rationale for this change
Better controls JNI Thread management in java dataset module to fix apache#20379 
Re-use the same code found in the java arrow-c-data module : https://github.com/apache/arrow/blob/main/java/c/src/main/cpp/jni_wrapper.cc#L107

May JNIEnvGuard class code can be put in a common place for both libraries ...

### What changes are included in this PR?
N/A

### Are these changes tested?
These changes has been tested with :  https://gist.github.com/fb64/71880cde297bc5234b02b68b785670fd 
on Linux X86_64 architecture

### Are there any user-facing changes?
N/A

* Closes: apache#20379

Authored-by: Florian Bernard <[email protected]>
Signed-off-by: David Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants