Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error creating dataset from get_subset() #3

Open
glitt13 opened this issue Oct 4, 2024 · 2 comments
Open

Error creating dataset from get_subset() #3

glitt13 opened this issue Oct 4, 2024 · 2 comments

Comments

@glitt13
Copy link

glitt13 commented Oct 4, 2024

Recent updates to hfsubsetR now generate errors in unit tests that previously passed. Here's an example of a call that no longer works:

nldi_feat <- list(featureSource = "comid",featureID="1520007")

hfsubsetR::get_subset(nldi_feature = nldi_feat,
                    outfile ="./hydrofab_network__1520007.gpkg",
                        type = 'reference',lyrs ="network",
                        overwrite=TRUE
 )

Which generates the following:

Error in `arrow::open_dataset()`:
! IOError: Error creating dataset. Could not read schema from 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/vpuid=01/part-0.parquet'. Is this a 'parquet' file?: Could not open Parquet input source 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/vpuid=01/part-0.parquet': AWS Error ACCESS_DENIED during GetObject operation: Access Denied

Traceback here:

> rlang::last_trace()
<error/rlang_error>
Error in `arrow::open_dataset()`:
! IOError: Error creating dataset. Could not read schema from 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/vpuid=01/part-0.parquet'. Is this a 'parquet' file?: Could not open Parquet input source 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/vpuid=01/part-0.parquet': AWS Error ACCESS_DENIED during GetObject operation: Access Denied
---
Backtrace:
     ▆
  1. └─hfsubsetR::get_subset(...)
  2.   └─hfsubsetR::findOrigin(...)
  3.     ├─dplyr::slice_min(...)
  4.     ├─dplyr::collect(...)
  5.     ├─dplyr::distinct(...)
  6.     ├─dplyr::select(...)
  7.     ├─hfsubsetR:::findOriginQuery(.query, network)
  8.     ├─hfsubsetR:::findOriginQuery.nldi_feature(.query, network)
  9.     ├─base::NextMethod()
 10.     └─hfsubsetR:::findOriginQuery.comid(.query, network)
 11.       ├─dplyr::filter(arrow::open_dataset(network), hf_id == !!comid)
 12.       └─arrow::open_dataset(network)
Run rlang::last_trace(drop = FALSE) to see 6 hidden frames.
> rlang::last_trace(drop = FALSE)
<error/rlang_error>
Error in `arrow::open_dataset()`:
! IOError: Error creating dataset. Could not read schema from 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/vpuid=01/part-0.parquet'. Is this a 'parquet' file?: Could not open Parquet input source 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/vpuid=01/part-0.parquet': AWS Error ACCESS_DENIED during GetObject operation: Access Denied
---
Backtrace:
     ▆
  1. └─hfsubsetR::get_subset(...)
  2.   └─hfsubsetR::findOrigin(...)
  3.     ├─dplyr::slice_min(...)
  4.     ├─dplyr::collect(...)
  5.     ├─dplyr::distinct(...)
  6.     ├─dplyr::select(...)
  7.     ├─hfsubsetR:::findOriginQuery(.query, network)
  8.     ├─hfsubsetR:::findOriginQuery.nldi_feature(.query, network)
  9.     ├─base::NextMethod()
 10.     └─hfsubsetR:::findOriginQuery.comid(.query, network)
 11.       ├─dplyr::filter(arrow::open_dataset(network), hf_id == !!comid)
 12.       └─arrow::open_dataset(network)
 13.         └─base::tryCatch(...)
 14.           └─base (local) tryCatchList(expr, classes, parentenv, handlers)
 15.             └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 16.               └─value[[3L]](cond)
 17.                 └─arrow:::augment_io_error_msg(e, call, format = format)
 18.                   └─rlang::abort(msg, call = call)
@glitt13
Copy link
Author

glitt13 commented Oct 17, 2024

@mikejohnson51 This issue still persists with the latest changes in the main branch, except that the Error message has changed to the following:

Error: IOError: Path does not exist 'lynker-spatial/hydrofabric/v2.2/reference/conus_network/'. Detail: [errno 2] No such file or directory

What's the anticipated resolution timeframe? Having an estimate will help me decide how to prioritize tasks. If you need a hand with anything, reach out and I'm happy to help dig further.

> traceback()
13: dataset___FileSystemDatasetFactory__Make(filesystem, selector, 
        format, fsf_options(factory_options, partitioning))
12: FileSystemDatasetFactory$create(path_and_fs$fs, selector, NULL, 
        format, partitioning, factory_options)
11: DatasetFactory$create(sources, partitioning = partitioning, format = format, 
        schema = schema, hive_style = hive_style, factory_options = factory_options, 
        ...)
10: arrow::open_dataset(network)
9: dplyr::filter(arrow::open_dataset(network), hf_id == !!comid)
8: findOriginQuery.comid(.query, network)
7: findOriginQuery(.query, network)
6: dplyr::select(findOriginQuery(.query, network), id, toid, vpuid, 
       topo, hydroseq)
5: dplyr::distinct(dplyr::select(findOriginQuery(.query, network), 
       id, toid, vpuid, topo, hydroseq))
4: dplyr::collect(dplyr::distinct(dplyr::select(findOriginQuery(.query, 
       network), id, toid, vpuid, topo, hydroseq)))
3: dplyr::slice_min(dplyr::collect(dplyr::distinct(dplyr::select(findOriginQuery(.query, 
       network), id, toid, vpuid, topo, hydroseq))), hydroseq, with_ties = TRUE)
2: findOrigin(network = glue("{hook}_network"), id = id, comid = comid, 
       hl_uri = hl_uri, poi_id = poi_id, nldi_feature = nldi_feature, 
       xy = xy)
1: hfsubsetR::get_subset(comid = comid, outfile = fp_cat, lyrs = lyrs, 
       overwrite = overwrite, type = "nextgen")

@mikejohnson51
Copy link
Member

Hey Guy,

If you check Lynker Spatial, you'll see v2.2 is not up in parquet form, so, the file you're aiming at truly doesn't exist. Once we're sure we've resolved the file corruption issue (root of this issue originally) we'll throw it all up there.

Thanks!

Mike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants