Documentation on how to add a new transformer #17

ansaardollie · 2024-03-09T21:14:11Z

Hi there,

Just wondering if you could provide documentation on how to go about creating custom storage,loaders and writers. The documentation just says to implement the required functions, however when I use the following code

function storage(provider::DataStorage{:rijl}, ::Type{IO}; write::Bool)
    println("Inside storage(provider::DataStorage{:rijl}, ::Type{IO}; write::Bool)")
    qry = getquery(provider)

    fn = createname(qry)

    # cache_dir = DataToolkitBase.config_get("cache_dir")
    cache_dir = "./cache"

    tpath = joinpath(cache_dir, fn)

    if isfile(tpath) && write
        cached_bytes = open(tpath, "w")
        return cached_bytes
    elseif isfile(tpath) && !(write)
        cached_bytes = open(tpath, "r")
        return cached_bytes
    else
        return run_query_and_cache(qry, tpath; write=write)
    end

end

function load(loader::DataLoader{:rijl}, source::Type{IO}, as::Type{DataFrame})
    println("Inside load(loader::DataLoader{:rijl}, source::Type{IO}, as::Type{DataFrame})")
    pds = Parquet2.Dataset(source)
    abf = IOBuffer()

    Arrow.write(abf, pds)

    abb = take!(abf)

    df = Arrow.Table(abb) |> DataFrame

    return df
end

And then use the following dataset in the Data.toml file

data_config_version = 0
uuid = "f812338f-4069-46dc-8bb8-dba7cb5e1ae5"
name = "RiData"
plugins = ["store", "defaults", "memorise"]


[[Test]]
uuid = "3fb5d56a-63d2-4474-b4ae-4d824a2d6b2a"

[[Test.storage]]
driver = "rijl"
type = "DataStorage{:rijl}"
query = "SELECT * FROM TABLE"

[[Test.loader]]
driver = "rijl"
type = "DataStorage{:rijl}"

I get the following error


ERROR: UnsatisfyableTransformer: There are no storages for "Test" that can provide a .
 The defined storages are as follows:
   DataStorage{rijl}(DataStorage{:rijl})

Please can you help me, I really love the idea of this package and want to incorporate it into a few different data pipelines but I cannot seem to get the basics down.

The text was updated successfully, but these errors were encountered:

tecosaur · 2024-04-07T04:45:14Z

Hi @ansaardollie, sorry for the delay but I'd be happy to help!

The main problem I see with the code you've shared, is what you've set the type parameter to in the TOML. It should be set to the Julia type of the information produced by the loader/storage backend, e.g. IO, String, DataFrame. When there's only one option, you can just omit it entirely too.

tecosaur added documentation Improvements or additions to documentation question Further information is requested and removed documentation Improvements or additions to documentation labels May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation on how to add a new transformer #17

Documentation on how to add a new transformer #17

ansaardollie commented Mar 9, 2024 •

edited by tecosaur

Loading

tecosaur commented Apr 7, 2024

Documentation on how to add a new transformer #17

Documentation on how to add a new transformer #17

Comments

ansaardollie commented Mar 9, 2024 • edited by tecosaur Loading

tecosaur commented Apr 7, 2024

ansaardollie commented Mar 9, 2024 •

edited by tecosaur

Loading