Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation on how to add a new transformer #17

Open
ansaardollie opened this issue Mar 9, 2024 · 1 comment
Open

Documentation on how to add a new transformer #17

ansaardollie opened this issue Mar 9, 2024 · 1 comment
Labels
documentation Improvements or additions to documentation question Further information is requested

Comments

@ansaardollie
Copy link

ansaardollie commented Mar 9, 2024

Hi there,

Just wondering if you could provide documentation on how to go about creating custom storage,loaders and writers. The documentation just says to implement the required functions, however when I use the following code

function storage(provider::DataStorage{:rijl}, ::Type{IO}; write::Bool)
    println("Inside storage(provider::DataStorage{:rijl}, ::Type{IO}; write::Bool)")
    qry = getquery(provider)

    fn = createname(qry)

    # cache_dir = DataToolkitBase.config_get("cache_dir")
    cache_dir = "./cache"

    tpath = joinpath(cache_dir, fn)

    if isfile(tpath) && write
        cached_bytes = open(tpath, "w")
        return cached_bytes
    elseif isfile(tpath) && !(write)
        cached_bytes = open(tpath, "r")
        return cached_bytes
    else
        return run_query_and_cache(qry, tpath; write=write)
    end

end
function load(loader::DataLoader{:rijl}, source::Type{IO}, as::Type{DataFrame})
    println("Inside load(loader::DataLoader{:rijl}, source::Type{IO}, as::Type{DataFrame})")
    pds = Parquet2.Dataset(source)
    abf = IOBuffer()

    Arrow.write(abf, pds)

    abb = take!(abf)

    df = Arrow.Table(abb) |> DataFrame

    return df
end

And then use the following dataset in the Data.toml file

data_config_version = 0
uuid = "f812338f-4069-46dc-8bb8-dba7cb5e1ae5"
name = "RiData"
plugins = ["store", "defaults", "memorise"]


[[Test]]
uuid = "3fb5d56a-63d2-4474-b4ae-4d824a2d6b2a"

[[Test.storage]]
driver = "rijl"
type = "DataStorage{:rijl}"
query = "SELECT * FROM TABLE"

[[Test.loader]]
driver = "rijl"
type = "DataStorage{:rijl}"

I get the following error


ERROR: UnsatisfyableTransformer: There are no storages for "Test" that can provide a .
 The defined storages are as follows:
   DataStorage{rijl}(DataStorage{:rijl})

Please can you help me, I really love the idea of this package and want to incorporate it into a few different data pipelines but I cannot seem to get the basics down.

@tecosaur
Copy link
Owner

tecosaur commented Apr 7, 2024

Hi @ansaardollie, sorry for the delay but I'd be happy to help!

The main problem I see with the code you've shared, is what you've set the type parameter to in the TOML. It should be set to the Julia type of the information produced by the loader/storage backend, e.g. IO, String, DataFrame. When there's only one option, you can just omit it entirely too.

@tecosaur tecosaur added documentation Improvements or additions to documentation question Further information is requested and removed documentation Improvements or additions to documentation labels May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants