diff --git a/Project.toml b/Project.toml index 9f2d36c..8b0da36 100644 --- a/Project.toml +++ b/Project.toml @@ -1,7 +1,7 @@ name = "PrefectInterfaces" uuid = "25d49962-0f22-42a0-bb44-b427e1ded1d4" authors = ["mahiki "] -version = "0.3.0" +version = "0.3.1" [deps] AWS = "fbe9abb3-538b-5e4e-ba9e-bc94f4f92ebc" diff --git a/docs/src/developers.md b/docs/src/developers.md index c2de087..8212bbc 100644 --- a/docs/src/developers.md +++ b/docs/src/developers.md @@ -1,5 +1,6 @@ # Developers -* Develop and test with/without `just` taskrunner. +* (Optional) `just` taskrunner, see [Justfile](@ref), install as a dev tool as a convenience. +* From repo root type '`just info`' for hints. * Documenter.jl `doctest()` included in `runtests.jl` ## Test, Build Docs with Justfile diff --git a/docs/src/usage-and-explanation.md b/docs/src/usage-and-explanation.md index 86e2284..2edde34 100644 --- a/docs/src/usage-and-explanation.md +++ b/docs/src/usage-and-explanation.md @@ -1,5 +1,5 @@ # Usage and Design Explanation ->A Data Scientist or Analyst User Story +>A data scientist or analyst with orchestrated jobs and productionized reports. The problem is to manage routine data ETL or pipeline processing with Prefect and the Python API, while calling Julia fuctions for expressive dataframe transformations or niche high performance custom code. Prefect doesn't provide a Julia SDK (yet), so this package provides components for julia operations that are called from a Prefect orchestration environment. @@ -53,5 +53,5 @@ The julia environment does not need to be aware of project environment, because **Managing dev/prod environment with dev/main git branches:** When both main/dev are local, there will be two local prefect DB with different PREFECT_API_URL defined by the Prefect `profiles.toml` profile. The python side of the application will need to distinguish the dev/prod PREFECT_HOME environment variables to define different locations for the prefect DB (which is just a sqlite file). I prefer to do this in a task runner outside of the python application, something like Github Actions, Make, or `just`. -## Justfile -I've found when managing a Prefect orchestrator it is helpful to have a taskrunner program that documents development tasks and executes them for you as well. I use [`just`](https://just.systems/) to launch `dev/main` Prefect DB local servers and manage tasks like Prefect deployment builds ßand running tests before merging and deploying. If you, like most data scientists, like to develop and test on the main branch please ignore this part of the package. +## Why Just Taskrunner +I've found when managing a Prefect orchestrator it's best to have a taskrunner program to codify and smooth out repetitive tasks. I use [`just`](https://just.systems/) to launch `dev/main` Prefect DB local servers and manage tasks like Prefect deployment builds and running tests before merging and deploying. The justfile provides self-documentation as the workflow evolves. diff --git a/justfile b/justfile index da0ac75..71496a9 100644 --- a/justfile +++ b/justfile @@ -10,19 +10,11 @@ default: # info for developing/testing this package info: - @echo "Optional on setup:" + @echo "Setting up Prefect Demo [Optional]:" @echo " cd prefect/; just init" - @echo " * this intalls poetry package and get prefect local server running" + @echo " * this intalls poetry package and gets prefect local server running" + @echo " * see docs 'Prefect Installation' section" @echo - @echo "Typical dev workflow:" - @echo " git checkout -b issue-3/s3-read-write" - @echo " just repl; ] instantiate; add PKGS # as neeeded" - @echo " * code, write/edit tests *" - @echo " just build - this runs the server, tests, doctest, builds docs" - @echo " * now debug until its clean *" - @echo " git commit 'closes #3: s3 read/write'" - @echo " ... git merge" - @echo " vim Project.toml -> bump version number, commit." # pass thru command run *args: @@ -52,3 +44,25 @@ kill: # full cycle of launch server, test, docs, kill server build: launch test docs kill + +# dev workflow steps, a reminder +workflow: + @echo "Dev workflow:" + @echo " git checkout -b issue-3/s3-read-write" + @echo " just repl; ] instantiate; add PKGS # as neeeded" + @echo " code, write/edit tests" + @echo " 'just build' - this runs the server, tests, doctest, builds docs" + @echo " debug" + @echo " vim Project.toml -> bump version number" + @echo " git commit 'closes #3: s3 read/write'" + @echo " => pull request" + @echo " => git merge; git push" + @echo + @echo " Registrator & Tagbot on merge commit" + @echo " add comment to commit to get release as follows:" + @echo " @JuliaRegistrator register" + @echo + @echo " Release Notes:" + @echo + @echo " # Markdown Notes Here" + @echo " - blah blah" diff --git a/src/Datasets/Datasets.jl b/src/Datasets/Datasets.jl index 3b22ebe..eff4faa 100644 --- a/src/Datasets/Datasets.jl +++ b/src/Datasets/Datasets.jl @@ -85,7 +85,8 @@ end read(ds::Dataset) Returns a `DataFrame` by calling `CSV.read` on a filepath defined by the Dataset type. -*NOTE:* A prefect server must be available. + +*NOTE:* A prefect server must be available to use Dataset read function. # Examples ```julia @@ -115,6 +116,8 @@ end write(ds::Dataset, df::DataFrame) Writes a `DataFrame` via `CSV.write` to a filepath defined by the `Dataset` type. + +*NOTE:* A prefect server must be available to use Dataset read function. """ function write( ds::Dataset diff --git a/src/config.jl b/src/config.jl index a244c83..a357ab0 100644 --- a/src/config.jl +++ b/src/config.jl @@ -1,5 +1,7 @@ """ PrefectAPI(url::String, key::SecretString) <:AbstractPrefectInterface + PrefectAPI(url::String) + PrefectAPI() Mutable struct tha stores the Prefect server api endpoint. All `PrefectInterface` operations depend on connecting to a running Prefect server to pull block information. Constructor with no arguments assigns env variables `PREFECT_API_URL`, `PREFECT_API_KEY` diff --git a/src/prefectblock/prefectblocktypes.jl b/src/prefectblock/prefectblocktypes.jl index 4b864bc..8ef862c 100644 --- a/src/prefectblock/prefectblocktypes.jl +++ b/src/prefectblock/prefectblocktypes.jl @@ -127,6 +127,33 @@ struct CredentialPairBlock <: AbstractPrefectBlock ) end +""" + S3BucketBlock( + blockname, blocktype, bucket_name, bucket_folder + , region_name, aws_access_key_id, aws_secret_access_key) + +Corresponds with the Prefect S3Bucket block in the prefect-aws integration. Attached functions: + + read_path("path/to/object.csv") + write_path("path/to/object.csv", df::AbstractDataFrame) + +Returns or writes a DataFrame csv object at a relative key from the +block-defined `s3:://bucket_name/bucket_folder/path/to/object.csv`. + +# Examples: +```julia +# pull hypothetical existing block from Prefect DB server + +julia> s3block = PrefectBlock("s3-bucket/willowdata") +S3BucketBlock("s3-bucket/willowdata", "s3-bucket", "willowdata", "data-folder/dev", "us-west-2" +, "AKIAEXAMPLEXXX", ####Secret####, ...) + +julia> df = s3block.block.read_path("extracts/csv/dataset=test_table/rundate=2023-05-25/data.csv"); + +julia> s3block.block.write_path("testfolder/xanadu-test.csv", df) +p"s3://willowdata/data-folder/dev/testfolder/xanadu-test.csv" +``` +""" struct S3BucketBlock <: AbstractPrefectBlock blockname::String blocktype::String