diff --git a/blog/2023-11-24-data-pipeline-orchestrator/index.mdx b/blog/2023-11-24-data-pipeline-orchestrator/index.mdx index 1110dfd5f..5dea86bcb 100644 --- a/blog/2023-11-24-data-pipeline-orchestrator/index.mdx +++ b/blog/2023-11-24-data-pipeline-orchestrator/index.mdx @@ -22,9 +22,9 @@ An ETL is nothing else than a [DAG](https://en.wikipedia.org/wiki/Directed_acycl Windmill enables building fast, powerful, reliable, and easy-to-build data pipelines: -- The DX in Windmill allows you to quickly [assemble flows](/docs/flows/flow_editor) that can process data step by step in a visual and easy-to-manage way ; -- You can control [parallelism](/docs/flows/flow_branches#branch-all) between individual steps, and set [concurrency limits](/docs/flows/concurrency_limit) in case external resources need are fragile or rate limited ; -- [Windmill flows can be restarted from any step](../2023-11-24-restartable-flows/index.mdx), making the iteration process of building a pipeline (or debugging one) smooth and efficient ; +- The DX in Windmill allows you to quickly [assemble flows](/docs/flows/flow_editor) that can process data step by step in a visual and easy-to-manage way; +- You can control [parallelism](/docs/flows/flow_branches#branch-all) between individual steps, and set [concurrency limits](/docs/flows/concurrency_limit) in case external resources need are fragile or rate limited; +- [Windmill flows can be restarted from any step](../2023-11-24-restartable-flows/index.mdx), making the iteration process of building a pipeline (or debugging one) smooth and efficient; - Monitoring is made easy with [error and recovery handlers](/docs/core_concepts/error_handling). +
+ +
## Windmill integration with Polars and DuckDB for data pipelines diff --git a/docs/core_concepts/38_object_storage_in_windmill/index.mdx b/docs/core_concepts/38_object_storage_in_windmill/index.mdx index ef21ff42f..25e742d8d 100644 --- a/docs/core_concepts/38_object_storage_in_windmill/index.mdx +++ b/docs/core_concepts/38_object_storage_in_windmill/index.mdx @@ -20,7 +20,7 @@ Once you've created an [S3 or Azure Blob resource](../../integrations/s3.mdx) in ![S3 storage workspace settings](../11_persistent_storage/workspace_settings.png) -From now on, Windmill will be connected to this bucket and you'll have easy access to it from the code editor and the job run details. If a script takes as input a `s3object`, you will see in the input form on the right a button helping you choose the file directly from the bucket. +From now on, Windmill will be connected to this bucket and you'll have easy access to it from the code editor and the job run details. If a script [takes as input](#take-a-file-as-input) a `s3object`, you will see in the input form on the right a button helping you choose the file directly from the bucket. Same for the result of the script. If you return an `s3object` containing a [key](../19_rich_display_rendering/index.mdx#s3) `s3` pointing to a file inside your bucket, in the result panel there will be a button to open the bucket explorer to visualize the file. S3 files in Windmill are just pointers to the S3 object using its key. As such, they are represented by a simple JSON: @@ -33,6 +33,8 @@ S3 files in Windmill are just pointers to the S3 object using its key. As such, ![Workspace object storage infographic](../11_persistent_storage/s3_infographics.png 'Workspace object storage infographic') +### Resources permissions + The resource can be set to be public with toggle "S3 resource details can be accessed by all users of this workspace". In this case, the [permissions](../16_roles_and_permissions/index.mdx#path) set on the resource will be ignored when users interact with the S3 bucket via Windmill. Note that when the resource is public, the users might be able to access all of its details (including access keys and secrets) via some Windmill endpoints. @@ -61,7 +63,19 @@ export async function main() { ![S3 list of files download](../19_rich_display_rendering/s3_array.png "S3 list of files download") -### Read a file from S3 within a script +### Read a file from S3 or object storage within a script + +`S3Object` is a type that represents a file in S3 or object storage. + +S3 files in Windmill are just pointers to the S3 object using its key. As such, they are represented by a simple JSON: + +```json +{ + "s3": "path/to/file" +} +``` + +You can read a file from S3 or object storage within a script using the `loadS3File` and `loadS3FileStream` functions from the [TypeScript client](../../advanced/2_clients/ts_client.mdx) and the `wmill.load_s3_file` and `wmill.load_s3_file_stream` functions from the [Python client](../../advanced/2_clients/python_client.md). @@ -71,16 +85,21 @@ export async function main() { import * as wmill from 'windmill-client'; import { S3Object } from 'windmill-client'; -export async function main(input_file: S3Object) { +export async function main() { + + const example_file: S3Object = { + s3: 'path/to/file' + }; + // Load the entire file_content as a Uint8Array - const file_content = await wmill.loadS3File(input_file); + const file_content = await wmill.loadS3File(example_file); const decoder = new TextDecoder(); const file_content_str = decoder.decode(file_content); console.log(file_content_str); // Or load the file lazily as a Blob - let fileContentBlob = await wmill.loadS3FileStream(input_file); + let fileContentBlob = await wmill.loadS3FileStream(example_file); console.log(await fileContentBlob.text()); } ``` @@ -91,18 +110,23 @@ export async function main(input_file: S3Object) { ```ts import * as wmill from 'npm:windmill-client@1.253.7'; -import S3Object from 'npm:windmill-client@1.253.7'; +import { S3Object } from 'npm:windmill-client@1.253.7'; + +export async function main() { + + const example_file: S3Object = { + s3: 'path/to/file' + }; -export async function main(input_file: S3Object) { // Load the entire file_content as a Uint8Array - const file_content = await wmill.loadS3File(input_file); + const file_content = await wmill.loadS3File(example_file); const decoder = new TextDecoder(); const file_content_str = decoder.decode(file_content); console.log(file_content_str); // Or load the file lazily as a Blob - let fileContentBlob = await wmill.loadS3FileStream(input_file); + let fileContentBlob = await wmill.loadS3FileStream(example_file); console.log(await fileContentBlob.text()); } ``` @@ -115,13 +139,16 @@ export async function main(input_file: S3Object) { import wmill from wmill import S3Object -def main(input_file: S3Object): - # Load the entire file_content as a bytes array - file_content = wmill.load_s3_file(input_file) +def main(): + + example_file = S3Object(s3='path/to/file') + + # Load the entire file_content as a bytes array + file_content = wmill.load_s3_file(example_file) print(file_content.decode('utf-8')) # Or load the file lazily as a Buffered reader: - with wmill.load_s3_file_reader(input_file) as file_reader: + with wmill.load_s3_file_reader(example_file) as file_reader: print(file_reader.read()) ``` @@ -130,11 +157,81 @@ def main(input_file: S3Object): ![Read S3 file](../18_files_binary_data/s3_file_input.png) -:::info Certain file types, typically parquet files, can be [directly rendered by Windmill](../19_rich_display_rendering/index.mdx). -::: -### Create a file in S3 within a script +### Take a file as input + +Scripts can accept a S3Object as input. + + + + + +```ts +import * as wmill from 'windmill-client'; +import { S3Object } from 'windmill-client'; + +export async function main(input_file: S3Object) { + +// rest of the code + +} +``` + + + + + +```ts +import * as wmill from 'npm:windmill-client@1.253.7'; +import { S3Object } from 'npm:windmill-client@1.253.7'; + +export async function main(input_file: S3Object) { + +// rest of the code + +} +``` + + + + + +```python +import wmill +from wmill import S3Object + +def main(input_file: S3Object): + +# Rest of the code +``` + + + + +The [auto-generated UI](../6_auto_generated_uis/index.mdx) will display a file uploader: + +![S3 File Uploader](./s3_file_input.png "S3 File Uploader") + +or you can fill path manually if you enable 'Raw S3 object input': + +![S3 Raw Object Input](./s3_raw_object_input.png "S3 Raw Object Input") + +and access bucket explorer if [resource permissions](#resources-permissions) allow it: + +![S3 Bucket Explorer](./s3_bucket_explorer.png "S3 Bucket Explorer") + +That's also the recommended way to [pass](../../flows/16_architecture.mdx) S3 files as input to steps within [flows](../../flows/1_flow_editor.mdx). + +![S3 file input in flow](./s3_file_input_in_flow.png "S3 file input in flow") + +![S3 file input in flow 1](./s3_file_input_in_flow_1.png "S3 file input in flow 1") + +![S3 file input in flow 2](./s3_file_input_in_flow_2.png "S3 file input in flow 2") + +### Create a file from S3 or object storage within a script + +You can create a file from S3 or object storage within a script using the `writeS3File` function from the [TypeScript client](../../advanced/2_clients/ts_client.mdx) and the `wmill.write_s3_file` function from the [Python client](../../advanced/2_clients/python_client.md). @@ -162,7 +259,7 @@ export async function main(s3_file_path: string) { ```ts import * as wmill from 'npm:windmill-client@1.253.7'; -import S3Object from 'npm:windmill-client@1.253.7'; +import { S3Object } from 'npm:windmill-client@1.253.7'; export async function main(s3_file_path: string) { const s3_file_output: S3Object = { @@ -200,7 +297,7 @@ def main(s3_file_path: str): For more info on how to use files and S3 files in Windmill, see [Handling files and binary data](../18_files_binary_data/index.mdx). -### Secondary S3 Storage +### Secondary storage Read and write from a storage that is not your main storage by specifying it in the S3 object as "secondary_storage" with the name of it. @@ -212,7 +309,7 @@ Then from script, you can specify the secondary storage with an object with prop const file = {s3: 'folder/hello.txt', storage: 'storage_1'} ``` -Here is an example of the [Create](#create-a-file-in-s3-within-a-script) then [Read](#read-a-file-from-s3-within-a-script) a file from S3 within a script with secondary storage named "storage_1": +Here is an example of the [Create](#create-a-file-from-s3-or-object-storage-within-a-script) then [Read](#read-a-file-from-s3-or-object-storage-within-a-script) a file from S3 within a script with secondary storage named "storage_1": ```ts import * as wmill from 'windmill-client'; @@ -259,7 +356,7 @@ Under [Enterprise Edition](/pricing), Instance object storage offers advanced fe ![S3/Azure for Python/Go cache & large logs](../../core_concepts/20_jobs/s3_azure_cache.png "S3/Azure for Python/Go cache & large logs") -### Large logs management with S3 +### Large job logs management To optimize log storage and performance, Windmill leverages S3 for log management. This approach minimizes database load by treating the database as a temporary buffer for up to 5000 characters of logs per job. @@ -267,7 +364,7 @@ For jobs with extensive logging needs, Windmill [Enterprise Edition](/pricing) u This allows the handling of large-scale logs with minimal database impact, supporting more efficient and scalable workflows. -For large logs storage (and display) and cache for distributed Python jobs, you can [connect your instance to a bucket](../20_jobs/index.mdx#large-logs-management-with-s3). This feature is at the Instance-level, and has no overlap with the Workspace object storage. +For large logs storage (and display) and cache for distributed Python jobs, you can [connect your instance to a bucket](../20_jobs/index.mdx#large-job-logs-management). This feature is at the Instance-level, and has no overlap with the Workspace object storage. ### Instance object storage distributed cache for Python, Rust, Go @@ -294,7 +391,7 @@ The first time a dependency is seen by a worker, if it is not cached locally, th 1. If it is not, install the dependency from pypi, then do a snapshot of installed dependency, tar it and push it to S3 (we call this a "piptar"). 2. If it is, simply pull the "piptar" and extract it in place of installing from pypi. It is much faster than installing from pypi because that S3 is much closer to your workers than pypi and because there is no installation step to be done, a simple tar extract is sufficient which takes no compute. -### Log storage +### Service logs storage [Logs are stored in S3](../36_service_logs/index.mdx) if S3 instance object storage is configured. This option provides more scalable storage and is ideal for larger-scale deployments or where long-term log retention is important. diff --git a/docs/core_concepts/38_object_storage_in_windmill/s3_bucket_explorer.png b/docs/core_concepts/38_object_storage_in_windmill/s3_bucket_explorer.png new file mode 100644 index 000000000..1bae67acc Binary files /dev/null and b/docs/core_concepts/38_object_storage_in_windmill/s3_bucket_explorer.png differ diff --git a/docs/core_concepts/38_object_storage_in_windmill/s3_file_input.png b/docs/core_concepts/38_object_storage_in_windmill/s3_file_input.png new file mode 100644 index 000000000..370cdaa95 Binary files /dev/null and b/docs/core_concepts/38_object_storage_in_windmill/s3_file_input.png differ diff --git a/docs/core_concepts/38_object_storage_in_windmill/s3_file_input_in_flow.png b/docs/core_concepts/38_object_storage_in_windmill/s3_file_input_in_flow.png new file mode 100644 index 000000000..9b11ebce4 Binary files /dev/null and b/docs/core_concepts/38_object_storage_in_windmill/s3_file_input_in_flow.png differ diff --git a/docs/core_concepts/38_object_storage_in_windmill/s3_file_input_in_flow_1.png b/docs/core_concepts/38_object_storage_in_windmill/s3_file_input_in_flow_1.png new file mode 100644 index 000000000..98bf70968 Binary files /dev/null and b/docs/core_concepts/38_object_storage_in_windmill/s3_file_input_in_flow_1.png differ diff --git a/docs/core_concepts/38_object_storage_in_windmill/s3_file_input_in_flow_2.png b/docs/core_concepts/38_object_storage_in_windmill/s3_file_input_in_flow_2.png new file mode 100644 index 000000000..fe0949abd Binary files /dev/null and b/docs/core_concepts/38_object_storage_in_windmill/s3_file_input_in_flow_2.png differ diff --git a/docs/core_concepts/38_object_storage_in_windmill/s3_raw_object_input.png b/docs/core_concepts/38_object_storage_in_windmill/s3_raw_object_input.png new file mode 100644 index 000000000..3c65eaf2a Binary files /dev/null and b/docs/core_concepts/38_object_storage_in_windmill/s3_raw_object_input.png differ diff --git a/docs/core_concepts/39_http_routing/index.mdx b/docs/core_concepts/39_http_routing/index.mdx index e407e8946..2bdadb8ca 100644 --- a/docs/core_concepts/39_http_routing/index.mdx +++ b/docs/core_concepts/39_http_routing/index.mdx @@ -4,7 +4,7 @@ import TabItem from '@theme/TabItem'; # Custom HTTP routes Windmill supports custom HTTP routes to trigger a script or flow. -They can only be created by workspace admins. +They can only be created by [admins workspace](../../advanced/18_instance_settings/index.mdx#admins-workspace). All properties of the route apart from the http path can be modified by any user with write access to the route. ## How to use @@ -39,7 +39,7 @@ Preprocessors can only be written in TypeScript (Bun/Deno) or Python. In scripts, you need to export an additional `preprocessor` function. It takes a special argument called `wm_trigger` in addition to the request body arguments and should return the transformed arguments for the main function of the script. The `wm_trigger` contains the kind of the trigger (http, email or webhook) and an http object with the details of the request when the script is triggered via an HTTP route. -Here are examples of a preprocessor function in TypeScript and Python: +Here are examples of a preprocessor function in [TypeScript](../../getting_started/0_scripts_quickstart/1_typescript_quickstart/index.mdx) and [Python](../../getting_started/0_scripts_quickstart/2_python_quickstart/index.mdx): diff --git a/docs/core_concepts/index.mdx b/docs/core_concepts/index.mdx index 518dd8f52..64a5ee223 100644 --- a/docs/core_concepts/index.mdx +++ b/docs/core_concepts/index.mdx @@ -122,6 +122,11 @@ On top of its editors to build endpoints, flows and apps, Windmill comes with a description="Scripts and flows can be triggered by email messages sent to a specific email address." href="/docs/advanced/email_triggers" /> + diff --git a/docs/getting_started/8_trigger_scripts/index.mdx b/docs/getting_started/8_trigger_scripts/index.mdx index f0944569a..b6eeab54c 100644 --- a/docs/getting_started/8_trigger_scripts/index.mdx +++ b/docs/getting_started/8_trigger_scripts/index.mdx @@ -2,7 +2,7 @@ import DocCard from '@site/src/components/DocCard'; # Triggering scripts -Scripts can be triggered in 9 ways. +Scripts can be triggered in the following ways. On-demand triggers: @@ -15,7 +15,9 @@ On-demand triggers: Triggers from external events: -- [API](#trigger-from-api) & [Webhooks](#webhooks), including from [Slack](#webhooks-trigger-scripts-from-slack) or [Emails](#webhooks-trigger-scripts-from-emails) +- [API](#trigger-from-api) & [Webhooks](#webhooks), including from [Slack](#webhooks-trigger-scripts-from-slack) +- [Emails](#emails) +- [Custom HTTP routes](#custom-http-routes) - [Scheduling + Trigger Scripts](#scheduling--trigger-scripts) :::info Scripts in Windmill @@ -224,37 +226,27 @@ Windmill uses Slack to trigger scripts and flows by establishing Slackbots and c /> -#### Webhooks: Trigger Scripts from Emails - -One use case of webhooks is [triggering scripts via inbound emails using Mailchimp](../../integrations/mailchimp_mandrill.md). - -