Update docs for indexer new setup and features (#780)

* Update docs for indexer new setup and features * Update index.mdx * Update service logs screenshot
windmill-labs · Dec 5, 2024 · f09830e · f09830e
1 parent 8289695
commit f09830e
Show file tree

Hide file tree

Showing 9 changed files with 142 additions and 80 deletions.
diff --git a/docs/core_concepts/35_search_bar/index.mdx b/docs/core_concepts/35_search_bar/index.mdx
@@ -39,6 +39,10 @@ With a special prefix key, you can search accros:
 
 ### Searching runs
 
+:::info
+Full text search is a feature available in Windmill EE, however, note that it is disabled by default in the example docker compose. To enable full text search on logs and completed jobs, you need to spin up the indexer service, [learn how to](../../misc/18_full_text_search/index.mdx).
+:::
+
 You can search through completed runs by selecting the option on the search menu, or by prefixing your search with `>`.
 
 ![Full text job search](./search_jobs.png)
@@ -100,80 +104,4 @@ By using the `#` prefix, you can use windmill's [Content search](../26_content_s
 
 ### Searching logs
 
-Searching through Windmill's service logs as well as audit logs is comming soon.
-
-## Setup
-
-### Setup using docker compose
-
-On the Windmill's docker-compose.yml there is an example of how to setup the indexer container to enable full text search, just make sure to change replicas from 0 to 1.
-
-```yml
-  # The indexer powers full-text job and log search, an EE feature.
-  windmill_indexer:
-    image: ${WM_IMAGE}
-    pull_policy: always
-    deploy:
-      replicas: 1 # set to 1 to enable full-text job and log search
-    restart: unless-stopped
-    expose:
-      - 8001
-    environment:
-      - PORT=8001
-      - DATABASE_URL=${DATABASE_URL}
-      - MODE=indexer
-      - TANTIVY_MAX_INDEXED_JOB_LOG_SIZE__MB=1 # job logs bigger than this will be truncated before indexing
-      - TANTIVY_S3_BACKUP_PERIOD__S=3600 # how often to backup the index into object storage
-      - TANTIVY_INDEX_WRITER_MEMORY_BUDGET__MB=100 # higher budget for higher indexing throughput
-      - TANTIVY_REFRESH_INDEX_PERIOD__S=300 #how often to start indexing new jobs
-      - TANTIVY_DOC_COMMIT_MAX_BATCH_SIZE=100000 #how many documents to batch in one commit
-      - TANTIVY_SHOW_MEMORY_EVERY=10000 #log memory usage and progress every so many documents indexed
-    depends_on:
-      db:
-        condition: service_healthy
-    volumes:
-      - windmill_index:/tmp/windmill/search
-```
-
-The indexer is in charge of both indexing new jobs and answering search queries. Because of this,
-we also need to redirect search requests to this container instead of the normal windmill server.
-This is what it looks like if you're using Caddy:
-
-```Caddyfile
-{$BASE_URL} {
-        bind {$ADDRESS}
-        reverse_proxy /ws/* http://lsp:3001
-        # reverse_proxy /ws_mp/* http://multiplayer:3002
-        reverse_proxy /api/srch/* http://windmill_indexer:8001
-        reverse_proxy /* http://windmill_server:8000
-        # tls /certs/cert.pem /certs/key.pem
-}
-```
-
-Redirecting requests prefixed by /api/srch to port 8001 (same port as in the docker-compose.yml)
-
-### Setup using helm charts
-
-
-## Configuration
-
-
-### Environment variables
-
-The index can be configured through a couple of environment variables.
-
-| env var                                | default | description |
-| -------------------------------------- | ------- | ----------- |
-| TANTIVY_MAX_INDEXED_JOB_LOG_SIZE__MB   | 1       | Job logs bigger than this will be truncated before indexing, This is to reduce the index size and to improve indexing performance. |
-| TANTIVY_S3_BACKUP_PERIOD__S            | 3600    | How often to backup the index into object storage |
-| TANTIVY_INDEX_WRITER_MEMORY_BUDGET__MB | 100     | How much memory the writer can use before writing to disk. Increasing it can improve indexing throughput. |
-| TANTIVY_DOC_COMMIT_MAX_BATCH_SIZE      | 100000  | Every `TANTIVY_REFRESH_INDEX_PERIOD__S`, the indexer will commit the jobs written, making them searchable. This variable sets an amount of jobs after which to early-commit, to make the jobs searchable. This is mainly relevant for the first run (when running the whole completed_job table). Lower values can hurt indexing throughput. |
-| TANTIVY_REFRESH_INDEX_PERIOD__S        | 300     | Adding new jobs to the index. Jobs are only available to be searched after they are indexed and commited |
-| TANTIVY_SHOW_MEMORY_EVERY              | 10000   | Log memory usage and indexing progress every this many jobs |
-
-
-### Index persistence
-
-There are two ways to make the index persistent (and avoid reindexing all jobs at every restart). The recommended way is to setup an object storage such as Amazon S3, and the index will automatically be backed up and pulled from there. This can be done by setting up [S3/Azure for python cache and large logs](../38_object_storage_in_windmill/index.mdx#large-job-logs-management).
-
-It is also possible to store the index in a volume attached to the indexer container. The docker-compose.yml serves as an example of how to set it up (on `/tmp/windmill/search`).
+If you use the ! prefix and type any string, you will get an option to search that query on the Windmill [Service Logs](../36_service_logs/index.mdx#log-search) tool.
diff --git a/docs/core_concepts/36_service_logs/index.mdx b/docs/core_concepts/36_service_logs/index.mdx
@@ -4,6 +4,7 @@ View logs from any [workers](../9_worker_groups/index.mdx) or servers directly w
 
 ![Service logs](./service_logs.png "Service logs")
 
+
 Windmill provides direct access to the logs emitted by containers, removing the need to manually retrieve them from Docker or other container platforms. This enables monitoring services without needing to leave the Windmill environment.
 
 You can view them from the [Workers](../9_worker_groups/index.mdx) page, in particular:
@@ -14,7 +15,42 @@ You can view them from the [Workers](../9_worker_groups/index.mdx) page, in part
   - **Error Count**: Displayed in red, highlighting errors separately for quick identification.
 - **Separation by Type**: Logs are organized by type. You can see logs for workers, servers and indexers.
 
-In future releases, we plan to introduce a powerful search capability for logs, integrated directly into the search modal. This feature will be exclusive to the Enterprise Edition, enabling users to quickly search across logs for specific events, errors, or patterns.
+
+On the left menu, you can navigate to Service logs.
+
+![Service Logs On Menu](./service_logs_menu.png)
+
+## Log search
+
+:::info
+Full text search is a feature available in Windmill EE, however, note that it is disabled by default in the example docker compose. To enable full text search on logs and completed jobs, you need to spin up the indexer service, [learn how to](../../misc/18_full_text_search/index.mdx).
+:::
+
+You can type any string on the search bar to query logs, the hosts that matched your query will be shown on the left pane with the count of lines that matched the query.
+
+If you select a host, you will get the most recent lines that match your query (limited to 1000). This is very similar to what you would expect from a graphana setup.
+
+![Search results](./service_log_search_results.png)
+
+
+Queries are parsed by Tantivy's [QueryParser](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html), which lets you build relatively complex and useful queries. For example, you can try searching:
+
+```
+worker_group:default ping
+```
+To limit the search to workers in the default worker group.
+
+The fields that are indexed and can be used for this kind of search are:
+
+| Filed name  | Type | Description |
+| ----------- | ---- | ----------- |
+|host         | TEXT | The hostname, e.g windmill-workers-7cbf97c994-lptqj |
+|mode         | TEXT | The mode, `worker`, `server` or `indexer` |
+|worker_group | TEXT | Worker Group associated (if applicable) |
+|timestamp    | DATE | This is the timestamp of the log file used to store the logs internally, meaning that it's innacurate of up to a minute. |
+|file_name    | TEXT | Name of the file associated in s3 or in disk. |
+|logs         | TEXT | the log lines themselves. `logs:<query>` is equivalent to `query` |
+
 
 ## Log Storage
 
@@ -24,4 +60,8 @@ Windmill provides a flexible solution for log storage depending on your setup:
   Logs are stored locally on disk. For this to work, there must be a dedicated log disk, which is pre-configured in the Docker Compose setup.
 
 - **Enterprise Edition** (with [Instance object storage](../38_object_storage_in_windmill/index.mdx#instance-object-storage)):
-  For users with the Enterprise Edition (EE), logs can be stored in S3 if instance object storage is configured. This option provides more scalable storage and is ideal for larger-scale deployments or where long-term log retention is important.
+  For users with the Enterprise Edition (EE), logs can be stored in S3 if instance object storage is configured. This option provides more scalable storage and is ideal for larger-scale deployments or where long-term log retention is important.
+
+### Log retention
+
+Windmill retains logs no older than two weeks. You can decide if this retention policy also applies to the logs stored on s3 through the Instance Settings.
diff --git a/docs/core_concepts/36_service_logs/service_log_search_results.png b/docs/core_concepts/36_service_logs/service_log_search_results.png
diff --git a/docs/core_concepts/36_service_logs/service_logs.png b/docs/core_concepts/36_service_logs/service_logs.png
diff --git a/docs/core_concepts/36_service_logs/service_logs_menu.png b/docs/core_concepts/36_service_logs/service_logs_menu.png
diff --git a/docs/core_concepts/index.mdx b/docs/core_concepts/index.mdx
@@ -634,4 +634,9 @@ All details & features on [Pricing page](/pricing).
 		description="Windmill offers white labeling capabilities, allowing you to embed and customize the Windmill platform to align with your brand."
 		href="/docs/misc/white_labelling"
 	/>
+	<DocCard
+		title="Full text search"
+		description="Full text search on jobs and service logs, allowing quick access and good observability out of the box. Learn how to set it up."
+		href="/docs/misc/full_text_search"
+	/>
 </div>
diff --git a/docs/misc/18_full_text_search/index.mdx b/docs/misc/18_full_text_search/index.mdx
@@ -0,0 +1,88 @@
+# Full text search on jobs and logs
+
+
+Windmill offers the functionality to do full-text search on jobs (across args, logs, results, ...) and service logs.
+
+In order to access this functionality, the instance must be running the windmill indexer service.
+
+
+## How to run the indexer service
+
+### Setup using docker compose
+
+On the Windmill's docker-compose.yml there is an example of how to setup the indexer container to enable full text search, just make sure to change replicas from 0 to 1.
+
+:::warn
+The replicas should be set to exactly one and not more, only one index writer can exist at a time and having multiple will not result in the expected behavior.
+:::
+
+```yml
+  # The indexer powers full-text job and log search, an EE feature.
+  windmill_indexer:
+    image: ${WM_IMAGE}
+    pull_policy: always
+    deploy:
+      replicas: 1 # set to 1 to enable full-text job and log search
+    restart: unless-stopped
+    expose:
+      - 8001
+    environment:
+      - PORT=8001
+      - DATABASE_URL=${DATABASE_URL}
+      - MODE=indexer
+    depends_on:
+      db:
+        condition: service_healthy
+    volumes:
+      - windmill_index:/tmp/windmill/search
+```
+
+The indexer is in charge of both indexing new jobs and answering search queries. Because of this,
+we also need to redirect search requests to this container instead of the normal windmill server.
+This is what it looks like if you're using Caddy:
+
+```Caddyfile
+{$BASE_URL} {
+        bind {$ADDRESS}
+        reverse_proxy /ws/* http://lsp:3001
+        # reverse_proxy /ws_mp/* http://multiplayer:3002
+        reverse_proxy /api/srch/* http://windmill_indexer:8001
+        reverse_proxy /* http://windmill_server:8000
+        # tls /certs/cert.pem /certs/key.pem
+}
+```
+
+Redirecting requests prefixed by /api/srch to port 8001 (same port as in the docker-compose.yml)
+
+### Setup using helm charts
+
+This section is a work in progress.
+
+## Configure the indexer service
+
+### Indexer settings
+
+The index can be configured in the Instance Settings. Note that the default values should work for most use cases
+
+
+![Indexer Settings](./indexer_settings.png)
+
+| Setting name                           | default  | description |
+| -------------------------------------- | -------- | ----------- |
+| Index writer memory budget (MB)        | 300 MB  | How much memory the writer can use before writing to disk. Increasing it can improve indexing throughput. |
+| Commit max batch size                  | 100000  | How many documents to include at most per commit. This is mostly relevant for the first time indexing. A large value will result in less commits, i.e. faster and more efficient indexing, but results will be available only once their commits are completed.  |
+| Refresh index period (s)               | 300s    | The indexer will periodically fetch the latest jobs and write them to the index. A shorter period means new jobs/logs are available for search faster, but also results in more and more frequent writes to s3. |
+| Max indexed job log size (MB)          | 1 MB    | Job logs bigger than this will be truncated before indexing. |
+
+
+### Index persistence
+
+There are two ways to make the index persistent (and avoid reindexing all jobs at every restart). The recommended way is to setup an object storage such as Amazon S3, and the index will automatically be backed up and pulled from there. This can be done by setting up [S3/Azure for python cache and large logs](../../core_concepts/38_object_storage_in_windmill/index.mdx#large-job-logs-management).
+
+It is also possible to store the index in a volume attached to the indexer container. The docker-compose.yml serves as an example of how to set it up (on `/tmp/windmill/search`).
+
+## Using full text search
+
+Learn how full text search can be used to find [completed jobs](../../core_concepts/35_search_bar/index.mdx#searching-runs) and [service logs](../../core_concepts/36_service_logs/index.mdx#log-search)
+
+
diff --git a/docs/misc/18_full_text_search/indexer_settings.png b/docs/misc/18_full_text_search/indexer_settings.png
diff --git a/sidebars.js b/sidebars.js
@@ -879,7 +879,8 @@ const sidebars = {
 				'misc/windows_workers/index',
 				'core_concepts/private_hub/index',
 				'misc/white_labelling/index',
-				'misc/partners/index'
+				'misc/partners/index',
+				'misc/full_text_search/index'
 			]
 		}
 	]