From 493ca987ca930ce4f92eb459895d90ee7f7ee67c Mon Sep 17 00:00:00 2001 From: Neil Ramaswamy Date: Fri, 30 Aug 2024 17:52:15 +0800 Subject: [PATCH] [SPARK-49378][DOCS][SS] Break apart the Structured Streaming Programming Guide MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ### What changes were proposed in this pull request? These changes break the Structured Streaming Programming Guide into smaller sub-pages **without changing any content**. You can see a preview of it [here](https://nr-spark-site.vercel.app/). I broke up the pages by `h1` tag; within pages, the sub-sections on the left menu are broken up by `h2`. The SS programming guide now will resemble the SQL programming guide and the MLLib programming guide. Additionally, to avoid cluttering the top-level namespace (there are dozens of `sql-*` files for the SQL reference), we nest all streaming docs in by one directory, namely the `/streaming/`. This has the side-effect of breaking links from our `_layouts`, since we assume a flat top-level namespace. To fix this issue, URLs in global layout files now all use absolute paths. This move to `/streaming/` has the consequence that bookmarks of `https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html` will not refer to the actual programming guide content. In anticipation of this, I have kept all pages for existing URLs present with links to the pages in their new locations. This includes the new state data source and the Kafka integration guide. In the future, we'll be able to quite easily (and in-parallel) break the programming guide apart further. This PR does all of the plumbing to make it work. ![image](https://github.com/user-attachments/assets/3eca87d4-9fb7-453c-a74a-20bd5c504d87) It is future work to fix the oddly-sized left-navigation bar for our menus. ### Why are the changes needed? One of the major hurdles that users have with Structured Streaming is that our guide is exceptionally long—it feels insurmountable, especially compared to other engines like Flink, which has many sub-pages. Google also has a very tricky time indexing the single large page; if you Google "[structured streaming output mode](https://www.google.com/search?q=structured+streaming+output+mode)" and you click on the link to our programming guide... nothing happens. You aren't taken to the actual content, since Google has trouble with indexing to specific heading tags. ### Does this PR introduce _any_ user-facing change? The structure of the website, with respect to Structured Streaming-related pages, is now changed. See the earlier parts of the PR description for the specific changes. However, **no** content is changed. This should make reviewing the changes much easier. ### How was this patch tested? I have used automated tools (e.g. [Lychee](https://github.com/lycheeverse/lychee)) and manual verification (i.e. clicking on every link) to make sure that I didn't break any links. It isn't fool-proof, though. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47864 from neilramaswamy/nr/streaming-guide-breakapart. Lead-authored-by: Neil Ramaswamy Co-authored-by: Kent Yao Signed-off-by: Kent Yao --- docs/_data/menu-streaming.yaml | 57 + .../_includes/nav-left-wrapper-streaming.html | 22 + docs/_includes/nav-left.html | 2 +- docs/_layouts/global.html | 93 +- docs/index.md | 2 +- docs/migration-guide.md | 2 +- docs/sparkr.md | 2 +- docs/ss-migration-guide.md | 40 +- docs/streaming-programming-guide.md | 2 +- docs/streaming/additional-information.md | 58 + .../apis-on-dataframes-and-datasets.md | 3592 ++++++++++++++ docs/streaming/getting-started.md | 508 ++ docs/streaming/index.md | 28 + docs/streaming/performance-tips.md | 174 + docs/streaming/ss-migration-guide.md | 56 + .../structured-streaming-kafka-integration.md | 1173 +++++ .../structured-streaming-state-data-source.md | 0 .../structured-streaming-kafka-integration.md | 1155 +---- .../structured-streaming-programming-guide.md | 4268 +---------------- 19 files changed, 5728 insertions(+), 5506 deletions(-) create mode 100644 docs/_data/menu-streaming.yaml create mode 100644 docs/_includes/nav-left-wrapper-streaming.html create mode 100644 docs/streaming/additional-information.md create mode 100644 docs/streaming/apis-on-dataframes-and-datasets.md create mode 100644 docs/streaming/getting-started.md create mode 100644 docs/streaming/index.md create mode 100644 docs/streaming/performance-tips.md create mode 100644 docs/streaming/ss-migration-guide.md create mode 100644 docs/streaming/structured-streaming-kafka-integration.md rename docs/{ => streaming}/structured-streaming-state-data-source.md (100%) diff --git a/docs/_data/menu-streaming.yaml b/docs/_data/menu-streaming.yaml new file mode 100644 index 0000000000000..b1dd024451125 --- /dev/null +++ b/docs/_data/menu-streaming.yaml @@ -0,0 +1,57 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +- text: Overview + url: streaming/index.html +- text: Getting Started + url: streaming/getting-started.html + subitems: + - text: Quick Example + url: streaming/getting-started.html#quick-example + - text: Programming Model + url: streaming/getting-started.html#programming-model +- text: APIs on DataFrames and Datasets + url: streaming/apis-on-dataframes-and-datasets.html + subitems: + - text: Creating Streaming DataFrames and Streaming Datasets + url: streaming/apis-on-dataframes-and-datasets.html#creating-streaming-dataframes-and-streaming-datasets + - text: Operations on Streaming DataFrames/Datasets + url: streaming/apis-on-dataframes-and-datasets.html#operations-on-streaming-dataframesdatasets + - text: Starting Streaming Queries + url: streaming/apis-on-dataframes-and-datasets.html#starting-streaming-queries + - text: Managing Streaming Queries + url: streaming/apis-on-dataframes-and-datasets.html#managing-streaming-queries + - text: Monitoring Streaming Queries + url: streaming/apis-on-dataframes-and-datasets.html#monitoring-streaming-queries + - text: Recovering from Failures with Checkpointing + url: streaming/apis-on-dataframes-and-datasets.html#recovering-from-failures-with-checkpointing + - text: Recovery Semantics after Changes in a Streaming Query + url: streaming/apis-on-dataframes-and-datasets.html#recovery-semantics-after-changes-in-a-streaming-query +- text: Performance Tips + url: streaming/performance-tips.html + subitems: + - text: Asynchronous Progress Tracking + url: streaming/performance-tips.html#asynchronous-progress-tracking + - text: Continuous Processing + url: streaming/performance-tips.html#continuous-processing +- text: Additional Information + url: streaming/additional-information.html + subitems: + - text: Miscellaneous Notes + url: streaming/additional-information.html#miscellaneous-notes + - text: Related Resources + url: streaming/additional-information.html#related-resources + - text: Migration Guide + url: streaming/additional-information.html#migration-guide diff --git a/docs/_includes/nav-left-wrapper-streaming.html b/docs/_includes/nav-left-wrapper-streaming.html new file mode 100644 index 0000000000000..82849f8140f5d --- /dev/null +++ b/docs/_includes/nav-left-wrapper-streaming.html @@ -0,0 +1,22 @@ +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +
+
+

Structured Streaming Programming Guide

+ {% include nav-left.html nav=include.nav-streaming %} +
+
diff --git a/docs/_includes/nav-left.html b/docs/_includes/nav-left.html index 19d68fd191635..935ed0c732ee6 100644 --- a/docs/_includes/nav-left.html +++ b/docs/_includes/nav-left.html @@ -2,7 +2,7 @@
    {% for item in include.nav %}
  • - + {% if navurl contains item.url %} {{ item.text }} {% else %} diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index c61c9349a6d7e..502113d11b77e 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -1,3 +1,9 @@ +{% assign current_page_segments = page.dir | split: "/" | where_exp: "element","element != ''" %} +{% assign rel_path_to_root = "" %} +{% for i in (1..current_page_segments.size) %} + {% assign rel_path_to_root = rel_path_to_root | append: "../" %} +{% endfor %} + @@ -21,12 +27,12 @@ - - + + - + - + {% production %} @@ -51,8 +57,8 @@