Skip to content

Commit

Permalink
migrate docs to Openlineage/Openlineage repo
Browse files Browse the repository at this point in the history
Signed-off-by: Pawel Leszczynski <[email protected]>
  • Loading branch information
pawel-big-lebowski committed Jul 23, 2024
1 parent 11b8e3b commit e216a26
Show file tree
Hide file tree
Showing 770 changed files with 189,120 additions and 0 deletions.
35 changes: 35 additions & 0 deletions .github/workflows/docs-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Build & Deploy docs to Netlify GitHub Pages

on:
push:
tags:
- '[0-9]+.[0-9]+.[0-9]+'

jobs:
generate_java_doc:
# Generates java doc for Java client, add it to the repo under website/static/javadoc
# TODO: generate javadoc for java client to be generated
# https://github.com/marketplace/actions/update-files-on-github

generate_openapi:
# TODO: https://github.com/OpenLineage/OpenLineage/blob/main/spec/release.sh
# TODO: need to run: https://github.com/OpenLineage/docs/blob/main/scripts/build-docs.sh
# why is this .last_spec_commit_id mechanism necessary?
# https://github.com/marketplace/actions/update-files-on-github


netlify_deploy: # https://github.com/marketplace/actions/netlify-deploy
# TODO: netlify token will be necessary
deploy:
name: 'Deploy to Netlify'
steps:
- uses: jsmrcaga/[email protected]
with:
NETLIFY_AUTH_TOKEN: ${{ secrets.MY_TOKEN_SECRET }}
NETLIFY_DEPLOY_TO_PROD: true # can be false for now


# TODO: move OpenLineage/docs repo content to /website within OpenLineage/OpenLineage
# remove build-docs -> https://github.com/OpenLineage/docs/blob/main/scripts/build-docs.sh
# remove this -> https://github.com/OpenLineage/docs/blob/main/.github/workflows/deploy.yml
# remove spec release -> https://github.com/OpenLineage/OpenLineage/blob/main/spec/release.sh
32 changes: 32 additions & 0 deletions .github/workflows/release_spec.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env bash
#
# Copyright 2018-2024 contributors to the OpenLineage project
# SPDX-License-Identifier: Apache-2.0

set -e

# check if there are any changes in spec in the latest commit
if git diff --name-only --exit-code HEAD~1 'spec/*.json' 'spec/OpenLineage.yml' >> /dev/null; then
echo "no changes in spec detected, skipping publishing spec"
exit 0
fi

# Copy changed spec JSON files to target location
git diff --name-only HEAD~1 'spec/*.json' | while read LINE; do

#ignore registry files
if [[ $LINE =~ "registry.json" ]]; then
continue
fi

# extract target file name from $id field in spec files
URL=$(cat $LINE | jq -r '.["$id"]')

# extract target location in website repo
LOC="website/static/${URL#*//*/}"
LOC_DIR="${LOC%/*}"

# create dir if necessary, and copy files
mkdir -p $LOC_DIR
cp $LINE $LOC
done
25 changes: 25 additions & 0 deletions .github/workflows/spec-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Build & Deploy docs to Netlify GitHub Pages

on:
pull_request:
branches:
- main

jobs:
generate_spec:
if: github.event.pull_request.merged == true
runs-on: ubuntu-latest
steps:
- run: ./release_spec.sh

generate_openapi:

netlify_deploy: # https://github.com/marketplace/actions/netlify-deploy
if: github.event.pull_request.merged == true
deploy:
name: 'Deploy to Netlify'
steps:
- uses: jsmrcaga/[email protected]
with:
NETLIFY_AUTH_TOKEN: ${{ secrets.MY_TOKEN_SECRET }}
NETLIFY_DEPLOY_TO_PROD: true # can be false for now
26 changes: 26 additions & 0 deletions website/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Dependencies
node_modules

# Production
/build

# Generated files
.docusaurus
.cache-loader

# Misc
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local

npm-debug.log*
yarn-debug.log*
yarn-error.log*

# intellij
.idea

argos/screenshots
argos/test-results
1 change: 1 addition & 0 deletions website/CNAME
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
openlineage.io
94 changes: 94 additions & 0 deletions website/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# OpenLineage Docs

[![Covered by Argos Visual Testing](https://argos-ci.com/badge.svg)](https://app.argos-ci.com/pawel-big-lebowski/docs/reference?utm_source=OpenLineage&utm_campaign=oss)

This is a Docusaurus site, and all content can be found in `docs/`. Contributions are welcome in the form of issues or pull requests. Pages that require attention have been marked with Docusaurus Admonitions.

### New posts

We love new blog posts, and welcome content about OpenLineage! Topics include:
* experiences from users of all kinds
* supporting products and technologies
* proposals for discussion

If you are familiar with the GitHub pull request process, it is easy to propose a new blog post:

1. Fork this project.
2. Make a new directory in `/blog`. The name of the directory will become part of the posts's URL, so choose something descriptive and unique.
3. Create an `index.mdx` file in the new directory containing your blog content. Use one of the other posts as a template. The `title`, `date`, `authors`, and `description` front matter fields are all required.
4. Add your author information -- name, title, url (optional), and image_url (optional) -- to `blog/authors.yml`.
5. Build the site locally if you want to see it in a browser and build confidence in your formatting choices.
6. Commit your changes and submit a pull request.

### New ecosystem partners for the Ecosystem page

- Add a rectangular logo in SVG format twice as wide as it is tall to static/img.
- Add a record to the appropriate file and array in static/ecosystem, using simply the filename of the logo for the image value.

### Changes to basepages

If you want to make a change to a basepage - e.g. to add a new member to the Ecosystem page - the best way is to submit a pull request.

These basepages can be found in `src/pages`, and are formatted in markdown.

### Building openapi docs

To build the openapi docs using `redoc-cli`, run:

```
% yarn run build:docs
```

## Local development

First, clone the repo.

Install the [node version manager](https://github.com/nvm-sh/nvm) and use it to create a Node 16 environment:

```
$ nvm install 16
$ nvm use 16
```

Run Yarn to install all of the Node dependencies for the project:

```
$ yarn
```

## Local site build

You need to first build the documentation contents. This is necessary before starting the docusaurus server.

```
$ yarn build
```

This command generates static content into the `build` directory. If you want to look at it, try `cd build && python3 -m http.server`.

## Local server start

Tell Yarn to start a development server:

```
$ yarn start
```

This command provides a URL where the doc site can be viewed. Most changes are reflected live without having to restart the server.

By default, the server port will be set to 3000. In case the port is already being used, you can specify the port number when starting the server:

```
$ yarn start --port 3001
```

## Deployment

Once the site has been launched, pull requests to `main` will cause a new doc site to be shipped via GitHub Pages.

The site is deployed using the [Gatsby Publish GitHub action](https://github.com/OpenLineage/docs/blob/main/.github/workflows/deploy.yml) whenever a change is merged into `main`.

This GitHub Action will:
* Execute `scripts/build-docs.sh`, which performs a build of the OpenAPI docs based on the latest version of the spec that has been published into `static/spec` by the [OpenLineage release script](https://github.com/OpenLineage/OpenLineage/blob/main/spec/release.sh). The resulting docs are placed into `static/apidocs/openapi`.
* Execute `yarn run build`, which performs a build of the Gatsby landing pages and places them into `public/`. The `static/` directory, containing the OpenAPI and Java client documentation, is copied into `public/` during this step.
* Replace the contents of the `gh-pages` branch of the [org domain repo](https://github.com/OpenLineage/OpenLineage.github.io) with the contents of `public/`. This will cause that repo's GitHub Action to deploy the new content.
17 changes: 17 additions & 0 deletions website/argos/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"name": "argos",
"version": "0.0.0",
"description": "Workspace for visual difference detection",
"license": "MIT",
"private": true,
"scripts": {
"screenshot": "playwright test",
"upload": "npx @argos-ci/cli upload ./screenshots"
},
"devDependencies": {
"@argos-ci/cli": "^0.6.0",
"@argos-ci/playwright": "^0.0.7",
"@playwright/test": "^1.38.1",
"cheerio": "^1.0.0-rc.12"
}
}
20 changes: 20 additions & 0 deletions website/argos/playwright.config.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import {devices} from '@playwright/test';
import type {PlaywrightTestConfig} from '@playwright/test';

const config: PlaywrightTestConfig = {
webServer: {
cwd: "..",
port: 3000,
command: 'yarn serve',
},
projects: [
{
name: 'chromium',
use: {
...devices['Desktop Chrome'],
},
},
],
};

export default config;
19 changes: 19 additions & 0 deletions website/argos/screenshot.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
/* Iframes can load lazily */
iframe,
/* Avatars can be flaky due to using external sources: GitHub/Unavatar */
.avatar__photo,
/* Gifs load lazily and are animated */
img[src$='.gif'],
/* Algolia keyboard shortcuts appear with a little delay */
.DocSearch-Button-Keys > kbd,
/* The live playground preview can often display dates/counters */
[class*='playgroundPreview'] {
visibility: hidden;
}

/* Different docs last-update dates can alter layout */
.theme-last-updated,
/* Mermaid diagrams are rendered client-side and produce layout shifts */
.docusaurus-mermaid-container {
display: none;
}
36 changes: 36 additions & 0 deletions website/argos/screenshot.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import * as fs from "fs";
import {test} from "@playwright/test";
import {argosScreenshot} from "@argos-ci/playwright";
import {extractSitemapPathnames, pathnameToArgosName} from "argos/utils";

// Constants:
const siteUrl = "http://localhost:3000";
const sitemapPath = "../build/sitemap.xml";
const stylesheetPath = "./screenshot.css";
const stylesheet = fs.readFileSync(stylesheetPath).toString();

// Wait for hydration, requires Docusaurus v2.4.3+
// See https://github.com/facebook/docusaurus/pull/9256
// Docusaurus adds a <html data-has-hydrated="true"> once hydrated
function waitForDocusaurusHydration() {
// uncomment the line when Docusaurus is upgraded to v2.4.3
// return document.documentElement.dataset.hasHydrated === "true";
return true;
}

function screenshotPathname(pathname: string, index: number, numberOfPaths: number) {
test(`pathname ${pathname}`, async ({page}) => {
const url = siteUrl + pathname;
console.log(`${index + 1}/${numberOfPaths} Screenshotting`, url);
await page.goto(url);
await page.waitForFunction(waitForDocusaurusHydration);
await page.addStyleTag({content: stylesheet});
await argosScreenshot(page, pathnameToArgosName(pathname));
});
}

test.describe("Docusaurus site screenshots", () => {
const pathnames = extractSitemapPathnames(sitemapPath);
console.log("Pathnames to screenshot:", pathnames);
pathnames.forEach((path, index) => screenshotPathname(path, index, pathnames.length));
});
17 changes: 17 additions & 0 deletions website/argos/utils.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import * as cheerio from "cheerio";
import * as fs from "fs";

export function extractSitemapPathnames(sitemapPath: string): string[] {
const sitemap = fs.readFileSync(sitemapPath).toString();
const $ = cheerio.load(sitemap, { xmlMode: true });
const urls: string[] = [];
$("loc").each(function handleLoc() {
urls.push($(this).text());
});
return urls.map((url) => new URL(url).pathname);
}

// Converts a pathname to a decent screenshot name
export function pathnameToArgosName(pathname: string): string {
return pathname.replace(/^\/|\/$/g, "") || "index";
}
3 changes: 3 additions & 0 deletions website/babel.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
module.exports = {
presets: [require.resolve('@docusaurus/core/lib/babel/preset')],
};
33 changes: 33 additions & 0 deletions website/blog/0.1-release/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: Introducing OpenLineage 0.1.0
date: 2021-09-03
authors: [Le Dem]
description: We are pleased to announce the initial release of OpenLineage. This release includes the core specification, data model, clients, and integrations with common data tools.
---
We are pleased to announce the initial release of OpenLineage. This release includes the core specification, data model, clients, and integrations with common data tools.

<!--truncate-->

We are pleased to announce the initial release of OpenLineage. This is the culmination of a broad community effort, and establishes a common framework for data lineage collection and analysis.

We want to thank [all the contributors](https://github.com/OpenLineage/OpenLineage/graphs/contributors) as well all the projects and companies involved in the design (in alphabetical order): [Airflow](https://airflow.apache.org), [Astronomer](https://www.astronomer.io), [Datakin](https://datakin.com), [Data Mesh](https://datameshlearning.com), [dbt](https://www.getdbt.com), [Egeria](https://egeria.odpi.org), [GetInData](https://getindata.com), [Great Expectations](https://greatexpectations.io), [Iceberg](https://iceberg.apache.org) (and others that I am probably forgetting).

This release includes:
* The initial 1-0-0 release of the [OpenLineage specification](https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md)
* A core lineage model of Jobs, Runs and Datasets
* Core facets
* Data Quality Metrics and statistics
* Dataset schema
* Source code location
* SQL
* Clients that send OpenLineage events to an HTTP backend
* Java
* Python
* [Integrations](https://github.com/OpenLineage/OpenLineage/tree/main/integration) that collect lineage metadata as OpenLineage events
* Apache Airflow with support for BigQuery, Great Expectations, Postgres, Redshift, Snowflake
* Apache Spark
* dbt

This is only the beginning. We invite everyone interested to [consult and contribute to the roadmap](https://github.com/OpenLineage/OpenLineage/projects). The roadmap currently contains, among other things: adding support for [Kafka](https://github.com/OpenLineage/OpenLineage/issues/152), [BI dashboards](https://github.com/OpenLineage/OpenLineage/issues/207), and [column level lineage](https://github.com/OpenLineage/OpenLineage/issues/148)...but you can influence it by participating!

Follow the [repo](https://github.com/OpenLineage/OpenLineage) to stay updated. And, as always, you can [join the conversation](http://bit.ly/OpenLineageSlack) on Slack.
Loading

0 comments on commit e216a26

Please sign in to comment.