Skip to content

Commit

Permalink
[uss_qualifier] Load configuration elements from private GitHub repos (
Browse files Browse the repository at this point in the history
…#738)

* Load configuration elements from private GitHub repos

* Add api.github.com as recognized host for private GitHub repos

* Add warning for github.com content reference
  • Loading branch information
BenjaminPelletier authored Aug 2, 2024
1 parent 6a823a2 commit 33bdaba
Show file tree
Hide file tree
Showing 3 changed files with 102 additions and 13 deletions.
56 changes: 47 additions & 9 deletions monitoring/uss_qualifier/configurations/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,18 @@

To execute a test run with uss_qualifier, a uss_qualifier configuration must be provided. This configuration consists of the test suite to run, along with definitions for all resources needed by that test suite, plus information about artifacts that should be generated. See [`USSQualifierConfiguration`](configuration.py) for the exact schema and [the dev configurations](./dev) for examples.

### Terminology

![Terminology flow chart](assets/terminology.png)

* **Test configuration**: A configuration following the [`USSQualifierConfiguration`](configuration.py) schema which fully defines the actions uss_qualifier should perform when run. This is the primary input to uss_qualifier and is fully defined by the combination of the test baseline configuration and the test environment configuration. See ["Specifying"](#specifying) and ["Building"](#building) for more information.
* **Test baseline configuration**: A configuration defining the behavior of the test, but generally omitting which systems are to be tested and where those systems are located. A test baseline configuration is defined as everything in a test configuration except those elements of the configuration explicitly identified as [`non_baseline_inputs`](configuration.py).
* **Test environment configuration**: The portions of a test configuration explicitly identified as [`non_baseline_inputs`](configuration.py) and generally corresponding with which systems are to be tested and where those systems are located.
* **Test baseline identifier**: An identifier that corresponds to the test baseline configuration + InterUSS `monitoring` codebase version used to run the configuration. This identifier has the characteristics of a hash: whenever any element of the test baseline configuration changes, the test baseline identifier should change as well. Given just the test baseline identifier, there is not enough information to construct the corresponding test baseline configuration. The long-form test baseline identifier is a long hexadecimal hash and can be found in the [`baseline_signature` field of a TestRunReport](../reports/report.py). This long-form identifier is shortened to a short-form identifier by combining a `TB-` prefix with the first 7 characters of the long-form identifier in certain human-facing artifacts.
* **Test environment identifier**: An identifier that corresponds to the test environment configuration. This identifier is identical to the test baseline identifier except that it hashes the test environment configuration rather than the test baseline configuration + InterUSS `monitoring` codebase version, its long-form identifier can be found in the [`environment_signature` field of a TestRunReport](../reports/report.py), and its short-form identifier is prefixed with `TE-`.
* **Test run report**: The full set of information captured for a test run is recorded in a [`TestRunReport` object](../reports/report.py), and often written to report.json. This information is the test run report, and it is the basis for creating all other test artifacts.
* **Test run identifier**: An identifier that corresponds to a particular test run. This identifier is identical to the test baseline identifier except that it hashes the test run report rather than the test baseline configuration + InterUSS `monitoring` codebase version, and its short-form identifier is prefixed with `TR-`.

### Specifying

When referring to a configuration, three methods may be used; see [`FileReference` documentation](../fileio.py) for more details.
Expand All @@ -14,17 +26,43 @@ Regardless of method used to refer to a configuration, the content of that confi
* **Local file**: when a configuration reference is prefixed with `file://`, it refers to a local file using the path syntax of the host operating system.
* **Web file**: when a configuration reference is prefixed with `http://` or `https://`, it refers to a file accessible at the specified URL.

### Terminology
#### Accessing private GitHub repos

If some or all of a test configuration is located in a private GitHub repo, uss_qualifier can be configured to retrieve that private configuration content in the same way it retrieves publicly-available configuration content. To enable this:

* Enable personal access tokens in the organization (if the repo is owned by an organization)
* Go to Settings from the organization page
* On the left under "Third-party Access", expand "Personal access tokens" and click on "Settings"
* Allow access to fine-grained personal access tokens
* For increased security, recommended settings are to require administrator approval and to restrict access to classic personal access tokens, but these settings are up to the organization administrator's discretion
* Create a personal access token capable of viewing the private repo
* With the GitHub user who will be executing (or managing the execution of) uss_qualifier, navigate to user "Settings"
* On the left at the very button, navigate to "Developer settings"
* On the left, expand "Personal access tokens" and navigate to "Fine-grained tokens"
* Click "Generate new token"
* Name the token something descriptive; e.g., "Read-only access to private repos"
* Under "Resource owner", select the appropriate owner (the organization, if the repo is owned by an organization)
* Under "Repository access", select "Only select repositories" and select the private repos to be accessed
* Under "Permissions", expand "Repository permissions" and change "Contents" to "Access: read-only"
* Create the token and copy the value to a secure location
* Identify the private repos and provide the personal access token to uss_qualifier
* Before running uss_qualifier, populate the environment variable `GITHUB_PRIVATE_REPOS`
* The value of this environment variable should be a series of private repositories declarations delimited with semicolons
* Each private repositories declaration should follow the format `ORG_NAME/REPO_NAMES:PAT` where
* `ORG_NAME` is the name of the GitHub organization or user who owns the repository
* `REPO_NAMES` is a comma-separated listed of private repos
* `PAT` is the personal access token
* Example: `interuss/secret_repo1,secret_repo2:github_pat_abcdefg01234_foobar;interuss_collaborator/other_secret_repo:github_pat_zyxw987_baz`

Now, references to content in these private repos can be used in configurations. For instance:

![Terminology flow chart](assets/terminology.png)
```yaml
$ref: https://raw.githubusercontent.com/interuss/secret_repo1/main/configuration/test_baseline.yaml
```
* **Test configuration**: A configuration following the [`USSQualifierConfiguration`](configuration.py) schema which fully defines the actions uss_qualifier should perform when run. This is the primary input to uss_qualifier and is fully defined by the combination of the test baseline configuration and the test environment configuration. See ["Specifying"](#specifying) and ["Building"](#building) for more information.
* **Test baseline configuration**: A configuration defining the behavior of the test, but generally omitting which systems are to be tested and where those systems are located. A test baseline configuration is defined as everything in a test configuration except those elements of the configuration explicitly identified as [`non_baseline_inputs`](configuration.py).
* **Test environment configuration**: The portions of a test configuration explicitly identified as [`non_baseline_inputs`](configuration.py) and generally corresponding with which systems are to be tested and where those systems are located.
* **Test baseline identifier**: An identifier that corresponds to the test baseline configuration + InterUSS `monitoring` codebase version used to run the configuration. This identifier has the characteristics of a hash: whenever any element of the test baseline configuration changes, the test baseline identifier should change as well. Given just the test baseline identifier, there is not enough information to construct the corresponding test baseline configuration. The long-form test baseline identifier is a long hexadecimal hash and can be found in the [`baseline_signature` field of a TestRunReport](../reports/report.py). This long-form identifier is shortened to a short-form identifier by combining a `TB-` prefix with the first 7 characters of the long-form identifier in certain human-facing artifacts.
* **Test environment identifier**: An identifier that corresponds to the test environment configuration. This identifier is identical to the test baseline identifier except that it hashes the test environment configuration rather than the test baseline configuration + InterUSS `monitoring` codebase version, its long-form identifier can be found in the [`environment_signature` field of a TestRunReport](../reports/report.py), and its short-form identifier is prefixed with `TE-`.
* **Test run report**: The full set of information captured for a test run is recorded in a [`TestRunReport` object](../reports/report.py), and often written to report.json. This information is the test run report, and it is the basis for creating all other test artifacts.
* **Test run identifier**: An identifier that corresponds to a particular test run. This identifier is identical to the test baseline identifier except that it hashes the test run report rather than the test baseline configuration + InterUSS `monitoring` codebase version, and its short-form identifier is prefixed with `TR-`.
```jsonnet
local test_environment = import 'https://raw.githubusercontent.com/interuss_collaborator/other_secret_repo/1234abcdef/configuration/test_environment.libsonnet';
```

### Building

Expand Down
58 changes: 54 additions & 4 deletions monitoring/uss_qualifier/fileio.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
import base64
import json
import os
import re
from typing import Tuple, Optional, Dict, List, Union

import bc_jsonpath_ng
import _jsonnet
from loguru import logger
import requests
import yaml

Expand Down Expand Up @@ -72,12 +75,56 @@ def get_package_name(local_file_path: str) -> FileReference:
return ".".join(os.path.normpath(rel_path).split(os.path.sep))


def _get_web_content(url: str) -> str:
headers = {}

# Check if this is a request to a private GitHub repo
github_private_repos_key = "GITHUB_PRIVATE_REPOS"
if github_private_repos_key in os.environ:
github_match = re.match(
r"^https://(?P<hostname>github\.com|raw\.githubusercontent\.com|api\.github\.com)/(?P<org>[^/]*)/(?P<repo>[^/?#]*)(?P<predicate>.*)$",
url,
)
if github_match:
if github_match.group("hostname") == "github.com":
logger.warning(
f"{url} references the main GitHub UI; did you mean to specify a reference to the corresponding content on raw.githubusercontent.com?"
)
org = github_match.group("org")
repo = github_match.group("repo")

# Extract personal access token(s) and applicability from environment variable
token = None
pat_defs = os.environ.get(github_private_repos_key).split(";")
for pat_def in pat_defs:
patdef_match = re.match(
f"^(?P<org>[^/]*)/(?P<repos>[^:]*):(?P<token>.*)$", pat_def
)
if not patdef_match:
raise ValueError(
f"Error in {github_private_repos_key} environment variable: element `{pat_def}` does not follow the pattern ORG/REPOS:TOKEN"
)
token_org = patdef_match.group("org")
token_repos = patdef_match.group("repos").split(",")
if org == token_org and repo in token_repos:
token = patdef_match.group("token")
break

if token is not None:
# This request is for a resource in a private GitHub repo that we have a personal access token for.
headers[
"Authorization"
] = f"Basic {base64.b64encode(token.encode()).decode()}"

resp = requests.get(url, headers=headers)
resp.raise_for_status()
return resp.content.decode("utf-8")


def _load_content_from_file_name(file_name: str) -> str:
if file_name.startswith(HTTP_PREFIX) or file_name.startswith(HTTPS_PREFIX):
# http(s):// web file reference
resp = requests.get(file_name)
resp.raise_for_status()
file_content = resp.content.decode("utf-8")
file_content = _get_web_content(file_name)
else:
with open(file_name, "r") as f:
file_content = f.read()
Expand Down Expand Up @@ -170,7 +217,10 @@ def _load_dict_with_references_from_file_name(
# This is a package-based file path
base_file_name = resolve_filename(base_file_name)

base_file_name = os.path.abspath(base_file_name)
if not base_file_name.startswith(HTTP_PREFIX) and not base_file_name.startswith(
HTTPS_PREFIX
):
base_file_name = os.path.abspath(base_file_name)

if base_file_name in cache:
dict_content = cache[base_file_name]
Expand Down
1 change: 1 addition & 0 deletions monitoring/uss_qualifier/run_locally.sh
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ docker run ${docker_args} --name uss_qualifier \
-e PYTHONBUFFERED=1 \
-e AUTH_SPEC=${AUTH_SPEC} \
-e AUTH_SPEC_2=${AUTH_SPEC_2} \
-e GITHUB_PRIVATE_REPOS=${GITHUB_PRIVATE_REPOS:-} \
-e MONITORING_GITHUB_ROOT=${MONITORING_GITHUB_ROOT:-} \
-v "$(pwd)/$OUTPUT_DIR:/app/$OUTPUT_DIR" \
-v "$(pwd)/$CACHE_DIR:/app/$CACHE_DIR" \
Expand Down

0 comments on commit 33bdaba

Please sign in to comment.