[uss_qualifier] Load configuration elements from private GitHub repos (…

…#738) * Load configuration elements from private GitHub repos * Add api.github.com as recognized host for private GitHub repos * Add warning for github.com content reference
interuss · Aug 2, 2024 · 33bdaba · 33bdaba
1 parent 6a823a2
commit 33bdaba
Show file tree

Hide file tree

Showing 3 changed files with 102 additions and 13 deletions.
diff --git a/monitoring/uss_qualifier/configurations/README.md b/monitoring/uss_qualifier/configurations/README.md
@@ -4,6 +4,18 @@
 
 To execute a test run with uss_qualifier, a uss_qualifier configuration must be provided.  This configuration consists of the test suite to run, along with definitions for all resources needed by that test suite, plus information about artifacts that should be generated.  See [`USSQualifierConfiguration`](configuration.py) for the exact schema and [the dev configurations](./dev) for examples.
 
+### Terminology
+
+![Terminology flow chart](assets/terminology.png)
+
+* **Test configuration**: A configuration following the [`USSQualifierConfiguration`](configuration.py) schema which fully defines the actions uss_qualifier should perform when run.  This is the primary input to uss_qualifier and is fully defined by the combination of the test baseline configuration and the test environment configuration.  See ["Specifying"](#specifying) and ["Building"](#building) for more information.
+* **Test baseline configuration**: A configuration defining the behavior of the test, but generally omitting which systems are to be tested and where those systems are located.  A test baseline configuration is defined as everything in a test configuration except those elements of the configuration explicitly identified as [`non_baseline_inputs`](configuration.py).
+* **Test environment configuration**: The portions of a test configuration explicitly identified as [`non_baseline_inputs`](configuration.py) and generally corresponding with which systems are to be tested and where those systems are located.
+* **Test baseline identifier**: An identifier that corresponds to the test baseline configuration + InterUSS `monitoring` codebase version used to run the configuration.  This identifier has the characteristics of a hash: whenever any element of the test baseline configuration changes, the test baseline identifier should change as well.  Given just the test baseline identifier, there is not enough information to construct the corresponding test baseline configuration.  The long-form test baseline identifier is a long hexadecimal hash and can be found in the [`baseline_signature` field of a TestRunReport](../reports/report.py).  This long-form identifier is shortened to a short-form identifier by combining a `TB-` prefix with the first 7 characters of the long-form identifier in certain human-facing artifacts.
+* **Test environment identifier**: An identifier that corresponds to the test environment configuration.  This identifier is identical to the test baseline identifier except that it hashes the test environment configuration rather than the test baseline configuration + InterUSS `monitoring` codebase version, its long-form identifier can be found in the [`environment_signature` field of a TestRunReport](../reports/report.py), and its short-form identifier is prefixed with `TE-`.
+* **Test run report**: The full set of information captured for a test run is recorded in a [`TestRunReport` object](../reports/report.py), and often written to report.json.  This information is the test run report, and it is the basis for creating all other test artifacts.
+* **Test run identifier**: An identifier that corresponds to a particular test run.  This identifier is identical to the test baseline identifier except that it hashes the test run report rather than the test baseline configuration + InterUSS `monitoring` codebase version, and its short-form identifier is prefixed with `TR-`.
+
 ### Specifying
 
 When referring to a configuration, three methods may be used; see [`FileReference` documentation](../fileio.py) for more details.
@@ -14,17 +26,43 @@ Regardless of method used to refer to a configuration, the content of that confi
 * **Local file**: when a configuration reference is prefixed with `file://`, it refers to a local file using the path syntax of the host operating system.
 * **Web file**: when a configuration reference is prefixed with `http://` or `https://`, it refers to a file accessible at the specified URL.
 
-### Terminology
+#### Accessing private GitHub repos
+
+If some or all of a test configuration is located in a private GitHub repo, uss_qualifier can be configured to retrieve that private configuration content in the same way it retrieves publicly-available configuration content.  To enable this:
+
+* Enable personal access tokens in the organization (if the repo is owned by an organization)
+    * Go to Settings from the organization page
+    * On the left under "Third-party Access", expand "Personal access tokens" and click on "Settings"
+    * Allow access to fine-grained personal access tokens
+        * For increased security, recommended settings are to require administrator approval and to restrict access to classic personal access tokens, but these settings are up to the organization administrator's discretion
+* Create a personal access token capable of viewing the private repo
+    * With the GitHub user who will be executing (or managing the execution of) uss_qualifier, navigate to user "Settings"
+    * On the left at the very button, navigate to "Developer settings"
+    * On the left, expand "Personal access tokens" and navigate to "Fine-grained tokens"
+    * Click "Generate new token"
+    * Name the token something descriptive; e.g., "Read-only access to private repos"
+    * Under "Resource owner", select the appropriate owner (the organization, if the repo is owned by an organization)
+    * Under "Repository access", select "Only select repositories" and select the private repos to be accessed
+    * Under "Permissions", expand "Repository permissions" and change "Contents" to "Access: read-only"
+    * Create the token and copy the value to a secure location
+* Identify the private repos and provide the personal access token to uss_qualifier
+    * Before running uss_qualifier, populate the environment variable `GITHUB_PRIVATE_REPOS`
+        * The value of this environment variable should be a series of private repositories declarations delimited with semicolons
+        * Each private repositories declaration should follow the format `ORG_NAME/REPO_NAMES:PAT` where
+            * `ORG_NAME` is the name of the GitHub organization or user who owns the repository
+            * `REPO_NAMES` is a comma-separated listed of private repos
+            * `PAT` is the personal access token
+        * Example: `interuss/secret_repo1,secret_repo2:github_pat_abcdefg01234_foobar;interuss_collaborator/other_secret_repo:github_pat_zyxw987_baz`
+
+Now, references to content in these private repos can be used in configurations.  For instance:
 
-![Terminology flow chart](assets/terminology.png)
+```yaml
+$ref: https://raw.githubusercontent.com/interuss/secret_repo1/main/configuration/test_baseline.yaml
+```
 
-* **Test configuration**: A configuration following the [`USSQualifierConfiguration`](configuration.py) schema which fully defines the actions uss_qualifier should perform when run.  This is the primary input to uss_qualifier and is fully defined by the combination of the test baseline configuration and the test environment configuration.  See ["Specifying"](#specifying) and ["Building"](#building) for more information.
-* **Test baseline configuration**: A configuration defining the behavior of the test, but generally omitting which systems are to be tested and where those systems are located.  A test baseline configuration is defined as everything in a test configuration except those elements of the configuration explicitly identified as [`non_baseline_inputs`](configuration.py).
-* **Test environment configuration**: The portions of a test configuration explicitly identified as [`non_baseline_inputs`](configuration.py) and generally corresponding with which systems are to be tested and where those systems are located.
-* **Test baseline identifier**: An identifier that corresponds to the test baseline configuration + InterUSS `monitoring` codebase version used to run the configuration.  This identifier has the characteristics of a hash: whenever any element of the test baseline configuration changes, the test baseline identifier should change as well.  Given just the test baseline identifier, there is not enough information to construct the corresponding test baseline configuration.  The long-form test baseline identifier is a long hexadecimal hash and can be found in the [`baseline_signature` field of a TestRunReport](../reports/report.py).  This long-form identifier is shortened to a short-form identifier by combining a `TB-` prefix with the first 7 characters of the long-form identifier in certain human-facing artifacts.
-* **Test environment identifier**: An identifier that corresponds to the test environment configuration.  This identifier is identical to the test baseline identifier except that it hashes the test environment configuration rather than the test baseline configuration + InterUSS `monitoring` codebase version, its long-form identifier can be found in the [`environment_signature` field of a TestRunReport](../reports/report.py), and its short-form identifier is prefixed with `TE-`.
-* **Test run report**: The full set of information captured for a test run is recorded in a [`TestRunReport` object](../reports/report.py), and often written to report.json.  This information is the test run report, and it is the basis for creating all other test artifacts.
-* **Test run identifier**: An identifier that corresponds to a particular test run.  This identifier is identical to the test baseline identifier except that it hashes the test run report rather than the test baseline configuration + InterUSS `monitoring` codebase version, and its short-form identifier is prefixed with `TR-`.
+```jsonnet
+local test_environment = import 'https://raw.githubusercontent.com/interuss_collaborator/other_secret_repo/1234abcdef/configuration/test_environment.libsonnet';
+```
 
 ### Building
 

diff --git a/monitoring/uss_qualifier/fileio.py b/monitoring/uss_qualifier/fileio.py
@@ -1,9 +1,12 @@
+import base64
 import json
 import os
+import re
 from typing import Tuple, Optional, Dict, List, Union
 
 import bc_jsonpath_ng
 import _jsonnet
+from loguru import logger
 import requests
 import yaml
 
@@ -72,12 +75,56 @@ def get_package_name(local_file_path: str) -> FileReference:
     return ".".join(os.path.normpath(rel_path).split(os.path.sep))
 
 
+def _get_web_content(url: str) -> str:
+    headers = {}
+
+    # Check if this is a request to a private GitHub repo
+    github_private_repos_key = "GITHUB_PRIVATE_REPOS"
+    if github_private_repos_key in os.environ:
+        github_match = re.match(
+            r"^https://(?P<hostname>github\.com|raw\.githubusercontent\.com|api\.github\.com)/(?P<org>[^/]*)/(?P<repo>[^/?#]*)(?P<predicate>.*)$",
+            url,
+        )
+        if github_match:
+            if github_match.group("hostname") == "github.com":
+                logger.warning(
+                    f"{url} references the main GitHub UI; did you mean to specify a reference to the corresponding content on raw.githubusercontent.com?"
+                )
+            org = github_match.group("org")
+            repo = github_match.group("repo")
+
+            # Extract personal access token(s) and applicability from environment variable
+            token = None
+            pat_defs = os.environ.get(github_private_repos_key).split(";")
+            for pat_def in pat_defs:
+                patdef_match = re.match(
+                    f"^(?P<org>[^/]*)/(?P<repos>[^:]*):(?P<token>.*)$", pat_def
+                )
+                if not patdef_match:
+                    raise ValueError(
+                        f"Error in {github_private_repos_key} environment variable: element `{pat_def}` does not follow the pattern ORG/REPOS:TOKEN"
+                    )
+                token_org = patdef_match.group("org")
+                token_repos = patdef_match.group("repos").split(",")
+                if org == token_org and repo in token_repos:
+                    token = patdef_match.group("token")
+                    break
+
+            if token is not None:
+                # This request is for a resource in a private GitHub repo that we have a personal access token for.
+                headers[
+                    "Authorization"
+                ] = f"Basic {base64.b64encode(token.encode()).decode()}"
+
+    resp = requests.get(url, headers=headers)
+    resp.raise_for_status()
+    return resp.content.decode("utf-8")
+
+
 def _load_content_from_file_name(file_name: str) -> str:
     if file_name.startswith(HTTP_PREFIX) or file_name.startswith(HTTPS_PREFIX):
         # http(s):// web file reference
-        resp = requests.get(file_name)
-        resp.raise_for_status()
-        file_content = resp.content.decode("utf-8")
+        file_content = _get_web_content(file_name)
     else:
         with open(file_name, "r") as f:
             file_content = f.read()
@@ -170,7 +217,10 @@ def _load_dict_with_references_from_file_name(
             # This is a package-based file path
             base_file_name = resolve_filename(base_file_name)
 
-    base_file_name = os.path.abspath(base_file_name)
+    if not base_file_name.startswith(HTTP_PREFIX) and not base_file_name.startswith(
+        HTTPS_PREFIX
+    ):
+        base_file_name = os.path.abspath(base_file_name)
 
     if base_file_name in cache:
         dict_content = cache[base_file_name]

diff --git a/monitoring/uss_qualifier/run_locally.sh b/monitoring/uss_qualifier/run_locally.sh
@@ -68,6 +68,7 @@ docker run ${docker_args} --name uss_qualifier \
   -e PYTHONBUFFERED=1 \
   -e AUTH_SPEC=${AUTH_SPEC} \
   -e AUTH_SPEC_2=${AUTH_SPEC_2} \
+  -e GITHUB_PRIVATE_REPOS=${GITHUB_PRIVATE_REPOS:-} \
   -e MONITORING_GITHUB_ROOT=${MONITORING_GITHUB_ROOT:-} \
   -v "$(pwd)/$OUTPUT_DIR:/app/$OUTPUT_DIR" \
   -v "$(pwd)/$CACHE_DIR:/app/$CACHE_DIR" \