Skip to content

Commit

Permalink
feat: Add initenvs API endpoint to set env vars (#37)
Browse files Browse the repository at this point in the history
  • Loading branch information
yoomlam authored Jun 8, 2024
1 parent f2094cf commit d7fe94f
Show file tree
Hide file tree
Showing 8 changed files with 167 additions and 37 deletions.
38 changes: 20 additions & 18 deletions .github/workflows/lightsail-mgmt.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,21 @@ run-name: "For subdomain ${{inputs.subdomain}}: ${{inputs.command}}"
on:
workflow_dispatch:
inputs:
command:
description: "Command to perform on Lightsail service"
required: true
type: choice
default: 'status'
options:
- 'status'
- 'list_images'
- 'delete_old_images'
- 'enable'
- 'disable'
- 'disable_all'
- 'update_power'
- 'create_new'
- 'delete_service'
subdomain:
description: 'Subdomain of navalabs.co on which to run command'
type: choice
Expand All @@ -20,32 +35,18 @@ on:
- 'bdt-chatbot'
- 'chatbot-prototype'
- 'chat.zone'
command:
description: "Command to perform on Lightsail service"
required: true
type: choice
default: 'status'
options:
- 'status'
- 'list_images'
- 'delete_old_images'
- 'enable'
- 'disable'
- 'disable_all'
- 'update_power'
- 'create_new'
- 'delete_service'
power:
description: "Only used for update_power and create_new commands: power of service (useful for deployment failures)"
description: "(Only for update_power and create_new commands) power of service"
type: choice
default: ''
options:
# - ''
- ''
# - nano
- micro
- small
- medium
- large
- xlarge
# - xlarge

permissions:
id-token: write # This is required for requesting the JWT from GitHub's OIDC provider for AWS authentication
Expand Down Expand Up @@ -193,6 +194,7 @@ jobs:
aws lightsail delete-container-service --service-name "$SERVICE_NAME"
- name: "Print status"
if: always()
run: |
aws lightsail get-container-services | jq '.containerServices[] | { containerServiceName, createdAt, state, isDisabled, power,
"deployment_state": .currentDeployment.state,
Expand Down
28 changes: 24 additions & 4 deletions .github/workflows/push-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ on:
required: true
type: boolean
default: true
deploy_retries:
description: "Number of times to retry deployment"
required: true
type: number
default: 3

permissions:
id-token: write # This is required for requesting the JWT from GitHub's OIDC provider for AWS authentication
Expand Down Expand Up @@ -88,6 +93,7 @@ jobs:
echo "SECRET_NAME=$SECRET_NAME" >> $GITHUB_ENV
- name: "Populate .env file"
if: inputs.dockerfile_folder != '05-assistive-chatbot'
run: |
# The ENV_FILE_CONTENTS contains API keys, like LITERAL_API_KEY and OPENAI_API_KEY
# As such, make sure the built image is not publicly accessible
Expand Down Expand Up @@ -131,8 +137,10 @@ jobs:
LAST_DEPLOYMENT_IMAGE=$(aws lightsail get-container-service-deployments --service-name "$SERVICE_NAME" | jq -r ".deployments[0].containers.chatbot.image")
echo "LS_DOCKER_IMAGE=$LAST_DEPLOYMENT_IMAGE" >> $GITHUB_ENV
- name: "Create new deployment"
- name: "Submit deployment"
if: inputs.deploy_image
env:
DEPLOY_RETRIES: ${{ inputs.deploy_retries }}
run: |
CONFIG_TEMPLATE='{
"serviceName": "$SERVICE_NAME",
Expand Down Expand Up @@ -170,18 +178,30 @@ jobs:
echo "$CONFIG_TEMPLATE" | envsubst > config.json
cat config.json
for ((i = 0 ; i < ${DEPLOY_RETRIES:=3} ; i++ )); do
for ((i = 0 ; i < ${DEPLOY_RETRIES} ; i++ )); do
echo "## Deploy attempt $((i+1)) of $DEPLOY_RETRIES"
date
echo "Creating new deployment"
aws lightsail create-container-service-deployment --cli-input-json file://config.json
sleep 10
sleep 30
if ./.github/workflows/waitForLightsail.sh deployment; then
echo "Success"
echo "Success on attempt $((i+1))"
break;
fi
done
date
- name: "Initialize app"
if: inputs.deploy_image && (inputs.dockerfile_folder == '05-assistive-chatbot')
run: |
# The ENV_FILE_CONTENTS contains API keys, like LITERAL_API_KEY and OPENAI_API_KEY
# As such, make sure the built image is not publicly accessible
echo "${{ secrets[env.SECRET_NAME] }}" > .env_vars
SVC_URL=$(aws lightsail get-container-services --service-name "$SERVICE_NAME" | jq -r '.containerServices[0].url')
echo "Setting API keys at $SVC_URL"
curl --fail -X POST "${SVC_URL}initenvs" --data-binary '@.env_vars'
# TODO: warm up vector DB on startup
2 changes: 1 addition & 1 deletion .github/workflows/waitForLightsail.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ wait_for_service_state(){
wait_for_next_container(){
echo "Waiting for deployment $TARGET_DEP_VERSION to be Active"
while true; do
sleep 20
sleep 30
local SERVICE_OBJ=$(aws lightsail get-container-services --service-name "$SERVICE_NAME")
local SVC_STATE=$(echo "$SERVICE_OBJ" | jq -r '.containerServices[0].state')
local CURR_DEP_VER=$(echo "$SERVICE_OBJ" | jq -r '.containerServices[0].currentDeployment.version')
Expand Down
1 change: 1 addition & 0 deletions 05-assistive-chatbot/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,6 @@ log/

# .env contains secret API keys
.env
.secrets-*

guru_cards*.json
37 changes: 25 additions & 12 deletions 05-assistive-chatbot/chatbot/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,22 @@
# - add unit tests


## Set default environment variables


# Opt out of telemetry -- https://docs.trychroma.com/telemetry
os.environ.setdefault("ANONYMIZED_TELEMETRY", "False")

# Used by SentenceTransformerEmbeddings and HuggingFaceEmbeddings
os.environ.setdefault("SENTENCE_TRANSFORMERS_HOME", "./.sentence-transformers-cache")

# Disable DSPy cache to get different responses for retry attempts
# Set to true to enable caching for faster responses and optimizing prompts using DSPy
os.environ.setdefault("DSP_CACHEBOOL", "false")

os.environ.setdefault("BUILD_DATE", str(date.today()))


## Initialize logging


Expand All @@ -29,22 +45,11 @@ def configure_logging():
dotenv.load_dotenv()
configure_logging()
logger = logging.getLogger(__name__)
logger.info("Build date: %s", os.environ.get("BUILD_DATE"))


## Initialize settings

# Opt out of telemetry -- https://docs.trychroma.com/telemetry
os.environ.setdefault("ANONYMIZED_TELEMETRY", "False")

# Used by SentenceTransformerEmbeddings and HuggingFaceEmbeddings
os.environ.setdefault("SENTENCE_TRANSFORMERS_HOME", "./.sentence-transformers-cache")

# Disable DSPy cache to get different responses for retry attempts
# Set to true to enable caching for faster responses and optimizing prompts using DSPy
os.environ.setdefault("DSP_CACHEBOOL", "false")

os.environ.setdefault("BUILD_DATE", str(date.today()))


@utils.verbose_timer(logger)
def _init_settings():
Expand All @@ -70,6 +75,14 @@ def is_true(string):
initial_settings = _init_settings()


def reset():
configure_logging()
engines._engines.clear()
llms._llms.clear()
global initial_settings
initial_settings = _init_settings()


@utils.verbose_timer(logger)
def validate_settings(settings):
chat_engine = settings["chat_engine"]
Expand Down
1 change: 1 addition & 0 deletions 05-assistive-chatbot/chatbot/llms/groq_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ def __init__(self, model_name, settings):
self.model_name = model_name
self.settings = settings
logger.info("Creating LLM client '%s' with %s", model_name, self.settings)
# TODO: remove temperature from settings
self.client = Groq(**self.settings)

def generate_reponse(self, message):
Expand Down
28 changes: 26 additions & 2 deletions 05-assistive-chatbot/chatbot_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,14 @@

import logging
import os
import platform
import socket
from functools import cached_property
from io import StringIO
from typing import Dict

from fastapi import FastAPI, Request
import dotenv
from fastapi import Body, FastAPI, Request
from fastapi.responses import HTMLResponse

import chatbot
Expand Down Expand Up @@ -55,9 +59,29 @@ def healthcheck(request: Request):

git_sha = os.environ.get("GIT_SHA", "")
build_date = os.environ.get("BUILD_DATE", "")
hostname = f"{platform.node()} {socket.gethostname()}"

logger.info("Returning: Healthy %s %s", build_date, git_sha)
return HTMLResponse(f"Healthy {git_sha} built at {build_date}")
return HTMLResponse(f"Healthy {git_sha} built at {build_date}<br/>{hostname}")


ALLOWED_ENV_VARS = ["CHATBOT_LOG_LEVEL"]


@app.post("/initenvs")
def initenvs(env_file_contents: str = Body()):
"Set environment variables for API keys and log level. See usage in push_image.yml"
env_values = dotenv.dotenv_values(stream=StringIO(env_file_contents))
vars_updated = []
for name, value in env_values.items():
if name.endswith("_API_KEY") or name.endswith("_API_TOKEN") or name in ALLOWED_ENV_VARS:
logger.info("Setting environment variable %s", name)
os.environ[name] = value or ""
vars_updated.append(name)
else:
logger.warning("Setting environment variable %s is not allowed!", name)
chatbot.reset()
return str(vars_updated)


if __name__ == "__main__":
Expand Down
69 changes: 69 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,71 @@
# labs-gen-ai-experiments
Generative A.I. experiments for the Nava Labs Gates project

## Lightsail Deployments

Deployments of prototypes use AWS Lightsail for easier setup and maintenance, plus drastically lower costs.

To share devops responsibilities and facilitate my outages in June and beyond, all engineers have been given access to [AWS Lightsail web console](https://lightsail.aws.amazon.com/ls/webapp/home/containers) but all the actions that need to be performed are implemented as GitHub Actions:
- [Lightsail service management](https://github.com/navapbc/labs-gen-ai-experiments/actions/workflows/lightsail-mgmt.yml)
- [Build and deploy Docker image](https://github.com/navapbc/labs-gen-ai-experiments/actions/workflows/push-image.yml)

Terms:
- `deployment` = defines Docker image(s), environment variables, port, and health check configurations
- `service` = a [Lightsail Container Service](https://docs.aws.amazon.com/lightsail/latest/userguide/amazon-lightsail-container-services.html), not to be confused with [Lightsail Instances](https://docs.aws.amazon.com/lightsail/latest/userguide/understanding-instances-virtual-private-servers-in-amazon-lightsail.html).
- A service can be in the following states: `DISABLED`, `READY` (no deployment), `UPDATING`, `DEPLOYING`, or `RUNNING` (a deployment). These states are checked in `lightsail-mgmt.yml`.
- A service defines the capacity (`power` x `scale`) that runs the specified deployment. The chosen capacity determines the cost. The capacity can be changed when the service is in the `READY` or `RUNNING` states.

### Updating deployments

When code is updated, a new image is created, along with a new `deployment`, which is then submitted to the service. Use the [Build and deploy Docker image](https://github.com/navapbc/labs-gen-ai-experiments/actions/workflows/push-image.yml) GitHub Action to build the image and optionally deploy it to the chosen subdomain. The action performs the following:
1. Populates a `.env` file based on a [GitHub Action secret](https://github.com/navapbc/labs-gen-ai-experiments/settings/secrets/actions), such as `DOT_ENV_FILE_CONTENTS` or `DOT_ENV_FILE_CONTENTS_04`. This secret contains API keys, like `LITERAL_API_KEY` and `OPENAI_API_KEY`.
1. Builds the image using the `.env` file.
1. Pushes the image to the Lightsail service corresponding to the chosen `subdomain`.
1. Creates and submits a new deployment using the image name assigned by Lightsail. The deployment is retried several times in case of failure.

This process takes about 15 minutes to complete.

### Addressing failed deployments

Sometimes a deployment fails due to failed healthchecks, which is indicated in the logs as `Took too long`, even after a `Reached a steady state` message.
* Assuming the Docker image functions properly, retrying the deployment will eventually succeed. In the [Build and deploy Docker image](https://github.com/navapbc/labs-gen-ai-experiments/actions/workflows/push-image.yml) GitHub Action, retry the last deployment quickly by deselecting `Build and push image` and selecting `Deploy built image or last deployment`.
* Increasing the service's `power` decreases the risk of healthcheck failures. Adjust the power by using the [Lightsail service management](https://github.com/navapbc/labs-gen-ai-experiments/actions/workflows/lightsail-mgmt.yml) GitHub Action. Note that `medium` and higher power options [start at $40/month](https://aws.amazon.com/lightsail/pricing/), though it is charged per time used.
- Once deployed, test the application. Then visit the `Metrics` tab of the service to assess the CPU and memory needs of the application.
- Often, the `power` can be decreased after a successful deployment. For the `05-assistive-chatbot` application, `micro` is sufficient but higher power may be desirable for user testing.

### Creating a new service for deployments

Use the [Lightsail service management](https://github.com/navapbc/labs-gen-ai-experiments/actions/workflows/lightsail-mgmt.yml) GitHub Action to perform the following procedure. All commands require a `subdomain` except the `status` and `disable_all` commands. Note the `status` command runs at the end of every run to show the state of Lightsail services.

1. Check the list of existing Lightsail (container) services by running `status` using the GitHub Action or checking the [Lightsail web console](https://lightsail.aws.amazon.com/ls/webapp/home/containers).
- Each service maps to a subdomain of `navalabs.co`. The name of the service uses the syntax: `<subdomain>-svc`.
1. Existing services can be deleted if desired -- run `delete_service` using the GitHub Action.
1. To create a new service or recreate a deleted service, run `create_new` using the GitHub Action with the desired `subdomain` and `power`.

### Maintenance

Use the [Lightsail service management](https://github.com/navapbc/labs-gen-ai-experiments/actions/workflows/lightsail-mgmt.yml) GitHub Action to perform the any of the following.

- Delete old images (`delete_old_images`) or old services (`delete_service`)
- Disable services if not expected to be used for a long duration (`disable` or `disable_all`)

### Custom domains setup

Setting up custom domains only needs to be done once for a set of subdomains. Ten subdomains of `navalabs.co` were created, and services are automatically configured to use them. To create more subdomains or choose a different set of subdomains, go through this process again.

When a service is created, the service is available via an Amazon-based domain `https://...amazonlightsail.com`. The SSL certificate is associated with this specific Amazon domain. To associate the service with a custom domain like `chatbot.navalabs.co`, the following was done to use a separate SSL certificate:
- Created a `navalabs.co` DNS Zone at [Domains & DNS](https://lightsail.aws.amazon.com/ls/webapp/home/domains) and copied the 4 provided name servers.
- Since `navalabs.co` is registered at NameCheap, replace the nameservers at NameCheap's website with the Amazon-provided name servers.
- Picked a Lightsail service and in the `Custom domains` tab, selected `Create certificate`, naming the SSL certificate `navalabs-cert` and provided an arbitrary set of subdomains. This certificate is reused by other services.

When the `create_new` command is run using the [Lightsail service management](https://github.com/navapbc/labs-gen-ai-experiments/actions/workflows/lightsail-mgmt.yml) GitHub Action, it:
- creates a new service for the specified `subdomain` using the `navalabs-cert`
- deletes any existing DNS Zone Assignment with the same `subdomain`
- creates a new DNS Zone Assignment for the `subdomain` so that it targets the new service
The effect can be seen in [Lightsail's DNS Zone Assignments](https://lightsail.aws.amazon.com/ls/webapp/domains/navalabs-co/assignments) and the service's `Custom domains` tab.

See [Amazon docs](https://docs.aws.amazon.com/lightsail/latest/userguide/amazon-lightsail-enabling-container-services-custom-domains.html) for details.

#### Targeting EC2 instances

New entries in [Lightsail DNS Zone's DNS records tab](https://lightsail.aws.amazon.com/ls/webapp/domains/navalabs-co/advanced) can be manually created to target applications at other URLs, such as those associated with EC2 instances. This avoids having to use (and pay for) Amazon Cerficate Manager or Route 53.

0 comments on commit d7fe94f

Please sign in to comment.