Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Actions workflow for running Automated Tests #62

Closed
wants to merge 19 commits into from

Conversation

MukuFlash03
Copy link

@MukuFlash03 MukuFlash03 commented Jun 23, 2024

The eventual goal is to have minimal manual intervention when testing latest images.
Essentially, we want to run the automated tests on every rebuild of the image so we know if the images are good enough to merge.

There are two scripts involved in running the demo tests:

Another requirement is that we need to run tests on different operating systems, hence the GitHub actions workflow must run on atleast ubuntu (for now) and MacOS later on.
MacOS with M1 chips has faced issues in the past and could be tricky to get configured.


Initial PR by shankari.
Manual testing steps highlighted in this PR.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jun 23, 2024

Related Commits / PRs

Listing some possibly related commits from the codebase history which could be helpful

Note:
These changes were not introduced by me.

Demo Automated Tests: PR link


“pip install pytest”: Apr 15, 2024 (1-3)

  1. Added this line
  2. Removed this line
  3. Added this line

tests/startup_tests.py: Dec 6, 2023 (1-2) ; Apr 15, 2024 (3)

  1. Added: 381a3ce
  2. Modified: 954b5d2
  3. Removed: 8b63180

Scripts updated to use file path “ext/source/” : commit link


Added Automated Tests related PR link

Automated test docker compose file, bash script, run-test.sh: Dec 6, 2023 (1)

  • Added first here
  • run-test.sh does not contain “$ pip install pytest” when it was first added.

Note:
These changes were not introduced by me.

Some more commits made prior to this PR:

  • Removed macOS since docker not pre-installed: commit link
  • Switched to macOS completely; removed ubuntu: commit link
  • Host folder /var/folders added: commit link
  • Timeout temporary, can be removed once automated tests added: commit link
  • Timeout bumped to 30 mins; enough time for 2nd .sh file to run: commit link

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jun 23, 2024

Starting off with getting demo-automated-testing.sh to run


Docker setup process in ubuntu vs in macOS

  • Ubuntu: Don’t need to install docker as it looks like it comes pre-installed.
  • macOS: Using a custom github action available [here](douglascamata/setup-docker-macos-action: A Github Action to setup Docker on macOS runners.).

Specific details available here in the official list of runner images.


Where does startup_tests.py come from? It isn't there in this repo?

  • Comes from everest-core as seen in this comment.
  • For the commit where startup_tests.py was first added, saw this comment:
# TODO: This should be removed once added to everest-core
COPY ./tests/startup_tests.py /ext/source/tests/core_tests/startup_tests.py
  • So, it looks like, the absence of the file should not cause any errors since it should be there in everest-core.
  • Yes, startup_tests.py present here in everest-core
  • Also, looking at Dockerfile in everest-demo/manager/ , I see that it clones the everest-core git repo, copies run-test.sh to /ext/source which is a folder created in the Dockerfile that renames everest-core/ to /ext/source.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jun 23, 2024

Timeout causing jobs to get cancelled


Found a related commit here that highlights this issue with the main reason for using timeout being that the docker compose scripts run without the -d flag and hence the process stays active in the terminal.

So for now, ignoring this. Will have to come back to it later.

@MukuFlash03
Copy link
Author

Demo-automated-testing.sh passed for ubuntu


Here is the successful: workflow run

This is output seen in the workflow logs:

manager-1      | ============================== 2 passed in 18.61s ==============================

I hope this is the right indicator that the current automated tests defined by the test script have passed.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jun 23, 2024

Added macOS runner to run automated tests


MacOS docker setup passed finally using the same custom github action used in this PR.
I did have to change the macOS version from macos-latest to macos-13 as the github action to setup docker wasn't compatible with the latest macos-14.


Timeout increased to 60 minutes

  • Increasing the timeout since the Downloading and Extracting steps themselves have taken about 20 minutes in the macOS runner.
  • But right now even for the demo-automated-tests.sh on MacOS its taking close to 40-45 minutes just for the docker image download + extraction to complete.
  • This is just for the macOS runner; ubuntu job completes pretty quickly.

As seen in this commit as well, when Shankari changed the timeout to 30 minutes, it was because the demo-iso15118-2-ac-plus-ocpp.sh script itself was taking close to 30 minutes.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jun 23, 2024

Observations for Workflow Runs with macOS


Test 1:

This macOS run passed - Pytest tests passed successfully.
However, the download and extraction of docker image itself takes close to 30 minutes, so had to increase timeout to 60 minutes.


Test 2:

For details on triggering events on PR vs non-PR (or on push) events, see this comment.

  1. Currently a PR is open into my local branch - automate-tests-merge; hence pushes trigger workflow for this as well.
    cicd.yaml run.

  2. But since this is a PR, the e2etest.yaml correctly skipped the jobs when triggered by a PR:
    e2etest.yaml run.

jobs:
  pull-and-run-tests:
    if: github.event.inputs.parent_workflow == 'cicd.yaml' && github.event.inputs.event_name != 'pull_request'
    runs-on: ${{ matrix.os }}
  1. Then added workflow dispatch to so that it triggers whenever cicd.yaml workflow is completed.
    Ensuring that the dispatch job runs only when docker job completes successfully in cicd workflow by using “needs” attribute.
    cicd.yaml run.

  2. Successfully triggered via the workflow run that itself was triggered by a non-PR event (push commit).
    e2etest.yaml run.

However in this macOS run, 1 test failed in the pytest startup_tests.
Usually there are two tests and both pass.

1 test failed with error message:

manager-1      |         if status == None or len(status) == 0:
manager-1      | >           raise TimeoutError("Timeout while waiting for EVerest to start")
manager-1      | E           TimeoutError: Timeout while waiting for EVerest to start
manager-1      | 
manager-1      | /usr/lib/python3.10/site-packages/everest/testing/core_utils/everest_core.py:198: TimeoutError

This failure error has been observed before as seen here.


Test 3:

Testing again by pushing changes.
Another different error observed in docker setup process for macOS in this run.


Test 4:

Testing again in this run, specifically to observe MacOS runner.
Docker setup successful.

However, yet again 1 test failed, 1 test passed.
But the error message was different from the error seen in previous run:

manager-1      | >       assert probe.test(20)
manager-1      | E       assert False
manager-1      | E        +  where False = <bound method ProbeModule.test of <startup_tests.ProbeModule object at 0x7ebd56d50a90>>(20)
manager-1      | E        +    where <bound method ProbeModule.test of <startup_tests.ProbeModule object at 0x7ebd56d50a90>> = <startup_tests.ProbeModule object at 0x7ebd56d50a90>.test
manager-1      | 
manager-1      | core_tests/startup_tests.py:101: AssertionError

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jun 23, 2024

Queries before moving onto demo-iso15118-2-ac-plus-ocpp.sh


1. Which version of the bash script parameters to use?

Shankari’s command for the AC script was valid as per this Readme.md.

🚨 Basic and ISO 15118-2 AC Charging with OCPP 201 CSMS (MaEVe) ⚡: 
$ curl https://raw.githubusercontent.com/everest/everest-demo/main/demo-iso15118-2-ac-plus-ocpp201.sh | bash

But latest code in EVerest/everest-demo parent repo has the new script which was added in the single demo script added in this commit.

Bash command has different arguments passed to choose which demo to run.

🚨 Basic and ISO 15118-2 AC Charging with OCPP 2.0.1 CSMS (MaEVe Security Profile 1) ⚡: 
$ curl https://raw.githubusercontent.com/everest/everest-demo/main/demo-iso15118-2-ac-plus-ocpp.sh | bash -s -- -1

2. Should all the command versions run in the workflow?

But these scripts run docker containers without -d flag which causes them to stay up and running.
Even if -d flag was supplied, they might be using the same port so would need to stop the containers before executing other scripts.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jun 23, 2024

Concerns with current process for image build push


My understanding of the requirements of this PR w.r.t. automated testing

  • I get that the requirement is to run the automated tests on every rebuild of the images so that we know if the changes are good to run.
  • However, in the cicd.yaml file, I see that the latest images are only pushed to the GHCR registry on a non-pull-request event.

My understanding of the process for a non-PR or normal push event

  • From what I see the flow probably is that firstly a non-PR or normal push or commit is made to main branch.
  • Then the image is built and pushed to the GHCR registry and then the tags can be updated.
  • Now, for future changes / PRs, the code will use this latest image with the latest code.

What's my concern?

  • Now this is a concern, since the single line demo scripts make use of the docker-compose YAML files to run containers.
  • These pulls images from the GHCR registry using the tag name available in .env file.
  • Now this tag name will still refer to the image without the latest changes from the PR.
  • This is because as specified above, the image will only be pushed if the trigger event is not a PR.

Summarized concern

  • Hence, the demo scripts tests will actually run on the current latest tagged image but not on the latest code as it has no relevant tag right now since this code isn't yet available in an image that has been built and pushed to GHCR.

What should happen for latest image with latest code from PR to be tested?

  • So in the current scenario, for the tests to run on the latest code, the image must be pushed once and tags updated before running e2etest.yaml workflow.

Observations from actions history

  1. I’ve even looked at the GitHub actions workflow runs history and can verify that for PRs, the push value is set to false in the docker-build-push action.
Run docker/build-push-action@v5
  with:
    context: ./manager
    push: false
    tags: [ghcr.io/everest/everest-demo/manager:0.0.16](http://ghcr.io/everest/everest-demo/manager:0.0.16)
  [ghcr.io/everest/everest-demo/manager:latest](http://ghcr.io/everest/everest-demo/manager:latest)
  1. This results in images being built but not pushed to the GHCR as can be observed in the Build and Push step in the cicd.yaml workflow
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load

This is observed even for PRs that triggered workflow multiple times since multiple pushes were made:


@shankari
Copy link
Collaborator

@MukuFlash03

First, I don't want a giant portmanteau PR that is hard to review and merge. This PR should do one thing and do it well, which is enabling automated tests. The demo-iso15118-2-ac-plus-ocpp.sh script should be enabled, but in a separate PR.

High level comments:

Essentially, we want to run the automated tests on every rebuild of the image so we know if the images are good enough to merge.

This is only partially correct. We do want to test out new images, but not all changes are to the images. Some changes are to the demo scripts as well. Note that the PR that required multiple rounds of testing from @louisg1337 did not make any changes to the images.

  1. Which version of the bash script parameters to use?

Obviously the most recent one. Shankari's PR was started before the script was refactored. It wasn't changed since then. We should use the currently valid version of the script. However, we are going to focus on only the automated testing script for now, so this is irrelevant for this PR.

Found a related commit here that highlights this issue with the main reason for using timeout being that the docker compose scripts run without the -d flag and hence the process stays active in the terminal.

The challenge with using -d is that then we won't see the logs - the container runs completely in the background. So if the test fails, we won't be able to see why. It also means that the action/job will not fail if the test fails. The only way to check whether the test actually passed or failed is to look at the logs, which is not how tests are supposed to work.
I would suggest using https://stackoverflow.com/questions/33799885/how-to-stop-all-containers-when-one-container-stops-with-docker-compose to ensure that the action sees the exit code from the test. Please also indicate the testing done, including a success and failure test case (you can change the entrypoint temporarily for this testing).

  1. Should all the command versions run in the workflow?
    But these scripts run docker containers without -d flag which causes them to stay up and running.
    Even if -d flag was supplied, they might be using the same port so would need to stop the containers before executing other scripts.

I am not sure what you mean by this. each separate run in the matrix runs in a separate VM. So I am not sure what you mean by the 'same port"

I get that the requirement is to run the automated tests on every rebuild of the images so that we know if the changes are good to run.

As I said above, this is necessary but not sufficient

However, in the cicd.yaml file, I see that the latest images are only pushed to the GHCR registry on a non-pull-request event.

This is expected, since we only want to push valid images to the repo.

So in the current scenario, for the tests to run on the latest code, the image must be pushed once and tags updated before running e2etest.yaml workflow.

That is not the only option. A better option is to test the locally built image

  1. We don't need to rebuild the image if the only change is to the demo files. In this case, it is fine to run the demo script directly.
  2. If there are changes to the non-demo files, we do need to rebuild the images. And we do. We just don't push them. We should be able to test against the locally built images from the previous jobs. Please read up on docker images pull and cache - similar to git repos, if you don't explicitly specify pull, docker will only pull the remote image if it does not exist locally. We have just built these images as part of the github action, so when you run the tests, if you do so in the same environment, they will use the newly built versions.

Copy link
Collaborator

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure what these changes are doing or why the image build and push has been commented out.

Comment on lines 38 to 43
# - name: Checkout
# uses: actions/checkout@v4
# with:
# fetch-depth: 0

# - name: Ensure Docker image version is not referencing an existing release
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is all this commented out?

@shankari
Copy link
Collaborator

Finally,

I am not sure why this change was introduced:

Removed macOS since docker not pre-installed: commit link
Switched to macOS completely; removed ubuntu: commit link

when the goal is

Another requirement is that we need to run tests on different operating systems, hence the GitHub actions workflow must run on atleast ubuntu (for now) and MacOS later on.

Note also that, given the time and resources required to run MacOS, I don't think we should run it on every pull request. The idea behind containerization/software based testing is that if the code works in one container, it should work in all containers. This is clearly not true 100% of the time, but if it works in one setup, we know that the application logic is right, and we just have to figure out the system-level library incompatibilities.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jun 27, 2024

Finally,

I am not sure why this change was introduced:

Removed macOS since docker not pre-installed: commit link
Switched to macOS completely; removed ubuntu: commit link

when the goal is

Another requirement is that we need to run tests on different operating systems, hence the GitHub actions workflow must run on atleast ubuntu (for now) and MacOS later on.

Just to clarify, these are not changes that I introduced. These were commits that I found in this PR by Shankari and I had just listed the relevant commits that could help better understand the task at hand.

Note also that, given the time and resources required to run MacOS, I don't think we should run it on every pull request. The idea behind containerization/software based testing is that if the code works in one container, it should work in all containers. This is clearly not true 100% of the time, but if it works in one setup, we know that the application logic is right, and we just have to figure out the system-level library incompatibilities.

Alright, noted.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jul 3, 2024

Based on the previous conversations, I tried a few approaches and listing them down along with reasons why I didn't use a specific approach and finally stuck with one.

Notes:

  • This blog post helped structure my thoughts to build a decent workflow.
  • Removed e2etest.yaml file as it didn’t seem relevant to run a separate workflow after cicd.yaml.
  • One of the primary tasks was to build the locally and then test it.

Finally decided to go ahead with Approach 3.


Approach 1:

  • Added a 2nd job to run automated tests after 1st job that builds and pushes docker images in cicd.yaml.

  • Now while this seemed good initially, I found some concerns with these:

    • In case of a PR event, it’s okay to have the build + push first, then test.
    • This is because since push isn’t executed, testing will run on the built image.
    • But in case of non-PR or push event when the PR is merged, the push condition evaluates to true and if test is done after pushing image, the test may fail.
    • And the erroneous image would have already been pushed to the registry as a part of the prior build + push action step.

Hence not moving ahead with this approach.


Approach 2:

  • Revised workflow jobs to follow this: Build -> Test -> Build + Push (existing)
  • The first build job would build image with the push attribute set to false as we don’t want to push even in case of a non-PR event before we test the image.
  • The image is then tested in the 2nd job that runs the automated tests.
  • Then proceed to Build + Push that uses the cache.

The problem with this was that:

  • The docker compose command was unable to retrieve the cached image layers that were built in the first Build job.
  • This was because the docker build push actions were caching to type=gha, the github actions cache.
  • Whereas, the docker compose would just pull the image from the repository directory and wasn't able to fetch cached layers.
  • Also, we cannot share images between jobs unless we use something like artifacts which would be inefficient since some images (like manager) are large in size.
  • Additionally, this approach resulted in a lot of duplicated code for the 1st Build job and 3rd Build + Push job, the only difference being that in the first job I had explicitly set push to false.

So adding just another build job with duplicated code from the 3rd job for Build + Push wasn’t that useful


Approach 3:

  • Revised workflow: Build + Test -> Build + Push
  • Now, the 1st job is actually the run-automated-tests job that uses the bash script to build the image locally using the docker compose commands.
  • Had to add the build and pull_policy attributes to the docker compose services.
  • Also added abort-on-exit and exit-code flags that help in killing the running docker compose up command once any container exits.
  • Additionally, if non-zero exit code is returned, the entire workflow fails before 2nd job that builds and pushes image.
  • Hence we can be assured that latest image is tested first before being pushed.

Notes on: How I am Building the image locally now

  • In the demo-automated-testing.sh script (commit), I had to change the docker compose command to use the docker-compose.automated-tests.yml directly via $DEMO_COMPOSE_FILE_NAME instead of $DEMO_DIR/DEMO_COMPOSE_FILE_NAME.
  • The reason was that docker compose file in the temporary directory was unable to find the build context Dockerfile since the temporary directory only had the docker compose file. Thus it was always pulling the image instead of building locally even after specifying the build attribute in the services in the docker compose file.
  • Hence, by removing the directory prefix to the file path, the docker compose present in the repository's current path is used where it can access the build context locations for the services (manager/ , /mosquitto)

Exit Code

@MukuFlash03 MukuFlash03 force-pushed the automate-tests-actions branch 2 times, most recently from c3bba26 to 36d7c1a Compare July 3, 2024 17:08
@MukuFlash03
Copy link
Author

Testing Done in my Forked Repo

Done by Triggering Workflow Runs for a Success and Failure Test Case

I would suggest using https://stackoverflow.com/questions/33799885/how-to-stop-all-containers-when-one-container-stops-with-docker-compose to ensure that the action sees the exit code from the test. Please also indicate the testing done, including a success and failure test case (you can change the entrypoint temporarily for this testing).


Temporarily changed entrypoint in docker-compose.automated-tests.yml for the purposes of this testing

- entrypoint: "sh ./run-test.sh"
+ entrypoint: "sh ./test-exit-code.sh success" 

Similarly triggered a separate run for failure:

+ entrypoint: "sh ./test-exit-code.sh failure" 

Test script used for testing

#!/bin/bash
TEST_TYPE=$1
exit_code=0
if [ "$TEST_TYPE" == "success" ]; then
    exit_code=0
elif [ "$TEST_TYPE" == "failure" ]; then
    exit_code=1
fi

echo "Exit Code from test-exit-code.sh: $exit_code"
exit $exit_code

Success Test Case:

  • commit: passing success as a parameter to the entrypoint script.
  • workflow run
  • workflow continues successfully after the run-automated-tests job and moves onto complete the following jobs.

Failure Test Case

  • commit: passing failure as a parameter to the entrypoint script.
  • workflow run
  • workflow fails immediately as soon as non-zero exit code is received in the manager service and skips all jobs.

@shankari
Copy link
Collaborator

In the demo-automated-testing.sh script (commit), I had to change the docker compose command to use the docker-compose.automated-tests.yml directly via $DEMO_COMPOSE_FILE_NAME instead of $DEMO_DIR/DEMO_COMPOSE_FILE_NAME.

@MukuFlash03 This is not correct. I am not sure why you are running the demo-XXX script in the first place. The demo scripts are designed to be "single line demos" that people can run without having to check out any code. They essentially check out the repo in a separate temporary directory and then run docker. But in the case of GitHub actions, we have already checked out the repo. It is not clear why you would run the demo script (which is a lightweight wrapper) and then edit it to remove its original function instead of just running docker compose.

MukuFlash03 pushed a commit to MukuFlash03/everest-demo that referenced this pull request Jul 11, 2024
Was incorrectly using the demo-automated-testing.sh. No need to modify the docker compose directories, file names in there.
As noted in this comment by Shankari:
link: EVerest#62 (comment)

> The demo scripts are designed to be "single line demos" that people can run without having to check out any code.
MukuFlash03 pushed a commit to MukuFlash03/everest-demo that referenced this pull request Jul 11, 2024
Was incorrectly using the demo-automated-testing.sh. No need to modify the docker compose directories, file names in there.
As noted in this comment by Shankari:
link: EVerest#62 (comment)

> The demo scripts are designed to be "single line demos" that people can run without having to check out any code

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
@@ -2,16 +2,47 @@ name: cicd

on:
pull_request:
branches:
- main
branches: [ main ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extraneous change, can be reverted

push:
branches:
- main
branches: [ main ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or this one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or this one. All of these seem to be from a separate change by @louisg1337

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not sure how this happened, my PR was created after @louisg1337 's PR's merged, so the changes should have been there in my branch code already and not appear as commits.

On comparing the timestamps, this is the workflow run that was triggered at the same timestamp now applied to all my commits.

The only possibility I can think of is I remember missing to sign-off some commits, and then ran the rebase command as per the documentation. Perhaps that somehow messed it up.

I'll exclude the changes from this PR.


Commits History

What's weird is that all my commits now have have the same timestamp of Jul 3rd, 10:08 AM.
I definitely did not make all the code changes and commits in a single day.

Screenshot 2024-07-11 at 10 33 38 PM

Workflow Runs

On looking at the workflow runs, you can see they're spread out over 2-3 weeks ago.

Screenshot 2024-07-11 at 10 33 23 PM

echo "Running docker compose up..."
docker compose --project-name everest-ac-demo \
--file "docker-compose.automated-tests.yml" up \
--build \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we building this separately? We build the image later (https://github.com/EVerest/everest-demo/pull/62/files#diff-2c9df400afaffa9270cab617792f08cc671e5f95ae192f1cf111ef4e990a4addL97) and the build takes a long time. We should not rebuild unless we have to.

I see that you wrote a fairly long description of the options here
#62 (comment)

but I wonder if you checked the documentation for the build-push action, and the "test before push" in particular
https://github.com/docker/build-push-action?tab=readme-ov-file

Copy link
Author

@MukuFlash03 MukuFlash03 Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing in below comments.

Comment on lines 34 to 39
exit_code=$?
echo "Docker-compose up exit code from manager service: $exit_code"

echo "Running docker compose down..."
docker compose --project-name everest-ac-demo \
--file "docker-compose.automated-tests.yml" down
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this action will fail if the test fails. we will just continue and build + push anyway. Have you actually tested that the workflow (not just the docker compose) will fail if the tests fail?

Copy link
Collaborator

@shankari shankari Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you linked to a workflow where the workflow failed, but I don't understand how it works. The exit code of a step is the exit code of the last statement by default. If the manager service ends and then you run another docker compose after it, the exit code should be the exit code of the docker compose down. I would appreciate an explanation of why/how this works. And I don't see why down should be needed anyway in this context given that the entire github action environment is throwaway.

Copy link
Author

@MukuFlash03 MukuFlash03 Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're correct about not needing docker compose down.
I might have added it when testing the updated docker compose up with the different abort and exit code flags and forgot to remove it.

Also, yes, the exit code returns the last run command.


With regards to the working of the exit code mechanism:

  • Initially, when I had incorrectly modified the demo-automated-testing.sh script, I had included the docker compose commands inside it. I needed to pass the exit code from the docker compose up command to GitHub actions and hence used a variable exit_code=$? to do that.

    • Read up about $? in this article and passing exit code via scripts here.
  • Now, without the demo-automated-testing.sh, when I'm directly executing the docker compose up inside the workflow file as a job step, it's still the same behavior.

    • In the workflow runner instance, the exit_code variable still stores the exit code from the last executed command above it which is the docker compose up command.

Now, bringing GitHub Actions into the picture and how it determines the exit_code to look at.

Setting exit code flag for shell : set -e or +e flag ([documentation])(https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html)

  • The -e flag instructs the shell to exit immediately if a non-zero exit code is encountered.
  • The +e flag on the other hand, allows execution to continue despite receiving non-zero exit codes.
  • In our case, we want the -e flag to be used.

GitHub action runners use the -e flag by default (mentioned in accepted answer in this discussion)


To summarize:

So considering the two scenarios:

  • Success (exit code 0): Workflow run continues executing onto the steps after docker compose up.
  • Failure (non-zero exit code): Workflow run fails immediately and successive steps, jobs not executed at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should not need any changes to this file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing in below comments.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jul 13, 2024

Addressing both these comments as they are related to the image build process:

  1. Why make changes to docker-compose-automated-tests.yml (link)
  2. Why rebuilding? (link)

  1. Changes to docker-compose-automated-tests.yml

The main reason I need changes to this file is because the github actions flow I'm proposing involves two jobs:

  • Run automated tests: Uses docker compose up to Build and Run Tests.
  • Build Push docker image.

So, I needed to build the image locally first in the docker compose up.
This is enforced by using the build flag (--build).

I tried two scenarios where I had used only build flag ((in docker compose up command) or only build attribute (docker compose file).

In both cases, I observed that the image is always pulled first and not built locally.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jul 13, 2024

What I understood from this is:

Both build flag (in docker compose up command) and build attribute (docker compose file) are required to build image locally.

The reason for this is that:

  • Even after using --build, the build context for the Dockerfile path must be provided.
  • Also, the hierarchy that is followed when deciding whether to pull or build images locally, specifically when both image and build attributes are specified in a docker compose file.
    This is mentioned in the documentation:
When Compose is confronted with both a build subsection for a service and an image attribute, it follows the rules defined by the pull_policy attribute.
If pull_policy is missing from the service definition, Compose attempts to pull the image first and then builds from source if the image isn't found in the registry or platform cache.

The priority of image source as per this above info is:

  1. Local image is used directly.
  2. If the image isn't found in the registry or platform cache.
    A. Compose attempts to pull the image first
    B. Then (re)builds from source.

But on observing the workflow runs, I saw that:

  • This hierarchy not being followed.
  • Image is always being built in our case with my changes.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jul 13, 2024

  1. Why rebuilding? (addressing this comment)

Transitioning into the second comment...


The reason for the image always being built is the build flag in the docker compose up command.
This is forcing the image to be rebuilt every time.
I did test removing it and it pulled the image instead then.
But this is what we want as it the first time in the workflow that the images are being built.

So we do need the build flag and also build attribute to specify context containing Dockerfile.
Because we want to be able to test the latest code changes which would be present in the locally built image.


@MukuFlash03
Copy link
Author

So yes, while the workflow does work, tests run, containers exit....there is the main problem of the long build times.

Currently, both jobs build images.
Where I am stuck is in finding a way to be able to do this:

Support Build and caching from docker-compose files

The advantage of this would be that the image layers built in the 1st job will be cached and these should be available to the 2nd job when it tries to build the image, thereby making the build process faster by not building in both jobs.

Found some documentation and articles on how this might be possible.

  • Caching from docker-compose files (link)
  • Build Push action + Docker compose (link)
  • Artifacts (link): not good for large images, already discussed above.
  • Official Docker Bake Action for GitHub (link)

Taking a look at these next.

@shankari
Copy link
Collaborator

@MukuFlash03 I am not sure why you need all these links. we use docker build and push. docker build and push has support for testing in the middle. I linked to the documentation on testing. I would suggest you just use it and move on

louisg1337 and others added 4 commits July 16, 2024 09:54
Signed-off-by: louisg1337 <[email protected]>
Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Used the existing file template from this PR:
EVerest#23

--------

Currently testing only with ubuntu-latest OS.
The demo scripts used are:
- demo-automated-testing.sh

I have commented out the existing /demo-iso15118-2-ac-plus-ocpp201.sh since as per the Readme.md it requires arguments to be passed and I am unsure which set of arguments need to be passed for the purpose of the workflow.
https://github.com/EVerest/everest-demo#step-1-run-the-demo

-----

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Not running on main since I will be making code pushes to my feature-branch: automate-tests-actions.
Also, created a new branch automate-tests-merge that will be used to test workflow runs whenever a PR is created.
This will simulate creating a PR for merging code changes into main in the parent repo.

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Mahadik, Mukul Chandrakant added 15 commits July 16, 2024 09:54
Trying out workflow execution for both Ubuntu and MacOS.

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
The macOS job was failing as the docker setup action is not suited for macOS ARM64 based images.
https://github.com/douglascamata/setup-docker-macos-action?tab=readme-ov-file#arm64-processors-m1-m2-m3-series-used-on-macos-14-images-are-unsupported

Hence, now using "macos-latest-large" (macOS 14 as of this commit) image as per the official runner images here:
https://github.com/actions/runner-images/tree/main

-----------

Also, this failure led to that specific job failing and all other jobs being cancelled including the job for ubuntu OS due to the "fail-fast" property being set to true as default.
Hence to allow other jobs to go through, setting the fail-fast property to false:
- https://github.com/orgs/community/discussions/27192#discussioncomment-3254964
- https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstrategyfail-fast

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
The run failed again but not due to ARM64 architecture but probably due to macos-14 not supported by GitHub action.
The GitHub action for docker setup on MacOS mentions that only macos-12 and macos-13 versions are supported:
https://github.com/douglascamata/setup-docker-macos-action#currently-supported-public-runner-images

Hence, trying to change macOS to version 13 now.

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Previous run failed yet again at the docker-verify state with error:

```
  shell: /bin/bash --noprofile --norc -e -o pipefail {0}
/Users/runner/work/_temp/f2669228-43bc-4f11-9c38-82bd32c1e3b4.sh: line 1: docker: command not found
Error: Process completed with exit code 127.
```

Hence printing outputs as mentioned in the action Readme.md.

-----

Ah! I see the step to install from the docker action was skipped as I had missed changing the macos version to v 13 in the if condition.
Let's see if it works now.

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
MacOS docker setup passed finally.

Increasing the timeout since the Downloading and Extracting steps themselves have taken about 20 minutes in the macOS runner.
But right now even for the demo-automated-tests.sh on MacOS its taking close to 30 minutes just for the download + extraction to complete.

As seen in the commit history, when Shankari changed the timeout to 30 minutes, it was because the demo-iso15118-2-ac script itself was taking close to 30 minutes.
EVerest@05528d4

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
In cicd.yaml:
- Added my feature branches as well to test triggering of workflows during development. Added a TODO to remove these later.
- Commented outsteps of docker-build-and-push-images job since it would fail if version TAG is unchanged from existing pushed image.
- Added a POST request to trigger workflow_dispatch to e2etest.yml via GitHub REST API.

In e2etest.yml:
- Set the trigger to only workflow_dispatch and removed push, pull_request triggers since we want the tests to run on the latest pushed images only which will happen when first workflow completes successfully.
- But also to note that the first workflow pushes image only on certain conditions.
- Need to handle this as well.
- For now, I've done this by checking if event type is not PR (same check done in cicd.yaml).

Detailed discussion on this in PR for this development.

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Previous run was able to dispatch the e2etest workflow but the macos runner didn't have all the tests pass.

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
MacOS runner abruptly failed in the docker setup step just after 7 minutes.

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Failure message:
```
Using variable interpolation `${{...}}` with `github` context data in a `run:` step could allow an attacker to inject their own code into the runner.
```

Found a fix which mentions using environment variables instead of directly accessing Github context variables directly in executable statements like the `run` statement:
cisagov/client-cert-update#53 (comment)

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
------

Created e2etest.yaml

Used the existing file template from this PR:
EVerest#23

--------

Currently testing only with ubuntu-latest OS.
The demo scripts used are:
- demo-automated-testing.sh

I have commented out the existing /demo-iso15118-2-ac-plus-ocpp201.sh since as per the Readme.md it requires arguments to be passed and I am unsure which set of arguments need to be passed for the purpose of the workflow.
https://github.com/EVerest/everest-demo#step-1-run-the-demo

-----

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
…e + Added exit_code

Docker compose should now use locally built image.
Problem was with ${DEMO_DIR}. Directly using docker file name works: ${DEMO_COMPOSE_FILE_NAME}.

Also added exit_code and abort-container so that the docker compose up command exits whenever any one container exits. In our testing case, when manager container exits.

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Was incorrectly using the demo-automated-testing.sh. No need to modify the docker compose directories, file names in there.
As noted in this comment by Shankari:
link: EVerest#62 (comment)

> The demo scripts are designed to be "single line demos" that people can run without having to check out any code

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Using “test before push” workflow mentioned in docker build push action official documentation.
Link: https://docs.docker.com/build/ci/github-actions/test-before-push/

---

While this would work, a problem is that I need to put the docker compose step in the docker build push job.
This means, it would run for all matrix jobs.

In our case currently manager service image takes time to build, and manager depends on mqtt service image which is built pretty quickly. So we are good here.

But consider this scenario.
What if a matrix job B is dependent on matrix job A but matrix job A hasn’t finished building it’s image as yet, then matrix job B might use the older image which it would have pulled and not the locally built / updated image.
In this case, the tests would be run on an older image, and could falsely pass the tests.

---

For instance, in this test workflow run, I wanted to test the success / failure test case and used a different test script. This is copied over in the manager/Dockerfile.
But since the automated tests step is part of the same matrix job, it would run for all services and not just manager which has the entry point (run-test.sh) script for automated tests.

So, mqtt service image, in this case, also executed the docker compose command, it wasn’t able to find the test script in the manager image.
The reason is that while I did include the test script in the manager/Dockerfile, this image is still being built, hence latest changes not yet available.
So, the matrix job for the mqtt service uses the pulled image layers from the cache for the manager image (it pulled in the first place since the manager image was used as a part of the docker compose), and this pulled image did not have the test script file.

Hence the workflow failed saying, no such file found:
https://github.com/MukuFlash03/everest-demo/actions/runs/9960438276/job/27519676597

----

One workaround I see then is to run the automated tests conditionally only for the manager service matrix job.

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
@MukuFlash03
Copy link
Author

MukuFlash03 commented Jul 16, 2024

@MukuFlash03 I am not sure why you need all these links. we use docker build and push. docker build and push has support for testing in the middle. I linked to the documentation on testing. I would suggest you just use it and move on

I've gone ahead and made the change to implement the test before push workflow inside the docker build push job itself.

I did take a look initially and one reason I was trying an alternative method was the fact that we are dealing with matrix jobs.

More details in this comment.

@MukuFlash03
Copy link
Author

Concerns with Including Testing inside Matrix job

I've used the “test before push” workflow mentioned in docker build push action documentation.

While this would work, a problem is that I need to put the docker compose step in the docker build push job.
This means, it would run for all matrix jobs.


In our case currently manager service image takes time to build, and manager depends on mqtt service image which is built pretty quickly. So we are good here.


But consider this scenario.
What if a matrix job B is dependent on matrix job A but matrix job A hasn’t finished building it’s image as yet, then matrix job B might use the older image which it would have pulled and not the locally built updated image.
In this case, the tests would be run on an older image, and could falsely pass the tests.


For instance, in this test workflow run, I wanted to test the success / failure test case and used a different test script. This is copied over in the manager/Dockerfile.
But since the automated tests step is part of the same matrix job, it would run for all services and not just manager which has the entry point (run-test.sh) script for automated tests.

So, mqtt service image, in this case, also executed the docker compose command, it wasn’t able to find the test script in the manager image.


The reason is that while I did include the test script in the manager/Dockerfile, this image is still being built, hence latest changes not yet available.
So, the matrix job for the mqtt service uses the pulled image layers from the cache for the manager image (it pulled in the first place since the manager image was used as a part of the docker compose), and this pulled cached image did not have the test script file.

Hence the workflow failed saying, no such file found.


One workaround I see then is to run the automated tests conditionally only for the manager service matrix job.

if: ${{ matrix.image_name == 'manager' }}

MukuFlash03 pushed a commit to MukuFlash03/everest-demo that referenced this pull request Jul 16, 2024
The previous PR commit history got messed up due to rebasing to handle the commits that had missing signoffs.
Created a new PR to bring over the final changes from there.

PR link: EVerest#62

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
@MukuFlash03
Copy link
Author

The changes from this PR are now moved over to a new PR #67
Closing this PR.

This was done to handle the commit history being messed up by the rebase commands to add signoffs to commits.
The issue is described in this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants