Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add [pinned, unpinnned] matrix to ci_eval.yaml. #767

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ScottTodd
Copy link
Member

Progress on #760. Depends on #765.

This changes ci_eval.yml (sharktank perplexity tests for llama) from testing:

  1. iree perplexity (unpinned iree)
  2. torch perplexity (unpinned iree-turbine)

to testing:

  1. iree perplexity (unpinned iree)
  2. iree perplexity (pinned iree)
  3. torch perplexity (unpinned iree-turbine)

The pinned versions are shared across the project. We could individually update pins per test workflow or package, but I'd much rather keep the pins shared by default and leave fragmenting as an escape hatch as needed. I can update the other scheduled workflows to match this style if the changes here look good.

Triggered a test run here: https://github.com/nod-ai/shark-ai/actions/runs/12642858744.

@archana-ramalingam
Copy link
Collaborator

archana-ramalingam commented Jan 7, 2025

Progress on #760. Depends on #765.

This changes ci_eval.yml (sharktank perplexity tests for llama) from testing:

  1. iree perplexity (unpinned iree)
  2. torch perplexity (unpinned iree-turbine)

to testing:

  1. iree perplexity (unpinned iree)
  2. iree perplexity (pinned iree)
  3. torch perplexity (unpinned iree-turbine)

The pinned versions are shared across the project. We could individually update pins per test workflow or package, but I'd much rather keep the pins shared by default and leave fragmenting as an escape hatch as needed. I can update the other scheduled workflows to match this style if the changes here look good.

Triggered a test run here: https://github.com/nod-ai/shark-ai/actions/runs/12642858744.

ci_eval.yaml perplexity test is intended to run on larger number of prompts with all attention versions of all Llama models, resulting in up to 6 tests in the future. Depending on the model size, each may take anywhere between 1 to 12 hours to complete. Given this is scheduled after IREE nightly release at 3 am PST, this might block dev resources in the AM. We might need to run them in dedicated CI machines, if we want to choose the matrix route.
We can do unpinned for schedule and pinned for pull_request/ push

@ScottTodd ScottTodd force-pushed the users/scotttodd/requirements-matrix branch from a0f261c to fa7ab97 Compare January 7, 2025 17:14
@ScottTodd
Copy link
Member Author

If we need more runners, I think we have plenty of extra machines that we can plug in. At this point, with multiple workflows unstable across unpinned versions, my main priority is getting back to a state where we know what works on which versions.

When we have more tests that take longer, we can

  • add more hardware runners to take jobs
  • shard tests across runners
  • make the tests faster

This matrix approach is a bit inflexible (all variants run whenever the workflow is triggered, individual jobs can't be run in isolation, canceling the job cancels all variants, et c.) but it is simple to understand and add to more workflows.

Copy link
Collaborator

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about where to run those / the runner resources but overall looks good to me.

requirements-iree-pinned.txt Outdated Show resolved Hide resolved
ScottTodd added a commit that referenced this pull request Jan 13, 2025
Progress on #760. We could make
the scheduled jobs test both pinned and unpinned versions like on
#767.

Cleanup included here:

* Dropped the "Installing the PyTorch CPU wheels saves multiple minutes
and a lot of bandwidth on runner setup." comments since they are
repetitive. Could add them back if people find them useful.
* Stopped installing from the root `requirements.txt` in some workflows,
instead opting to just install from the more specific
`sharktank/requirements-tests.txt`

I did not test the changes to scheduled workflows. Could do that on
request, or just revert if we see issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants