Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SImulated Devices and Topology #788

Closed
lockshaw opened this issue Jun 24, 2023 · 0 comments
Closed

SImulated Devices and Topology #788

lockshaw opened this issue Jun 24, 2023 · 0 comments
Assignees
Labels
question Further information is requested

Comments

@lockshaw
Copy link
Collaborator

Moved from #675

  1. @shrek
    Hi, Is it possible to run FlexFlow on a real model but simulated devices and topology ? The result of this would be the "best strategy + device mapping". Some older code suggest that in the past this was possible. But the current code and tests seem to require real devices.

    I am thinking of a refactor where the unity algorithms work on a device/topology abstraction (with operation costs, operation memory usage and communication costs abstracted). This would abstract the algorithms from the devices and the distributed runtime.

  2. @lockshaw
    Yes, this is currently possible (--search-num-nodes and --search-num-workers), though support for this will be improved in Repo Refactor #622 & Search Refactor #680

  3. @shrek
    thanks, let me try these out. I havent looked at the refactor branches yet, but I am thinking an ideal refactor would abstract the algorithms, the devices+topology and the distributed runtime into orthogonal and pluggable components.

    I would like to contribute towards this effort. I found a bunch of memory leaks (using valgrind) - perhaps fixing those can be my first contribution.

  4. @lockshaw
    @shrek Contributions are always welcome 🙂. If the memory leaks are in the simulator, then they will likely still be around after Repo Refactor #622 and so a fix would be much appreciated. If they are in other parts of the code, you may want to check out Repo Refactor #622 to see if that change fixes them.

@lockshaw lockshaw added the question Further information is requested label Jun 24, 2023
@lockshaw lockshaw self-assigned this Jun 24, 2023
@lockshaw lockshaw closed this as completed Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant