Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSE: Estimate RAM usage and adjust the batch size/other parameters accordingly #230

Open
yannickl96 opened this issue Jul 12, 2020 · 8 comments

Comments

@yannickl96
Copy link
Contributor

yannickl96 commented Jul 12, 2020

By using the information from https://www.xilinx.com/products/design-tools/vivado/memory.html it would be possible to implement a pessimistic estimate of how much parallel jobs a machine can handle for a DSE. As Tapasco by default starts as many Vivado processes as there are CPU cores available on the machine, it is quite easy to fill up the RAM when doing a DSE, especially on larger chips.
This would also reduce the probability that people (who have never used the DSE from Tapasco before) crash servers, etc.

Implementation would be something like:
Read available memory, get worst case memory per vivado instance. #jobs = available memory/worst case memory per job

@jahofmann
Copy link
Contributor

Yes, that would be awesome. However, we actually had exactly this implemented before and it turned out way too pessimistic. For instance, we couldn't build US+ bitstreams on our 32GB development machines.

If you can improve on that it would certainly be very useful. You might be able to find the old commits in the log.

@yannickl96
Copy link
Contributor Author

Did you just do it similar to the naive calculation I proposed in my original post or did you do a smarter solution? Another solution could be to first do a run with batch size 1, look at the LUT/CLB utilization, since that is the main culprit of memory consumption and then increase the batch size according to the utilization and the estimated memory needed.

@jahofmann
Copy link
Contributor

To be honest, I don't remember. The implementation back then was from Jens and I was just the one that made him delete it again...

I just checked for the commit and I fear they're lost by the big "squashening". It seems like Jens used actual numbers from the Lichtenberg:

commit 5029ff2f7851727f93a95f31e316c9fee5411193
Author: Jens Korinth <[email protected]>
Date:   Thu Jul 6 18:18:30 2017 +0200

    Remove resource check for memory

    * memory usage estimates are based on numbers reported by the
      Lichtenberg cluster, but are too conservative for normal users
    * removed check for memory entirely

I can give you access to the old TPC repository if you're interested.

@yannickl96
Copy link
Contributor Author

I don't have time right now, but maybe @tkay94 can take a look when he's finished his current task.

@wirthjohannes
Copy link
Collaborator

I doubt it's possible to get an estimation which is precise enough that it's useful. From my observations (mainly US+ devices) the memory consumption does not only depend on the number of LUT/CLB but only changes drastically on a lot of other factors.
Just to give an example: Increasing the "freedom" of Vivado for placing and routing (e.g. by adding Registers to some connections) seems to have a big impact on the memory usage.
I think there are way to many factors like this with a big impact to reliably estimate the memory usage. And you'd probably also need to tune this for every supported platform...

@yannickl96
Copy link
Contributor Author

@JoJoWi But wouldn't you be able to get this information by monitoring the memory usage of one run with Vivado configured to the DOFs that you mentioned? I think the main difficulty here would be to just aggregate the memory consumption of Vivado and all its child processes.

@m-ober
Copy link
Contributor

m-ober commented Jul 16, 2020

As Tapasco by default starts as many Vivado processes as there are CPU cores available

Another problem is that Tapasco does not consider hyper-threading. I opened an issue (#52), which was closed. I still think a sane default would be to only use physical cores.

This would also reduce the probability that people (who have never used the DSE from Tapasco before) crash servers, etc.

Well, another approach could be to monitor the free memory on the system and just let Tapasco kill Vivado (and cancel the current job) if it reaches a lower limit. At least that's what I did (using a Bash script) in order to avoid crashing machines :)

Guessing the memory usage of a Vivado run does look impossible to me. Especially once OOC synthesis is used (Vivado then also exceeds the limits listed on the Xilinx page)

@wirthjohannes
Copy link
Collaborator

You could probably get a worst case estimate like that (at least if you know all factors with an impact on memory usage).
However the number you get by this will be much bigger than what you typically need.
But an estimate which is reasonably precise is (in my opinion) not possible/feasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants