Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for Madmax plotter. #797

Merged
merged 21 commits into from
Jun 23, 2021

Conversation

guydavis
Copy link
Contributor

Hi! Great job on Plotman. I've been using Plotman in Machinaris (a WebUI) for a month now. Really solid stuff. However, the new hotness now seems to be this Madmax plotter, as requested in #762. I've been getting a lot of asks for it as an alternative to the official Chia plotter.

This basic PR adds optional support for the chia_plot executable when a plotting :=> type of madmax is added to the plotman.yaml. Tested within the Machinaris container (Ubuntu Hirsute) for plot, analyze, interactive, and status along with suspend, resume, and kill actions.

Please let me know where you think the PR could be improved. Thanks again!

@guydavis guydavis changed the base branch from main to development June 18, 2021 05:05
@guydavis
Copy link
Contributor Author

Some screenshots to illustrate:

image

image

image

image

@guydavis guydavis mentioned this pull request Jun 18, 2021
@altendky
Copy link
Collaborator

altendky commented Jun 19, 2021

Thanks a bunch for doing this work. I can't say I have gotten around to doing anything with madmax but... we all know it's the talk of the town right now. :]

A few thoughts:

I generally expect to want the code long term to be separated for each plotter. I know of one more being worked on already. Maybe I'll come up with a typing.Protocol to describe what is needed. Any bits that can be reused can be put in some common file each references. But, I don't think it's the best choice to hold madmax support up waiting for me to get this structuring done. So, let's keep this as is, but here's a heads up that I hope to reorganize the mixed code 'soon'.

For the configuration, I think it would be good to have a separate group for each. This will make it much more explicit for users and easier for us to implement the schema to check and complain about options that are not common. Though, it does seem that they stayed at least mostly compatible.

Unless madmax uses the chia cli parser, I think it would be good to copy theirs in just as we've done with each version of chia's plotter.

I think it would be good to indicate what sort of plot each one is. At present plotman will only be able to launch one or the other within a given session, but users could launch the other kind manually or in the future that could vary between multiple workflows (someday... someday). So... A new column with C or M to distinguish? I'm really on the fence on this since I started a branch to replace curses a month or two ago so putting time into curses seems half silly, but other stuff has gained priority and so for now we will continue with curses for a bit longer. There are a couple other PRs adding more information to the output as well which will make the used screen space non-trivially bigger. I expect to have to make the display configurable in response to this.

So, some of that is just context and some is requested changes. Here are the requested changes.

  1. Create separate chia and madmax plotting option configuration sections.
  2. Copy in the madmax cli parsing code and use it instead of the chia code. This will also involve adding a LICENSE-madmax or such along with the code.
  3. Add a column to the plot table indicating which plotter is in use.
  4. Add a changelog entry.
  5. Fix the type hints.

For all the other changes I mentioned, maybe you want to change my mind but for now I plan to keep them separate from this. If any of the three above seem like they would take enough calendar time that we don't want to wait for them, I'm game for discussing that.

Thanks again for your work on this contribution. I know lots of people appreciate it, including me.

@guydavis
Copy link
Contributor Author

Hi, thanks for the feedback. My initial PR above was definitely an attempt to make the smallest possible change while still maintaining compatibility. I agree that an eventual refactoring is a good idea though, to support additional plotters in the future.

So, I've just pushed an update to this PR which:

  1. Includes a Changelog entry for the PR.
  2. Adds a 'plotter' column to plotman status and plotman interactive. Values of either 'chia' or 'madmax'. Feel free to suggest alternate column title or values.

Examples of new column:
image
image

A couple of questions about how to improve on the other items:

  1. The configuration file changes are compatible with existing configs as the new type param defaults to chia. Some of the parameters (n_threads and n_buckets) are used by both plotters. By separating chia and madmax into different sections, wouldn't that break compatibility? As background, my day job is often spent listening to users complain about upgrade paths so I'm particularly leery of introducing such changes. :) That said, what is your preferred config layout for these two?
  2. Madmax is written in C++ and GPLv3 licensed. So, I'm not sure about about copying the madmax cli option parsing code in Plotman. What do you think would be the best approach?
  3. Finally, I did try to follow type hinting as best I could. It's a bit new to me though, so I'm happy to improve it. Any line numbers in particular to fix?

@altendky
Copy link
Collaborator

Yeah, I appreciate starting with a minimally intrusive PR. We'll get the rest in time.

  1. For a bit I tried to keep the configuration compatible but at this point I'm focusing on just trying to make things better. There will be some pain in the process but it's not like the existing configuration generally made things obvious for users. I've run tech support in the past and am getting to do so for plotman as well. Again, I appreciate your default of starting compatible. Definitely a preferred starting point. Anyways, we'll document it in the the wiki and move forward. People that haven't updated get an error with a link to the wiki. I was surprised how little response there was to the v0.4 configuration changes. But... maybe that's just people not upgrading. :]
  2. Yeah, so as soon as I walked away from submitting this request I realized I was making a silly presumption about it being Python. Anyways, doesn't look like they use getopt so no use trying to leverage any consistency there. I'm used to click so I would just write up a matching interface with that and start there. Leave the comments with links to their source for reference. No license needed I would think since the code won't bear any relationship beyond the arguments and the GPL doesn't leak through a CLI. Don't worry about the versioning like I have for the Chia Network plotter. I'll shift stuff around later and add that in. Not that we're leveraging it for anything yet.
  3. I'll go take a look at the hints now and leave inline comments etc.

Sorry you don't get build feedback straight away. A few weeks back GitHub decided to require approval for every CI run for PRs from new contributors. Apparently too many people were mining crypto in PRs... :]

src/plotman/job.py Outdated Show resolved Hide resolved
src/plotman/manager.py Outdated Show resolved Hide resolved
@guydavis
Copy link
Contributor Author

Sounds good. I've added a mapping of the MadMax plotter options to a separate file, using the Click pattern. Aligns with current version options:

image

Thanks for the typing fixes. I've applied them. Please let me know if there is anything further I can improve. Thanks!

@altendky
Copy link
Collaborator

I'm still interested in the separate configuration sections for the Chia Network plotter vs. madmax. Do you want to go back and forth over that or should I just make a few changes here and merge? Or, probably let you test afterwards to make sure... then merge.

@guydavis
Copy link
Contributor Author

guydavis commented Jun 22, 2021

I'm still interested in the separate configuration sections for the Chia Network plotter vs. madmax. Do you want to go back and forth over that or should I just make a few changes here and merge? Or, probably let you test afterwards to make sure... then merge.

Hi, I'm 100% good with you making the changes that best suit the overall configuration approach you want to use. I'm happy to test. Just let me know if I need to grant any permissions on Github or anything. Thanks!

@altendky
Copy link
Collaborator

Welp, it is passing type hints anyways. The new madmax CLI parsing wasn't in use yet, so I (hopefully) hooked that up. I need to write up a description of the change in the configuration wiki page but with a little luck you can figure it out until then. It is passing type hints so there's some hope, but I'll admit I haven't run it yet.

@guydavis
Copy link
Contributor Author

Welp, it is passing type hints anyways. The new madmax CLI parsing wasn't in use yet, so I (hopefully) hooked that up. I need to write up a description of the change in the configuration wiki page but with a little luck you can figure it out until then. It is passing type hints so there's some hope, but I'll admit I haven't run it yet.

Hi! Thanks for this. On initial test, this section of job.py is now looking for chia command parameters, from the madmax command and not finding them. I've added a print of the self.args, just before the index error:

image

Not sure, what the best way to split this behavior for the different chia and madmax args is in this section. Thoughts?

@guydavis
Copy link
Contributor Author

Okay, I've made a small fix to handle the two sets of job args. Tested with both madmax:

image

and stock chia:

image

I think we're good to go now. Thanks again!

Copy link
Collaborator

@altendky altendky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll commit these...

src/plotman/job.py Outdated Show resolved Hide resolved
src/plotman/job.py Outdated Show resolved Hide resolved
@altendky
Copy link
Collaborator

Alrighty, I'm working on the configuration wiki page... then maybe... this'll be ready. :]

@altendky
Copy link
Collaborator

Thanks again! Great to have this. Looks like they added pool support and maybe a couple other things, but I'll follow up on those separately. Cheers.

@altendky altendky merged commit b6b44a7 into ericaltendorf:development Jun 23, 2021
@wasdokij
Copy link

wasdokij commented Jun 24, 2021

I have a problem

5950x + 3* 2T nvme

Looking at my setup, the results are not what I expected

version: [2]

directories:
        tmp:
                - /ssd1/Plotter
                - /ssd2/Plotter
                - /ssd3/Plotter

        tmp_overrides:
                "/ssd1/Plotter":
                        tmpdir_max_jobs: 2

                "/ssd2/Plotter":
                        tmpdir_max_jobs: 2

                "/ssd3/Plotter":
                        tmpdir_max_jobs: 2

        dst:
                - /d/Plots
                - /e/Plots
                - /f/Plots
                - /g/Plots
                - /h/Plots

scheduling:

        tmpdir_stagger_phase_major: 2
        tmpdir_stagger_phase_minor: 1
        tmpdir_stagger_phase_limit: 1
        tmpdir_max_jobs: 2
        global_max_jobs: 6
        global_stagger_m: 5
        polling_time_s: 20

        type: madmax
  
        madmax:
                n_threads: 8         
                n_buckets: 256       

1624514385031

@Pencerval
Copy link

Pencerval commented Jun 24, 2021

Same here. I'm triggering the same issue. 2 different temp dirs only 1 used.
Also, looks like tmpdir_stagger_phase_major is not being used.

directories:
        # One or more directories to use as tmp dirs for plotting.  The
        # scheduler will use all of them and distribute jobs among them.
        # It assumes that IO is independent for each one (i.e., that each
        # one is on a different physical device).
        #
        # If multiple directories share a common prefix, reports will
        # abbreviate and show just the uniquely identifying suffix.
        tmp:
                - /plot_temp/1
                - /plot_temp/2

        # Optional: Allows overriding some characteristics of certain tmp
        # directories. This contains a map of tmp directory names to
        # attributes. If a tmp directory and attribute is not listed here,
        # it uses the default attribute setting from the main configuration.
        #
        # Currently support override parameters:
        #     - tmpdir_max_jobs
        #tmp_overrides:
                # In this example, /mnt/tmp/00 is larger than the other tmp
                # dirs and it can hold more plots than the default.
                #"/mnt/tmp/00":
                #        tmpdir_max_jobs: 5
`

`
scheduling:
        # Run a job on a particular temp dir only if the number of existing jobs
        # before [tmpdir_stagger_phase_major : tmpdir_stagger_phase_minor]
        # is less than tmpdir_stagger_phase_limit.
        # Phase major corresponds to the plot phase, phase minor corresponds to
        # the table or table pair in sequence, phase limit corresponds to
        # the number of plots allowed before [phase major : phase minor].
        # e.g, with default settings, a new plot will start only when your plot
        # reaches phase [2 : 1] on your temp drive. This setting takes precidence
        # over global_stagger_m
        tmpdir_stagger_phase_major: 4
        tmpdir_stagger_phase_minor: 1
        # Optional: default is 1
        tmpdir_stagger_phase_limit: 1

        # Dont run more than this many jobs at a time on a single temp dir.
        tmpdir_max_jobs: 2

        # Don't run more than this many jobs at a time in total.
        global_max_jobs: 4

        # Dont run any jobs (across all temp dirs) more often than this, in minutes.
        global_stagger_m: 1

        # How often the daemon wakes to consider starting a new plot job, in seconds.
        polling_time_s: 20
plot id   plotter    k           tmp           dst   wall   phase    tmp       pid   stat     mem   user    sys   io
9e735167    madmax   32   /plot_temp/   /mnt/hubusb   0:49     4:0   109G   2546991    SLP   4.2Gi   8:07   0:39   0s
                                   1/   1/disk1/plo
                                                ts/
6ece507b    madmax   32   /plot_temp/   /farm/1/dis   8:43     4:0   109G   2259637    SLP   5.4Gi   8:44   0:50   0s
                                   1/     k2/plots/

Total jobs: 2
Jobs in /plot_temp/1/: 2

Updated at: Thu Jun 24 10:03:22 2021

@altendky
Copy link
Collaborator

Thanks for sharing the problem. Issue reports moved to #807.

@Pencerval
Copy link

Pencerval commented Jun 24, 2021

We are talking about 2 different issues. 1 of them is reported on #807 -> Not being able to use more than 1 temp folder.
The second one is the scheduling not being properly handled. Looks like tmpdir_stagger_phase_major is not taken into account

@altendky
Copy link
Collaborator

Then file another issue I guess. (the message started with "Same here.")

@guydavis guydavis deleted the guydavis_madmax branch December 31, 2022 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants