Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 Updates #130

Open
rcoreilly opened this issue Oct 24, 2024 · 6 comments
Open

V2 Updates #130

rcoreilly opened this issue Oct 24, 2024 · 6 comments

Comments

@rcoreilly
Copy link
Member

rcoreilly commented Oct 24, 2024

I'm using this ticket to take notes on further updates for v2 emergent.

  • While the etime.Train, Test Trial Epoch etc enums work well for many cases, sometimes they just don't fit, and it just seems very bad to have to shoe-horn into these predefined values. And all the redundant functions taking Scope inputs etc is bad. A better solution would be to just have the sim define its own enums (there are just a few -- would be copied boilerplate but still explicit), and all the standard code uses enums.Enum for everything. The ccnsims stroop model is a good illustration of this issue: had to use Validate for the SOA test case. In general, the no. 1 lesson is to have good simple interfaces that all the general library stuff consumes, instead of requiring specific types.

  • The looper GUI controls become increasingly unwieldy as you have more "Modes" beyond just Train and Test. Also the Init function needs to be somehow built into these controls -- otherwise it ends up being tacked on at the start or end. Soln: have a mode enum at the start, that should be rendered with the nice segmented button thing, that you click to set the mode, and have an integrated Init button that is also mode sensitive and takes a func pointer to actually perform the init functionality beyond the looper. Again stroop broke this paradigm.

  • In general, it would be good to revisit the looper api and make things simpler and more transparent, and provide good cheat sheet API docs -- I am always forgetting how to do stuff. Also, having things implicitly shared across loops in some cases is very bad -- it should never share across Stacks. And rename Stacks -> Modes so it is more clear what a stack is.

  • elog will be significantly updated based on the datafs framework, and estats will be replaced entirely by datafs and the updated tensor framework. Using goal, it should be possible for everything to be much simpler and more directly implemented so you can easily see how to compute your own stats at every step. The model of a single log item that generates everything across modes and levels is good for consolidating all the logic in one place, but is too magical -- needs to be more explicit. e.g., use a vararg list of modes instead of AllModes magic, which then entails all the NoPlot stuff. With datafs replacing the magical ctx context, hopefully everything is clearer -- need good goal code to make all that super clear, concise and efficient. Also absolutely need the plotting Option stuff specified cleanly with struct values: using closures like styler? If need to refer to a given stat in multiple places (e.g., for control) then use enums as a convention so that you're not typing string key names everywhere. But ideally most stats are specified exactly once in a monolithic closure that computes the value directly from the network and does all the aggregation, all in one switch statement organized by time scale. Run it once to create the values during Init? use Recycle logic? not clear about some of these things..

  • params don't use a string value. Does it need to be any or perhaps it should be a more constrained interface type, that can either be bool or float32 or enums.Enum? It would be good to constrain -- no strings actually used in real params! everything must be GPU compatible, but params remains CPU-side only.

@rcoreilly
Copy link
Member Author

@kkoreilly reminded me that params are basically the same as styles in cogent core, and the same conclusion applies: the only way to cleanly get a direct tab-completable path into a nested structured object and set a value directly as a value is using closures (Stylers in core). So this is what we have to do in params. Getting away from the whole struct literal syntax will be nice too.

Furthermore, there is a notional plan of CoCo configuration system that also integrates yaegi interpretable code and goal transpiling to produce the best ever config file syntax. This would be for Config etc. So probably we want to go down that path first. Remains to be seen how this applies to the plot config stuff but probably.

@rcoreilly
Copy link
Member Author

The CCN sims have further pushed the issues with the log / stats side of things (see dyslexia.go). The following are some key points:

  • Everything is still way too distributed -- there are InitStats, TrialStats, StatCounters,ConfigLogs, Log -- crazy. Plot config is separated mostly from the core log item (except for min / max range). Need to have one function that does everything for a given log item, from computing to aggregating to configuring the plot.
  • There are often multiple stats that depend on a shared computational path, so the notion of a separate closure for each stat is not going to work for that case. Probably we just have a big list of generic functions that get called, and each function can do whatever it needs to do, whether that is multiple vs. single stats etc. Simple cases have wrapper functions and more complex cases are fully written. Each function is just a big switch statement, and it uses the context to get the mode and time to determine what to compute.
  • The question of initialization vs. computation is challenging: the GUI generally needs to have everything configured in advance so all the plots etc can be in place before you run the sim. When you have derived stats based on aggregating lower level things, it is not clear how that could be initialized properly? The plot details don't really need to exist in advance, so maybe we need to just list the plots we want explicitly, instead of having that be automatic? That would be more compact than current exclusions!
  • Also need a good compositional non-string(?) way of representing the combination of Mode and Time, so e.g., when specifying plots, you have vararg list of Train|Trial, Train|Epoch, Test|Trial etc. Don't want full bits level. Presumably just define modes as 100, 200, 300 etc and times are 1,2,3 etc and have supporting functions for pulling these out?

@kkoreilly
Copy link
Member

An arbitrary base-10 compositional combination seems like a very bad idea that will not scale; why not just bit flags?

@rcoreilly
Copy link
Member Author

It is not valid to specify multiple times or multiple modes -- only 1 mode and 1 time -- so bits would not work.

@rcoreilly
Copy link
Member Author

V2 updates are all in place:

While the etime.Train, Test Trial Epoch etc enums work well for many cases, sometimes they just don't fit, and it just seems very bad to have to shoe-horn into these predefined values. And all the redundant functions taking Scope inputs etc is bad. A better solution would be to just have the sim define its own enums (there are just a few -- would be copied boilerplate but still explicit), and all the standard code uses enums.Enum for everything. The ccnsims stroop model is a good illustration of this issue: had to use Validate for the SOA test case. In general, the no. 1 lesson is to have good simple interfaces that all the general library stuff consumes, instead of requiring specific types.

Done -- works well for emergent infra to deal exclusively with enums.Enum values. looper is really now the only package of relevance.

The looper GUI controls become increasingly unwieldy as you have more "Modes" beyond just Train and Test. Also the Init function needs to be somehow built into these controls -- otherwise it ends up being tacked on at the start or end. Soln: have a mode enum at the start, that should be rendered with the nice segmented button thing, that you click to set the mode, and have an integrated Init button that is also mode sensitive and takes a func pointer to actually perform the init functionality beyond the looper. Again stroop broke this paradigm.

Done -- updated ccnsims with this fix to.

In general, it would be good to revisit the looper api and make things simpler and more transparent, and provide good cheat sheet API docs -- I am always forgetting how to do stuff. Also, having things implicitly shared across loops in some cases is very bad -- it should never share across Stacks. And rename Stacks -> Modes so it is more clear what a stack is.

Done -- much improved and cleaner, with several renames, including using level instead of time which is much clearer and doesn't conflict with time package.

elog will be significantly updated based on the datafs framework, and estats will be replaced entirely by datafs and the updated tensor framework. Using goal, it should be possible for everything to be much simpler and more directly implemented so you can easily see how to compute your own stats at every step. The model of a single log item that generates everything across modes and levels is good for consolidating all the logic in one place, but is too magical -- needs to be more explicit. e.g., use a vararg list of modes instead of AllModes magic, which then entails all the NoPlot stuff. With datafs replacing the magical ctx context, hopefully everything is clearer -- need good goal code to make all that super clear, concise and efficient.

Done -- elog package is now not used, and having everything in one place is so much clearer and cleaner.

Also absolutely need the plotting Option stuff specified cleanly with struct values: using closures like styler? If need to refer to a given stat in multiple places (e.g., for control) then use enums as a convention so that you're not typing string key names everywhere. But ideally most stats are specified exactly once in a monolithic closure that computes the value directly from the network and does all the aggregation, all in one switch statement organized by time scale. Run it once to create the values during Init? use Recycle logic? not clear about some of these things..

Done -- used a Start vs Step enum to switch between init and aggregation -- added tensor.Append* methods as main api to record new values, and Start resets num rows = 0, and sets plot metadata styler -- one styler does everything at tensor and overall plot level -- very clean.

params don't use a string value... @kkoreilly reminded me that params are basically the same as styles in cogent core, and the same conclusion applies: the only way to cleanly get a direct tab-completable path into a nested structured object and set a value directly as a value is using closures (Stylers in core). So this is what we have to do in params.

Done -- closure works great and is a major improvement. still need some example param search / tweak logic, but in general a closure will work great for this in a much more direct and flexible way.

One more TODO:

  • switch over to cli instead of econfig.

@rcoreilly
Copy link
Member Author

cli done. List of packages to remove once everything is finalized:

  • econfig, efun (move to goal / tensor somewhere), elog, estats.
  • can keep etime but put a big disclaimer on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants