-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stochastic simulations (number of runs) - SimpleRepeatedTask #22
Comments
mentioning @cjmyers, so informed about updates on this |
I am not super excited about having two ways to do exactly the same thing. I think this leads to incompatibilities between software tools, and confusion for our users about how to accomplish things. I also don't really buy the 'but repeated tasks are too complicated!' argument. If you're only going to support a small subset of repeated tasks for stochastic simulations, then just support that subset of RepeatedTask abilities, and move on. Just don't implement the parts you don't care about. My feeling is that once we solve the other end of things (namely, how best to treat the results of a repeated stochastic task in the output, and how to get means and stddevs from them), this end of things won't matter so much. |
Repeated tasks are NOT stochastic tasks. Repeated tasks are for sweeping parameters, etc. Their semantics are not consistent with stochastic tasks which do not change anything, but rerun with new random values. Repeated tasks semantics are to restart with new values. This would mean every task would be identical, since the SEED would reset each time. I agree with Lucian that we should not have two ways to do things. Repeated Tasks should be forbidden as a means for stochastic simulation for the reasons I just gave. If we are introducing many new types of Tasks as the new UMLs indicate, then I don't see any problem with making one of the new types of tasks a stochastic task. We have found in our experimentations that trying to shoe-horn repeated task to do stochastic simulation just does not work. |
I disagree with your semantic assessment of what a RepeatedTask is. In my view, a RepeatedTask is just a task that is repeated. The class is agnostic as to the reason (in my head, at least). My hypothesis as to why repeated tasks for stochastic simulation don't work for you is because of a lack of support on the output side. Let's get that end working (which will have to work with all repeated tasks of every stripe, including stochastic repeats), and then revisit this issue at that point, perhaps? |
That is incorrect. There are two reasons. One which I've already mentioned is that it is too heavy a hammer. It is an extremely complicated way to express repeat for N runs. However, the main reason is the fact that you must either set resetModel to true or false. If you set it to true, then you reset everything each time around, which would mean resetting the SEED too, so you get identical simulations each time. If you set it to false, then the initial values do not get set back to their initial value as they should. The way I see repeatedTask is that it enables you to potentially call a simulator like in a script to run a series of tasks. Each time you call the simulator you send it the Model (perhaps with changed parameter values) and the simulation options (including the SEED). Then you simulate. The simulator does not need to maintain any state information. StochasticSimulation is NOT like this. It is a single indivisible Task. Namely, you should send it to the simulator as one single task to execute. StochasticTasks are therefore Tasks, NOT RepeatedTasks. |
How is the seed part of the model? This is a genuine question. You can't actually store it there, can you? What would be the purpose of ever storing the seed with a model? |
It is not in the model, but there is no other switch in repeatedTask that says anything about reseting a parameter or not. Anyway, this is not the point. The more important point is the semantics, which as I explained above, a StochasticTask is a Task, i.e., an indivisible action that must be considered to be run as a unit (this is the only way you can get proper stochastic behavior). A repeated task on the other hand is a set of separable tasks that can be run independently. Stochastic tasks are not truly independent. If you call a simulator with the same algorithm parameters multiple times, then you would be sending the same SEED over and over again and getting the same result. There is state that is preserved from one task to the next to ensure that the random number generator is not reset. There is currently no way that I'm aware of in RepeatedTasks to change algorithm parameters, and even if there were this would cause one to attempt to encode changes to the SEED for each run for something as simple as run a set of stochastic simulation runs. Furthermore, it would not emulate what is actually happening in the simulation that treats this as a single complete simulation task. For all these reasons, we are simply using a single Task for stochastic simulation with an algorithm parameter setting the number of runs. The reason why we want a StochasticTask is to enable us to say that a StochasticTask actually is different from a Task in that it returns a 2-dimensional array of results over time and runs. Namely, StochasticTask gives us the ability to do better validation once we are able to access the results as arrays. |
So, what I hear you saying is that we need to say something about the random number seed in our explanation of the 'reset' parameter. (We clearly need to do this whether or not we introduce a separate StochasticTask task.) I would say that the most obvious thing to do would be to say "The random number seed never resets in the completion of a SED-ML experiment." Maybe we could introduce a special sedml-defined 'seed' csymbol (like we do for 'time') if people really wanted to reset the seed for some reason. (Or maybe there's a KiSAO term? Hmm.) |
It is not just about the SEED. The current approach that I've taken is fine, and as far as I can tell perfectly legal in SED-ML now. Namely, I have a Task and an algorithm parameter for the number of runs that task completes. I'm not going to change this to RepeatedTask because of the semantic problems I've just described. I want the simulator to be called exactly once. This is a Task, not a RepeatedTask. No solution for SEED is going to change the fact that I want to consider a stochastic run as a single indivisible task. However, once we consider the fact that Tasks can return arrays. We need a clean way to indicate how many dimensions to expect. A stochastic simulation will always return an array of values with two dimensions, so StochasticTask that inherits from Task would allows us to specify that fact. Note that I'm not against RepeatedTasks. We use them for parameter sweeps. However, the Task in the parameter sweep may be a StochasticTask. In this case, we would not necessarily care if we did reset the SEED, since each StochasticTask run in this parameter sweep can be thought of as a single simulation, and starting with the same SEED might not be a bad thing in this case. |
Wait, wait wait. You already have a solution? Why don't we just use that? It sounds like you don't need a StochasticTask; you need a way to indicate the dimensions of the returned array for an arbitrary Task. If you've already discovered one way to change the expected return dimensions of a Task, surely people will find other ways as we move forward. |
Ok, almost. I still need a proper KISAO term for number of runs. I'm currently using: KISAO:0000326 which is technically Number of Samples. It is the closest I could find. I need a term for Number of Runs. I've just submitted a ticket for a new term. Not sure if anyone is watching this tracker though as there is one open issue submitted in 2011. Even if we develop a different technique to indicate the number of dimensions for a task, I'm still concerned about people being confused and using RepeatedTask for stochastic simulation. Having a StochasticTask would make this clear. We will end up having this same discussion over and over again. Creating a StochasticTask would make it clear that this is what should be used for doing stochastic simulation. I could relent on this though, if the specification expressly points out that one should NOT use repeated tasks for stochastic simulation. |
The repeatedTask is exactly the same and also returns a 2-dimensional array of results. Personally I don't see any difference between a repeatedTask and a stochasticTask with multiple runs. Like Lucian said it breaks all down to deal with the multi-dimensional data repeatedTasks return.
Personally I also see the problem that there are suddenly multiple ways to do things. Because most implementations of SED-ML already use repeatedTasks for stochastic simulations. Also there will be the issue with distrib models run with deterministic algorithms (is this a stochastic task?) and sampling from distributions to initialize models (is this a stochastic task?) The reasons are not really convincing:
This is wrong: The specifications show clearly in the examples L1V2 that the SEED is not reset and repeatedTasks are used for stochastic simulations.
It's a format read by computers. IMHO it does not make a big difference if you read an annotation or if you iterate over an repeated task (you don't have to support any of the functionalRanges even, but just get the number of repeats out. Also implementing the repeatedTasks would allow you to perform parameter scans which you could not do right now. But I see clearly that repeated task is somehow an overkill.
which is a subclass of task. Which is equivalent to a repeatedTask with no listOfChanges, a simple range 0, ... numRepeats-1, and only allows to perform a single task. Personally I like this SimpleRepeatedTask and people could easily map it to the repeated Tasks in their implementations. M |
Again, my point is a Task is an indivisible computation. A stochastic simulation is an indivisible computation. It is not the complexity of RepeatedTask that is the main issue for me. It is the semantics problem of RepeatedTask referring to Tasks. Namely, a RepeatedTask is stating loop through these Tasks. I don't like SimpleRepeatedTask either, since it would presumably still refer to a Task that is being repeated. I would prefer to add "numRepeats" to the Task class. That is much simpler, and, more importantly, semantically correct. |
But adding the numRepeats is exactly what SimpleRepeatedTask does (or whatever is a good name for it). It is a subclass of task and adds the attribute numRepeats and reset (in case people don't want to reset the initial concentrations).
You need a new subclass where one can define the behavior (otherwise what happens with the second run? Is everything reset? or not? Probably the name SimpleRepeatedTask is confusing, because it is not a subclass of RepeatedTask, it is a Task which runs the simulation multiple time (as I understand this is exactly what you want). |
I'm okay with StochasticTask still for the name, since even if you are doing multiple runs of an ODE simulation with random distributions for the initialAssignments, this is still a stochastic task. |
If the task is not stochastic, there would be little point to repeat. |
How would you easily check that it is not stochastic:
- it could be an SBML model with a distrib construct (for parameter) run
with a deterministic algorithm (I would call this a deterministic
simulation with stochastic initialization)
- there could also be some AlgorithmParameter which makes the simulator
return different results every time (not necessarily stochastic). Something
like cycleInitialConditions
I agree most of the use cases are stochastic, but I could imagine that some
use cases come up where `StochasticTask` would be not such a good name.
How about `IteratedTask`, perhaps `SampledTask`?
If we can agree that the construct is what you want, i.e. a subclass of
task with numRepeats (and reset). Than we can argue about the name :)
…On Fri, Jul 14, 2017 at 1:30 AM, cjmyers ***@***.***> wrote:
If the task is not stochastic, there would be little point to repeat.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#22 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA29usQja8hjE0kZQK9pRAbMqrq1y0rvks5sNqiXgaJpZM4OXVA_>
.
--
Dr. Matthias König
Junior Group Leader LiSyM - Systems Medicine of the Liver
Humboldt-University Berlin, Institute of Biology, Institute for Theoretical
Biology
https://www.livermetabolism.com
[email protected]
Tel: +49 30 20938450
Tel: +49 176 81168480
|
There are two different types of 'stochastic', in this case. A simulation with a 'stochastic' KiSAO term means 'treat the reactions in a stochastic manner'. In this way, each repeat of the simulation is different, if there are any active reactions in the model. But a second kind of stochastic model uses 'distrib' or the like to set values during the simulation. In that case, any repeated simulation of the model, regardless of the KiSAO term used, would produce a different result. And in fact, you might want to run either a stochastic-reaction run of a model with 'distrib'-set parameters, or a deterministic-reaction run of a model with 'distrib'-set parameters. Another option that might work is if we just adjusted 'repeatedtask' slightly to allow just a 'numRepeats' attribute instead of a 'range' reference: https://docs.google.com/drawings/d/1CbShcFxJYWyrAOmp6YnUpxeE8Pr3ij3Z1AQeVNY3C_4/edit Alternatively, we could create a simpler 'UniformRange' object (https://docs.google.com/drawings/d/195Wmeo8WtE6daf80rLELxOP_jdclZbiRtadzq_vew5Q/edit) with just one attribute: 'numberOfPoints'. |
Lucian: I think I'm sounded like a broken record, but RepeatedTask for this is a non-starter. It is not the complexity of RepeatedTask, but its semantics. What I want is a Task that has a number of runs, period. This is the only thing that makes sense semantically. Having a RepeatedTask that is simpler does not change the fact that it is acting on Tasks and repeating them. Matthias: I'm okay with "SampledTask" deriving from Task with NumberOfRuns as an added variable. |
@luciansmith I prefer the solution of: give the users what they want, i.e. a I had some informal agreement with Chris: If we make this happen in a timely manner, they will implement data reading in iBioSim (L1V3 data). |
Well, yes: we're circling back to the original disagreement where you think RepeatedTask means one thing, and I think it means something else. However, more importantly, handling the post-task data is obviously a problem for everyone, and needs to be addressed. Similarly, we need to handle 'seed' semantics, so everyone can be on the same page with that, too. If we must define repeated tasks three different ways, I would vote that the third way simply be the addition of an optional attribute 'numRepeats' on Task. No need to sub-class anything. (There is zero way that I can think of to implement Chris's suggestion that we somehow forbid people from using the RepeatedTask construct to repeat tasks.) |
I prefer "NumberOfRuns" rather than "NumRepeats", since the later makes it sound like it is a "RepeatedTask" where we already have some confusion on its meaning. I was not saying to "forbid" people. I was saying to make it clear in the specification that RepeatedTasks are not appropriate for stochastic simulation. Essentially, it should be made clear that in a RepeatedTask that, for example, all the algorithm parameters including the SEED are re-assigned at the beginning of each Task. |
As I understood you the major reason of your dislike for a new
Most of the current SED-ML implementations only implement the In my opinion the introduction of |
I read this issue to try to better understand the intended meaning of Regarding whether multiple stochastic runs can be described with L1V3 RepeatedTask, assuming the resetModel issue is addressed, I think multiple stochastic runs can be adequately described with RepeatedTask. While RepeatedTask doesn't provide the simplest possible syntax for describing multiple stochastic runs, to me, its semantics seem consistent with multiple independent stochastic runs. It seems to me that the central point of discussion in this issue is about the semantic interpretation of the SED classes.
Its seems to me that part of the reason for diverging opinions arises from SED currently being a middle ground between the two extremes outlined above. Most of the SED classes are focused on capturing computational operations. This is exemplified by AlgorithmParameter which uses KiSAO terms to define their semantic meaning. But, the simulation classes (SteadyState, OneStep, and UniformTimeCourse) convey specific semantic meaning about the computation. Because SED takes an intermediate approach with some degree of semantic meaning, I don't think its 100% clear which classes/attributes have specific semantic meanings and which do not. One place where these competing visions is particularly relevant is the discussion about how to apply SED to other modeling frameworks such as logical modeling. Similar to what Chris advocates here, #8 advocates for several additional classes for logical simulation, each of which would function substantially similar to an existing class. Similar to what Chris advocates, this would create simpler syntax for specific types of simulations. However, this would come at the cost of increasing the complexity of SED. In turn, this would likely result in further fracturing of software support for SED. This would likely make it less likely that simulation experiments (SED files) can be ported from one tool to another. To keep SED as simple as possible, to make SED as easy as possible for software developers to implement, and to maximize the portability of SED documents between tools, I would vote for keeping SED as free of semantic meaning as possible and using ontology terms (e.g., KiSAO) to describe the semantic meaning of instances of SED classes. This requires more KiSAO terms, but new terms are easy to add. For example, we've recently added many new terms. To avoid confusion about the semantic meaning of SED classes and attributes, ideally I would also vote to remove the existing semantic meaning from SED (e.g., replace UniformTimeCourse with a Simulation classes and encode initialTime, outputStartTime, etc. into AlgorithmParameters with appropriate KiSAO ids). Rather than using these classes (OneStep, SteadyState, UniformTimeCourse) to describe specific combinations of parameters that simulation tools should recognize, BioSimulators provides a place for simulation tools to advertise the KiSAO ids they support for each algorithm. This is much more flexible than what's possible with the three existing semantically-motivated simulation classes (OneStep, SteadyState, UniformTimeCourse). I think removing SED meaning from SED would also making the interpretation of the remaining classes clearer, addressing the central issue discussed here. |
I support @jonrkarr proposal. I think keeping the classes simple with semantics covered in parameters will increase the use and applicability of SED-ML. This is inline with my advocation for using an algorithm parameter rather than a repeated task for stochastic runs. It also will help library development which is stalled in some instances, such as Java. |
We have already proceeded in this direction -- combining the existing SED classes with KiSAO terms to create semantic meaning for simulations and providing a more flexible place to advertise how investigators can use the particular combinations supported by each tool. This has enabled us to use SED with a broader range of simulations:
|
This issue summarizes the information about the |
Just as a note--at this point, there are no interpreters that support the SimpleRepeatedTask (and I believe Chris is supporting his mode through the use of KiSAO terms, so even he has no real reason to support them, either). So as it stands right now, it will probably be dropped from the specification (due to MSB by the end of July). |
Done (but reversible) with 99cdcf8 |
Issue
A highly requested feature is better/simpler support for stochastic simulations which are currently quite complicated with the repeatedTasks. It should be simple to just define the number of runs for a stochastic simulation (instead of the overhead of a repeated task).
In addition it must be possible to easily calculate summary functions over the runs of the stochastic simulation. Things like mean, variance, std. Also the individual runs must be easily indexable.
Examples
Proposals
adjust 'repeatedtask' slightly to allow just a 'numRepeats' attribute instead of a 'range' reference:
https://docs.google.com/drawings/d/1CbShcFxJYWyrAOmp6YnUpxeE8Pr3ij3Z1AQeVNY3C_4/edit
create a simpler 'UniformRange' object (https://docs.google.com/drawings/d/195Wmeo8WtE6daf80rLELxOP_jdclZbiRtadzq_vew5Q/edit) with just one attribute: 'numberOfPoints'.
create a simple repeated task:
Related Issue
edit: I opened a separate issue for the math related part ( #53 )
The text was updated successfully, but these errors were encountered: