Add Gaussian noise to the dynamics for improved parameter identification #250

moorepants · 2024-09-24T06:12:42Z

Huawei Wang has this paper:

"Identification of the human postural control system through stochastic trajectory optimization" https://doi.org/10.1016/j.jneumeth.2020.108580

It seems that the technique is very similar to our noncontiguous example but he adds a small amount of fixed Gaussian noise to the equations of motion (different noise for each measurement trial). opty supports such an addition of noise via the known_trajectory_map, so I have tested that here. The result converges to the correct globally optimal solution even from an initial guess of zeros for the trajectories and choosing an initial guess as the lower bound of the parameters, but this also works without the noise too. It isn't yet clear to me that this reaches the global solution any better than what we have. It is worth investigating it more, as the paper claims adding the noise improves things. I also need to confirm that what I'm doing is the same as what Huawei did.

cc @chrismo-schmidt, @Peter230655

…ion.

Peter230655 · 2024-09-24T06:17:01Z

Huawei Wang has this paper:

"Identification of the human postural control system through stochastic trajectory optimization" https://doi.org/10.1016/j.jneumeth.2020.108580

It seems that the technique is very similar to our noncontiguous example but he adds a small amount of fixed Gaussian noise to the equations of motion (different noise for each measurement trial). opty supports such an addition of noise via the known_trajectory_map, so I have tested that here. The result converges to the correct globally optimal solution even from an initial guess of zeros for the trajectories and choosing an initial guess as the lower bound of the parameters, but this also works without the noise too. It isn't yet clear to me that this reaches the global solution any better than what we have. It is worth investigating it more, as the paper claims adding the noise improves things. I also need to confirm that what I'm doing is the same as what Huawei did.

cc @chrismo-schmidt, @Peter230655

Could you snd me the complete paper, I am not allowed to download it, as I do not belong to a university.
Thanks!

moorepants · 2024-09-24T06:26:35Z

@HuaweiWang, if you have a moment, could you confirm that I'm doing the same thing you do in your paper? I've made a small modification to this example: https://opty.readthedocs.io/latest/examples/plot_non_contiguous_parameter_identification.html by adding process noise to the dynamics.

Peter230655 · 2024-09-24T06:47:05Z

Huawei Wang has this paper:

"Identification of the human postural control system through stochastic trajectory optimization" https://doi.org/10.1016/j.jneumeth.2020.108580

It seems that the technique is very similar to our noncontiguous example but he adds a small amount of fixed Gaussian noise to the equations of motion (different noise for each measurement trial). opty supports such an addition of noise via the known_trajectory_map, so I have tested that here. The result converges to the correct globally optimal solution even from an initial guess of zeros for the trajectories and choosing an initial guess as the lower bound of the parameters, but this also works without the noise too. It isn't yet clear to me that this reaches the global solution any better than what we have. It is worth investigating it more, as the paper claims adding the noise improves things. I also need to confirm that what I'm doing is the same as what Huawei did.

cc @chrismo-schmidt, @Peter230655

Did you check, whether Mr. Wang's noise will also get the correct parameters if you add bias to your measurements?
I think, I found a way to handle bias, made a PR, but Mr. Wang's idea may be simpler. Of course I will test it.

moorepants · 2024-09-24T06:52:07Z

I saw your bias PR, but I'm not sure having biased measurements is so common (at least not large bias relative to the measurements).

Peter230655 · 2024-09-24T06:58:05Z

I saw your bias PR, but I'm not sure having biased measurements is so common (at least not large bias relative to the measurements).

I agree, such large biases as I tested would make a measurement instrument useless. I just thought, 'my' way would make the simulation non-susceptible to some bias in the measurements. I will test, whether Mr. Wang's method will give similar results.

moorepants · 2024-09-24T08:12:52Z

Do you think your bias addition reduces the chance the optimizer finds a local minima?

Peter230655 · 2024-09-24T08:22:59Z

Do you think your bias addition reduces the chance the optimizer finds a local minima?

I did not find this in my tests - and I ran quite a few. They'd always get the correct parameters, regardless of how much bias I added. The accuracy seemed in line with the simulation without bias, errors maybe 10% of correct value.
Of course I have no theoretical idea whether this would be so or not.
Right now I am setting up a simulation to how Wang's method works.

Peter230655 · 2024-09-24T08:59:12Z

My VERY initial findings with Wang's method are:

it does not work with (unrealistically) large bias, I am pretty sure of this.
it does not seem to help without bias. With large noise_scale (I use the term you used) it seems to hurt.

I use my model with spring coefficient, dampening and friction. Seems the spring coefficient is estimated the best, always seems so.
Do you think, this simple spring_damper_friction mass system is too simple a system to test these methods? If so, should I try a more complicated system, with say some rigid bodies involved?

moorepants · 2024-09-24T09:05:30Z

Do you think, this simple spring_damper_friction mass system is too simple a system to test these methods?

If we can't get these things to work well with a simple system, I don't see how jumping to a complex system will help us see the forest for the trees.

Peter230655 · 2024-09-24T09:10:53Z

Do you think, this simple spring_damper_friction mass system is too simple a system to test these methods?

If we can't get these things to work well with a simple system, I don't see how jumping to a complex system will help us see the forest for the trees.

I just thought, that maybe this simple system only has one minimum for (c, k, friction), so Wang's method cannot help to jump out of a local minimum to a lower minimum.

Peter230655 · 2024-09-24T10:15:24Z

A test showed these results, the % number are the errors:

noise scale       0.0      1.0     2.0
dampening         6.9%    9.7%    6.6%
spring           0.01%   0.31%  0.64%
friction          10.0%   20.7$  24.8%

So, at least in this simulation, barring programming errors, the addition of noise does not seem to help,

moorepants · 2024-09-24T12:00:29Z

If you read the paper, do you think I'm implementing it correctly? It seems too simple to be true.

moorepants · 2024-09-24T12:01:18Z

I just thought, that maybe this simple system only has one minimum for (c, k, friction), so Wang's method cannot help to jump out of a local minimum to a lower minimum.

Without the bounds and good guesses, I was regularly landing in local minima when we originally were playing around with the problem.

Peter230655 · 2024-09-24T12:16:28Z

If you read the paper, do you think I'm implementing it correctly? It seems too simple to be true.

I will. I implemented it, same way you did it, in my slightly more complicated model with friction, and my maybe 15 - 20 test runs gave no advantage, actually it got worse with larger noise_scale.
Eq (5) on page 11 looks exactly like what you did, except I have to find out what is meant by episode s. Is this the s-th measurement of the same thing, like I guess you assumed. Also, it says that $\theta$ is part of x(t), and he only optimizes that part, like you optimize x(t) only, not $\dot x(t)$
Anyway, let me read it, hopefully I will understand better.

moorepants · 2024-09-24T13:36:04Z

Here is Huawei's code: https://github.com/Huawei-ClevelandStateUniv/Stochastic_Paper/blob/master/controller_identification/StandingModel_PD_Eps_Noise_SameInit.py#L306 It seems like the noise is added just like we are doing.

Peter230655 · 2024-09-24T13:49:56Z

Here is Huawei's code: https://github.com/Huawei-ClevelandStateUniv/Stochastic_Paper/blob/master/controller_identification/StandingModel_PD_Eps_Noise_SameInit.py#L306 It seems like the noise is added just like we are doing.

From figure 1 on page 7, it is clear, what $\theta_a, \theta_h$ are.
In the program are also omega_a, omega_h not sure what they are.

We are adding the noise as if it were a random_force(t) acting on the system.
Is he doing the same? I am not very clear about his eoms.

moorepants · 2024-09-24T14:42:18Z

I would guess that omega_a = theta_a.diff(t).

Peter230655 · 2024-09-24T14:45:11Z

If I understood correctly it is not exactly the same what you are doing:

you want to find the values of parameters, just like he wants to find the $K_{i,j}$ parameters of eq(1) or eq(2). However, he has only one set of measurements ( $\theta_m $), while we assume we have different sets of measurements.
He creates M sets of eoms by adding noise to each one. if his $f_i(x(t), \dot x(t), K_{i,j}, a) = fr + fr^{\star}$ then it seems the same like you did it. (unclear what a is, page 10 is unreadable, where it may be explained)

If I understood correctly, it is a different set up.
I have forgotten too much of control theory to remember how to find out whether a controller like eq(1) or eq(2) is stable.

moorepants · 2024-09-24T14:51:03Z

Are you saying he fits multiple models to one set of measurements, with each model having different noise? I think I was interpreting it to multiple measurements (his episodes being multiple measurements).

Peter230655 · 2024-09-24T14:55:24Z

Are you saying he fits multiple models to one set of measurements, with each model having different noise? I think I was interpreting it to multiple measurements (his episodes being multiple measurements).

This is how I would read the very top of page 11. I guess by process noise signal he means the $n_s(0, \sigma)$
Also $\theta_m$ has no index to indicate different measurements.
In his objective function he sums over s - and only in $n_s(0, \sigma)$ does s occur. Hard to believe, that $\theta_m$ has an index s hidden somewhere.

Peter230655 · 2024-09-24T15:06:49Z

I would guess that omega_a = theta_a.diff(t).

I wonder why he did not use sympy mechanics to set up the eoms.

moorepants · 2024-09-24T15:12:25Z

I think he likely did.

Peter230655 · 2024-09-24T15:15:10Z

I think he likely did.

Makes sense. From the function you sent, I could not tell.

moorepants · 2024-09-24T15:18:42Z

In equation 5:

I would think that the sum over M in the objective would somehow indicate that this is a sum over the same measurements, i.e. the argument of the sum would not vary with s.

moorepants · 2024-09-24T15:19:43Z

Ah, I see what you are saying. The $\theta_m(t)$ in the objective does not have an $s$ subscript. So it is the same for all M episodes.

Peter230655 · 2024-09-24T15:23:35Z

Ah, I see what you are saying. The θ m ( t ) in the objective does not have an s subscript. So it is the same for all M episodes.

Yes, this is what I think.

Peter230655 · 2024-09-24T15:29:32Z

If his $f_i(.....) = fr + fr^{\star}$ then he subjects the controller to random forces in addition to the measured one.
Intuitively this should help to get a controller which can handle such disturbances better - which is more stable.

tvdbogert · 2024-09-24T15:30:53Z

Yes, each episode s uses the same measurements in the objective, but each episode has its own trajectory \theta(t) which is indexed with s. All M trajectories are optimized simultaneously.

moorepants · 2024-09-24T15:42:56Z

I just pushed another change that creates two independent measurements and three episodes for each of the two measurements. This also works.

Peter230655 · 2024-09-24T15:47:13Z

If I understand correctly, these $n_s(0, \theta)$ are the next best thing to having different measurements.
Would make sense intuitively.
Still one question: I tried to estimate 3 parameters with 6 measurements. When I added $n_s(0, \sigma)$ results got worse.

moorepants · 2024-09-24T15:52:05Z

On this problem with a different random seed on each execution of the script, I'm getting about 1 in 6 do not converge to the global optimum.

moorepants · 2024-09-24T15:52:24Z

But that is with an initial guess of all zeros.

moorepants · 2024-09-24T15:54:05Z

Nice thing is that opty supports this as is! The only negative is the repetition in the equations of motion, which may cause some slow down for very large equations of motion systems.

Peter230655 · 2024-09-24T15:57:53Z

Nice thing is that opty supports this as is! The only negative is the repetition in the equations of motion, which may cause some slow down for very large equations of motion systems.

This repetition of eoms is the same with Wang's method, is it not?
Broadly speaking: is this adding of noise the next best thing to having various measurements?

moorepants · 2024-09-24T16:00:47Z

@tvdbogert thanks for the tip! Nice idea you all came up with. We are doing some bicycle control experiments right now and amassing a bunch of experimental data to identify the rider's heading feedback control for use in multi-actor traffic simulations. We are hitting the same issues you and Huawei came upon (unstable solutions, not always global optimum). This may help us home in on consistent controller parameter identification.

moorepants · 2024-09-24T16:01:30Z

@Peter230655 you've got sharper eyes than I do! I blazed over that missing subscript s!

moorepants · 2024-09-24T16:07:40Z

Broadly speaking: is this adding of noise the next best thing to having various measurements?

I think there two separable concepts. Firstly, at least for the experiments we do in my lab, we collect data from many trials from the same person controlling a bicycle. So I have lots of noncontiguous measurements where the rider is roughly using the same internal control algorithm. I want to know the best fitting control parameters for all of the noncontiguous measurements we take. So, your example solved that.

Secondly, we need to get reliable global optima from real measurements where we don't know what the optimal parameters are. Wang's paper helps with that.

But, if you are stuck with only a single (hopefully long) measurement, then Wang's method helps you get to the global optima also.

So I don't think the addition of process noise is exactly equivalent, but it does seem to have some similar effects. I suppose having many noncontiguous measurements could be sufficient for getting to a global optima and the process noise may not be needed. But using them both might be the best of both solutions.

Peter230655 · 2024-09-24T16:40:39Z

@Peter230655 you've got sharper eyes than I do! I blazed over that missing subscript s!

They are old, but they have time! :-)

Peter230655 · 2024-09-24T16:46:28Z

Broadly speaking: is this adding of noise the next best thing to having various measurements?

I think there two separable concepts. Firstly, at least for the experiments we do in my lab, we collect data from many trials from the same person controlling a bicycle. So I have lots of noncontiguous measurements where the rider is roughly using the same internal control algorithm. I want to know the best fitting control parameters for all of the noncontiguous measurements we take. So, your example solved that.

Secondly, we need to get reliable global optima from real measurements where we don't know what the optimal parameters are. Wang's paper helps with that.

But, if you are stuck with only a single (hopefully long) measurement, then Wang's method helps you get to the global optima also.

So I don't think the addition of process noise is exactly equivalent, but it does seem to have some similar effects. I suppose having many noncontiguous measurements could be sufficient for getting to a global optima and the process noise may not be needed. But using them both might be the best of both solutions.

Makes sense.
I made the mistake this morning that, due to np.random.seed(...) I added the same $n_s(0, \sigma)$ to all equations.
I now corrected this, but in my example (as many measurements as $n_s(0, \sigma)$ ) it still makes things worse. My objective function likely was also not correct.
I will change the code tomorrow, so that these numbers may be picked independently and see.
This was a REALLY interesting day for me! :-)

Dumb question: Say, I have x measurements, and I want to add y noises
So, I set up y eoms, and each of the y eoms is minimized against all x measurements?
Wang set up all y noisy equations against his one measurement, so this would just be an extension to x measurements (?)

moorepants · 2024-09-24T17:24:53Z

This was a REALLY interesting day for me! :-)

Yes, fun!

Dumb question: Say, I have x measurements, and I want to add y noises
So, I set up y eoms, and each of the y eoms is minimized against all x measurements?
Wang set up all y noisy equations against his one measurement, so this would just be an extension to x measurements (?)

The way I now have this one is that I have a dynamical differential equation for each measurement and for each "episode" of that measurement. So if I have 10 noncontiguous measurements (where measurements = subset of all or all state trajectories) and then I want to do 5 episodes then I need 50 equations of motion.

Peter230655 · 2024-09-24T18:02:29Z

This was a REALLY interesting day for me! :-)

Yes, fun!

Dumb question: Say, I have x measurements, and I want to add y noises
So, I set up y eoms, and each of the y eoms is minimized against all x measurements?
Wang set up all y noisy equations against his one measurement, so this would just be an extension to x measurements (?)

The way I now have this one is that I have a dynamical differential equation for each measurement and for each "episode" of that measurement. So if I have 10 noncontiguous measurements (where measurements = subset of all or all state trajectories) and then I want to do 5 episodes then I need 50 equations of motion.

I will change my simulation accordingly tomorrow.
I was afraid, that this would be the answer...! :-)

Peter230655 · 2024-09-25T06:50:11Z

I am trying my simulation with variable number of measurements and variable number of repeats per measurement.
Hopefully some results by this evening.

Peter230655 · 2024-09-25T13:08:45Z

I made PR 253, but with examples-gallery, I do not know where else to do it.

Peter230655 · 2024-09-26T08:35:48Z

Yes, each episode s uses the same measurements in the objective, but each episode has its own trajectory \theta(t) which is indexed with s. All M trajectories are optimized simultaneously.

Question:
Are these measurements publicly available?
If yes, is there a way I could get them? Just something to play around with in my retirement.
Thanks!

moorepants · 2024-09-26T09:16:21Z

Huawei released the data publicly for his paper. Is that what data you are asking about?

Peter230655 · 2024-09-26T09:54:18Z

Huawei released the data publicly for his paper. Is that what data you are asking about?

Yes, this is what I meant. I just want to play around, using opty to find the $K_{i, j}$. As always with me nothing serious, important, or urgent. How would I get these data?
Thanks!

moorepants · 2024-09-26T10:02:25Z

You'll have to search for the data. I think I saw that it was available (maybe in his dissertation). Also, the Park2004 model in this repository already implements his entire model except for the process noise part. That's why I opened #252. It uses generated data.

moorepants · 2024-09-26T10:06:16Z

I also don't intend to merge this example or mine, because it is just a duplicate of what is there. We should have a separate different example to show how to do Wang's method.

Peter230655 · 2024-09-26T10:33:44Z

I also don't intend to merge this example or mine, because it is just a duplicate of what is there. We should have a separate different example to show how to do Wang's method.

I fully agree! I enjoy doing these simulations. But no point in adding something which is already there. I am not the least bit offended!!
Should I close the PR to avoid too much 'dead wood' with examples-gallery?

moorepants · 2024-09-26T11:02:17Z

You can leave it open while you work on it. Eventually I'll close both.

Peter230655 · 2024-09-26T11:11:51Z

You can leave it open while you work on it. Eventually I'll close both.

I am finished with both in their present stage. I am thinking about a simple model of that person standing of a tread mill, but initial stage only. I think 249 serves no practical purpose: it converges even with very large (unrealistic) biases, but very slowly so.
I'd like to use 253 to play around with a bit longer.

moorepants · 2024-09-26T12:12:39Z

I am thinking about a simple model of that person standing of a tread mill

This is already implemented in examples/Park2004.

Peter230655 · 2024-09-26T12:27:35Z

I am thinking about a simple model of that person standing of a tread mill

This is already implemented in examples/Park2004.

I saw it, but it did not lok 'simple' to me, surely it is realistic.

chrismo-schmidt · 2024-10-13T14:47:13Z

Hi all,

@tvdbogert thanks for the tip! Nice idea you all came up with. We are doing some bicycle control experiments right now and amassing a bunch of experimental data to identify the rider's heading feedback control for use in multi-actor traffic simulations. We are hitting the same issues you and Huawei came upon (unstable solutions, not always global optimum). This may help us home in on consistent controller parameter identification

As @moorepants already mentioned, I am working on a system id problem that occasionally has issues finding stable results. So I gave this approach a try with my more complex problem and real data.

Here is what I did:

Repeat eoms while introducing new state variables for each eom repetition and keeping the unknown system parameters. Similarly, repeat all bounds and constraints that correspond to the repeated states.
Add a noise symbol to each each highest derivative. For every repetition I introduced new noise symbols.
For every noise symbol, sample a noise trajectory and include them in the known_trajectory_map. Each sample is i.i.d and from a normal distribution with sigma as a fraction of the difference between the data-derived upper and lower bound of the measurements of the corresponding state.

As was discussed here previously, I believe that this corresponds to what Wang and van den Bogert do in their paper. With this setup, I selected a data sample and compared the results of optimization with and without process noise. For each case, I ran the same 100 different inital guesses that where sampled from wide spectrum of system parameters that result in a stable system. I tried 3 repetitions. Here are the results:

| process_noise_level | # stable sols. | obj. val. of best stable sol. | # sols. converging to best stable sol. |
| 0.0                 |    60 (60.0 %) |                   3.4541E-05  |                            60 (60.0 %) |
| 0.05                |    86 (86.0 %) |                   9.1467E-05  |                            28 (28.0 %) |

So indeed, the optimisation is more successful at finding stable results. However, the added process noise seems to make it more difficult to find what seems to be the global optimum. The best solutions of both scenarios are not identical but very close-by. Diving the objective value of the case with process noise by the number of repetitions, it is actually even a little better than the case without process noise. However adding process noise seems to make it much harder to reach the global optimum from any given initial guess. Only 28 initial guesses end up at the optimum under process noise, while 60 end up there without it.

I don't think one can draw too many conclusions from this one sample. But it definitely shows a case where adding process noise significantly increases the success at finding stable solutions. I find it quite interesting though that less initial guesses find the best optimum. It seems that the noise somehow makes it harder for the optimization to perform large jumps through the parameter space, in my case.

The slightly better objective value of the process noise case seems too marginal to draw any generalizing conclusions from in.

moorepants · 2024-10-13T15:20:57Z

@chrismo-schmidt once you find one of these optima that are close to global from the noise model, can you then solve one time with the non-noise model using that as an initial guess, thus converging to the true optima?

Add Gaussian noise to the dynamics for improved parameter identificat…

ff801e4

…ion.

This comment was marked as duplicate.

Sign in to view

Use two different measurements and 3 episodes each.

c60a7db

moorepants mentioned this pull request Sep 24, 2024

Move the Park2004 example to the examples-gallery #252

Open

moorepants mentioned this pull request Sep 25, 2024

Parameter identification wang #253

Open

Add Gaussian noise to the dynamics for improved parameter identification #250

Are you sure you want to change the base?

Add Gaussian noise to the dynamics for improved parameter identification #250

Conversation

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024

This comment was marked as duplicate.

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024

Peter230655 commented Sep 24, 2024

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024

Peter230655 commented Sep 24, 2024 • edited Loading

moorepants commented Sep 24, 2024

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024 • edited Loading

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024 • edited Loading

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024 • edited Loading

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024 • edited Loading

Peter230655 commented Sep 24, 2024

moorepants commented Sep 24, 2024 • edited Loading

Peter230655 commented Sep 24, 2024

moorepants commented Sep 24, 2024

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024

Peter230655 commented Sep 24, 2024 • edited Loading

tvdbogert commented Sep 24, 2024

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024

moorepants commented Sep 24, 2024 • edited Loading

moorepants commented Sep 24, 2024

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024

moorepants commented Sep 24, 2024

moorepants commented Sep 24, 2024

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024

Peter230655 commented Sep 24, 2024 • edited Loading

moorepants commented Sep 24, 2024

Peter230655 commented Sep 24, 2024

Peter230655 commented Sep 25, 2024 • edited Loading

Peter230655 commented Sep 25, 2024

Peter230655 commented Sep 26, 2024

moorepants commented Sep 26, 2024

Peter230655 commented Sep 26, 2024

moorepants commented Sep 26, 2024

moorepants commented Sep 26, 2024

Peter230655 commented Sep 26, 2024

moorepants commented Sep 26, 2024

Peter230655 commented Sep 26, 2024 • edited Loading

moorepants commented Sep 26, 2024

Peter230655 commented Sep 26, 2024

chrismo-schmidt commented Oct 13, 2024

moorepants commented Oct 13, 2024

Peter230655 commented Sep 24, 2024 •

edited

Loading

Peter230655 commented Sep 24, 2024 •

edited

Loading

Peter230655 commented Sep 24, 2024 •

edited

Loading

Peter230655 commented Sep 24, 2024 •

edited

Loading

Peter230655 commented Sep 24, 2024 •

edited

Loading

moorepants commented Sep 24, 2024 •

edited

Loading

Peter230655 commented Sep 24, 2024 •

edited

Loading

moorepants commented Sep 24, 2024 •

edited

Loading

Peter230655 commented Sep 24, 2024 •

edited

Loading

Peter230655 commented Sep 25, 2024 •

edited

Loading

Peter230655 commented Sep 26, 2024 •

edited

Loading