Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperparameters for biped walk #24

Open
apirrone opened this issue Aug 6, 2024 · 42 comments
Open

Hyperparameters for biped walk #24

apirrone opened this issue Aug 6, 2024 · 42 comments

Comments

@apirrone
Copy link

apirrone commented Aug 6, 2024

Hello !

I'm trying to use AMP_for_hardware to learn a nice walk on a small bipedal robot. I have produced motion example data in the same format as you used here.

I tried running trainings with pretty much the same hyperparameters as the ones in your example a1_amp_config.py. I had to tweak some things to match the size and weight of my own robot, but not a lot.

To validate that there is no bug in my adaptations, I first train with only a walking forward example motion and only positive x linear velocity command and no noise/randomization. The robot progressively learns to move forward a little bit, but by shaking its feet and making small jumps, not imitating the walking example motion. Here is an example (after 1000 steps of 8000 envs)

amp_for_hardware-2024-08-06_15.53.49.mp4

And here is what the reference motion looks like (using your replay_amp_data.py script)

amp_for_hardware-2024-08-06_15.55.33.mp4

The training curves look like this :

image

I am able to run your a1_amp environment with the provided checkpoint and it runs great, it's very fun ton control it with a joystick :)

My questions are :

  • Did you use the parameters that are in a1_amp_config.py to train the provided policy ? Meaning only the velocity tracking rewards and amp_reward_coef = 2.0 ?
  • Do you think the behavior I get is a symptom of a bug or bad parameters ?
  • Do I just need to train for much longer ?
    • I tried letting the training run over night, I did not get much better results

Thank you very much !

@WoohyunCha
Copy link

I have been training a bipedal robot as well.
Although my case is a bit different; setting the velocity tracking reward scales to (1.0, 0.5) instead of the 1.0/(0.005*6) or so, leads to the robot following the reference motion too much. No matter what command I give it would move in the same speed.
Increasing the velocity tracking reward scale or the task_lerp does help better tracking of command, but it leads to instability of the robot such as shaking or simply tipping over.
Someone else in my lab has succeeded in training a policy for the same robot using AMP without much tuning effort, but the code is based on rl_games instead of legged gym.
I hope my case helps you in solving your problem

@apirrone
Copy link
Author

apirrone commented Aug 7, 2024

That's very interesting, thank you @WoohyunCha !

I think I may still have some bugs, even when I keep only the amp reward (no velocity tracking) and only the forward walk motion example, the robot has trouble learning something clean. There is shaking and it's not really a regular walking gait.
Before trying AMP_for_hardware, I was using IsaacGymEnv's implementation of AMP, and in their documentation, they say that the disc_grad_penalty parameter is critical:
image

If I am not mistaken, this parameter is not "parameterized" in AMP_for_hardware, so I dug into the code and I think I found it, so I parameterized it to play with it. I think it is this (in amp_ppo.py) but I am not sure.
Capture d’écran du 2024-08-07 10-27-26

@WoohyunCha Did you tune this parameter as well or did you keep it at 10 ? In this repo https://github.com/inspirai/MetalHead they seem to use a very low disc_grad_penalty of 0.01

Also, I record my motion examples at 60fps. My understanding is that AMPLoader interpolates the example motion to match the dt set in the config. Is that right ?

@WoohyunCha
Copy link

WoohyunCha commented Aug 7, 2024

Hi.
I figured lambda=10 is reasonable because all the papers I have read uses 10. However, the actual weight should be 5, because the papers formulate
image
So I just divided it by 2, which still did not make things any better.
image

About the motion loader, my understanding is also the same, so the fps shouldn't be much of a problem. My reference motion are at 2000fps and my policy runs in 250Hz, but the imitation worked just fine.

My colleague who has succeeded in training a stable policy also used IsaacgymEnvs. I tried to match the parameters in this framework to those of IsaacgymEnvs, but the training does not improve at all. One key difference was that IsaacgymEnvs had a parameter named "disc_coef", which scales up the amp_loss and grad_pen_loss.
image
It is usually set as 5 in IsaacgymEnvs, which means if you set disc_grad_penalty to 5, the w_gp from the paper would be 50, which is much higher than what the papers suggest. Maybe this is why the lower disc_grad_penalty works.

@apirrone
Copy link
Author

apirrone commented Aug 7, 2024

Ok, I'll try introducing the disc_coef parameter and keep playing with disc_grad_penalty.

Thank you for your valuable help @WoohyunCha !

@WoohyunCha
Copy link

Also, my colleague says including the b_loss from IsaacgymEnvs is important so maybe take a look into it as well.
Thanks for the discussion! It would be really helpful if you could tell me the stuffs that worked after you succeed :)

@apirrone
Copy link
Author

apirrone commented Aug 7, 2024

I'll look into the b_loss !

Yeah I'll report back there, it could be useful for other people too :)

@apirrone
Copy link
Author

apirrone commented Aug 7, 2024

So I have been playing with disc_coef, disc_grad_penalty, amp_task_reward_lerp, different example motion speeds, the scale of the tracking reward, the stiffness and damping parameters and action_scale, and I can't get anything very good. I have implemented the b_loss but I have not tried it on a long run yet.

This makes me think there is a bug somewhere, or that some of the parameters are much too far from what they should be. @WoohyunCha @Alescontrela any ideas ?

@apirrone
Copy link
Author

apirrone commented Aug 7, 2024

I added clipping for the observations (5) and actions (1) like in IsaacGymEnvs, It seems to have a drastic effect on learning behavior. I'm starting to see things I like :)

amp_for_hardware-2024-08-07_22.57.30.mp4

This is with only motion imitation (amp_task_reward_lerp = 0) and not a lot of steps. I'm going to run a training with the velocity tracking overnight and hope for the best 🤞

@WoohyunCha
Copy link

Congrats @apirrone !!
I have been running things overnight too and finally made the policy imitate reference motion, and act differently according to the velocity commands although it does not perfectly track it. At least it tries to walk faster at higher commands and stands still at 0 commands.
It seems like either one of bound_loss or disc_grad_penalty was indeed the most crucial parameter.
image
image

Also, about clipping the observations and actions,
could you tell me more about how you implemented the clipping and scaling on observations?
By like in IsaacgymEnvs, do you mean you keep running estimates of mean and variance of observations, then normalize the observations instead of using fixed observation scales?

@apirrone
Copy link
Author

apirrone commented Aug 8, 2024

Well this night's run did not yield the results I hoped for. Still no clean walk :'(

I'll keep playing with the parameters, and try b_loss out.

Concerning the obs and action clipping, I just set action clipping to 1 and observation clipping to 5, like it was by default in the IsaacGymEnv env I based mine on.
image

Btw, how did you find the values for min_normalized_std ? I just set it to None

@apirrone
Copy link
Author

apirrone commented Aug 8, 2024

So I've been messing around all day, no luck.

Now, I'm trying to match all the parameters I had in IsaacGymEnvs, but I don't know if it's such a good idea since the a1_amp example of this repo works just fine ...

If anyone has any idea of things I could try, that would be great. I'm a little out of ideas 😅

@WoohyunCha
Copy link

WoohyunCha commented Aug 9, 2024

Hi.
I successfully trained my policy to both imitate and track velocity commands in a pretty stable manner. The training nearly converges at 3000 iterations, so the training stability itself is pretty good too.

tocaboi_walk_05m.mp4

I actually did not change anything in min_normalized_std (it's the default value from this repository)
image
Don't mind the wgan part as I'm still working on it.

These are the parameters I used to train the stable version. After implementing bound loss and disc_coef along with the smaller disc_grad_pen, changes in other parameters did not affect the training that much. So maybe try my parameters? I have updated my version on my github repo so might as well check that.

If using the same parameters as me does not work, maybe we should also take a look into the reference motions and the commands you are giving. Could you describe what kind of reference motions you are using, how you set the policy observations and the discriminator states, and the ranges of commands (velocities) you are giving?

And also, I haven't run the a1_amp example. Does it work without all the additional stuffs we have discussed so far? (like disc_coef or bound loss) Matching everything with IsaacGymEnvs (especially the loss functions) worked for me so I think it's worth the try.

@apirrone
Copy link
Author

apirrone commented Aug 9, 2024

That's a great result @WoohyunCha, congrats !

I have tried with pretty much the exact same parameters as in your screenshot, it wouldn't imitate the motion properly.

I generate the reference motion using a procedural walk engine, here is an example

amp_for_hardware-2024-08-09_10.02.39.mp4

I know these motions are somewhat realistic, as I can make the robot walk in mujoco using them

amp_for_hardware-2024-08-09_10.04.27.mp4

I also know I formatted them correctly for AMP_for_hardware's motion_loader.py because I can use the provided replay script

amp_for_hardware-2024-08-06_15.55.33.mp4

For now, I try to only make it walk forward properly, so my velocity command is always 0.1 m/s (I looked at the avereage linear velocity in mujoco when walking forward)

I matched the actual torque of the motors I'm using in the real robot (given by the specs), they are not very powerful. I was thinking maybe the robot is simply under actuated and it's hard to follow this walk properly, but I was able to learn a very clean walk with these specs in IsaacGymEnvs so I don't think it's that

isaac_walk-2024-07-16_17.52.32.mp4

I'll keep trying here, but at some point I will probably go back to IsaacGymEnv.

@BruceXing24
Copy link

@WoohyunCha can you share your reward scale?

@WoohyunCha
Copy link

@WoohyunCha can you share your reward scale?

I used 50 for linear velocity tracking and 25 for angular velocity tracking.

@kaydx1
Copy link

kaydx1 commented Aug 12, 2024

@apirrone in resample_commands function velocities smaller than 0.2 are set to zero, so I assume you are always feeding 0 command to the policy

@apirrone
Copy link
Author

Hi @kaydx1 , yes I noticed that recently, I changed that to 0.01, but still no luck. Thank you for your help, don't hesitate if you have another idea :)

@kaydx1
Copy link

kaydx1 commented Aug 12, 2024

@apirrone _get_noise_scale_vec check this function as well if it coresponds to your action space(if you are using noise).

@apirrone
Copy link
Author

I removed all noise and randomization for now, trying to debug the root of the issue. But at first I had issues with base mass randomization, which would add or remove up to 1kg, and my robot is 1kg :)

@apirrone
Copy link
Author

@kaydx1 @WoohyunCha Do you think I could have simulation stability issues because of the low weight of my robot ?

I set the max torque the same as given by the specs of the real motors I use (0.52Nm), but I have trouble finding relevant stiffness and damping parameters.

Also, I tried implementing the same custom PD control as in AMP_for_hardware in IsaacGymEnvs, and I have similar issues. With isaac gym's PD controller (using gym.set_dof_position_target_tensor()) I can learn a pretty clean forward walk, but If I switch to gym.set_dof_actuation_force_tensor() with the torques computed as below, It never learns to properly walk.
Capture d’écran du 2024-08-13 11-57-58
Capture d’écran du 2024-08-13 11-58-19

So I guess my issues could come from simulation stability (dt ? decimation? )or control parameters ?

@kaydx1
Copy link

kaydx1 commented Aug 13, 2024

@apirrone You could try manually investigate outputs of both position and torque controlled policies and torques you are sending in compute_torques function as well. Maybe it will give some insights(if your torque always hits the limit or if it too small for example). And for sure you can try bigger action scale(0.5 or 1). And maybe increase clipping parameter of action and observation. Regarding stability if your sim.dt is small enough(0.005 for example) it shouldn't be the case.

@apirrone
Copy link
Author

Good idea, I'll try that

@apirrone
Copy link
Author

Do you know how to get the computed torques out of the default pd controller ? I did this :

self.gym.refresh_dof_force_tensor(self.sim)
torques = self.gym.acquire_dof_force_tensor(self.sim)
torques = gymtorch.wrap_tensor(torques).view(self.num_envs, self.num_dof)

But I think it returns the net torques applied on each joints, and in the case of a "working" walk, the torques are mostly compensated by gravity, which gives this

Capture d’écran du 2024-08-13 13-54-38

Here, custom is with the custom PD controller that uses gym.set_dof_actuation_force_tensor() (the robot does nonsense)

@kaydx1
Copy link

kaydx1 commented Aug 13, 2024

@apirrone No, dont have experience with gym.set_dof_position_target_tensor(). So that means that you learn right motor position commands but when you compute the torque from these positions it doesn't work? Right now have no ideas

@apirrone
Copy link
Author

My understanding is that when the robot is standing up, the policy learned to output actions such that the torques sent to the motors compensate the torques created by gravity (with the legs flexed), so that the total sum of torques is close to zero. I think that is what torques = self.gym.acquire_dof_force_tensor(self.sim) shows. But when I run the same policy with the custom PD control, the torques don't just compensate the gravity (it's expected as the policy was not trained with this control type).

I don't know how I can get the torques that are applied to the motors by the policy when using isaac's pd control

@kaydx1
Copy link

kaydx1 commented Aug 16, 2024

@apirrone I also forgot about default_dof_drive_mode in asset config? Did you check it?

@apirrone
Copy link
Author

I set it to None, by default it was effort, does this make a difference ?

@kaydx1
Copy link

kaydx1 commented Aug 16, 2024

@apirrone https://forums.developer.nvidia.com/t/difference-between-set-dof-actuation-force-tensor-and-set-actor-dof-position-targets/271432/4
I saw this discussion and maybe if your drive mode is set to position control but you are using for example force control(set_dof_actuation_force_tensor), then there is a conflict

@Alescontrela
Copy link
Owner

@apirrone Very cool robot! Can't wait to see it walking properly. Are you entirely sure that the reference motion is correct? Not just the join angles, but also the velocities / joint velocities. Also the mapping between reference motion joints and robot joints is crucial. Here is how I would recommend debugging that:

  1. Play the reference motion open loop in legged_gym (like you did in MuJoCo) and record all the relevant information
  2. Plot the open loop data and the reference motion data and compare the two

If they don't line up then there's a good chance something is wrong

@WoohyunCha That motion looks awesome!

@apirrone
Copy link
Author

Hi @Alescontrela !

I'm pretty sure the reference motion is correct, I did not plot it but I printed the difference between the data and the actual motion in Isaac in replay_amp_data.py like this for instance :

Capture d’écran du 2024-08-17 10-34-02

And the difference is 0

Capture d’écran du 2024-08-17 10-33-38

In my understanding, this means that the mapping is correct too, right ?

A little update, this is where I'm at now :

bdx_amp_for_hardware-2024-08-17_10.44.04.mp4

I tuned the control parameters better, so there is way less shaking, which is good :) But as you can see, the robots are still not really walking ^^

Thank you for your help !

@apirrone
Copy link
Author

I double checked everything:

  • I record the joints in the correct order for isaac gym
  • I record in the same format as set there
    Capture d’écran du 2024-08-17 16-39-43
  • legged_robot.py:get_amp_observations() returns in the same order as when using motion_loader.py:feed_forward_generator() (which is used in amp_ppo.py

Do I need to check something else ?

I noticed that when replaying using replay_amp_data.py, the motion was quite slower than what I recorded. Could it be a clue to what's going on ? For reference :

Using replay_amp_data.py :

bdx_amp_for_hardware-2024-08-17_16.43.58.mp4

Using a script of mine (which replays real time)

bdx_amp_for_hardware-2024-08-17_16.45.00.mp4

If I adjust env.dt or decimation, replay_amp_data.py plays faster or slower, but should it not be agnostic to dt ? I mean the motion loader should compute the motion such that it is replayed real time regardless of dt right ?

@apirrone
Copy link
Author

apirrone commented Aug 19, 2024

So I'm still investigating.
I took your advice @Alescontrela, I replayed the move open loop by fixing the floating base in place. It looks like that

bdx_amp_for_hardware-2024-08-19_17.41.50.mp4

Looking at the dof position targets (motion reference) and actual dof positions:
Capture d’écran du 2024-08-19 17-42-32

Looks pretty good.

Now, the dof velocities (in rad/s):
Capture d’écran du 2024-08-19 17-42-42

This looks very wrong right ?

For reference, this is how I get these values :

target_dof_vel = self.amp_loader.get_joint_vel_batch(
    self.amp_loader.get_full_frame_at_time_batch(
        np.zeros(self.num_envs, dtype=np.int),
        self.envs_times.cpu().numpy().flatten(),
    )
)

target_dof_vel_np = torch.round(target_dof_vel, decimals=4).cpu().numpy()
actual_dof_vel_np = torch.round(self.dof_vel, decimals=4).cpu().numpy()

Also, I noticed a while back that the velocities shown in the graph when using play.py seemed weird
Capture d’écran du 2024-08-19 17-46-00

For a run that looks like this

bdx_amp_for_hardware-2024-08-19_17.45.47.mp4

Does anyone know what could be going on ?

@WoohyunCha
Copy link

So I'm still investigating. I took your advice @Alescontrela, I replayed the move open loop by fixing the floating base in place. It looks like that

bdx_amp_for_hardware-2024-08-19_17.41.50.mp4
Looking at the dof position targets (motion reference) and actual dof positions: Capture d’écran du 2024-08-19 17-42-32

Looks pretty good.

Now, the dof velocities (in rad/s): Capture d’écran du 2024-08-19 17-42-42

This looks very wrong right ?

For reference, this is how I get these values :

target_dof_vel = self.amp_loader.get_joint_vel_batch(
    self.amp_loader.get_full_frame_at_time_batch(
        np.zeros(self.num_envs, dtype=np.int),
        self.envs_times.cpu().numpy().flatten(),
    )
)

target_dof_vel_np = torch.round(target_dof_vel, decimals=4).cpu().numpy()
actual_dof_vel_np = torch.round(self.dof_vel, decimals=4).cpu().numpy()

Also, I noticed a while back that the velocities shown in the graph when using play.py seemed weird Capture d’écran du 2024-08-19 17-46-00

For a run that looks like this

bdx_amp_for_hardware-2024-08-19_17.45.47.mp4
Does anyone know what could be going on ?

Is the video below from using the custom PD controller instead of the one in Isaacgym? If so, could it not be simply a matter of gain tuning?

@apirrone
Copy link
Author

@WoohyunCha Yeah this is a random run, a few training steps, it was just to demonstrate the velocity noise. It is using the custom PD controller.

Maybe it's just a matter of gain tuning, but I spent some time tuning those gains. In the motion you see in the first video, the motion is replayed through the actions, meaning through the PD controller. The motion is correctly reproduced and as you can see in the position graphs, the dofs follow the commands pretty well. So I don't think the gains are too much far off, right ?

@WoohyunCha
Copy link

WoohyunCha commented Aug 20, 2024

@WoohyunCha Yeah this is a random run, a few training steps, it was just to demonstrate the velocity noise. It is using the custom PD controller.

Maybe it's just a matter of gain tuning, but I spent some time tuning those gains. In the motion you see in the first video, the motion is replayed through the actions, meaning through the PD controller. The motion is correctly reproduced and as you can see in the position graphs, the dofs follow the commands pretty well. So I don't think the gains are too much far off, right ?

So both the videos use custom PD controllers, and the only difference is whether the base is fixed or not? If so, I think it is because the gains will work differently when the robot is under contact (with the ground). Have you tuned the gains while the robot base is fixed?

@apirrone
Copy link
Author

Yes I have tuned the motor gains with the base fixed, to see if it could reproduce the reference motion. I could try tuning the gains with the open loop motion with the base not fixed, but it's hard because the robot immediately falls down with the open loop motion. I'll see if I can make it work

@apirrone
Copy link
Author

I have spent some time tuning the kP and kD parameters, I had to reduce the dt to 0.002 (instead of 0.005) to get something stable. With zero actions, the robot stands on its own without shaking, and replaying the motion it seems to follow the commands pretty well (it's falling because it's open loop, but it looks ok).

With disc_grad_penalty = 0.1, I'm starting to see real attempts at steps (after only 1200 training steps) , finally :)

bdx_amp_for_hardware-2024-08-21_09.38.41.mp4

I think because of the low weight and inertia of my robot, the physics solver introduced a lot of noise with a higher dt, and because the kP and kD parameters were not optimal, the policy could not learn to properly follow the movements. Does this make sense ?

@WoohyunCha
Copy link

Great to hear that!
What's the current control frequency? (I remember the parameter that decides this is decimation)
I was working on making my training more stable, and it turned out that too high control frequency makes the training very noisy and often leads to failure. I guess this is due to the high correlation between samples used to train the networks using SGD.
I have been using dt = 0.002 with decimation 2 (which means the control frequency is 250Hz), and increasing decimation to 4 has greatly improved everything.

@apirrone
Copy link
Author

apirrone commented Aug 25, 2024

Hi @WoohyunCha !

This is the best I got as of today

bdx_amp_for_hardware-2024-08-21_17.06.01.mp4

That's a big step forward, but still not perfect, too much shaking, the walkk does not look very robust etc. So I'm still playing with parameters :)

I have been using dt = 0.002 and decimation 6. I'll try increasing it to 8 to see if it helps things.
Also, I have set disc_grad_penalty to 0.01 there

@WoohyunCha
Copy link

Hi @WoohyunCha !

This is the best I got as of today

bdx_amp_for_hardware-2024-08-21_17.06.01.mp4
That's a big step forward, but still not perfect, too much shaking, the walkk does not look very robust etc. So I'm still playing with parameters :)

I have been using dt = 0.002 and decimation 6. I'll try increasing it to 8 to see if it helps things. Also, I have set disc_grad_penalty to 0.01 there

Hi, I've been working on the parameters a lot and found out that setting disc_coef = 1 and disc_grad_penalty as 5, which is closer to the parameters in the original framework, works best. I guess trying to match the parameters with IsaacgymEnvs was a bad idea.

  1. The major point that I was missing is that the reward is scaled down by the control frequency when computing the lerp between task and style rewards. So, with almost everything close to the original parameters and setting the linear/angular tracking reward scales to 1.5 / (sim.dt * control.decimation) and 0.5 / (sim.dt * control.decimation) gave the best results. Have you tried matching the reward scale to your control frequency?
  2. I still needed to add the bound loss, but this is probably because I'm training a torque based policy, where the action should be between [-1, 1].

@apirrone
Copy link
Author

Thank you so much @WoohyunCha, I'll try with these parameters and see if it improves things !

Were you able to introduce velocity following for turning ? I tried adding it along with the relevant reference motion but it did not really work. I only tried once tho

@WoohyunCha
Copy link

Thank you so much @WoohyunCha, I'll try with these parameters and see if it improves things !

Were you able to introduce velocity following for turning ? I tried adding it along with the relevant reference motion but it did not really work. I only tried once tho

Haven't tried it yet, but my colleague did it in IsaacgymEnvs so I guess it should work in leggedgym as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants