Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with HoverAviary: Navigation and Plotting #227

Open
abdul-mannan-khan opened this issue Jul 8, 2024 · 19 comments
Open

Issues with HoverAviary: Navigation and Plotting #227

abdul-mannan-khan opened this issue Jul 8, 2024 · 19 comments
Labels
question Further information is requested

Comments

@abdul-mannan-khan
Copy link

abdul-mannan-khan commented Jul 8, 2024

Hi everyone,

I'm working with the HoverAviary in gym_pybullet_drones and facing two main issues:

Navigation to Target: The drone only moves vertically and doesn't navigate to the set (x, y, z) coordinates. Does anyone have tips on enabling horizontal movement toward a target? For reference, I have attached the video response here.

crazyflie_video22.mp4

Plotting Problems: I'm also struggling with plotting the simulation results. If anyone has suggestions on how to effectively set up plots in this environment particularly for the main branch, it would be greatly appreciated.

Thanks for any help you can provide!

@JacopoPan JacopoPan added the question Further information is requested label Jul 31, 2024
@JacopoPan
Copy link
Member

as per #225 , action spaces that start with ONE_D_ forces the 4 motor thrust to be identical and thus "reduce" the problem to a mono dimensional one along z. You should use PID or RPM for 3D motion.

@abdul-mannan-khan
Copy link
Author

Dear @JacopoPan, I have already tried it for 10,000,000 steps. The drone does not reach there. Basically, I have shifted the desired location from (0,0,1) to some other point like (2,2,2). It never reached there even once even in 10,000,000 steps. Any suggestion?

@JacopoPan
Copy link
Member

Have you tried to learn to stabilize in 0,0,1 with a 3D action space first? I would make sure that at least that is working for you first.

@abdul-mannan-khan
Copy link
Author

Yes, it works for (0,0,1), however, it does not work for an arbitrary point like (2,2,1).

@JacopoPan
Copy link
Member

Can you please attach the terminal outputs (training curves) of those 2 trainings and the sourcecode changes you made to go to (2,2,1)?

@abdul-mannan-khan
Copy link
Author

abdul-mannan-khan commented Jul 31, 2024

HoverAviary.zip

learn_basics_20240727.zip

I do not have those training curves. I cleared my work history. I am running the simulation again and I shall soon send you those results. For the time being the code is attached herewith. Could you please take a look what could possible go wrong? Thank you so much for your help. Truly grateful to you.

@JacopoPan
Copy link
Member

can you share the code changes as a PR please?
otherwise you can share the output of git diff in the main folder?

@abdul-mannan-khan
Copy link
Author

abdul-mannan-khan commented Jul 31, 2024

My Linux was broken and i just reinstalled it. I do not have git diff. Also, pull request was causing problem for me on you repo. So, I created a separate repo citing your main repo. Just to keep my own version. If you think it is inappropriate, please let me know and I shall take it down.

I did not make big changes in the HoverAviary.py, just changes in the reward function. All other changes are simply my efforts to make the RL algorithm converge. Do you have any example in which rpm would take the drone from a starting point to any arbitrary goal point such as (2,2,2)? This will be extremely helpful. Thank you so much for your time.

I am not sure but I guess there is a logical problem in BaseRLAviary in line 190 as:

             target = action[k, :]
            if self.ACT_TYPE == ActionType.RPM:
                rpm[k,:] = np.array(self.HOVER_RPM * (1+0.05*target))

@JacopoPan
Copy link
Member

No worries with having your own version but it makes it a bit difficult for me to tell what might be fixed/changed from the original one.

Can you past the modified reward here?

@abdul-mannan-khan
Copy link
Author

Thank you so much @JacopoPan for your understanding and help. Here is the reward function:

def _computeReward(self):
        """Computes the current reward value.

        Returns
        -------
        float
            The reward.

        """
        state = self._getDroneStateVector(0)

        # ret = -10*(self.TARGET_POS[0]-state[0]) -10*(self.TARGET_POS[1]-state[1]) - 30*(self.TARGET_POS[2]-state[2])
        # ret = (self.TARGET_POS-state[0:3])
        # print("state x: ", state[0], "state y: ", state[1], "state z: ", state[2], "state 3: ", state[3])

        # time penalty
        norm_ep_time = (self.step_counter / self.PYB_FREQ) / self.EPISODE_LEN_SEC        
        time_penalty = -100*norm_ep_time
        # ret = -10*(self.TARGET_POS[0]-state[0]) -10*(self.TARGET_POS[1]-state[1]) - 30*(self.TARGET_POS[2]-state[2]) + 20*time_penalty
        goal_reward = -10*np.linalg.norm(state[0:3] - self.TARGET_POS)**4 

        stability_reward = -20 * (np.abs(state[3]) + np.abs(state[4]) + np.abs(state[5]))  # penalize large angular velocities

        # smoothness_reward = -0.1 * np.sum(np.square(act))


        # if (abs(state[0]) > 0.8*self.ARENA_SIZE[0] or abs(state[1]) > 0.8*self.ARENA_SIZE[1] or abs(state[2]) > 0.8*self.ARENA_SIZE[2]):
        #     # boundary_penalty = 10 * np.linalg.norm(state[0:3] - self.ARENA_SIZE)
        #     ret += 100/np.linalg.norm(state[0:3] - self.ARENA_SIZE)
        
        if (abs(state[0]) > self.ARENA_SIZE[0] or abs(state[1]) > self.ARENA_SIZE[1] or abs(state[2]) > self.ARENA_SIZE[2]):
            # print ("--------------------------------------------------")
            # print ("   The drone has gone out of working boundary.    ")
            # print ("--------------------------------------------------")
            working_boundary_penalty = -500
        else:
            working_boundary_penalty = 0

        
        # Truncate if the drone has tilted too much
        if (abs(state[7]) > .6 or abs(state[8]) > .6):
            # print ("--------------------------------------------------")
            # print ("           Getting tilting penalty!!!!            ")
            # print ("--------------------------------------------------")
            tilting_penalty = -500
        else:
            tilting_penalty = 0
        
        # Truncate if the episode has timed out
        if self.step_counter / self.PYB_FREQ > self.EPISODE_LEN_SEC:
            # print ("--------------------------------------------------")
            # print ("   Time is up. Too long to reach to the goal.     ")
            # print ("--------------------------------------------------")
            time_up_penalty = -300
        else: 
            time_up_penalty = 0

        # if np.linalg.norm(self.TARGET_POS-state[0:3]) < 0.5:
        if np.linalg.norm(np.array([self.TARGET_POS[0]-state[0], self.TARGET_POS[1]-state[1], self.TARGET_POS[2]-state[2]])) < 0.5:
            goal_reaching_reward = 5000
        else: 
            goal_reaching_reward = 0

        # ret = (max(0, 3 - np.linalg.norm(self.TARGET_POS-state[0:3]))*40)
        
        # ret = max(0, 2 - np.linalg.norm(self.TARGET_POS-state[0:3])**4)
        # print(ret)
        
        # Total reward
        ret = goal_reward + stability_reward + working_boundary_penalty + tilting_penalty + time_up_penalty +  goal_reaching_reward + time_penalty
        # print(ret)
        return ret

Basically, trying to penalise the agent if it is moving away from the goal, moving too fast, going out of the working boudary (flight arena), tilting too much or taking too long time to reach to the goal. pretty standard things.

I think the main hurdle is rpm action. It is not working as it should.

@JacopoPan
Copy link
Member

Hello @abdul-mannan-khan

I would definitely try starting from the 3D (rpm action, NOT one_d_rpm, this you need to change from the example in learn.py) example in a simple task (0,0,1) with a simple reward (the default one). Make sure that works to confirm that hasn't been other changes with side effects. Then progressively increase the complexity of the reward (first) and the challenge (going to a different point, second).

@abdul-mannan-khan
Copy link
Author

Hello @JacopoPan ,

Thank you so much for your reply. I have tested one_d_rpm. It works flawlessly with default and modified reward function. Thank you for your effort. However if I change the goal from (0,0,1) to even (1,0,1), it stops working with action rpm. I am looking forward to seeing your results. Extremely grateful for your kind efforts.

@JacopoPan
Copy link
Member

A few points:

  • Some of the components in your reward are not doing anything for the one_d_rpm example (the drone cannot tilt or escape in x-y with that action space)
  • The existing example in 0,0,1 already works with action rpm, I created it for a CDC workshop we gave last year (that's why I'm saying to start from there with your reward, not from the change of action space)
  • More importantly, when using rpm you need to understand that you are asking an RL agent to end-to-end learn in 1 value-policy network pair something that is conventionally done with 2-layers of, finely tuned, control (attitude and position): even the very famous RL drone results that we have seen recent years typically rely on a given collective thrust and body rates lower level control (e.g. Betaflight, see that in examples/), NOT direct motor controls
  • One of the main messages of this work was to show that, using vanilla DRL, that can be done (but it shouldn't let you underestimate how incredibly more complex that is)

@abdul-mannan-khan
Copy link
Author

Thank you so much for highlighting it. I am particularly intersted in finding out more about the following:

More importantly, when using rpm you need to understand that you are asking an RL agent to end-to-end learn in 1 value-policy network pair something that is conventionally done with 2-layers of, finely tuned, control (attitude and position): even the very famous RL drone results that we have seen recent years typically rely on a given collective thrust and body rates lower level control (e.g. Betaflight, see that in examples/), NOT direct motor controls
One of the main messages of this work was to show that, using vanilla DRL, that can be done (but it shouldn't let you underestimate how incredibly more complex that is)

Can you please put some more light on these points? I am sure that the solution is there in 2-layers and not in one layer. Should I write two RL algorithms one for low level control and one for high level control? For example, one control (MPC/RL) only able to track the given trajecctory and the second controller (RL-based) able to search for the trajectroy to reach the goal? Like one RL for trajectory planning and the second one for tracking it? Thank you so much for your response.

@JacopoPan
Copy link
Member

In essence, if you "only" specify a reward based on a far away target position, you are asking the RL agent (e.g. PPO) to figure out what in, a conventional pipeline, would be a stabilizing controller (to hover in place), a position controller (to perturb the equilibrium towards a desired reference point), and a trajectory generator (to generate a feasible sequence of poses towards your goal), all of which are affected by the dynamical qualities of the quadcopter itself. learn.py shows that this is possible, but also very hard. All the other approaches you mentioned (two-layered, sometimes with one, sometimes with both layers both learning-based; learning-based trajectory optimization with model-based control) are, in fact, the ones commonly used in practice.

@abdul-mannan-khan
Copy link
Author

abdul-mannan-khan commented Aug 6, 2024

Dear @JacopoPan,

Thank you so much for your kind response. Following your comment, if I would like to change my action type to xyz in which the action would be to move left/right, forward/backward and up/down, where can I define this kind of action?

I have already its definition in gym_pybullet_drones/utils/enums.py under class ActionType(Enum) as follows:

class ActionType(Enum):
    """Action type enumeration class."""
    RPM = "rpm"                 # RPMS
    PID = "pid"                 # PID control
    VEL = "vel"                 # Velocity input (using PID control)
    ONE_D_RPM = "one_d_rpm"     # 1D (identical input to all motors) with RPMs
    ONE_D_PID = "one_d_pid"     # 1D (identical input to all motors) with PID control
    XYZ = "xyz"

assuming that my action creating this motion will be followed by a local MPC. Where can I define this action? I have already defined it in gym_pybullet_drones/envs/BaseRLAviary.py line number 145 as

def _actionSpace(self):
        """Returns the action space of the environment.

        Returns
        -------
        spaces.Box
            A Box of size NUM_DRONES x 4, 3, or 1, depending on the action type.

        """
        if self.ACT_TYPE in [ActionType.RPM, ActionType.VEL]:
            size = 4
        elif self.ACT_TYPE in [ActionType.PID, ActionType.XYZ]:
            size = 3
        elif self.ACT_TYPE in [ActionType.ONE_D_RPM, ActionType.ONE_D_PID]:
            size = 1
        else:
            print("[ERROR] in BaseRLAviary._actionSpace()")
            exit()
        act_lower_bound = np.array([-1*np.ones(size) for i in range(self.NUM_DRONES)])
        act_upper_bound = np.array([+1*np.ones(size) for i in range(self.NUM_DRONES)])
        #
        for i in range(self.ACTION_BUFFER_SIZE):
            self.action_buffer.append(np.zeros((self.NUM_DRONES,size)))
        #
        return spaces.Box(low=act_lower_bound, high=act_upper_bound, dtype=np.float32)

I am confused about defining it in gym_pybullet_drones/envs/BaseRLAviary.py --> def _preprocessAction(self,action) line number 243 and gym_pybullet_drones/envs/BaseRLAviary.py --> def _observationSpace(self) line number 272.

I have made following changes in gym_pybullet_drones/envs/BaseRLAviary.py --> def _observationSpace(self) line number 272.

elif self.ACT_TYPE in [ActionType.PID,ActionType.XYZ]:

where can I define it? I tried to work on gym_pybullet_drones/envs/BaseAviary.py line number 259 def step(self, action):

Thank you so much for your kind response.

@JacopoPan
Copy link
Member

ActionType.PID already does that for a generic xyz target, you essentially want to restrict it to [1,0,0], [-1,0,0], [0,1,0], etc.
But, to be fair, if you are not at all interested about the dynamical properties of a multicopter, I might concede that this simulation is not the best choice. The focus here is on using physics based simulation.

@abdul-mannan-khan
Copy link
Author

abdul-mannan-khan commented Aug 6, 2024

Thank you so much for your response @JacopoPan. I believe that we do not need an RL algo for a dynamic control. MPC can do a pretty good job and there is no value addition in removing it but this is just my view. You can disagree with it obviously.

I am tying to understand what you mean by

ActionType.PID already does that for a generic xyz target, you essentially want to restrict it to [1,0,0], [-1,0,0], [0,1,0], etc

Do you mean that I need to restrict it in gym_pybullet_drones/envs/BaseRLAviary.py line 193 to 206 as follows:

elif self.ACT_TYPE == ActionType.PID:
               state = self._getDroneStateVector(k)
               next_pos = self._calculateNextStep(
                   current_position=state[0:3],
                   destination=target,
                   step_size=1,
                   )
               rpm_k, _, _ = self.ctrl[k].computeControl(control_timestep=self.CTRL_TIMESTEP,
                                                       cur_pos=state[0:3],
                                                       cur_quat=state[3:7],
                                                       cur_vel=state[10:13],
                                                       cur_ang_vel=state[13:16],
                                                       target_pos=next_pos
                                                       )
               rpm[k,:] = rpm_k

If you could point me some directory which would help me solve the problem I am trying to do, it would be a great favour. Thank you so much for your time and efforts.

@JacopoPan
Copy link
Member

yes, all the changes should be local to gym_pybullet_drones/envs/BaseRLAviary.py, you can re-implement the elif self.ACT_TYPE == ActionType.PID branches of duplicate them in a new ActionType.

In practice, because you have 6 actions instead of over a continuous 3D target in xyz, you'll probably have a switch case going from a 6-way enumeration to the directional versors.
The thing that you need to keep in mind is that, if you are changing the action space from continuous to discrete, you also need to change it's definition for the gym/gymnasium/stable-baselines3 interfaces to work as you'd expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants