Skip to content
This repository has been archived by the owner on Oct 10, 2022. It is now read-only.

Question regarding Model State #4

Open
sahpat229 opened this issue Mar 14, 2018 · 5 comments
Open

Question regarding Model State #4

sahpat229 opened this issue Mar 14, 2018 · 5 comments

Comments

@sahpat229
Copy link

I read through your PortfolioEnv code and your DDPG code. It seems to me that the state of your model only incorporates the window of price returns for each asset, and does not incorporate the previous portfolio allocation. This is reflected in how the commission is computed in PortfolioEnv: where you say that dw1 = (y1 * w1) / (np.dot(y1, w1) + eps). Comparing this to the PortfolioEnv implemented by wassname: dw1 = (y1 * w0) / (np.dot(y1, w0) + eps). Can you explain why you calculate the commission in this manner? Much appreciated, thanks.

@vermouth1992
Copy link
Owner

The trading model is this: at time stamp t, allocate portfolio denoted as w1 and buy using open price and sell all the portfolio using close price at timestamp t and so on. So at the beginning of each timestamp, we assume you only hold money without any asset. It's actually a process of distribution, gather and redistribution.

@sahpat229
Copy link
Author

I see, so your commission cost in this scenario is the percentage of the profit that is treated as commission, if that is correct?

@vermouth1992
Copy link
Owner

Yes.

@sahpat229
Copy link
Author

A couple other questions I have is:

  1. if that is the profit scheme, why is cost calculated as self.cost * (np.abs(dw1 - w1)).sum() instead of saying w2 = np.zeros(len(self.asset_names)+1), w2=[0]=1, cost = self.cost * (np.abs(dw1 - w2)).sum()? It seems to me that the first form, which is what you have, is considering the commission on profit when reallocating back to w1 rather than selling all assets and only holding cash.

  2. In the DDPG training, you say that action = predicted_action + self.actor_noise(), where I replaced the agent's prediction with predicted_action. This can violate the boundaries of 0 to 1 for each item of action, and violate the constraint of action.sum() == 1. In the PortfolioEnv, you correct for this by clipping the action to 0 to 1 and normalizing over its sum, but when adding the action taken to the ReplayBuffer, you add the original action, not the one that was normalized and actually taken by the environment. Is there a specific reason you do this?

  3. In the way the current environment is set up, the action w1 only has an effect on the immediate reward that the agent observes, and has no effect on future states and rewards the agent observes. Is there a reason you chose gamma, in this case, to be 0.99? Please correct me if my understanding of the impact of the action is wrong for the environment.

Thank you for taking the time to answer these questions.

@vermouth1992
Copy link
Owner

  1. There may be something wrong with the commission calculation. We are in a hurry and doesn't focus too much on the trading model. But it doesn't affect the algorithm.
  2. I think in the original paper, the action added to replay buffer is just a_t. I have seen classic control RL with DDPG also does this. But I guess you can actually try adding the valid action since it is the same as adding another random noise to the action.
  3. Yes. You are correct. This is actually not the same as traditional robot case. I set gamma to 0.99 just because I wrote this as a generic DDPG algorithm at the very beginning. Not for this specific task. I agree that gamma will not affect the outcome.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants