-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stock_NeurIPS2018_2_Train.ipynb: clarification on state space, action space #92
Comments
I'm not sure if I understand your question correctly, just lmk |
Thanks for the response !! understand state_space now. |
Here is the description provided for action space in the notebook. Action: The action space describes the allowed actions that the agent interacts with the environment. Normally, a ∈ A includes three actions: a ∈ {−1, 0, 1}, where −1, 0, 1 represent selling, holding, and buying one stock. Also, an action can be carried upon multiple shares. We use an action space {−k, ..., −1, 0, 1, ..., k}, where k denotes the number of shares. For example, "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively Going by the above description, if there are 30 stocks, possible number of actions should be : 30 sells, 30 buys and a hold. However, in the notebook, action space is considered equal to stock dimension. So, for 30 stocks, Can you please clarify, why "action_space = stock_dimension" is considered ? |
Because for each stock, the action is a scalar with in a continuous space instead of a discrete space like {-1, 0, 1}. Or say we just need one action for each stock, and that action is from a continuous space. Thus the action is always a vector with dimension of 30. The amount of each element directly represents buy(+)/hold(0)/sell(-) for that stock. |
Excellent !! Thanks for clarifying !! |
Thanks for opening up this issue. I will use this thread rather than create a new one. I am studying the notebook Stock_NeurIPS2018.ipynb and experimenting. I notice several issues in it. I will list them here in no particular order. 1.) I am using one stock to train (Say AAPL). Here I notice that the DDPG agent is not learning at all. I notice that the action space used to call the step function converges quickly to -1 (All sell) or 1 (All buy) and the reward calculated is 0 which probably explains the convergence to one specific action. Has anyone observed this? 2.) I see that the agent only buys and sells and does not hold at all during learning. Negative action values are sell and positive is buy. Shouldn't the action space be divided equally between sell, hold and buy 3.) If the initial action is sell and the number of shares at the start will naturally be 0. In this case the agent is barred from selling and sell_num_shares: 0 until some buy actions are generated. Later sell action goes through because we have shares to sell. I feel this is too restrictive during the initial learning process. The agent should be allowed to sell or buy provided we have the funds. Any comments or suggestions will be appreciated. |
Have couple of questions, should be trivial, but somehow not getting.
As mentioned in the description, for a single share, action space should just be [buy, sell, hold] or {-1, 0, 1}. For multiple shares say 10, action space = {-10 ... -1, 0, 1 ... 10}
which should be equal to,
action_space = 2 * stock_dimension + 1
However, referring to env_kwargs in the workbook, "action_space": stock_dimension is being considered.
Can you please clarify ?
Also, can you help as to how you arrived at state_space ?
state_space = 1 + 2*stock_dimension + len(INDICATORS)*stock_dimension
I could understand state variables corresponding to - len(INDICATORS)stock_dimension.
Why (1 + 2stock_dimension) is being added ?
The text was updated successfully, but these errors were encountered: