We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hola, creo que hay un pequeño bug en el código base. En MDPs/Problems/GamblerProblem está el método:
MDPs/Problems/GamblerProblem
def __get_tail_outcome(self, state: int, action: int) -> (int, float): next_state = max(state - action, self.__min_state) return 1.0 - self.__prob_head, next_state, 0.0
Pero según el enunciado, cuando se llega al mínimo, hay una recompensa de -1. Creo que debiera ser así
def __get_tail_outcome(self, state: int, action: int) -> (int, float): next_state = max(state - action, self.__min_state) reward = -1.0 if next_state == self.__min_state else 0.0 return 1.0 - self.__prob_head, next_state, reward
The text was updated successfully, but these errors were encountered:
Perdón, me equivoqué en el enunciado. La recompensa es siempre cero salvo cuando se gana. Ahí la recompensa es 1.
Sorry, something went wrong.
No branches or pull requests
Hola, creo que hay un pequeño bug en el código base. En
MDPs/Problems/GamblerProblem
está el método:Pero según el enunciado, cuando se llega al mínimo, hay una recompensa de -1. Creo que debiera ser así
The text was updated successfully, but these errors were encountered: