New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

duda con el código base #1

Open

tvergara opened this issue Sep 6, 2024 · 1 comment

tvergara commented Sep 6, 2024

Hola, creo que hay un pequeño bug en el código base. En MDPs/Problems/GamblerProblem está el método:

def __get_tail_outcome(self, state: int, action: int) -> (int, float):
        next_state = max(state - action, self.__min_state)
        return 1.0 - self.__prob_head, next_state, 0.0

Pero según el enunciado, cuando se llega al mínimo, hay una recompensa de -1. Creo que debiera ser así

    def __get_tail_outcome(self, state: int, action: int) -> (int, float):
        next_state = max(state - action, self.__min_state)
        reward = -1.0 if next_state == self.__min_state else 0.0
        return 1.0 - self.__prob_head, next_state, reward

The text was updated successfully, but these errors were encountered:

Owner

RodrigoToroIcarte commented Sep 6, 2024

Perdón, me equivoqué en el enunciado. La recompensa es siempre cero salvo cuando se gana. Ahí la recompensa es 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment