Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duda con el código base #1

Open
tvergara opened this issue Sep 6, 2024 · 1 comment
Open

duda con el código base #1

tvergara opened this issue Sep 6, 2024 · 1 comment

Comments

@tvergara
Copy link

tvergara commented Sep 6, 2024

Hola, creo que hay un pequeño bug en el código base. En MDPs/Problems/GamblerProblem está el método:

def __get_tail_outcome(self, state: int, action: int) -> (int, float):
        next_state = max(state - action, self.__min_state)
        return 1.0 - self.__prob_head, next_state, 0.0

Pero según el enunciado, cuando se llega al mínimo, hay una recompensa de -1. Creo que debiera ser así

    def __get_tail_outcome(self, state: int, action: int) -> (int, float):
        next_state = max(state - action, self.__min_state)
        reward = -1.0 if next_state == self.__min_state else 0.0
        return 1.0 - self.__prob_head, next_state, reward
@RodrigoToroIcarte
Copy link
Owner

Perdón, me equivoqué en el enunciado. La recompensa es siempre cero salvo cuando se gana. Ahí la recompensa es 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants