ReinforceViz Backend

This is the backend for ReinforceViz, a web tool for visualizing Q-learning and value iteration in reinforcement learning. It provides a RESTful API for algorithm computations, supporting the frontend visualization tool.

Features

RESTful API for Q-learning and value iteration algorithms
Customizable grid-world environments
Real-time computation of Q-values and state values
Step-by-step execution support for algorithm understanding

Technology Stack

Python 3.7+
Flask web framework
Custom implementations of Q-learning and Value Iteration algorithms

Getting Started

Prerequisites

Python (v3.7 or later)
pip

Installation

Clone the repository:

    git clone https://github.com/yourusername/ReinforceViz.git
    cd ReinforceViz

Set up a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Set up the backend:

    pip install -r requirements.txt

Create a .env file in the root directory and add the necessary environment variables. You can decide what variables to add based on your specific requirements.

    DEBUG=True
    SECRET_KEY=your_secret_key_here
    PORT=5000

Running the Application

Start the backend server:

    python run.py

The API will be available at http://localhost:5000 by default

Project Structure

The project follows a modular structure for better organization and maintainability:

run.py: Application entry point
app/: Main application directory
- __init__.py: Flask app initialization
- core/: Core implementations of algorithms and grid environment
  - enums/: Enumeration classes (AgentType, QueryType)
  - grid/: Grid-related classes (Grid, GridState, GridCellProperties)
  - agent/: Agent-related classes (ValueIterationAgent, QLearningAgent, QueryAnsweringAgent)
- controllers/: Request handlers for Q-learning and Value Iteration
  - q_learning_controller.py: Q-learning algorithm controller
  - value_iteration_controller.py: Value Iteration algorithm controller
- routes/: API route definitions
  - main_routes.py: Main API routes
  - q_learning_routes.py: Q-learning specific routes
  - value_iteration_routes.py: Value Iteration specific routes
requirements.txt: Python dependencies
.env: Environment variables (not in version control)
.gitignore: Git ignore file
LICENSE: Project license file
README.md: Project documentation

API Documentation

Q-Learning

Run Q-Learning Agent

URL: /api/q-learning/run-agent
Method: POST
Data Params:

{
  "x": 4,
  "y": 3,
  "Terminal": [
    [3, 2, 1],
    [3, 1, -1]
  ],
  "Boulder": [[1, 1]],
  "RobotStartState": [0, 0],
  "Discount": 0.9,
  "Noise": 0.2,
  "TransitionCost": 0.0,
  "Alpha": 0.1,
  "Episodes": 1000
}

Success Response:
- Code: 200
- Content:

    {
      "message": "Q-Learning completed",
      "iterations":  {
        "q_values": {...},
        "state_sequences": {...}
      }
    }

Reading the Output for Q-Learning Route

The iterations object contains:

q_values: A nested object representing the Q-values for each state-action pair at different episodes.
- The outermost key is the episode number.
- The next level key is the state in "x,y" format.
- The innermost object contains the Q-values for each action (N, S, E, W) in that state. Example:

{
  "q_values": {
    "0": {
      "0,0": {
        "E": 1.0,
        "N": 0.0,
        "S": 0.0,
        "W": 0.0
      },
      "0,1": {
        "E": 0.0,
        "N": 0.0,
        "S": 0.0,
        "W": 0.0
      }
    }
  }
}

In this example, iteration 0 shows the initial Q-values for a 2x1 grid:

 --------------- ---------------
|       N       |       N       |
|     0.00      |     0.00      |
|  W        E   |   W       E   |
| 0.00    1.00  | 0.00    0.00  |
|      S        |      S        |
|     0.00      |     0.00      |
 --------------- ---------------

state_sequences: An object where keys are episode numbers and values are arrays representing the sequence of states visited in that episode.

"state_sequences": {
  "1": ["0,0", "0,1", "0,2", "1,2", "2,2", "3,2"],
  "2": ["0,0", "1,0", "2,0", "3,0", "3,1"],
  ...
}

This state_sequences object shows the path taken by the agent in each episode. For example, in episode 1, the agent started at state "0,0", then moved to "0,1", "0,2", and so on, until it reached the terminal state "3,2". In episode 2, the agent took a different path, moving from "0,0" to "1,0", "2,0", "3,0", and finally to "3,1". These sequences help visualize how the agent's behavior changes as it learns the optimal policy over multiple episodes.

Value Iteration

Run Value Iteration Agent

URL: /api/value-iteration/run-agent
Method: POST
Data Params:

{
  "x": 4,
  "y": 3,
  "Terminal": [
    [3, 2, 1],
    [3, 1, -1]
  ],
  "Boulder": [[1, 1]],
  "RobotStartState": [0, 0],
  "K": 25,
  "Discount": 0.9,
  "Noise": 0.2,
  "TransitionCost": 0.0
}

Success Response:
- Code: 200
- Content: { "message": "Value Iteration completed", "iterations": {...} }

Reading the Output for Value Iteration Route

The iterations object contains:

Keys representing each iteration number.
Values are objects containing:
- values: The state values for each grid cell.
- policy: The best action for each grid cell.

Example:

{
  "iterations": {
    "0": {
      "0,0": {
        "best_action": "N",
        "value": 0.0
      },
      "0,1": {
        "best_action": "N",
        "value": 0.0
      }
    },
    "1": {
      "0,0": {
        "best_action": "E",
        "value": 0.1
      },
      "0,1": {
        "best_action": "N",
        "value": 0.2
      }
    }
  }
}

In this example, we see the first two iterations of Value Iteration for a 2x1 grid:

 --------- ---------
|    ^    |    ^    |
|  0.00   |  0.00   |
|         |         |
 --------- ---------
  --------- ---------
|    ^    |         |
|  0.10   |  0.20 > |
|         |         |
 --------- ---------

At iteration 0, all state values are initialized to 0.0, and the best actions are arbitrarily set to "N" (North). As the algorithm progresses to iteration 1, we see that the state values have been updated based on the rewards and transition probabilities. The best actions have also been updated to reflect the current estimate of the optimal policy.

Note: The state values represent the expected cumulative reward for starting in that state and adhering to the optimal policy. The best actions indicate the direction the agent should take to maximize its expected cumulative reward.

In both cases, the state is represented as "x,y" coordinates, and actions are abbreviated as N (North), S (South), E (East), W (West), or Terminate for terminal states.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReinforceViz Backend

Features

Technology Stack

Getting Started

Prerequisites

Installation

Running the Application

Project Structure

API Documentation

Q-Learning

Run Q-Learning Agent

Reading the Output for Q-Learning Route

Value Iteration

Run Value Iteration Agent

Reading the Output for Value Iteration Route

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
app		app
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
wsgi.py		wsgi.py

License

kimmu9512/reinforceviz-backend

Folders and files

Latest commit

History

Repository files navigation

ReinforceViz Backend

Features

Technology Stack

Getting Started

Prerequisites

Installation

Running the Application

Project Structure

API Documentation

Q-Learning

Run Q-Learning Agent

Reading the Output for Q-Learning Route

Value Iteration

Run Value Iteration Agent

Reading the Output for Value Iteration Route

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages