diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..9ab3e6f --- /dev/null +++ b/404.html @@ -0,0 +1,861 @@ + + + +
+ + + + + + + + + + + + + + + +We as members, contributors, and leaders pledge to make participation in our +community a harassment-free experience for everyone, regardless of age, body +size, visible or invisible disability, ethnicity, sex characteristics, gender +identity and expression, level of experience, education, socio-economic status, +nationality, personal appearance, race, religion, or sexual identity +and orientation.
+We pledge to act and interact in ways that contribute to an open, welcoming, +diverse, inclusive, and healthy community.
+Examples of behavior that contributes to a positive environment for our +community include:
+Examples of unacceptable behavior include:
+Community leaders are responsible for clarifying and enforcing our standards of +acceptable behavior and will take appropriate and fair corrective action in +response to any behavior that they deem inappropriate, threatening, offensive, +or harmful.
+Community leaders have the right and responsibility to remove, edit, or reject +comments, commits, code, wiki edits, issues, and other contributions that are +not aligned to this Code of Conduct, and will communicate reasons for moderation +decisions when appropriate.
+This Code of Conduct applies within all community spaces, and also applies when +an individual is officially representing the community in public spaces. +Examples of representing our community include using an official e-mail address, +posting via an official social media account, or acting as an appointed +representative at an online or offline event.
+Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported to the community leaders responsible for enforcement at +ew2664@columbia.edu. +All complaints will be reviewed and investigated promptly and fairly.
+All community leaders are obligated to respect the privacy and security of the +reporter of any incident.
+Community leaders will follow these Community Impact Guidelines in determining +the consequences for any action they deem in violation of this Code of Conduct:
+Community Impact: Use of inappropriate language or other behavior deemed +unprofessional or unwelcome in the community.
+Consequence: A private, written warning from community leaders, providing +clarity around the nature of the violation and an explanation of why the +behavior was inappropriate. A public apology may be requested.
+Community Impact: A violation through a single incident or series +of actions.
+Consequence: A warning with consequences for continued behavior. No +interaction with the people involved, including unsolicited interaction with +those enforcing the Code of Conduct, for a specified period of time. This +includes avoiding interactions in community spaces as well as external channels +like social media. Violating these terms may lead to a temporary or +permanent ban.
+Community Impact: A serious violation of community standards, including +sustained inappropriate behavior.
+Consequence: A temporary ban from any sort of interaction or public +communication with the community for a specified period of time. No public or +private interaction with the people involved, including unsolicited interaction +with those enforcing the Code of Conduct, is allowed during this period. +Violating these terms may lead to a permanent ban.
+Community Impact: Demonstrating a pattern of violation of community +standards, including sustained inappropriate behavior, harassment of an +individual, or aggression toward or disparagement of classes of individuals.
+Consequence: A permanent ban from any sort of public interaction within +the community.
+This Code of Conduct is adapted from the Contributor Covenant, +version 2.0, available at +https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
+Community Impact Guidelines were inspired by Mozilla's code of conduct +enforcement ladder.
+For answers to common questions about this code of conduct, see the FAQ at +https://www.contributor-covenant.org/faq. Translations are available at +https://www.contributor-covenant.org/translations.
+ + + + + + +We welcome and value all types of contributions, from bug reports to feature additions. Please make sure to read the relevant section(s) below before making your contribution.
+And if you like the project but just don't have time to contribute, there are other easy ways to support the project and show your appreciation. We'd love if you could:
+Once again, thank you for supporting the project and taking the time to contribute!
+Please read and follow our Code of Conduct.
+When submitting a new bug report, please first search for an existing or similar report. If you believe you've come across a unique problem, then use one of our existing issue templates. Duplicate issues or issues that don't use one of our templates may get closed without a response.
+Before contributing, make sure you have Python 3.9+ and poetry installed.
+Fork the repository on GitHub.
+Clone the repository from your GitHub.
+Setup development environment (make install
).
Setup pre-commit hooks (poetry run pre-commit install
).
Check out a new branch and make your modifications.
+Add test cases for all your changes.
+Run make lint
and make test
and ensure they pass.
Tip
+Run make format
to fix the linting errors that are auto-fixable.
Tip
+Run make coverage
to run unit tests only and generate an HTML coverage report.
Commit your changes following our commit conventions.
+Push your changes to your fork of the repository.
+Open a pull request!
+We follow conventional commits. When opening a pull request, please be sure that both the pull request title and each commit in the pull request has one of the following prefixes:
+Prefix | +Description | +SemVer | +
---|---|---|
feat: |
+a new feature | +MINOR |
+
fix: |
+a bug fix | +PATCH |
+
refactor: |
+a code change that neither fixes a bug nor adds a new feature | +PATCH |
+
docs: |
+a documentation-only change | +PATCH |
+
chore: |
+any other change that does not affect the published module (e.g. testing) | +none | +
We will walk through an example using mabby to run a classic "Bernoulli bandits" simulation.
+from mabby import BernoulliArm, Bandit, Metric, Simulation
+from mabby.strategies import BetaTSStrategy, EpsilonGreedyStrategy, UCB1Strategy
+
First, to set up our simulation, let us start by configuring our multi-armed bandit. We want to simulate a 3-armed bandit where the rewards of each arm follow Bernoulli distributions with p
of 0.5, 0.6, and 0.7 respectively.
ps = [0.5, 0.6, 0.7]
+
We create a BernoulliArm
for each arm, then create a Bandit
using the list of arms.
arms = [BernoulliArm(p) for p in ps]
+bandit = Bandit(arms=arms)
+
Because all our arms are of the same type (i.e., their rewards follow the same type of distribution), we can also use the equivalent shorthand below to create the bandit.
+bandit = BernoulliArm.bandit(p=ps)
+
Next, we need to configure the strategies we want to simulate on the bandit we just created. We will compare between three strategies:
+EpsilonGreedyStrategy
)UCB1Strategy
)BetaTSStrategy
)We create each of the strategies with the appropriate hyperparameters.
+strategy_1 = EpsilonGreedyStrategy(eps=0.2)
+strategy_2 = UCB1Strategy(alpha=0.5)
+strategy_3 = BetaTSStrategy(general=True)
+
+strategies = [strategy_1, strategy_2, strategy_3]
+
Now, we can set up a simulation and run it. We first create a Simulation
with our bandit and strategies.
simulation = Simulation(
+ bandit=bandit, strategies=strategies, names=["eps-greedy", "ucb1", "thompson"]
+)
+
Then, we run our simulation for 100 trials of 300 steps each. We also specify that we want to collect statistics on the optimality (Metric.OPTIMALITY
) and cumulative regret (Metric.CUM_REGRET
) for each of the strategies. Running the simulation outputs a SimulationStats
object holding the statistics we requested.
metrics = [Metric.OPTIMALITY, Metric.CUM_REGRET]
+stats = simulation.run(trials=100, steps=300, metrics=metrics)
+
After running our simulation, we can visualize the statistics we collected by calling various plotting methods.
+stats.plot_optimality()
+
stats.plot_regret(cumulative=True)
+
mabby is a library for simulating multi-armed bandits (MABs), a resource-allocation problem and framework in reinforcement learning. It allows users to quickly yet flexibly define and run bandit simulations, with the ability to:
+Prerequisites: Python 3.9+ and pip
Install mabby with pip
:
pip install mabby
+
The code example below demonstrates the basic steps of running a simulation with mabby. For more in-depth examples, please see the Usage Examples section of the mabby documentation.
+import mabby as mb
+
+# configure bandit arms
+bandit = mb.BernoulliArm.bandit(p=[0.3, 0.6])
+
+# configure bandit strategy
+strategy = mb.strategies.EpsilonGreedyStrategy(eps=0.2)
+
+# setup simulation
+simulation = mb.Simulation(bandit=bandit, strategies=[strategy])
+
+# run simulation
+stats = simulation.run(trials=100, steps=300)
+
+# plot regret statistics
+stats.plot_regret()
+
Please see CONTRIBUTING for more information.
+This software is licensed under the Apache 2.0 license. Please see LICENSE for more information.
+ + + + + + +Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+Definitions.
+"License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document.
+"Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License.
+"Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity.
+"You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License.
+"Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files.
+"Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types.
+"Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below).
+"Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof.
+"Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution."
+"Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work.
+Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form.
+Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed.
+Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions:
+(a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and
+(b) You must cause any modified files to carry prominent notices + stating that You changed the files; and
+(c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and
+(d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License.
+You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License.
+Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions.
+Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file.
+Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License.
+Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages.
+Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability.
+END OF TERMS AND CONDITIONS
+APPENDIX: How to apply the Apache License to your work.
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+Copyright [yyyy] [name of copyright owner]
+Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License.
+ + + + + + +Provides Agent
class for bandit simulations.
Agent(strategy, name=None)
+
+Agent in a multi-armed bandit simulation.
+An agent represents an autonomous entity in a bandit simulation. It wraps around a +specified strategy and provides an interface for each part of the decision-making +process, including making a choice then updating internal parameter estimates based +on the observed rewards from that choice.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
strategy |
+
+ Strategy
+ |
+ The bandit strategy to use. |
+ + required + | +
name |
+
+ str | None
+ |
+ An optional name for the agent. |
+
+ None
+ |
+
mabby/agent.py
28 +29 +30 +31 +32 +33 +34 +35 +36 +37 +38 |
|
Ns: NDArray[np.uint32]
+
+
+ property
+
+
+The number of times the agent has played each arm.
+The play counts are only available after the agent has been primed.
+ +Returns:
+Type | +Description | +
---|---|
+ NDArray[np.uint32]
+ |
+ An array of the play counts of each arm. |
+
Raises:
+Type | +Description | +
---|---|
+ AgentUsageError
+ |
+ If the agent has not been primed. |
+
Qs: NDArray[np.float64]
+
+
+ property
+
+
+The agent's current estimated action values (Q-values).
+The action values are only available after the agent has been primed.
+ +Returns:
+Type | +Description | +
---|---|
+ NDArray[np.float64]
+ |
+ An array of the action values of each arm. |
+
Raises:
+Type | +Description | +
---|---|
+ AgentUsageError
+ |
+ If the agent has not been primed. |
+
__repr__()
+
+Returns the agent's string representation.
+Uses the agent's name if set. Otherwise, the string representation of the +agent's strategy is used by default.
+ +mabby/agent.py
40 +41 +42 +43 +44 +45 +46 +47 +48 |
|
choose()
+
+Returns the agent's next choice based on its strategy.
+This method can only be called on a primed agent.
+ +Returns:
+Type | +Description | +
---|---|
+ int
+ |
+ The index of the arm chosen by the agent. |
+
Raises:
+Type | +Description | +
---|---|
+ AgentUsageError
+ |
+ If the agent has not been primed. |
+
mabby/agent.py
63 +64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 |
|
prime(k, steps, rng)
+
+Primes the agent before running a trial.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
k |
+
+ int
+ |
+ The number of bandit arms for the agent to choose from. |
+ + required + | +
steps |
+
+ int
+ |
+ The number of steps to the simulation will be run. |
+ + required + | +
rng |
+
+ Generator
+ |
+ A random number generator. |
+ + required + | +
mabby/agent.py
50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 |
|
update(reward)
+
+Updates the agent's internal parameter estimates.
+This method can only be called if the agent has previously made a choice, and +an update based on that choice has not already been made.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
reward |
+
+ float
+ |
+ The observed reward from the agent's most recent choice. |
+ + required + | +
Raises:
+Type | +Description | +
---|---|
+ AgentUsageError
+ |
+ If the agent has not previously made a choice. |
+
mabby/agent.py
79 +80 +81 +82 +83 +84 +85 +86 +87 +88 +89 +90 +91 +92 +93 +94 |
|
Provides Arm
base class with some common reward distributions.
Arm(**kwargs)
+
+
+ Bases: ABC
, EnforceOverrides
Base class for a bandit arm implementing a reward distribution.
+An arm represents one of the decision choices available to the agent in a bandit +problem. It has a hidden reward distribution and can be played by the agent to +generate observable rewards.
+ + +mabby/arms.py
21 +22 +23 |
|
mean: float
+
+
+ abstractmethod
+ property
+
+
+The mean reward of the arm.
+ +Returns:
+Type | +Description | +
---|---|
+ float
+ |
+ The computed mean of the arm's reward distribution. |
+
__repr__()
+
+
+ abstractmethod
+
+
+Returns the string representation of the arm.
+ +mabby/arms.py
45 +46 +47 |
|
bandit(rng=None, seed=None, **kwargs)
+
+
+ classmethod
+
+
+Creates a bandit with arms of the same reward distribution type.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
rng |
+
+ Generator | None
+ |
+ A random number generator. |
+
+ None
+ |
+
seed |
+
+ int | None
+ |
+ A seed for random number generation if |
+
+ None
+ |
+
**kwargs |
+
+ list[float]
+ |
+ A dictionary where keys are arm parameter names and values are +lists of parameter values for each arm. |
+
+ {}
+ |
+
Returns:
+Type | +Description | +
---|---|
+ Bandit
+ |
+ A bandit with the specified arms. |
+
mabby/arms.py
49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 |
|
play(rng)
+
+
+ abstractmethod
+
+
+Plays the arm and samples a reward.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
rng |
+
+ Generator
+ |
+ A random number generator. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ float
+ |
+ The sampled reward from the arm's reward distribution. |
+
mabby/arms.py
25 +26 +27 +28 +29 +30 +31 +32 +33 +34 |
|
BernoulliArm(p)
+
+
+ Bases: Arm
Bandit arm with a Bernoulli reward distribution.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
p |
+
+ float
+ |
+ Parameter of the Bernoulli distribution. |
+ + required + | +
mabby/arms.py
76 +77 +78 +79 +80 +81 +82 +83 +84 +85 +86 +87 |
|
GaussianArm(loc, scale)
+
+
+ Bases: Arm
Bandit arm with a Gaussian reward distribution.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
loc |
+
+ float
+ |
+ Mean ("center") of the Gaussian distribution. |
+ + required + | +
scale |
+
+ float
+ |
+ Standard deviation of the Gaussian distribution. |
+ + required + | +
mabby/arms.py
106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 |
|
Provides Bandit
class for bandit simulations.
Bandit(arms, rng=None, seed=None)
+
+Multi-armed bandit with one or more arms.
+This class wraps around a list of arms, each of which has a reward distribution. It +provides an interface for interacting with the arms, such as playing a specific arm, +querying for the optimal arm, and computing regret from a given choice.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
arms |
+
+ list[Arm]
+ |
+ A list of arms for the bandit. |
+ + required + | +
rng |
+
+ Generator | None
+ |
+ A random number generator. |
+
+ None
+ |
+
seed |
+
+ int | None
+ |
+ A seed for random number generation if |
+
+ None
+ |
+
mabby/bandit.py
24 +25 +26 +27 +28 +29 +30 +31 +32 +33 +34 +35 |
|
means: list[float]
+
+
+ property
+
+
+__getitem__(i)
+
+Returns an arm by index.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
i |
+
+ int
+ |
+ The index of the arm to get. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ Arm
+ |
+ The arm at the given index. |
+
mabby/bandit.py
45 +46 +47 +48 +49 +50 +51 +52 +53 +54 |
|
__iter__()
+
+Returns an iterator over the bandit's arms.
+ +mabby/bandit.py
56 +57 +58 |
|
__len__()
+
+Returns the number of arms.
+ +mabby/bandit.py
37 +38 +39 |
|
__repr__()
+
+Returns a string representation of the bandit.
+ +mabby/bandit.py
41 +42 +43 |
|
best_arm()
+
+Returns the index of the optimal arm.
+The optimal arm is the arm with the greatest expected reward. If there are +multiple arms with equal expected rewards, a random one is chosen.
+ +Returns:
+Type | +Description | +
---|---|
+ int
+ |
+ The index of the optimal arm. |
+
mabby/bandit.py
80 +81 +82 +83 +84 +85 +86 +87 +88 +89 |
|
is_opt(choice)
+
+Returns the optimality of a given choice.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
choice |
+
+ int
+ |
+ The index of the chosen arm. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ bool
+ |
+
|
+
mabby/bandit.py
91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 |
|
play(i)
+
+Plays an arm by index.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
i |
+
+ int
+ |
+ The index of the arm to play. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ float
+ |
+ The reward from playing the arm. |
+
mabby/bandit.py
60 +61 +62 +63 +64 +65 +66 +67 +68 +69 |
|
regret(choice)
+
+Returns the regret from a given choice.
+The regret is computed as the difference between the expected reward from the +optimal arm and the expected reward from the chosen arm.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
choice |
+
+ int
+ |
+ The index of the chosen arm. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ float
+ |
+ The computed regret value. |
+
mabby/bandit.py
102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 |
|
Provides exceptions for mabby usage.
+ + + +AgentUsageError
+
+
+
+ Bases: Exception
Raised when agent methods are used incorrectly.
+ + + +SimulationUsageError
+
+
+
+ Bases: Exception
Raised when simulation methods are used incorrectly.
+ + + +StatsUsageError
+
+
+
+ Bases: Exception
Raised when stats methods are used incorrectly.
+ + + +StrategyUsageError
+
+
+
+ Bases: Exception
Raised when strategy methods are used incorrectly.
+ + + +A multi-armed bandit (MAB) simulation library.
+mabby is a library for simulating multi-armed bandits (MABs), a resource-allocation +problem and framework in reinforcement learning. It allows users to quickly yet flexibly +define and run bandit simulations, with the ability to:
+Agent(strategy, name=None)
+
+Agent in a multi-armed bandit simulation.
+An agent represents an autonomous entity in a bandit simulation. It wraps around a +specified strategy and provides an interface for each part of the decision-making +process, including making a choice then updating internal parameter estimates based +on the observed rewards from that choice.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
strategy |
+
+ Strategy
+ |
+ The bandit strategy to use. |
+ + required + | +
name |
+
+ str | None
+ |
+ An optional name for the agent. |
+
+ None
+ |
+
mabby/agent.py
28 +29 +30 +31 +32 +33 +34 +35 +36 +37 +38 |
|
Ns: NDArray[np.uint32]
+
+
+ property
+
+
+The number of times the agent has played each arm.
+The play counts are only available after the agent has been primed.
+ +Returns:
+Type | +Description | +
---|---|
+ NDArray[np.uint32]
+ |
+ An array of the play counts of each arm. |
+
Raises:
+Type | +Description | +
---|---|
+ AgentUsageError
+ |
+ If the agent has not been primed. |
+
Qs: NDArray[np.float64]
+
+
+ property
+
+
+The agent's current estimated action values (Q-values).
+The action values are only available after the agent has been primed.
+ +Returns:
+Type | +Description | +
---|---|
+ NDArray[np.float64]
+ |
+ An array of the action values of each arm. |
+
Raises:
+Type | +Description | +
---|---|
+ AgentUsageError
+ |
+ If the agent has not been primed. |
+
__repr__()
+
+Returns the agent's string representation.
+Uses the agent's name if set. Otherwise, the string representation of the +agent's strategy is used by default.
+ +mabby/agent.py
40 +41 +42 +43 +44 +45 +46 +47 +48 |
|
choose()
+
+Returns the agent's next choice based on its strategy.
+This method can only be called on a primed agent.
+ +Returns:
+Type | +Description | +
---|---|
+ int
+ |
+ The index of the arm chosen by the agent. |
+
Raises:
+Type | +Description | +
---|---|
+ AgentUsageError
+ |
+ If the agent has not been primed. |
+
mabby/agent.py
63 +64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 |
|
prime(k, steps, rng)
+
+Primes the agent before running a trial.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
k |
+
+ int
+ |
+ The number of bandit arms for the agent to choose from. |
+ + required + | +
steps |
+
+ int
+ |
+ The number of steps to the simulation will be run. |
+ + required + | +
rng |
+
+ Generator
+ |
+ A random number generator. |
+ + required + | +
mabby/agent.py
50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 |
|
update(reward)
+
+Updates the agent's internal parameter estimates.
+This method can only be called if the agent has previously made a choice, and +an update based on that choice has not already been made.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
reward |
+
+ float
+ |
+ The observed reward from the agent's most recent choice. |
+ + required + | +
Raises:
+Type | +Description | +
---|---|
+ AgentUsageError
+ |
+ If the agent has not previously made a choice. |
+
mabby/agent.py
79 +80 +81 +82 +83 +84 +85 +86 +87 +88 +89 +90 +91 +92 +93 +94 |
|
Arm(**kwargs)
+
+
+ Bases: ABC
, EnforceOverrides
Base class for a bandit arm implementing a reward distribution.
+An arm represents one of the decision choices available to the agent in a bandit +problem. It has a hidden reward distribution and can be played by the agent to +generate observable rewards.
+ + +mabby/arms.py
21 +22 +23 |
|
mean: float
+
+
+ abstractmethod
+ property
+
+
+The mean reward of the arm.
+ +Returns:
+Type | +Description | +
---|---|
+ float
+ |
+ The computed mean of the arm's reward distribution. |
+
__repr__()
+
+
+ abstractmethod
+
+
+Returns the string representation of the arm.
+ +mabby/arms.py
45 +46 +47 |
|
bandit(rng=None, seed=None, **kwargs)
+
+
+ classmethod
+
+
+Creates a bandit with arms of the same reward distribution type.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
rng |
+
+ Generator | None
+ |
+ A random number generator. |
+
+ None
+ |
+
seed |
+
+ int | None
+ |
+ A seed for random number generation if |
+
+ None
+ |
+
**kwargs |
+
+ list[float]
+ |
+ A dictionary where keys are arm parameter names and values are +lists of parameter values for each arm. |
+
+ {}
+ |
+
Returns:
+Type | +Description | +
---|---|
+ Bandit
+ |
+ A bandit with the specified arms. |
+
mabby/arms.py
49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 |
|
play(rng)
+
+
+ abstractmethod
+
+
+Plays the arm and samples a reward.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
rng |
+
+ Generator
+ |
+ A random number generator. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ float
+ |
+ The sampled reward from the arm's reward distribution. |
+
mabby/arms.py
25 +26 +27 +28 +29 +30 +31 +32 +33 +34 |
|
Bandit(arms, rng=None, seed=None)
+
+Multi-armed bandit with one or more arms.
+This class wraps around a list of arms, each of which has a reward distribution. It +provides an interface for interacting with the arms, such as playing a specific arm, +querying for the optimal arm, and computing regret from a given choice.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
arms |
+
+ list[Arm]
+ |
+ A list of arms for the bandit. |
+ + required + | +
rng |
+
+ Generator | None
+ |
+ A random number generator. |
+
+ None
+ |
+
seed |
+
+ int | None
+ |
+ A seed for random number generation if |
+
+ None
+ |
+
mabby/bandit.py
24 +25 +26 +27 +28 +29 +30 +31 +32 +33 +34 +35 |
|
means: list[float]
+
+
+ property
+
+
+__getitem__(i)
+
+Returns an arm by index.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
i |
+
+ int
+ |
+ The index of the arm to get. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ Arm
+ |
+ The arm at the given index. |
+
mabby/bandit.py
45 +46 +47 +48 +49 +50 +51 +52 +53 +54 |
|
__iter__()
+
+Returns an iterator over the bandit's arms.
+ +mabby/bandit.py
56 +57 +58 |
|
__len__()
+
+Returns the number of arms.
+ +mabby/bandit.py
37 +38 +39 |
|
__repr__()
+
+Returns a string representation of the bandit.
+ +mabby/bandit.py
41 +42 +43 |
|
best_arm()
+
+Returns the index of the optimal arm.
+The optimal arm is the arm with the greatest expected reward. If there are +multiple arms with equal expected rewards, a random one is chosen.
+ +Returns:
+Type | +Description | +
---|---|
+ int
+ |
+ The index of the optimal arm. |
+
mabby/bandit.py
80 +81 +82 +83 +84 +85 +86 +87 +88 +89 |
|
is_opt(choice)
+
+Returns the optimality of a given choice.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
choice |
+
+ int
+ |
+ The index of the chosen arm. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ bool
+ |
+
|
+
mabby/bandit.py
91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 |
|
play(i)
+
+Plays an arm by index.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
i |
+
+ int
+ |
+ The index of the arm to play. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ float
+ |
+ The reward from playing the arm. |
+
mabby/bandit.py
60 +61 +62 +63 +64 +65 +66 +67 +68 +69 |
|
regret(choice)
+
+Returns the regret from a given choice.
+The regret is computed as the difference between the expected reward from the +optimal arm and the expected reward from the chosen arm.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
choice |
+
+ int
+ |
+ The index of the chosen arm. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ float
+ |
+ The computed regret value. |
+
mabby/bandit.py
102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 |
|
BernoulliArm(p)
+
+
+ Bases: Arm
Bandit arm with a Bernoulli reward distribution.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
p |
+
+ float
+ |
+ Parameter of the Bernoulli distribution. |
+ + required + | +
mabby/arms.py
76 +77 +78 +79 +80 +81 +82 +83 +84 +85 +86 +87 |
|
GaussianArm(loc, scale)
+
+
+ Bases: Arm
Bandit arm with a Gaussian reward distribution.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
loc |
+
+ float
+ |
+ Mean ("center") of the Gaussian distribution. |
+ + required + | +
scale |
+
+ float
+ |
+ Standard deviation of the Gaussian distribution. |
+ + required + | +
mabby/arms.py
106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 |
|
Metric(label, base=None, transform=None)
+
+
+ Bases: Enum
Enum for metrics that simulations can track.
+ + +Metrics can be derived from other metrics through specifying a base
metric
+and a transform
function. This is useful for things like defining cumulative
+versions of an existing metric, where the transformed values can be computed
+"lazily" instead of being redundantly stored.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
label |
+
+ str
+ |
+ Verbose name of the metric (title case) |
+ + required + | +
base |
+
+ str | None
+ |
+ Name of the base metric |
+
+ None
+ |
+
transform |
+
+ Callable[[NDArray[np.float64]], NDArray[np.float64]] | None
+ |
+ Transformation function from the base metric |
+
+ None
+ |
+
mabby/stats.py
44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 |
|
base: Metric
+
+
+ property
+
+
+The base metric that the metric is transformed from.
+If the metric is already a base metric, the metric itself is returned.
+__repr__()
+
+Returns the verbose name of the metric.
+ +mabby/stats.py
70 +71 +72 |
|
is_base()
+
+Returns whether the metric is a base metric.
+ +Returns:
+Type | +Description | +
---|---|
+ bool
+ |
+
|
+
mabby/stats.py
74 +75 +76 +77 +78 +79 +80 |
|
map_to_base(metrics)
+
+
+ classmethod
+
+
+Traces all metrics back to their base metrics.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
metrics |
+
+ Iterable[Metric]
+ |
+ A collection of metrics. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ Iterable[Metric]
+ |
+ A set containing the base metrics of all the input metrics. |
+
mabby/stats.py
92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 |
|
transform(values)
+
+Transforms values from the base metric.
+If the metric is already a base metric, the input values are returned.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
values |
+
+ NDArray[np.float64]
+ |
+ An array of input values for the base metric. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ NDArray[np.float64]
+ |
+ An array of transformed values for the metric. |
+
mabby/stats.py
104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 |
|
Simulation(bandit, agents=None, strategies=None, names=None, rng=None, seed=None)
+
+Simulation of a multi-armed bandit problem.
+A simulation consists of multiple trials of one or more bandit strategies run on a +configured multi-armed bandit.
+ + +One of agents
or strategies
must be supplied. If agents
is supplied,
+strategies
and names
are ignored. Otherwise, an agent
is created for
+each strategy
and given a name from names
if available.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
bandit |
+
+ Bandit
+ |
+ A configured multi-armed bandit to simulate on. |
+ + required + | +
agents |
+
+ Iterable[Agent] | None
+ |
+ A list of agents to simulate. |
+
+ None
+ |
+
strategies |
+
+ Iterable[Strategy] | None
+ |
+ A list of strategies to simulate. |
+
+ None
+ |
+
names |
+
+ Iterable[str] | None
+ |
+ A list of names for agents. |
+
+ None
+ |
+
rng |
+
+ Generator | None
+ |
+ A random number generator. |
+
+ None
+ |
+
seed |
+
+ int | None
+ |
+ A seed for random number generation if |
+
+ None
+ |
+
Raises:
+Type | +Description | +
---|---|
+ SimulationUsageError
+ |
+ If neither |
+
mabby/simulation.py
28 +29 +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 |
|
run(trials, steps, metrics=None)
+
+Runs a simulation.
+In a simulation run, each agent or strategy is run for the specified number of +trials, and each trial is run for the given number of steps.
+If metrics
is not specified, all available metrics are tracked by default.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
trials |
+
+ int
+ |
+ The number of trials in the simulation. |
+ + required + | +
steps |
+
+ int
+ |
+ The number of steps in a trial. |
+ + required + | +
metrics |
+
+ Iterable[Metric] | None
+ |
+ A list of metrics to collect. |
+
+ None
+ |
+
Returns:
+Type | +Description | +
---|---|
+ SimulationStats
+ |
+ A |
+
mabby/simulation.py
80 + 81 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 |
|
SimulationStats(simulation)
+
+Statistics for a multi-armed bandit simulation.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
simulation |
+
+ Simulation
+ |
+ The simulation to track. |
+ + required + | +
mabby/stats.py
123 +124 +125 +126 +127 +128 +129 +130 |
|
__contains__(agent)
+
+Returns if an agent's statistics are present.
+ +Returns:
+Type | +Description | +
---|---|
+ bool
+ |
+
|
+
mabby/stats.py
162 +163 +164 +165 +166 +167 +168 |
|
__getitem__(agent)
+
+Gets statistics for an agent.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
agent |
+
+ Agent
+ |
+ The agent to get the statistics of. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ AgentStats
+ |
+ The statistics of the agent. |
+
mabby/stats.py
140 +141 +142 +143 +144 +145 +146 +147 +148 +149 |
|
__setitem__(agent, agent_stats)
+
+Sets the statistics for an agent.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
agent |
+
+ Agent
+ |
+ The agent to set the statistics of. |
+ + required + | +
agent_stats |
+
+ AgentStats
+ |
+ The agent statistics to set. |
+ + required + | +
mabby/stats.py
151 +152 +153 +154 +155 +156 +157 +158 +159 +160 |
|
add(agent_stats)
+
+Adds statistics for an agent.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
agent_stats |
+
+ AgentStats
+ |
+ The agent statistics to add. |
+ + required + | +
mabby/stats.py
132 +133 +134 +135 +136 +137 +138 |
|
plot(metric)
+
+Generates a plot for a simulation metric.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
metric |
+
+ Metric
+ |
+ The metric to plot. |
+ + required + | +
mabby/stats.py
170 +171 +172 +173 +174 +175 +176 +177 +178 +179 |
|
plot_optimality()
+
+Generates a plot for the optimality metric.
+ +mabby/stats.py
189 +190 +191 |
|
plot_regret(cumulative=True)
+
+Generates a plot for the regret or cumulative regret metrics.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
cumulative |
+
+ bool
+ |
+ Whether to use the cumulative regret. |
+
+ True
+ |
+
mabby/stats.py
181 +182 +183 +184 +185 +186 +187 |
|
plot_rewards(cumulative=True)
+
+Generates a plot for the rewards or cumulative rewards metrics.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
cumulative |
+
+ bool
+ |
+ Whether to use the cumulative rewards. |
+
+ True
+ |
+
mabby/stats.py
193 +194 +195 +196 +197 +198 +199 |
|
Strategy()
+
+
+ Bases: ABC
, EnforceOverrides
Base class for a bandit strategy.
+A strategy provides the computational logic for choosing which bandit arms to play +and updating parameter estimates.
+ + +mabby/strategies/strategy.py
22 +23 +24 |
|
Ns: NDArray[np.uint32]
+
+
+ abstractmethod
+ property
+
+
+The number of times each arm has been played.
+Qs: NDArray[np.float64]
+
+
+ abstractmethod
+ property
+
+
+The current estimated action values for each arm.
+__repr__()
+
+
+ abstractmethod
+
+
+Returns a string representation of the strategy.
+ +mabby/strategies/strategy.py
26 +27 +28 |
|
agent(**kwargs)
+
+Creates an agent following the strategy.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
**kwargs |
+
+ str
+ |
+ Parameters for initializing the agent (see
+ |
+
+ {}
+ |
+
Returns:
+Type | +Description | +
---|---|
+ Agent
+ |
+ The created agent with the strategy. |
+
mabby/strategies/strategy.py
70 +71 +72 +73 +74 +75 +76 +77 +78 +79 +80 |
|
choose(rng)
+
+
+ abstractmethod
+
+
+Returns the next arm to play.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
rng |
+
+ Generator
+ |
+ A random number generator. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ int
+ |
+ The index of the arm to play. |
+
mabby/strategies/strategy.py
39 +40 +41 +42 +43 +44 +45 +46 +47 +48 |
|
prime(k, steps)
+
+
+ abstractmethod
+
+
+Primes the strategy before running a trial.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
k |
+
+ int
+ |
+ The number of bandit arms to choose from. |
+ + required + | +
steps |
+
+ int
+ |
+ The number of steps to the simulation will be run. |
+ + required + | +
mabby/strategies/strategy.py
30 +31 +32 +33 +34 +35 +36 +37 |
|
update(choice, reward, rng=None)
+
+
+ abstractmethod
+
+
+Updates internal parameter estimates based on reward observation.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
choice |
+
+ int
+ |
+ The most recent choice made. |
+ + required + | +
reward |
+
+ float
+ |
+ The observed reward from the agent's most recent choice. |
+ + required + | +
rng |
+
+ Generator | None
+ |
+ A random number generator. |
+
+ None
+ |
+
mabby/strategies/strategy.py
50 +51 +52 +53 +54 +55 +56 +57 +58 |
|
Provides Simulation
class for bandit simulations.
Simulation(bandit, agents=None, strategies=None, names=None, rng=None, seed=None)
+
+Simulation of a multi-armed bandit problem.
+A simulation consists of multiple trials of one or more bandit strategies run on a +configured multi-armed bandit.
+ + +One of agents
or strategies
must be supplied. If agents
is supplied,
+strategies
and names
are ignored. Otherwise, an agent
is created for
+each strategy
and given a name from names
if available.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
bandit |
+
+ Bandit
+ |
+ A configured multi-armed bandit to simulate on. |
+ + required + | +
agents |
+
+ Iterable[Agent] | None
+ |
+ A list of agents to simulate. |
+
+ None
+ |
+
strategies |
+
+ Iterable[Strategy] | None
+ |
+ A list of strategies to simulate. |
+
+ None
+ |
+
names |
+
+ Iterable[str] | None
+ |
+ A list of names for agents. |
+
+ None
+ |
+
rng |
+
+ Generator | None
+ |
+ A random number generator. |
+
+ None
+ |
+
seed |
+
+ int | None
+ |
+ A seed for random number generation if |
+
+ None
+ |
+
Raises:
+Type | +Description | +
---|---|
+ SimulationUsageError
+ |
+ If neither |
+
mabby/simulation.py
28 +29 +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 |
|
run(trials, steps, metrics=None)
+
+Runs a simulation.
+In a simulation run, each agent or strategy is run for the specified number of +trials, and each trial is run for the given number of steps.
+If metrics
is not specified, all available metrics are tracked by default.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
trials |
+
+ int
+ |
+ The number of trials in the simulation. |
+ + required + | +
steps |
+
+ int
+ |
+ The number of steps in a trial. |
+ + required + | +
metrics |
+
+ Iterable[Metric] | None
+ |
+ A list of metrics to collect. |
+
+ None
+ |
+
Returns:
+Type | +Description | +
---|---|
+ SimulationStats
+ |
+ A |
+
mabby/simulation.py
80 + 81 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 |
|
Provides metric tracking for multi-armed bandit simulations.
+ + + +AgentStats(agent, bandit, steps, metrics=None)
+
+Statistics for an agent in a multi-armed bandit simulation.
+ + +All available metrics are tracked by default. Alternatively, a specific list can
+be specified through the metrics
argument.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
agent |
+
+ Agent
+ |
+ The agent that statistics are tracked for |
+ + required + | +
bandit |
+
+ Bandit
+ |
+ The bandit of the simulation being run |
+ + required + | +
steps |
+
+ int
+ |
+ The number of steps per trial in the simulation |
+ + required + | +
metrics |
+
+ Iterable[Metric] | None
+ |
+ A collection of metrics to track. |
+
+ None
+ |
+
mabby/stats.py
205 +206 +207 +208 +209 +210 +211 +212 +213 +214 +215 +216 +217 +218 +219 +220 +221 +222 +223 +224 +225 +226 +227 +228 +229 |
|
__getitem__(metric)
+
+Gets values for a metric.
+If the metric is not a base metric, the values are automatically transformed.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
metric |
+
+ Metric
+ |
+ The metric to get the values for. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ NDArray[np.float64]
+ |
+ An array of values for the metric. |
+
mabby/stats.py
235 +236 +237 +238 +239 +240 +241 +242 +243 +244 +245 +246 +247 +248 |
|
__len__()
+
+Returns the number of steps each trial is tracked for.
+ +mabby/stats.py
231 +232 +233 |
|
update(step, choice, reward)
+
+Updates metric values for the latest simulation step.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
step |
+
+ int
+ |
+ The number of the step. |
+ + required + | +
choice |
+
+ int
+ |
+ The choice made by the agent. |
+ + required + | +
reward |
+
+ float
+ |
+ The reward observed by the agent. |
+ + required + | +
mabby/stats.py
250 +251 +252 +253 +254 +255 +256 +257 +258 +259 +260 +261 +262 +263 +264 +265 |
|
Metric(label, base=None, transform=None)
+
+
+ Bases: Enum
Enum for metrics that simulations can track.
+ + +Metrics can be derived from other metrics through specifying a base
metric
+and a transform
function. This is useful for things like defining cumulative
+versions of an existing metric, where the transformed values can be computed
+"lazily" instead of being redundantly stored.
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
label |
+
+ str
+ |
+ Verbose name of the metric (title case) |
+ + required + | +
base |
+
+ str | None
+ |
+ Name of the base metric |
+
+ None
+ |
+
transform |
+
+ Callable[[NDArray[np.float64]], NDArray[np.float64]] | None
+ |
+ Transformation function from the base metric |
+
+ None
+ |
+
mabby/stats.py
44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 |
|
base: Metric
+
+
+ property
+
+
+The base metric that the metric is transformed from.
+If the metric is already a base metric, the metric itself is returned.
+__repr__()
+
+Returns the verbose name of the metric.
+ +mabby/stats.py
70 +71 +72 |
|
is_base()
+
+Returns whether the metric is a base metric.
+ +Returns:
+Type | +Description | +
---|---|
+ bool
+ |
+
|
+
mabby/stats.py
74 +75 +76 +77 +78 +79 +80 |
|
map_to_base(metrics)
+
+
+ classmethod
+
+
+Traces all metrics back to their base metrics.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
metrics |
+
+ Iterable[Metric]
+ |
+ A collection of metrics. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ Iterable[Metric]
+ |
+ A set containing the base metrics of all the input metrics. |
+
mabby/stats.py
92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 |
|
transform(values)
+
+Transforms values from the base metric.
+If the metric is already a base metric, the input values are returned.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
values |
+
+ NDArray[np.float64]
+ |
+ An array of input values for the base metric. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ NDArray[np.float64]
+ |
+ An array of transformed values for the metric. |
+
mabby/stats.py
104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 |
|
MetricMapping
+
+
+
+ dataclass
+
+
+Transformation from a base metric.
+See Metric
for examples of metric mappings.
SimulationStats(simulation)
+
+Statistics for a multi-armed bandit simulation.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
simulation |
+
+ Simulation
+ |
+ The simulation to track. |
+ + required + | +
mabby/stats.py
123 +124 +125 +126 +127 +128 +129 +130 |
|
__contains__(agent)
+
+Returns if an agent's statistics are present.
+ +Returns:
+Type | +Description | +
---|---|
+ bool
+ |
+
|
+
mabby/stats.py
162 +163 +164 +165 +166 +167 +168 |
|
__getitem__(agent)
+
+Gets statistics for an agent.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
agent |
+
+ Agent
+ |
+ The agent to get the statistics of. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ AgentStats
+ |
+ The statistics of the agent. |
+
mabby/stats.py
140 +141 +142 +143 +144 +145 +146 +147 +148 +149 |
|
__setitem__(agent, agent_stats)
+
+Sets the statistics for an agent.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
agent |
+
+ Agent
+ |
+ The agent to set the statistics of. |
+ + required + | +
agent_stats |
+
+ AgentStats
+ |
+ The agent statistics to set. |
+ + required + | +
mabby/stats.py
151 +152 +153 +154 +155 +156 +157 +158 +159 +160 |
|
add(agent_stats)
+
+Adds statistics for an agent.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
agent_stats |
+
+ AgentStats
+ |
+ The agent statistics to add. |
+ + required + | +
mabby/stats.py
132 +133 +134 +135 +136 +137 +138 |
|
plot(metric)
+
+Generates a plot for a simulation metric.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
metric |
+
+ Metric
+ |
+ The metric to plot. |
+ + required + | +
mabby/stats.py
170 +171 +172 +173 +174 +175 +176 +177 +178 +179 |
|
plot_optimality()
+
+Generates a plot for the optimality metric.
+ +mabby/stats.py
189 +190 +191 |
|
plot_regret(cumulative=True)
+
+Generates a plot for the regret or cumulative regret metrics.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
cumulative |
+
+ bool
+ |
+ Whether to use the cumulative regret. |
+
+ True
+ |
+
mabby/stats.py
181 +182 +183 +184 +185 +186 +187 |
|
plot_rewards(cumulative=True)
+
+Generates a plot for the rewards or cumulative rewards metrics.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
cumulative |
+
+ bool
+ |
+ Whether to use the cumulative rewards. |
+
+ True
+ |
+
mabby/stats.py
193 +194 +195 +196 +197 +198 +199 |
|
Multi-armed bandit strategies.
+mabby provides a collection of preset bandit strategies that can be plugged into
+simulations. The Strategy
abstract base class
+can also be sub-classed to implement custom bandit strategies.
BetaTSStrategy(general=False)
+
+
+ Bases: Strategy
Thompson sampling strategy with Beta priors.
+ + +If general
is False
, rewards used for updates must be either 0 or 1.
+Otherwise, rewards must be with support [0, 1].
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
general |
+
+ bool
+ |
+ Whether to use a generalized version of the strategy. |
+
+ False
+ |
+
mabby/strategies/thompson.py
21 +22 +23 +24 +25 +26 +27 +28 +29 +30 |
|
EpsilonFirstStrategy(eps)
+
+
+ Bases: SemiUniformStrategy
Epsilon-first bandit strategy.
+The epsilon-first strategy has a pure exploration phase followed by a pure +exploitation phase.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
eps |
+
+ float
+ |
+ The ratio of exploration steps (must be between 0 and 1). |
+ + required + | +
mabby/strategies/semi_uniform.py
132 +133 +134 +135 +136 +137 +138 +139 +140 +141 |
|
EpsilonGreedyStrategy(eps)
+
+
+ Bases: SemiUniformStrategy
Epsilon-greedy bandit strategy.
+The epsilon-greedy strategy has a fixed chance of exploration every time step.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
eps |
+
+ float
+ |
+ The chance of exploration (must be between 0 and 1). |
+ + required + | +
mabby/strategies/semi_uniform.py
103 +104 +105 +106 +107 +108 +109 +110 +111 +112 |
|
RandomStrategy()
+
+
+ Bases: SemiUniformStrategy
Random bandit strategy.
+The random strategy chooses arms at random, i.e., it explores with 100% chance.
+ + +mabby/strategies/semi_uniform.py
84 +85 +86 |
|
SemiUniformStrategy()
+
+
+ Bases: Strategy
, ABC
, EnforceOverrides
Base class for semi-uniform bandit strategies.
+Every semi-uniform strategy must implement
+effective_eps
+to compute the chance of exploration at each time step.
mabby/strategies/semi_uniform.py
33 +34 |
|
effective_eps()
+
+
+ abstractmethod
+
+
+Returns the effective epsilon value.
+The effective epsilon value is the probability at the current time step that the +bandit will explore rather than exploit. Depending on the strategy, the +effective epsilon value may be different from the nominal epsilon value set.
+ +mabby/strategies/semi_uniform.py
68 +69 +70 +71 +72 +73 +74 +75 |
|
Strategy()
+
+
+ Bases: ABC
, EnforceOverrides
Base class for a bandit strategy.
+A strategy provides the computational logic for choosing which bandit arms to play +and updating parameter estimates.
+ + +mabby/strategies/strategy.py
22 +23 +24 |
|
Ns: NDArray[np.uint32]
+
+
+ abstractmethod
+ property
+
+
+The number of times each arm has been played.
+Qs: NDArray[np.float64]
+
+
+ abstractmethod
+ property
+
+
+The current estimated action values for each arm.
+__repr__()
+
+
+ abstractmethod
+
+
+Returns a string representation of the strategy.
+ +mabby/strategies/strategy.py
26 +27 +28 |
|
agent(**kwargs)
+
+Creates an agent following the strategy.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
**kwargs |
+
+ str
+ |
+ Parameters for initializing the agent (see
+ |
+
+ {}
+ |
+
Returns:
+Type | +Description | +
---|---|
+ Agent
+ |
+ The created agent with the strategy. |
+
mabby/strategies/strategy.py
70 +71 +72 +73 +74 +75 +76 +77 +78 +79 +80 |
|
choose(rng)
+
+
+ abstractmethod
+
+
+Returns the next arm to play.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
rng |
+
+ Generator
+ |
+ A random number generator. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ int
+ |
+ The index of the arm to play. |
+
mabby/strategies/strategy.py
39 +40 +41 +42 +43 +44 +45 +46 +47 +48 |
|
prime(k, steps)
+
+
+ abstractmethod
+
+
+Primes the strategy before running a trial.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
k |
+
+ int
+ |
+ The number of bandit arms to choose from. |
+ + required + | +
steps |
+
+ int
+ |
+ The number of steps to the simulation will be run. |
+ + required + | +
mabby/strategies/strategy.py
30 +31 +32 +33 +34 +35 +36 +37 |
|
update(choice, reward, rng=None)
+
+
+ abstractmethod
+
+
+Updates internal parameter estimates based on reward observation.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
choice |
+
+ int
+ |
+ The most recent choice made. |
+ + required + | +
reward |
+
+ float
+ |
+ The observed reward from the agent's most recent choice. |
+ + required + | +
rng |
+
+ Generator | None
+ |
+ A random number generator. |
+
+ None
+ |
+
mabby/strategies/strategy.py
50 +51 +52 +53 +54 +55 +56 +57 +58 |
|
UCB1Strategy(alpha)
+
+
+ Bases: Strategy
Strategy using the UCB1 bandit algorithm.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
alpha |
+
+ float
+ |
+ The exploration parameter. |
+ + required + | +
mabby/strategies/ucb.py
20 +21 +22 +23 +24 +25 +26 +27 +28 |
|
Provides implementations of semi-uniform bandit strategies.
+Semi-uniform strategies will choose to explore or exploit at each time step. When
+exploring, a random arm will be played. When exploiting, the arm with the greatest
+estimated action value will be played. epsilon
, the chance of exploration, is
+computed differently with different semi-uniform strategies.
EpsilonFirstStrategy(eps)
+
+
+ Bases: SemiUniformStrategy
Epsilon-first bandit strategy.
+The epsilon-first strategy has a pure exploration phase followed by a pure +exploitation phase.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
eps |
+
+ float
+ |
+ The ratio of exploration steps (must be between 0 and 1). |
+ + required + | +
mabby/strategies/semi_uniform.py
132 +133 +134 +135 +136 +137 +138 +139 +140 +141 |
|
EpsilonGreedyStrategy(eps)
+
+
+ Bases: SemiUniformStrategy
Epsilon-greedy bandit strategy.
+The epsilon-greedy strategy has a fixed chance of exploration every time step.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
eps |
+
+ float
+ |
+ The chance of exploration (must be between 0 and 1). |
+ + required + | +
mabby/strategies/semi_uniform.py
103 +104 +105 +106 +107 +108 +109 +110 +111 +112 |
|
RandomStrategy()
+
+
+ Bases: SemiUniformStrategy
Random bandit strategy.
+The random strategy chooses arms at random, i.e., it explores with 100% chance.
+ + +mabby/strategies/semi_uniform.py
84 +85 +86 |
|
SemiUniformStrategy()
+
+
+ Bases: Strategy
, ABC
, EnforceOverrides
Base class for semi-uniform bandit strategies.
+Every semi-uniform strategy must implement
+effective_eps
+to compute the chance of exploration at each time step.
mabby/strategies/semi_uniform.py
33 +34 |
|
effective_eps()
+
+
+ abstractmethod
+
+
+Returns the effective epsilon value.
+The effective epsilon value is the probability at the current time step that the +bandit will explore rather than exploit. Depending on the strategy, the +effective epsilon value may be different from the nominal epsilon value set.
+ +mabby/strategies/semi_uniform.py
68 +69 +70 +71 +72 +73 +74 +75 |
|
Provides Strategy
class.
Strategy()
+
+
+ Bases: ABC
, EnforceOverrides
Base class for a bandit strategy.
+A strategy provides the computational logic for choosing which bandit arms to play +and updating parameter estimates.
+ + +mabby/strategies/strategy.py
22 +23 +24 |
|
Ns: NDArray[np.uint32]
+
+
+ abstractmethod
+ property
+
+
+The number of times each arm has been played.
+Qs: NDArray[np.float64]
+
+
+ abstractmethod
+ property
+
+
+The current estimated action values for each arm.
+__repr__()
+
+
+ abstractmethod
+
+
+Returns a string representation of the strategy.
+ +mabby/strategies/strategy.py
26 +27 +28 |
|
agent(**kwargs)
+
+Creates an agent following the strategy.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
**kwargs |
+
+ str
+ |
+ Parameters for initializing the agent (see
+ |
+
+ {}
+ |
+
Returns:
+Type | +Description | +
---|---|
+ Agent
+ |
+ The created agent with the strategy. |
+
mabby/strategies/strategy.py
70 +71 +72 +73 +74 +75 +76 +77 +78 +79 +80 |
|
choose(rng)
+
+
+ abstractmethod
+
+
+Returns the next arm to play.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
rng |
+
+ Generator
+ |
+ A random number generator. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ int
+ |
+ The index of the arm to play. |
+
mabby/strategies/strategy.py
39 +40 +41 +42 +43 +44 +45 +46 +47 +48 |
|
prime(k, steps)
+
+
+ abstractmethod
+
+
+Primes the strategy before running a trial.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
k |
+
+ int
+ |
+ The number of bandit arms to choose from. |
+ + required + | +
steps |
+
+ int
+ |
+ The number of steps to the simulation will be run. |
+ + required + | +
mabby/strategies/strategy.py
30 +31 +32 +33 +34 +35 +36 +37 |
|
update(choice, reward, rng=None)
+
+
+ abstractmethod
+
+
+Updates internal parameter estimates based on reward observation.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
choice |
+
+ int
+ |
+ The most recent choice made. |
+ + required + | +
reward |
+
+ float
+ |
+ The observed reward from the agent's most recent choice. |
+ + required + | +
rng |
+
+ Generator | None
+ |
+ A random number generator. |
+
+ None
+ |
+
mabby/strategies/strategy.py
50 +51 +52 +53 +54 +55 +56 +57 +58 |
|
Provides implementations of Thompson sampling strategies.
+ + + +BetaTSStrategy(general=False)
+
+
+ Bases: Strategy
Thompson sampling strategy with Beta priors.
+ + +If general
is False
, rewards used for updates must be either 0 or 1.
+Otherwise, rewards must be with support [0, 1].
Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
general |
+
+ bool
+ |
+ Whether to use a generalized version of the strategy. |
+
+ False
+ |
+
mabby/strategies/thompson.py
21 +22 +23 +24 +25 +26 +27 +28 +29 +30 |
|
Provides implementations of upper confidence bound (UCB) strategies.
+ + + +UCB1Strategy(alpha)
+
+
+ Bases: Strategy
Strategy using the UCB1 bandit algorithm.
+ + + +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
alpha |
+
+ float
+ |
+ The exploration parameter. |
+ + required + | +
mabby/strategies/ucb.py
20 +21 +22 +23 +24 +25 +26 +27 +28 |
|
Provides commonly used utility functions.
+ + + +random_argmax(values, rng)
+
+Computes random argmax of an array.
+If there are multiple maximums, the index of one is chosen at random.
+ +Parameters:
+Name | +Type | +Description | +Default | +
---|---|---|---|
values |
+
+ ArrayLike
+ |
+ An input array. |
+ + required + | +
rng |
+
+ Generator
+ |
+ A random number generator. |
+ + required + | +
Returns:
+Type | +Description | +
---|---|
+ int
+ |
+ The random argmax of the input array. |
+
mabby/utils.py
9 +10 +11 +12 +13 +14 +15 +16 +17 +18 +19 +20 +21 +22 |
|
mabby is a library for simulating multi-armed bandits (MABs), a resource-allocation problem and framework in reinforcement learning. It allows users to quickly yet flexibly define and run bandit simulations, with the ability to:
Prerequisites: Python 3.9+ and pip
Install mabby with pip
:
pip install mabby\n
"},{"location":"#basic-usage","title":"Basic Usage","text":"The code example below demonstrates the basic steps of running a simulation with mabby. For more in-depth examples, please see the Usage Examples section of the mabby documentation.
import mabby as mb\n\n# configure bandit arms\nbandit = mb.BernoulliArm.bandit(p=[0.3, 0.6])\n\n# configure bandit strategy\nstrategy = mb.strategies.EpsilonGreedyStrategy(eps=0.2)\n\n# setup simulation\nsimulation = mb.Simulation(bandit=bandit, strategies=[strategy])\n\n# run simulation\nstats = simulation.run(trials=100, steps=300)\n\n# plot regret statistics\nstats.plot_regret()\n
"},{"location":"#contributing","title":"Contributing","text":"Please see CONTRIBUTING for more information.
"},{"location":"#license","title":"License","text":"This software is licensed under the Apache 2.0 license. Please see LICENSE for more information.
"},{"location":"code_of_conduct/","title":"Contributor Covenant Code of Conduct","text":""},{"location":"code_of_conduct/#our-pledge","title":"Our Pledge","text":"We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
"},{"location":"code_of_conduct/#our-standards","title":"Our Standards","text":"Examples of behavior that contributes to a positive environment for our community include:
Examples of unacceptable behavior include:
Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
"},{"location":"code_of_conduct/#scope","title":"Scope","text":"This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
"},{"location":"code_of_conduct/#enforcement","title":"Enforcement","text":"Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at ew2664@columbia.edu. All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the reporter of any incident.
"},{"location":"code_of_conduct/#enforcement-guidelines","title":"Enforcement Guidelines","text":"Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:
"},{"location":"code_of_conduct/#1-correction","title":"1. Correction","text":"Community Impact: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
Consequence: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
"},{"location":"code_of_conduct/#2-warning","title":"2. Warning","text":"Community Impact: A violation through a single incident or series of actions.
Consequence: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
"},{"location":"code_of_conduct/#3-temporary-ban","title":"3. Temporary Ban","text":"Community Impact: A serious violation of community standards, including sustained inappropriate behavior.
Consequence: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
"},{"location":"code_of_conduct/#4-permanent-ban","title":"4. Permanent Ban","text":"Community Impact: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
Consequence: A permanent ban from any sort of public interaction within the community.
"},{"location":"code_of_conduct/#attribution","title":"Attribution","text":"This Code of Conduct is adapted from the Contributor Covenant, version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
Community Impact Guidelines were inspired by Mozilla's code of conduct enforcement ladder.
For answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.
"},{"location":"contributing/","title":"Contributing","text":"We welcome and value all types of contributions, from bug reports to feature additions. Please make sure to read the relevant section(s) below before making your contribution.
And if you like the project but just don't have time to contribute, there are other easy ways to support the project and show your appreciation. We'd love if you could:
Once again, thank you for supporting the project and taking the time to contribute!
"},{"location":"contributing/#code-of-conduct","title":"Code of Conduct","text":"Please read and follow our Code of Conduct.
"},{"location":"contributing/#reporting-bugs","title":"Reporting Bugs","text":"When submitting a new bug report, please first search for an existing or similar report. If you believe you've come across a unique problem, then use one of our existing issue templates. Duplicate issues or issues that don't use one of our templates may get closed without a response.
"},{"location":"contributing/#development","title":"Development","text":"Before contributing, make sure you have Python 3.9+ and poetry installed.
Fork the repository on GitHub.
Clone the repository from your GitHub.
Setup development environment (make install
).
Setup pre-commit hooks (poetry run pre-commit install
).
Check out a new branch and make your modifications.
Add test cases for all your changes.
Run make lint
and make test
and ensure they pass.
Tip
Run make format
to fix the linting errors that are auto-fixable.
Tip
Run make coverage
to run unit tests only and generate an HTML coverage report.
Commit your changes following our commit conventions.
Push your changes to your fork of the repository.
Open a pull request!
We follow conventional commits. When opening a pull request, please be sure that both the pull request title and each commit in the pull request has one of the following prefixes:
Prefix Description SemVerfeat:
a new feature MINOR
fix:
a bug fix PATCH
refactor:
a code change that neither fixes a bug nor adds a new feature PATCH
docs:
a documentation-only change PATCH
chore:
any other change that does not affect the published module (e.g. testing) none"},{"location":"license/","title":"License","text":"Apache License Version 2.0, January 2004 http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
Definitions.
\"License\" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
\"Licensor\" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
\"Legal Entity\" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, \"control\" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
\"You\" (or \"Your\") shall mean an individual or Legal Entity exercising permissions granted by this License.
\"Source\" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
\"Object\" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
\"Work\" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
\"Derivative Works\" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
\"Contribution\" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, \"submitted\" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as \"Not a Contribution.\"
\"Contributor\" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
(a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
(d) If the Work includes a \"NOTICE\" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following\n boilerplate notice, with the fields enclosed by brackets \"[]\"\n replaced with your own identifying information. (Don't include\n the brackets!) The text should be enclosed in the appropriate\n comment syntax for the file format. We also recommend that a\n file or class name and description of purpose be included on the\n same \"printed page\" as the copyright notice for easier\n identification within third-party archives.\n
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0\n
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
"},{"location":"examples/","title":"Usage Examples","text":"We will walk through an example using mabby to run a classic \"Bernoulli bandits\" simulation.
from mabby import BernoulliArm, Bandit, Metric, Simulation\nfrom mabby.strategies import BetaTSStrategy, EpsilonGreedyStrategy, UCB1Strategy\n
"},{"location":"examples/bernoulli_bandits/#configuring-bandit-arms","title":"Configuring bandit arms","text":"First, to set up our simulation, let us start by configuring our multi-armed bandit. We want to simulate a 3-armed bandit where the rewards of each arm follow Bernoulli distributions with p
of 0.5, 0.6, and 0.7 respectively.
ps = [0.5, 0.6, 0.7]\n
We create a BernoulliArm
for each arm, then create a Bandit
using the list of arms.
arms = [BernoulliArm(p) for p in ps]\nbandit = Bandit(arms=arms)\n
Because all our arms are of the same type (i.e., their rewards follow the same type of distribution), we can also use the equivalent shorthand below to create the bandit.
bandit = BernoulliArm.bandit(p=ps)\n
"},{"location":"examples/bernoulli_bandits/#configuring-bandit-strategies","title":"Configuring bandit strategies","text":"Next, we need to configure the strategies we want to simulate on the bandit we just created. We will compare between three strategies:
EpsilonGreedyStrategy
)UCB1Strategy
)BetaTSStrategy
)We create each of the strategies with the appropriate hyperparameters.
strategy_1 = EpsilonGreedyStrategy(eps=0.2)\nstrategy_2 = UCB1Strategy(alpha=0.5)\nstrategy_3 = BetaTSStrategy(general=True)\n\nstrategies = [strategy_1, strategy_2, strategy_3]\n
"},{"location":"examples/bernoulli_bandits/#running-a-simulation","title":"Running a simulation","text":"Now, we can set up a simulation and run it. We first create a Simulation
with our bandit and strategies.
simulation = Simulation(\n bandit=bandit, strategies=strategies, names=[\"eps-greedy\", \"ucb1\", \"thompson\"]\n)\n
Then, we run our simulation for 100 trials of 300 steps each. We also specify that we want to collect statistics on the optimality (Metric.OPTIMALITY
) and cumulative regret (Metric.CUM_REGRET
) for each of the strategies. Running the simulation outputs a SimulationStats
object holding the statistics we requested.
metrics = [Metric.OPTIMALITY, Metric.CUM_REGRET]\nstats = simulation.run(trials=100, steps=300, metrics=metrics)\n
"},{"location":"examples/bernoulli_bandits/#visualizing-simulation-statistics","title":"Visualizing simulation statistics","text":"After running our simulation, we can visualize the statistics we collected by calling various plotting methods.
stats.plot_optimality()\n
stats.plot_regret(cumulative=True)\n
"},{"location":"reference/","title":"mabby","text":"A multi-armed bandit (MAB) simulation library.
mabby is a library for simulating multi-armed bandits (MABs), a resource-allocation problem and framework in reinforcement learning. It allows users to quickly yet flexibly define and run bandit simulations, with the ability to:
Agent(strategy, name=None)
","text":"Agent in a multi-armed bandit simulation.
An agent represents an autonomous entity in a bandit simulation. It wraps around a specified strategy and provides an interface for each part of the decision-making process, including making a choice then updating internal parameter estimates based on the observed rewards from that choice.
Parameters:
Name Type Description Defaultstrategy
Strategy
The bandit strategy to use.
requiredname
str | None
An optional name for the agent.
None
Source code in mabby/agent.py
def __init__(self, strategy: Strategy, name: str | None = None):\n\"\"\"Initializes an agent with a given strategy.\n\n Args:\n strategy: The bandit strategy to use.\n name: An optional name for the agent.\n \"\"\"\n self.strategy: Strategy = strategy #: The bandit strategy to use\n self._name = name\n self._primed = False\n self._choice: int | None = None\n
"},{"location":"reference/#mabby.agent.Agent.Ns","title":"Ns: NDArray[np.uint32]
property
","text":"The number of times the agent has played each arm.
The play counts are only available after the agent has been primed.
Returns:
Type DescriptionNDArray[np.uint32]
An array of the play counts of each arm.
Raises:
Type DescriptionAgentUsageError
If the agent has not been primed.
"},{"location":"reference/#mabby.agent.Agent.Qs","title":"Qs: NDArray[np.float64]
property
","text":"The agent's current estimated action values (Q-values).
The action values are only available after the agent has been primed.
Returns:
Type DescriptionNDArray[np.float64]
An array of the action values of each arm.
Raises:
Type DescriptionAgentUsageError
If the agent has not been primed.
"},{"location":"reference/#mabby.agent.Agent.__repr__","title":"__repr__()
","text":"Returns the agent's string representation.
Uses the agent's name if set. Otherwise, the string representation of the agent's strategy is used by default.
Source code inmabby/agent.py
def __repr__(self) -> str:\n\"\"\"Returns the agent's string representation.\n\n Uses the agent's name if set. Otherwise, the string representation of the\n agent's strategy is used by default.\n \"\"\"\n if self._name is None:\n return str(self.strategy)\n return self._name\n
"},{"location":"reference/#mabby.agent.Agent.choose","title":"choose()
","text":"Returns the agent's next choice based on its strategy.
This method can only be called on a primed agent.
Returns:
Type Descriptionint
The index of the arm chosen by the agent.
Raises:
Type DescriptionAgentUsageError
If the agent has not been primed.
Source code inmabby/agent.py
def choose(self) -> int:\n\"\"\"Returns the agent's next choice based on its strategy.\n\n This method can only be called on a primed agent.\n\n Returns:\n The index of the arm chosen by the agent.\n\n Raises:\n AgentUsageError: If the agent has not been primed.\n \"\"\"\n if not self._primed:\n raise AgentUsageError(\"choose() can only be called on a primed agent\")\n self._choice = self.strategy.choose(self._rng)\n return self._choice\n
"},{"location":"reference/#mabby.agent.Agent.prime","title":"prime(k, steps, rng)
","text":"Primes the agent before running a trial.
Parameters:
Name Type Description Defaultk
int
The number of bandit arms for the agent to choose from.
requiredsteps
int
The number of steps to the simulation will be run.
requiredrng
Generator
A random number generator.
required Source code inmabby/agent.py
def prime(self, k: int, steps: int, rng: Generator) -> None:\n\"\"\"Primes the agent before running a trial.\n\n Args:\n k: The number of bandit arms for the agent to choose from.\n steps: The number of steps to the simulation will be run.\n rng: A random number generator.\n \"\"\"\n self._primed = True\n self._choice = None\n self._rng = rng\n self.strategy.prime(k, steps)\n
"},{"location":"reference/#mabby.agent.Agent.update","title":"update(reward)
","text":"Updates the agent's internal parameter estimates.
This method can only be called if the agent has previously made a choice, and an update based on that choice has not already been made.
Parameters:
Name Type Description Defaultreward
float
The observed reward from the agent's most recent choice.
requiredRaises:
Type DescriptionAgentUsageError
If the agent has not previously made a choice.
Source code inmabby/agent.py
def update(self, reward: float) -> None:\n\"\"\"Updates the agent's internal parameter estimates.\n\n This method can only be called if the agent has previously made a choice, and\n an update based on that choice has not already been made.\n\n Args:\n reward: The observed reward from the agent's most recent choice.\n\n Raises:\n AgentUsageError: If the agent has not previously made a choice.\n \"\"\"\n if self._choice is None:\n raise AgentUsageError(\"update() can only be called after choose()\")\n self.strategy.update(self._choice, reward, self._rng)\n self._choice = None\n
"},{"location":"reference/#mabby.Arm","title":"Arm(**kwargs)
","text":" Bases: ABC
, EnforceOverrides
Base class for a bandit arm implementing a reward distribution.
An arm represents one of the decision choices available to the agent in a bandit problem. It has a hidden reward distribution and can be played by the agent to generate observable rewards.
Source code inmabby/arms.py
@abstractmethod\ndef __init__(self, **kwargs: float):\n\"\"\"Initializes an arm.\"\"\"\n
"},{"location":"reference/#mabby.arms.Arm.mean","title":"mean: float
abstractmethod
property
","text":"The mean reward of the arm.
Returns:
Type Descriptionfloat
The computed mean of the arm's reward distribution.
"},{"location":"reference/#mabby.arms.Arm.__repr__","title":"__repr__()
abstractmethod
","text":"Returns the string representation of the arm.
Source code inmabby/arms.py
@abstractmethod\ndef __repr__(self) -> str:\n\"\"\"Returns the string representation of the arm.\"\"\"\n
"},{"location":"reference/#mabby.arms.Arm.bandit","title":"bandit(rng=None, seed=None, **kwargs)
classmethod
","text":"Creates a bandit with arms of the same reward distribution type.
Parameters:
Name Type Description Defaultrng
Generator | None
A random number generator.
None
seed
int | None
A seed for random number generation if rng
is not provided.
None
**kwargs
list[float]
A dictionary where keys are arm parameter names and values are lists of parameter values for each arm.
{}
Returns:
Type DescriptionBandit
A bandit with the specified arms.
Source code inmabby/arms.py
@classmethod\ndef bandit(\n cls,\n rng: Generator | None = None,\n seed: int | None = None,\n **kwargs: list[float],\n) -> Bandit:\n\"\"\"Creates a bandit with arms of the same reward distribution type.\n\n Args:\n rng: A random number generator.\n seed: A seed for random number generation if ``rng`` is not provided.\n **kwargs: A dictionary where keys are arm parameter names and values are\n lists of parameter values for each arm.\n\n Returns:\n A bandit with the specified arms.\n \"\"\"\n params_dicts = [dict(zip(kwargs, t)) for t in zip(*kwargs.values())]\n if len(params_dicts) == 0:\n raise ValueError(\"insufficient parameters to create an arm\")\n return Bandit([cls(**params) for params in params_dicts], rng, seed)\n
"},{"location":"reference/#mabby.arms.Arm.play","title":"play(rng)
abstractmethod
","text":"Plays the arm and samples a reward.
Parameters:
Name Type Description Defaultrng
Generator
A random number generator.
requiredReturns:
Type Descriptionfloat
The sampled reward from the arm's reward distribution.
Source code inmabby/arms.py
@abstractmethod\ndef play(self, rng: Generator) -> float:\n\"\"\"Plays the arm and samples a reward.\n\n Args:\n rng: A random number generator.\n\n Returns:\n The sampled reward from the arm's reward distribution.\n \"\"\"\n
"},{"location":"reference/#mabby.Bandit","title":"Bandit(arms, rng=None, seed=None)
","text":"Multi-armed bandit with one or more arms.
This class wraps around a list of arms, each of which has a reward distribution. It provides an interface for interacting with the arms, such as playing a specific arm, querying for the optimal arm, and computing regret from a given choice.
Parameters:
Name Type Description Defaultarms
list[Arm]
A list of arms for the bandit.
requiredrng
Generator | None
A random number generator.
None
seed
int | None
A seed for random number generation if rng
is not provided.
None
Source code in mabby/bandit.py
def __init__(\n self, arms: list[Arm], rng: Generator | None = None, seed: int | None = None\n):\n\"\"\"Initializes a bandit with a given set of arms.\n\n Args:\n arms: A list of arms for the bandit.\n rng: A random number generator.\n seed: A seed for random number generation if ``rng`` is not provided.\n \"\"\"\n self._arms = arms\n self._rng = rng if rng else np.random.default_rng(seed)\n
"},{"location":"reference/#mabby.bandit.Bandit.means","title":"means: list[float]
property
","text":"The means of the arms.
Returns:
Type Descriptionlist[float]
An array of the means of each arm.
"},{"location":"reference/#mabby.bandit.Bandit.__getitem__","title":"__getitem__(i)
","text":"Returns an arm by index.
Parameters:
Name Type Description Defaulti
int
The index of the arm to get.
requiredReturns:
Type DescriptionArm
The arm at the given index.
Source code inmabby/bandit.py
def __getitem__(self, i: int) -> Arm:\n\"\"\"Returns an arm by index.\n\n Args:\n i: The index of the arm to get.\n\n Returns:\n The arm at the given index.\n \"\"\"\n return self._arms[i]\n
"},{"location":"reference/#mabby.bandit.Bandit.__iter__","title":"__iter__()
","text":"Returns an iterator over the bandit's arms.
Source code inmabby/bandit.py
def __iter__(self) -> Iterable[Arm]:\n\"\"\"Returns an iterator over the bandit's arms.\"\"\"\n return iter(self._arms)\n
"},{"location":"reference/#mabby.bandit.Bandit.__len__","title":"__len__()
","text":"Returns the number of arms.
Source code inmabby/bandit.py
def __len__(self) -> int:\n\"\"\"Returns the number of arms.\"\"\"\n return len(self._arms)\n
"},{"location":"reference/#mabby.bandit.Bandit.__repr__","title":"__repr__()
","text":"Returns a string representation of the bandit.
Source code inmabby/bandit.py
def __repr__(self) -> str:\n\"\"\"Returns a string representation of the bandit.\"\"\"\n return repr(self._arms)\n
"},{"location":"reference/#mabby.bandit.Bandit.best_arm","title":"best_arm()
","text":"Returns the index of the optimal arm.
The optimal arm is the arm with the greatest expected reward. If there are multiple arms with equal expected rewards, a random one is chosen.
Returns:
Type Descriptionint
The index of the optimal arm.
Source code inmabby/bandit.py
def best_arm(self) -> int:\n\"\"\"Returns the index of the optimal arm.\n\n The optimal arm is the arm with the greatest expected reward. If there are\n multiple arms with equal expected rewards, a random one is chosen.\n\n Returns:\n The index of the optimal arm.\n \"\"\"\n return random_argmax(self.means, rng=self._rng)\n
"},{"location":"reference/#mabby.bandit.Bandit.is_opt","title":"is_opt(choice)
","text":"Returns the optimality of a given choice.
Parameters:
Name Type Description Defaultchoice
int
The index of the chosen arm.
requiredReturns:
Type Descriptionbool
True
if the arm has the greatest expected reward, False
otherwise.
mabby/bandit.py
def is_opt(self, choice: int) -> bool:\n\"\"\"Returns the optimality of a given choice.\n\n Args:\n choice: The index of the chosen arm.\n\n Returns:\n ``True`` if the arm has the greatest expected reward, ``False`` otherwise.\n \"\"\"\n return np.max(self.means) == self._arms[choice].mean\n
"},{"location":"reference/#mabby.bandit.Bandit.play","title":"play(i)
","text":"Plays an arm by index.
Parameters:
Name Type Description Defaulti
int
The index of the arm to play.
requiredReturns:
Type Descriptionfloat
The reward from playing the arm.
Source code inmabby/bandit.py
def play(self, i: int) -> float:\n\"\"\"Plays an arm by index.\n\n Args:\n i: The index of the arm to play.\n\n Returns:\n The reward from playing the arm.\n \"\"\"\n return self[i].play(self._rng)\n
"},{"location":"reference/#mabby.bandit.Bandit.regret","title":"regret(choice)
","text":"Returns the regret from a given choice.
The regret is computed as the difference between the expected reward from the optimal arm and the expected reward from the chosen arm.
Parameters:
Name Type Description Defaultchoice
int
The index of the chosen arm.
requiredReturns:
Type Descriptionfloat
The computed regret value.
Source code inmabby/bandit.py
def regret(self, choice: int) -> float:\n\"\"\"Returns the regret from a given choice.\n\n The regret is computed as the difference between the expected reward from the\n optimal arm and the expected reward from the chosen arm.\n\n Args:\n choice: The index of the chosen arm.\n\n Returns:\n The computed regret value.\n \"\"\"\n return np.max(self.means) - self._arms[choice].mean\n
"},{"location":"reference/#mabby.BernoulliArm","title":"BernoulliArm(p)
","text":" Bases: Arm
Bandit arm with a Bernoulli reward distribution.
Parameters:
Name Type Description Defaultp
float
Parameter of the Bernoulli distribution.
required Source code inmabby/arms.py
def __init__(self, p: float):\n\"\"\"Initializes a Bernoulli arm.\n\n Args:\n p: Parameter of the Bernoulli distribution.\n \"\"\"\n if p < 0 or p > 1:\n raise ValueError(\n f\"float {str(p)} is not a valid probability for Bernoulli distribution\"\n )\n\n self.p: float = p #: Parameter of the Bernoulli distribution\n
"},{"location":"reference/#mabby.GaussianArm","title":"GaussianArm(loc, scale)
","text":" Bases: Arm
Bandit arm with a Gaussian reward distribution.
Parameters:
Name Type Description Defaultloc
float
Mean (\"center\") of the Gaussian distribution.
requiredscale
float
Standard deviation of the Gaussian distribution.
required Source code inmabby/arms.py
def __init__(self, loc: float, scale: float):\n\"\"\"Initializes a Gaussian arm.\n\n Args:\n loc: Mean (\"center\") of the Gaussian distribution.\n scale: Standard deviation of the Gaussian distribution.\n \"\"\"\n if scale < 0:\n raise ValueError(\n f\"float {str(scale)} is not a valid scale for Gaussian distribution\"\n )\n\n self.loc: float = loc #: Mean (\"center\") of the Gaussian distribution\n self.scale: float = scale #: Standard deviation of the Gaussian distribution\n
"},{"location":"reference/#mabby.Metric","title":"Metric(label, base=None, transform=None)
","text":" Bases: Enum
Enum for metrics that simulations can track.
Metrics can be derived from other metrics through specifying a base
metric and a transform
function. This is useful for things like defining cumulative versions of an existing metric, where the transformed values can be computed \"lazily\" instead of being redundantly stored.
Parameters:
Name Type Description Defaultlabel
str
Verbose name of the metric (title case)
requiredbase
str | None
Name of the base metric
None
transform
Callable[[NDArray[np.float64]], NDArray[np.float64]] | None
Transformation function from the base metric
None
Source code in mabby/stats.py
def __init__(\n self,\n label: str,\n base: str | None = None,\n transform: Callable[[NDArray[np.float64]], NDArray[np.float64]] | None = None,\n):\n\"\"\"Initializes a metric.\n\n Metrics can be derived from other metrics through specifying a ``base`` metric\n and a ``transform`` function. This is useful for things like defining cumulative\n versions of an existing metric, where the transformed values can be computed\n \"lazily\" instead of being redundantly stored.\n\n Args:\n label: Verbose name of the metric (title case)\n base: Name of the base metric\n transform: Transformation function from the base metric\n \"\"\"\n self.__class__.__MAPPING__[self._name_] = self\n self._label = label\n self._mapping: MetricMapping | None = (\n MetricMapping(base=self.__class__.__MAPPING__[base], transform=transform)\n if base and transform\n else None\n )\n
"},{"location":"reference/#mabby.stats.Metric.base","title":"base: Metric
property
","text":"The base metric that the metric is transformed from.
If the metric is already a base metric, the metric itself is returned.
"},{"location":"reference/#mabby.stats.Metric.__repr__","title":"__repr__()
","text":"Returns the verbose name of the metric.
Source code inmabby/stats.py
def __repr__(self) -> str:\n\"\"\"Returns the verbose name of the metric.\"\"\"\n return self._label\n
"},{"location":"reference/#mabby.stats.Metric.is_base","title":"is_base()
","text":"Returns whether the metric is a base metric.
Returns:
Type Descriptionbool
True
if the metric is a base metric, False
otherwise.
mabby/stats.py
def is_base(self) -> bool:\n\"\"\"Returns whether the metric is a base metric.\n\n Returns:\n ``True`` if the metric is a base metric, ``False`` otherwise.\n \"\"\"\n return self._mapping is None\n
"},{"location":"reference/#mabby.stats.Metric.map_to_base","title":"map_to_base(metrics)
classmethod
","text":"Traces all metrics back to their base metrics.
Parameters:
Name Type Description Defaultmetrics
Iterable[Metric]
A collection of metrics.
requiredReturns:
Type DescriptionIterable[Metric]
A set containing the base metrics of all the input metrics.
Source code inmabby/stats.py
@classmethod\ndef map_to_base(cls, metrics: Iterable[Metric]) -> Iterable[Metric]:\n\"\"\"Traces all metrics back to their base metrics.\n\n Args:\n metrics: A collection of metrics.\n\n Returns:\n A set containing the base metrics of all the input metrics.\n \"\"\"\n return set(m.base for m in metrics)\n
"},{"location":"reference/#mabby.stats.Metric.transform","title":"transform(values)
","text":"Transforms values from the base metric.
If the metric is already a base metric, the input values are returned.
Parameters:
Name Type Description Defaultvalues
NDArray[np.float64]
An array of input values for the base metric.
requiredReturns:
Type DescriptionNDArray[np.float64]
An array of transformed values for the metric.
Source code inmabby/stats.py
def transform(self, values: NDArray[np.float64]) -> NDArray[np.float64]:\n\"\"\"Transforms values from the base metric.\n\n If the metric is already a base metric, the input values are returned.\n\n Args:\n values: An array of input values for the base metric.\n\n Returns:\n An array of transformed values for the metric.\n \"\"\"\n if self._mapping is not None:\n return self._mapping.transform(values)\n return values\n
"},{"location":"reference/#mabby.Simulation","title":"Simulation(bandit, agents=None, strategies=None, names=None, rng=None, seed=None)
","text":"Simulation of a multi-armed bandit problem.
A simulation consists of multiple trials of one or more bandit strategies run on a configured multi-armed bandit.
One of agents
or strategies
must be supplied. If agents
is supplied, strategies
and names
are ignored. Otherwise, an agent
is created for each strategy
and given a name from names
if available.
Parameters:
Name Type Description Defaultbandit
Bandit
A configured multi-armed bandit to simulate on.
requiredagents
Iterable[Agent] | None
A list of agents to simulate.
None
strategies
Iterable[Strategy] | None
A list of strategies to simulate.
None
names
Iterable[str] | None
A list of names for agents.
None
rng
Generator | None
A random number generator.
None
seed
int | None
A seed for random number generation if rng
is not provided.
None
Raises:
Type DescriptionSimulationUsageError
If neither agents
nor strategies
are supplied.
mabby/simulation.py
def __init__(\n self,\n bandit: Bandit,\n agents: Iterable[Agent] | None = None,\n strategies: Iterable[Strategy] | None = None,\n names: Iterable[str] | None = None,\n rng: Generator | None = None,\n seed: int | None = None,\n):\n\"\"\"Initializes a simulation.\n\n One of ``agents`` or ``strategies`` must be supplied. If ``agents`` is supplied,\n ``strategies`` and ``names`` are ignored. Otherwise, an ``agent`` is created for\n each ``strategy`` and given a name from ``names`` if available.\n\n Args:\n bandit: A configured multi-armed bandit to simulate on.\n agents: A list of agents to simulate.\n strategies: A list of strategies to simulate.\n names: A list of names for agents.\n rng: A random number generator.\n seed: A seed for random number generation if ``rng`` is not provided.\n\n Raises:\n SimulationUsageError: If neither ``agents`` nor ``strategies`` are supplied.\n \"\"\"\n self.agents = self._create_agents(agents, strategies, names)\n if len(list(self.agents)) == 0:\n raise ValueError(\"no strategies or agents were supplied\")\n self.bandit = bandit\n if len(self.bandit) == 0:\n raise ValueError(\"bandit cannot be empty\")\n self._rng = rng if rng else np.random.default_rng(seed)\n
"},{"location":"reference/#mabby.simulation.Simulation.run","title":"run(trials, steps, metrics=None)
","text":"Runs a simulation.
In a simulation run, each agent or strategy is run for the specified number of trials, and each trial is run for the given number of steps.
If metrics
is not specified, all available metrics are tracked by default.
Parameters:
Name Type Description Defaulttrials
int
The number of trials in the simulation.
requiredsteps
int
The number of steps in a trial.
requiredmetrics
Iterable[Metric] | None
A list of metrics to collect.
None
Returns:
Type DescriptionSimulationStats
A SimulationStats
object with the results of the simulation.
mabby/simulation.py
def run(\n self, trials: int, steps: int, metrics: Iterable[Metric] | None = None\n) -> SimulationStats:\n\"\"\"Runs a simulation.\n\n In a simulation run, each agent or strategy is run for the specified number of\n trials, and each trial is run for the given number of steps.\n\n If ``metrics`` is not specified, all available metrics are tracked by default.\n\n Args:\n trials: The number of trials in the simulation.\n steps: The number of steps in a trial.\n metrics: A list of metrics to collect.\n\n Returns:\n A ``SimulationStats`` object with the results of the simulation.\n \"\"\"\n sim_stats = SimulationStats(simulation=self)\n for agent in self.agents:\n agent_stats = self._run_trials_for_agent(agent, trials, steps, metrics)\n sim_stats.add(agent_stats)\n return sim_stats\n
"},{"location":"reference/#mabby.SimulationStats","title":"SimulationStats(simulation)
","text":"Statistics for a multi-armed bandit simulation.
Parameters:
Name Type Description Defaultsimulation
Simulation
The simulation to track.
required Source code inmabby/stats.py
def __init__(self, simulation: Simulation):\n\"\"\"Initializes simulation statistics.\n\n Args:\n simulation: The simulation to track.\n \"\"\"\n self._simulation: Simulation = simulation\n self._stats_dict: dict[Agent, AgentStats] = {}\n
"},{"location":"reference/#mabby.stats.SimulationStats.__contains__","title":"__contains__(agent)
","text":"Returns if an agent's statistics are present.
Returns:
Type Descriptionbool
True
if an agent's statistics are present, False
otherwise.
mabby/stats.py
def __contains__(self, agent: Agent) -> bool:\n\"\"\"Returns if an agent's statistics are present.\n\n Returns:\n ``True`` if an agent's statistics are present, ``False`` otherwise.\n \"\"\"\n return agent in self._stats_dict\n
"},{"location":"reference/#mabby.stats.SimulationStats.__getitem__","title":"__getitem__(agent)
","text":"Gets statistics for an agent.
Parameters:
Name Type Description Defaultagent
Agent
The agent to get the statistics of.
requiredReturns:
Type DescriptionAgentStats
The statistics of the agent.
Source code inmabby/stats.py
def __getitem__(self, agent: Agent) -> AgentStats:\n\"\"\"Gets statistics for an agent.\n\n Args:\n agent: The agent to get the statistics of.\n\n Returns:\n The statistics of the agent.\n \"\"\"\n return self._stats_dict[agent]\n
"},{"location":"reference/#mabby.stats.SimulationStats.__setitem__","title":"__setitem__(agent, agent_stats)
","text":"Sets the statistics for an agent.
Parameters:
Name Type Description Defaultagent
Agent
The agent to set the statistics of.
requiredagent_stats
AgentStats
The agent statistics to set.
required Source code inmabby/stats.py
def __setitem__(self, agent: Agent, agent_stats: AgentStats) -> None:\n\"\"\"Sets the statistics for an agent.\n\n Args:\n agent: The agent to set the statistics of.\n agent_stats: The agent statistics to set.\n \"\"\"\n if agent != agent_stats.agent:\n raise StatsUsageError(\"agents specified in key and value don't match\")\n self._stats_dict[agent] = agent_stats\n
"},{"location":"reference/#mabby.stats.SimulationStats.add","title":"add(agent_stats)
","text":"Adds statistics for an agent.
Parameters:
Name Type Description Defaultagent_stats
AgentStats
The agent statistics to add.
required Source code inmabby/stats.py
def add(self, agent_stats: AgentStats) -> None:\n\"\"\"Adds statistics for an agent.\n\n Args:\n agent_stats: The agent statistics to add.\n \"\"\"\n self._stats_dict[agent_stats.agent] = agent_stats\n
"},{"location":"reference/#mabby.stats.SimulationStats.plot","title":"plot(metric)
","text":"Generates a plot for a simulation metric.
Parameters:
Name Type Description Defaultmetric
Metric
The metric to plot.
required Source code inmabby/stats.py
def plot(self, metric: Metric) -> None:\n\"\"\"Generates a plot for a simulation metric.\n\n Args:\n metric: The metric to plot.\n \"\"\"\n for agent, agent_stats in self._stats_dict.items():\n plt.plot(agent_stats[metric], label=str(agent))\n plt.legend()\n plt.show()\n
"},{"location":"reference/#mabby.stats.SimulationStats.plot_optimality","title":"plot_optimality()
","text":"Generates a plot for the optimality metric.
Source code inmabby/stats.py
def plot_optimality(self) -> None:\n\"\"\"Generates a plot for the optimality metric.\"\"\"\n self.plot(metric=Metric.OPTIMALITY)\n
"},{"location":"reference/#mabby.stats.SimulationStats.plot_regret","title":"plot_regret(cumulative=True)
","text":"Generates a plot for the regret or cumulative regret metrics.
Parameters:
Name Type Description Defaultcumulative
bool
Whether to use the cumulative regret.
True
Source code in mabby/stats.py
def plot_regret(self, cumulative: bool = True) -> None:\n\"\"\"Generates a plot for the regret or cumulative regret metrics.\n\n Args:\n cumulative: Whether to use the cumulative regret.\n \"\"\"\n self.plot(metric=Metric.CUM_REGRET if cumulative else Metric.REGRET)\n
"},{"location":"reference/#mabby.stats.SimulationStats.plot_rewards","title":"plot_rewards(cumulative=True)
","text":"Generates a plot for the rewards or cumulative rewards metrics.
Parameters:
Name Type Description Defaultcumulative
bool
Whether to use the cumulative rewards.
True
Source code in mabby/stats.py
def plot_rewards(self, cumulative: bool = True) -> None:\n\"\"\"Generates a plot for the rewards or cumulative rewards metrics.\n\n Args:\n cumulative: Whether to use the cumulative rewards.\n \"\"\"\n self.plot(metric=Metric.CUM_REWARDS if cumulative else Metric.REWARDS)\n
"},{"location":"reference/#mabby.Strategy","title":"Strategy()
","text":" Bases: ABC
, EnforceOverrides
Base class for a bandit strategy.
A strategy provides the computational logic for choosing which bandit arms to play and updating parameter estimates.
Source code inmabby/strategies/strategy.py
@abstractmethod\ndef __init__(self) -> None:\n\"\"\"Initializes a bandit strategy.\"\"\"\n
"},{"location":"reference/#mabby.strategies.strategy.Strategy.Ns","title":"Ns: NDArray[np.uint32]
abstractmethod
property
","text":"The number of times each arm has been played.
"},{"location":"reference/#mabby.strategies.strategy.Strategy.Qs","title":"Qs: NDArray[np.float64]
abstractmethod
property
","text":"The current estimated action values for each arm.
"},{"location":"reference/#mabby.strategies.strategy.Strategy.__repr__","title":"__repr__()
abstractmethod
","text":"Returns a string representation of the strategy.
Source code inmabby/strategies/strategy.py
@abstractmethod\ndef __repr__(self) -> str:\n\"\"\"Returns a string representation of the strategy.\"\"\"\n
"},{"location":"reference/#mabby.strategies.strategy.Strategy.agent","title":"agent(**kwargs)
","text":"Creates an agent following the strategy.
Parameters:
Name Type Description Default**kwargs
str
Parameters for initializing the agent (see Agent
)
{}
Returns:
Type DescriptionAgent
The created agent with the strategy.
Source code inmabby/strategies/strategy.py
def agent(self, **kwargs: str) -> Agent:\n\"\"\"Creates an agent following the strategy.\n\n Args:\n **kwargs: Parameters for initializing the agent (see\n [`Agent`][mabby.agent.Agent])\n\n Returns:\n The created agent with the strategy.\n \"\"\"\n return Agent(strategy=self, **kwargs)\n
"},{"location":"reference/#mabby.strategies.strategy.Strategy.choose","title":"choose(rng)
abstractmethod
","text":"Returns the next arm to play.
Parameters:
Name Type Description Defaultrng
Generator
A random number generator.
requiredReturns:
Type Descriptionint
The index of the arm to play.
Source code inmabby/strategies/strategy.py
@abstractmethod\ndef choose(self, rng: Generator) -> int:\n\"\"\"Returns the next arm to play.\n\n Args:\n rng: A random number generator.\n\n Returns:\n The index of the arm to play.\n \"\"\"\n
"},{"location":"reference/#mabby.strategies.strategy.Strategy.prime","title":"prime(k, steps)
abstractmethod
","text":"Primes the strategy before running a trial.
Parameters:
Name Type Description Defaultk
int
The number of bandit arms to choose from.
requiredsteps
int
The number of steps to the simulation will be run.
required Source code inmabby/strategies/strategy.py
@abstractmethod\ndef prime(self, k: int, steps: int) -> None:\n\"\"\"Primes the strategy before running a trial.\n\n Args:\n k: The number of bandit arms to choose from.\n steps: The number of steps to the simulation will be run.\n \"\"\"\n
"},{"location":"reference/#mabby.strategies.strategy.Strategy.update","title":"update(choice, reward, rng=None)
abstractmethod
","text":"Updates internal parameter estimates based on reward observation.
Parameters:
Name Type Description Defaultchoice
int
The most recent choice made.
requiredreward
float
The observed reward from the agent's most recent choice.
requiredrng
Generator | None
A random number generator.
None
Source code in mabby/strategies/strategy.py
@abstractmethod\ndef update(self, choice: int, reward: float, rng: Generator | None = None) -> None:\n\"\"\"Updates internal parameter estimates based on reward observation.\n\n Args:\n choice: The most recent choice made.\n reward: The observed reward from the agent's most recent choice.\n rng: A random number generator.\n \"\"\"\n
"},{"location":"reference/agent/","title":"agent","text":"Provides Agent
class for bandit simulations.
Agent(strategy, name=None)
","text":"Agent in a multi-armed bandit simulation.
An agent represents an autonomous entity in a bandit simulation. It wraps around a specified strategy and provides an interface for each part of the decision-making process, including making a choice then updating internal parameter estimates based on the observed rewards from that choice.
Parameters:
Name Type Description Defaultstrategy
Strategy
The bandit strategy to use.
requiredname
str | None
An optional name for the agent.
None
Source code in mabby/agent.py
def __init__(self, strategy: Strategy, name: str | None = None):\n\"\"\"Initializes an agent with a given strategy.\n\n Args:\n strategy: The bandit strategy to use.\n name: An optional name for the agent.\n \"\"\"\n self.strategy: Strategy = strategy #: The bandit strategy to use\n self._name = name\n self._primed = False\n self._choice: int | None = None\n
"},{"location":"reference/agent/#mabby.agent.Agent.Ns","title":"Ns: NDArray[np.uint32]
property
","text":"The number of times the agent has played each arm.
The play counts are only available after the agent has been primed.
Returns:
Type DescriptionNDArray[np.uint32]
An array of the play counts of each arm.
Raises:
Type DescriptionAgentUsageError
If the agent has not been primed.
"},{"location":"reference/agent/#mabby.agent.Agent.Qs","title":"Qs: NDArray[np.float64]
property
","text":"The agent's current estimated action values (Q-values).
The action values are only available after the agent has been primed.
Returns:
Type DescriptionNDArray[np.float64]
An array of the action values of each arm.
Raises:
Type DescriptionAgentUsageError
If the agent has not been primed.
"},{"location":"reference/agent/#mabby.agent.Agent.__repr__","title":"__repr__()
","text":"Returns the agent's string representation.
Uses the agent's name if set. Otherwise, the string representation of the agent's strategy is used by default.
Source code inmabby/agent.py
def __repr__(self) -> str:\n\"\"\"Returns the agent's string representation.\n\n Uses the agent's name if set. Otherwise, the string representation of the\n agent's strategy is used by default.\n \"\"\"\n if self._name is None:\n return str(self.strategy)\n return self._name\n
"},{"location":"reference/agent/#mabby.agent.Agent.choose","title":"choose()
","text":"Returns the agent's next choice based on its strategy.
This method can only be called on a primed agent.
Returns:
Type Descriptionint
The index of the arm chosen by the agent.
Raises:
Type DescriptionAgentUsageError
If the agent has not been primed.
Source code inmabby/agent.py
def choose(self) -> int:\n\"\"\"Returns the agent's next choice based on its strategy.\n\n This method can only be called on a primed agent.\n\n Returns:\n The index of the arm chosen by the agent.\n\n Raises:\n AgentUsageError: If the agent has not been primed.\n \"\"\"\n if not self._primed:\n raise AgentUsageError(\"choose() can only be called on a primed agent\")\n self._choice = self.strategy.choose(self._rng)\n return self._choice\n
"},{"location":"reference/agent/#mabby.agent.Agent.prime","title":"prime(k, steps, rng)
","text":"Primes the agent before running a trial.
Parameters:
Name Type Description Defaultk
int
The number of bandit arms for the agent to choose from.
requiredsteps
int
The number of steps to the simulation will be run.
requiredrng
Generator
A random number generator.
required Source code inmabby/agent.py
def prime(self, k: int, steps: int, rng: Generator) -> None:\n\"\"\"Primes the agent before running a trial.\n\n Args:\n k: The number of bandit arms for the agent to choose from.\n steps: The number of steps to the simulation will be run.\n rng: A random number generator.\n \"\"\"\n self._primed = True\n self._choice = None\n self._rng = rng\n self.strategy.prime(k, steps)\n
"},{"location":"reference/agent/#mabby.agent.Agent.update","title":"update(reward)
","text":"Updates the agent's internal parameter estimates.
This method can only be called if the agent has previously made a choice, and an update based on that choice has not already been made.
Parameters:
Name Type Description Defaultreward
float
The observed reward from the agent's most recent choice.
requiredRaises:
Type DescriptionAgentUsageError
If the agent has not previously made a choice.
Source code inmabby/agent.py
def update(self, reward: float) -> None:\n\"\"\"Updates the agent's internal parameter estimates.\n\n This method can only be called if the agent has previously made a choice, and\n an update based on that choice has not already been made.\n\n Args:\n reward: The observed reward from the agent's most recent choice.\n\n Raises:\n AgentUsageError: If the agent has not previously made a choice.\n \"\"\"\n if self._choice is None:\n raise AgentUsageError(\"update() can only be called after choose()\")\n self.strategy.update(self._choice, reward, self._rng)\n self._choice = None\n
"},{"location":"reference/arms/","title":"arms","text":"Provides Arm
base class with some common reward distributions.
Arm(**kwargs)
","text":" Bases: ABC
, EnforceOverrides
Base class for a bandit arm implementing a reward distribution.
An arm represents one of the decision choices available to the agent in a bandit problem. It has a hidden reward distribution and can be played by the agent to generate observable rewards.
Source code inmabby/arms.py
@abstractmethod\ndef __init__(self, **kwargs: float):\n\"\"\"Initializes an arm.\"\"\"\n
"},{"location":"reference/arms/#mabby.arms.Arm.mean","title":"mean: float
abstractmethod
property
","text":"The mean reward of the arm.
Returns:
Type Descriptionfloat
The computed mean of the arm's reward distribution.
"},{"location":"reference/arms/#mabby.arms.Arm.__repr__","title":"__repr__()
abstractmethod
","text":"Returns the string representation of the arm.
Source code inmabby/arms.py
@abstractmethod\ndef __repr__(self) -> str:\n\"\"\"Returns the string representation of the arm.\"\"\"\n
"},{"location":"reference/arms/#mabby.arms.Arm.bandit","title":"bandit(rng=None, seed=None, **kwargs)
classmethod
","text":"Creates a bandit with arms of the same reward distribution type.
Parameters:
Name Type Description Defaultrng
Generator | None
A random number generator.
None
seed
int | None
A seed for random number generation if rng
is not provided.
None
**kwargs
list[float]
A dictionary where keys are arm parameter names and values are lists of parameter values for each arm.
{}
Returns:
Type DescriptionBandit
A bandit with the specified arms.
Source code inmabby/arms.py
@classmethod\ndef bandit(\n cls,\n rng: Generator | None = None,\n seed: int | None = None,\n **kwargs: list[float],\n) -> Bandit:\n\"\"\"Creates a bandit with arms of the same reward distribution type.\n\n Args:\n rng: A random number generator.\n seed: A seed for random number generation if ``rng`` is not provided.\n **kwargs: A dictionary where keys are arm parameter names and values are\n lists of parameter values for each arm.\n\n Returns:\n A bandit with the specified arms.\n \"\"\"\n params_dicts = [dict(zip(kwargs, t)) for t in zip(*kwargs.values())]\n if len(params_dicts) == 0:\n raise ValueError(\"insufficient parameters to create an arm\")\n return Bandit([cls(**params) for params in params_dicts], rng, seed)\n
"},{"location":"reference/arms/#mabby.arms.Arm.play","title":"play(rng)
abstractmethod
","text":"Plays the arm and samples a reward.
Parameters:
Name Type Description Defaultrng
Generator
A random number generator.
requiredReturns:
Type Descriptionfloat
The sampled reward from the arm's reward distribution.
Source code inmabby/arms.py
@abstractmethod\ndef play(self, rng: Generator) -> float:\n\"\"\"Plays the arm and samples a reward.\n\n Args:\n rng: A random number generator.\n\n Returns:\n The sampled reward from the arm's reward distribution.\n \"\"\"\n
"},{"location":"reference/arms/#mabby.arms.BernoulliArm","title":"BernoulliArm(p)
","text":" Bases: Arm
Bandit arm with a Bernoulli reward distribution.
Parameters:
Name Type Description Defaultp
float
Parameter of the Bernoulli distribution.
required Source code inmabby/arms.py
def __init__(self, p: float):\n\"\"\"Initializes a Bernoulli arm.\n\n Args:\n p: Parameter of the Bernoulli distribution.\n \"\"\"\n if p < 0 or p > 1:\n raise ValueError(\n f\"float {str(p)} is not a valid probability for Bernoulli distribution\"\n )\n\n self.p: float = p #: Parameter of the Bernoulli distribution\n
"},{"location":"reference/arms/#mabby.arms.GaussianArm","title":"GaussianArm(loc, scale)
","text":" Bases: Arm
Bandit arm with a Gaussian reward distribution.
Parameters:
Name Type Description Defaultloc
float
Mean (\"center\") of the Gaussian distribution.
requiredscale
float
Standard deviation of the Gaussian distribution.
required Source code inmabby/arms.py
def __init__(self, loc: float, scale: float):\n\"\"\"Initializes a Gaussian arm.\n\n Args:\n loc: Mean (\"center\") of the Gaussian distribution.\n scale: Standard deviation of the Gaussian distribution.\n \"\"\"\n if scale < 0:\n raise ValueError(\n f\"float {str(scale)} is not a valid scale for Gaussian distribution\"\n )\n\n self.loc: float = loc #: Mean (\"center\") of the Gaussian distribution\n self.scale: float = scale #: Standard deviation of the Gaussian distribution\n
"},{"location":"reference/bandit/","title":"bandit","text":"Provides Bandit
class for bandit simulations.
Bandit(arms, rng=None, seed=None)
","text":"Multi-armed bandit with one or more arms.
This class wraps around a list of arms, each of which has a reward distribution. It provides an interface for interacting with the arms, such as playing a specific arm, querying for the optimal arm, and computing regret from a given choice.
Parameters:
Name Type Description Defaultarms
list[Arm]
A list of arms for the bandit.
requiredrng
Generator | None
A random number generator.
None
seed
int | None
A seed for random number generation if rng
is not provided.
None
Source code in mabby/bandit.py
def __init__(\n self, arms: list[Arm], rng: Generator | None = None, seed: int | None = None\n):\n\"\"\"Initializes a bandit with a given set of arms.\n\n Args:\n arms: A list of arms for the bandit.\n rng: A random number generator.\n seed: A seed for random number generation if ``rng`` is not provided.\n \"\"\"\n self._arms = arms\n self._rng = rng if rng else np.random.default_rng(seed)\n
"},{"location":"reference/bandit/#mabby.bandit.Bandit.means","title":"means: list[float]
property
","text":"The means of the arms.
Returns:
Type Descriptionlist[float]
An array of the means of each arm.
"},{"location":"reference/bandit/#mabby.bandit.Bandit.__getitem__","title":"__getitem__(i)
","text":"Returns an arm by index.
Parameters:
Name Type Description Defaulti
int
The index of the arm to get.
requiredReturns:
Type DescriptionArm
The arm at the given index.
Source code inmabby/bandit.py
def __getitem__(self, i: int) -> Arm:\n\"\"\"Returns an arm by index.\n\n Args:\n i: The index of the arm to get.\n\n Returns:\n The arm at the given index.\n \"\"\"\n return self._arms[i]\n
"},{"location":"reference/bandit/#mabby.bandit.Bandit.__iter__","title":"__iter__()
","text":"Returns an iterator over the bandit's arms.
Source code inmabby/bandit.py
def __iter__(self) -> Iterable[Arm]:\n\"\"\"Returns an iterator over the bandit's arms.\"\"\"\n return iter(self._arms)\n
"},{"location":"reference/bandit/#mabby.bandit.Bandit.__len__","title":"__len__()
","text":"Returns the number of arms.
Source code inmabby/bandit.py
def __len__(self) -> int:\n\"\"\"Returns the number of arms.\"\"\"\n return len(self._arms)\n
"},{"location":"reference/bandit/#mabby.bandit.Bandit.__repr__","title":"__repr__()
","text":"Returns a string representation of the bandit.
Source code inmabby/bandit.py
def __repr__(self) -> str:\n\"\"\"Returns a string representation of the bandit.\"\"\"\n return repr(self._arms)\n
"},{"location":"reference/bandit/#mabby.bandit.Bandit.best_arm","title":"best_arm()
","text":"Returns the index of the optimal arm.
The optimal arm is the arm with the greatest expected reward. If there are multiple arms with equal expected rewards, a random one is chosen.
Returns:
Type Descriptionint
The index of the optimal arm.
Source code inmabby/bandit.py
def best_arm(self) -> int:\n\"\"\"Returns the index of the optimal arm.\n\n The optimal arm is the arm with the greatest expected reward. If there are\n multiple arms with equal expected rewards, a random one is chosen.\n\n Returns:\n The index of the optimal arm.\n \"\"\"\n return random_argmax(self.means, rng=self._rng)\n
"},{"location":"reference/bandit/#mabby.bandit.Bandit.is_opt","title":"is_opt(choice)
","text":"Returns the optimality of a given choice.
Parameters:
Name Type Description Defaultchoice
int
The index of the chosen arm.
requiredReturns:
Type Descriptionbool
True
if the arm has the greatest expected reward, False
otherwise.
mabby/bandit.py
def is_opt(self, choice: int) -> bool:\n\"\"\"Returns the optimality of a given choice.\n\n Args:\n choice: The index of the chosen arm.\n\n Returns:\n ``True`` if the arm has the greatest expected reward, ``False`` otherwise.\n \"\"\"\n return np.max(self.means) == self._arms[choice].mean\n
"},{"location":"reference/bandit/#mabby.bandit.Bandit.play","title":"play(i)
","text":"Plays an arm by index.
Parameters:
Name Type Description Defaulti
int
The index of the arm to play.
requiredReturns:
Type Descriptionfloat
The reward from playing the arm.
Source code inmabby/bandit.py
def play(self, i: int) -> float:\n\"\"\"Plays an arm by index.\n\n Args:\n i: The index of the arm to play.\n\n Returns:\n The reward from playing the arm.\n \"\"\"\n return self[i].play(self._rng)\n
"},{"location":"reference/bandit/#mabby.bandit.Bandit.regret","title":"regret(choice)
","text":"Returns the regret from a given choice.
The regret is computed as the difference between the expected reward from the optimal arm and the expected reward from the chosen arm.
Parameters:
Name Type Description Defaultchoice
int
The index of the chosen arm.
requiredReturns:
Type Descriptionfloat
The computed regret value.
Source code inmabby/bandit.py
def regret(self, choice: int) -> float:\n\"\"\"Returns the regret from a given choice.\n\n The regret is computed as the difference between the expected reward from the\n optimal arm and the expected reward from the chosen arm.\n\n Args:\n choice: The index of the chosen arm.\n\n Returns:\n The computed regret value.\n \"\"\"\n return np.max(self.means) - self._arms[choice].mean\n
"},{"location":"reference/exceptions/","title":"exceptions","text":"Provides exceptions for mabby usage.
"},{"location":"reference/exceptions/#mabby.exceptions.AgentUsageError","title":"AgentUsageError
","text":" Bases: Exception
Raised when agent methods are used incorrectly.
"},{"location":"reference/exceptions/#mabby.exceptions.SimulationUsageError","title":"SimulationUsageError
","text":" Bases: Exception
Raised when simulation methods are used incorrectly.
"},{"location":"reference/exceptions/#mabby.exceptions.StatsUsageError","title":"StatsUsageError
","text":" Bases: Exception
Raised when stats methods are used incorrectly.
"},{"location":"reference/exceptions/#mabby.exceptions.StrategyUsageError","title":"StrategyUsageError
","text":" Bases: Exception
Raised when strategy methods are used incorrectly.
"},{"location":"reference/simulation/","title":"simulation","text":"Provides Simulation
class for bandit simulations.
Simulation(bandit, agents=None, strategies=None, names=None, rng=None, seed=None)
","text":"Simulation of a multi-armed bandit problem.
A simulation consists of multiple trials of one or more bandit strategies run on a configured multi-armed bandit.
One of agents
or strategies
must be supplied. If agents
is supplied, strategies
and names
are ignored. Otherwise, an agent
is created for each strategy
and given a name from names
if available.
Parameters:
Name Type Description Defaultbandit
Bandit
A configured multi-armed bandit to simulate on.
requiredagents
Iterable[Agent] | None
A list of agents to simulate.
None
strategies
Iterable[Strategy] | None
A list of strategies to simulate.
None
names
Iterable[str] | None
A list of names for agents.
None
rng
Generator | None
A random number generator.
None
seed
int | None
A seed for random number generation if rng
is not provided.
None
Raises:
Type DescriptionSimulationUsageError
If neither agents
nor strategies
are supplied.
mabby/simulation.py
def __init__(\n self,\n bandit: Bandit,\n agents: Iterable[Agent] | None = None,\n strategies: Iterable[Strategy] | None = None,\n names: Iterable[str] | None = None,\n rng: Generator | None = None,\n seed: int | None = None,\n):\n\"\"\"Initializes a simulation.\n\n One of ``agents`` or ``strategies`` must be supplied. If ``agents`` is supplied,\n ``strategies`` and ``names`` are ignored. Otherwise, an ``agent`` is created for\n each ``strategy`` and given a name from ``names`` if available.\n\n Args:\n bandit: A configured multi-armed bandit to simulate on.\n agents: A list of agents to simulate.\n strategies: A list of strategies to simulate.\n names: A list of names for agents.\n rng: A random number generator.\n seed: A seed for random number generation if ``rng`` is not provided.\n\n Raises:\n SimulationUsageError: If neither ``agents`` nor ``strategies`` are supplied.\n \"\"\"\n self.agents = self._create_agents(agents, strategies, names)\n if len(list(self.agents)) == 0:\n raise ValueError(\"no strategies or agents were supplied\")\n self.bandit = bandit\n if len(self.bandit) == 0:\n raise ValueError(\"bandit cannot be empty\")\n self._rng = rng if rng else np.random.default_rng(seed)\n
"},{"location":"reference/simulation/#mabby.simulation.Simulation.run","title":"run(trials, steps, metrics=None)
","text":"Runs a simulation.
In a simulation run, each agent or strategy is run for the specified number of trials, and each trial is run for the given number of steps.
If metrics
is not specified, all available metrics are tracked by default.
Parameters:
Name Type Description Defaulttrials
int
The number of trials in the simulation.
requiredsteps
int
The number of steps in a trial.
requiredmetrics
Iterable[Metric] | None
A list of metrics to collect.
None
Returns:
Type DescriptionSimulationStats
A SimulationStats
object with the results of the simulation.
mabby/simulation.py
def run(\n self, trials: int, steps: int, metrics: Iterable[Metric] | None = None\n) -> SimulationStats:\n\"\"\"Runs a simulation.\n\n In a simulation run, each agent or strategy is run for the specified number of\n trials, and each trial is run for the given number of steps.\n\n If ``metrics`` is not specified, all available metrics are tracked by default.\n\n Args:\n trials: The number of trials in the simulation.\n steps: The number of steps in a trial.\n metrics: A list of metrics to collect.\n\n Returns:\n A ``SimulationStats`` object with the results of the simulation.\n \"\"\"\n sim_stats = SimulationStats(simulation=self)\n for agent in self.agents:\n agent_stats = self._run_trials_for_agent(agent, trials, steps, metrics)\n sim_stats.add(agent_stats)\n return sim_stats\n
"},{"location":"reference/stats/","title":"stats","text":"Provides metric tracking for multi-armed bandit simulations.
"},{"location":"reference/stats/#mabby.stats.AgentStats","title":"AgentStats(agent, bandit, steps, metrics=None)
","text":"Statistics for an agent in a multi-armed bandit simulation.
All available metrics are tracked by default. Alternatively, a specific list can be specified through the metrics
argument.
Parameters:
Name Type Description Defaultagent
Agent
The agent that statistics are tracked for
requiredbandit
Bandit
The bandit of the simulation being run
requiredsteps
int
The number of steps per trial in the simulation
requiredmetrics
Iterable[Metric] | None
A collection of metrics to track.
None
Source code in mabby/stats.py
def __init__(\n self,\n agent: Agent,\n bandit: Bandit,\n steps: int,\n metrics: Iterable[Metric] | None = None,\n):\n\"\"\"Initializes agent statistics.\n\n All available metrics are tracked by default. Alternatively, a specific list can\n be specified through the ``metrics`` argument.\n\n Args:\n agent: The agent that statistics are tracked for\n bandit: The bandit of the simulation being run\n steps: The number of steps per trial in the simulation\n metrics: A collection of metrics to track.\n \"\"\"\n self.agent: Agent = agent #: The agent that statistics are tracked for\n self._bandit = bandit\n self._steps = steps\n self._counts = np.zeros(steps)\n\n base_metrics = Metric.map_to_base(list(Metric) if metrics is None else metrics)\n self._stats = {stat: np.zeros(steps) for stat in base_metrics}\n
"},{"location":"reference/stats/#mabby.stats.AgentStats.__getitem__","title":"__getitem__(metric)
","text":"Gets values for a metric.
If the metric is not a base metric, the values are automatically transformed.
Parameters:
Name Type Description Defaultmetric
Metric
The metric to get the values for.
requiredReturns:
Type DescriptionNDArray[np.float64]
An array of values for the metric.
Source code inmabby/stats.py
def __getitem__(self, metric: Metric) -> NDArray[np.float64]:\n\"\"\"Gets values for a metric.\n\n If the metric is not a base metric, the values are automatically transformed.\n\n Args:\n metric: The metric to get the values for.\n\n Returns:\n An array of values for the metric.\n \"\"\"\n with np.errstate(divide=\"ignore\", invalid=\"ignore\"):\n values = self._stats[metric.base] / self._counts\n return metric.transform(values)\n
"},{"location":"reference/stats/#mabby.stats.AgentStats.__len__","title":"__len__()
","text":"Returns the number of steps each trial is tracked for.
Source code inmabby/stats.py
def __len__(self) -> int:\n\"\"\"Returns the number of steps each trial is tracked for.\"\"\"\n return self._steps\n
"},{"location":"reference/stats/#mabby.stats.AgentStats.update","title":"update(step, choice, reward)
","text":"Updates metric values for the latest simulation step.
Parameters:
Name Type Description Defaultstep
int
The number of the step.
requiredchoice
int
The choice made by the agent.
requiredreward
float
The reward observed by the agent.
required Source code inmabby/stats.py
def update(self, step: int, choice: int, reward: float) -> None:\n\"\"\"Updates metric values for the latest simulation step.\n\n Args:\n step: The number of the step.\n choice: The choice made by the agent.\n reward: The reward observed by the agent.\n \"\"\"\n regret = self._bandit.regret(choice)\n if Metric.REGRET in self._stats:\n self._stats[Metric.REGRET][step] += regret\n if Metric.OPTIMALITY in self._stats:\n self._stats[Metric.OPTIMALITY][step] += int(self._bandit.is_opt(choice))\n if Metric.REWARDS in self._stats:\n self._stats[Metric.REWARDS][step] += reward\n self._counts[step] += 1\n
"},{"location":"reference/stats/#mabby.stats.Metric","title":"Metric(label, base=None, transform=None)
","text":" Bases: Enum
Enum for metrics that simulations can track.
Metrics can be derived from other metrics through specifying a base
metric and a transform
function. This is useful for things like defining cumulative versions of an existing metric, where the transformed values can be computed \"lazily\" instead of being redundantly stored.
Parameters:
Name Type Description Defaultlabel
str
Verbose name of the metric (title case)
requiredbase
str | None
Name of the base metric
None
transform
Callable[[NDArray[np.float64]], NDArray[np.float64]] | None
Transformation function from the base metric
None
Source code in mabby/stats.py
def __init__(\n self,\n label: str,\n base: str | None = None,\n transform: Callable[[NDArray[np.float64]], NDArray[np.float64]] | None = None,\n):\n\"\"\"Initializes a metric.\n\n Metrics can be derived from other metrics through specifying a ``base`` metric\n and a ``transform`` function. This is useful for things like defining cumulative\n versions of an existing metric, where the transformed values can be computed\n \"lazily\" instead of being redundantly stored.\n\n Args:\n label: Verbose name of the metric (title case)\n base: Name of the base metric\n transform: Transformation function from the base metric\n \"\"\"\n self.__class__.__MAPPING__[self._name_] = self\n self._label = label\n self._mapping: MetricMapping | None = (\n MetricMapping(base=self.__class__.__MAPPING__[base], transform=transform)\n if base and transform\n else None\n )\n
"},{"location":"reference/stats/#mabby.stats.Metric.base","title":"base: Metric
property
","text":"The base metric that the metric is transformed from.
If the metric is already a base metric, the metric itself is returned.
"},{"location":"reference/stats/#mabby.stats.Metric.__repr__","title":"__repr__()
","text":"Returns the verbose name of the metric.
Source code inmabby/stats.py
def __repr__(self) -> str:\n\"\"\"Returns the verbose name of the metric.\"\"\"\n return self._label\n
"},{"location":"reference/stats/#mabby.stats.Metric.is_base","title":"is_base()
","text":"Returns whether the metric is a base metric.
Returns:
Type Descriptionbool
True
if the metric is a base metric, False
otherwise.
mabby/stats.py
def is_base(self) -> bool:\n\"\"\"Returns whether the metric is a base metric.\n\n Returns:\n ``True`` if the metric is a base metric, ``False`` otherwise.\n \"\"\"\n return self._mapping is None\n
"},{"location":"reference/stats/#mabby.stats.Metric.map_to_base","title":"map_to_base(metrics)
classmethod
","text":"Traces all metrics back to their base metrics.
Parameters:
Name Type Description Defaultmetrics
Iterable[Metric]
A collection of metrics.
requiredReturns:
Type DescriptionIterable[Metric]
A set containing the base metrics of all the input metrics.
Source code inmabby/stats.py
@classmethod\ndef map_to_base(cls, metrics: Iterable[Metric]) -> Iterable[Metric]:\n\"\"\"Traces all metrics back to their base metrics.\n\n Args:\n metrics: A collection of metrics.\n\n Returns:\n A set containing the base metrics of all the input metrics.\n \"\"\"\n return set(m.base for m in metrics)\n
"},{"location":"reference/stats/#mabby.stats.Metric.transform","title":"transform(values)
","text":"Transforms values from the base metric.
If the metric is already a base metric, the input values are returned.
Parameters:
Name Type Description Defaultvalues
NDArray[np.float64]
An array of input values for the base metric.
requiredReturns:
Type DescriptionNDArray[np.float64]
An array of transformed values for the metric.
Source code inmabby/stats.py
def transform(self, values: NDArray[np.float64]) -> NDArray[np.float64]:\n\"\"\"Transforms values from the base metric.\n\n If the metric is already a base metric, the input values are returned.\n\n Args:\n values: An array of input values for the base metric.\n\n Returns:\n An array of transformed values for the metric.\n \"\"\"\n if self._mapping is not None:\n return self._mapping.transform(values)\n return values\n
"},{"location":"reference/stats/#mabby.stats.MetricMapping","title":"MetricMapping
dataclass
","text":"Transformation from a base metric.
See Metric
for examples of metric mappings.
SimulationStats(simulation)
","text":"Statistics for a multi-armed bandit simulation.
Parameters:
Name Type Description Defaultsimulation
Simulation
The simulation to track.
required Source code inmabby/stats.py
def __init__(self, simulation: Simulation):\n\"\"\"Initializes simulation statistics.\n\n Args:\n simulation: The simulation to track.\n \"\"\"\n self._simulation: Simulation = simulation\n self._stats_dict: dict[Agent, AgentStats] = {}\n
"},{"location":"reference/stats/#mabby.stats.SimulationStats.__contains__","title":"__contains__(agent)
","text":"Returns if an agent's statistics are present.
Returns:
Type Descriptionbool
True
if an agent's statistics are present, False
otherwise.
mabby/stats.py
def __contains__(self, agent: Agent) -> bool:\n\"\"\"Returns if an agent's statistics are present.\n\n Returns:\n ``True`` if an agent's statistics are present, ``False`` otherwise.\n \"\"\"\n return agent in self._stats_dict\n
"},{"location":"reference/stats/#mabby.stats.SimulationStats.__getitem__","title":"__getitem__(agent)
","text":"Gets statistics for an agent.
Parameters:
Name Type Description Defaultagent
Agent
The agent to get the statistics of.
requiredReturns:
Type DescriptionAgentStats
The statistics of the agent.
Source code inmabby/stats.py
def __getitem__(self, agent: Agent) -> AgentStats:\n\"\"\"Gets statistics for an agent.\n\n Args:\n agent: The agent to get the statistics of.\n\n Returns:\n The statistics of the agent.\n \"\"\"\n return self._stats_dict[agent]\n
"},{"location":"reference/stats/#mabby.stats.SimulationStats.__setitem__","title":"__setitem__(agent, agent_stats)
","text":"Sets the statistics for an agent.
Parameters:
Name Type Description Defaultagent
Agent
The agent to set the statistics of.
requiredagent_stats
AgentStats
The agent statistics to set.
required Source code inmabby/stats.py
def __setitem__(self, agent: Agent, agent_stats: AgentStats) -> None:\n\"\"\"Sets the statistics for an agent.\n\n Args:\n agent: The agent to set the statistics of.\n agent_stats: The agent statistics to set.\n \"\"\"\n if agent != agent_stats.agent:\n raise StatsUsageError(\"agents specified in key and value don't match\")\n self._stats_dict[agent] = agent_stats\n
"},{"location":"reference/stats/#mabby.stats.SimulationStats.add","title":"add(agent_stats)
","text":"Adds statistics for an agent.
Parameters:
Name Type Description Defaultagent_stats
AgentStats
The agent statistics to add.
required Source code inmabby/stats.py
def add(self, agent_stats: AgentStats) -> None:\n\"\"\"Adds statistics for an agent.\n\n Args:\n agent_stats: The agent statistics to add.\n \"\"\"\n self._stats_dict[agent_stats.agent] = agent_stats\n
"},{"location":"reference/stats/#mabby.stats.SimulationStats.plot","title":"plot(metric)
","text":"Generates a plot for a simulation metric.
Parameters:
Name Type Description Defaultmetric
Metric
The metric to plot.
required Source code inmabby/stats.py
def plot(self, metric: Metric) -> None:\n\"\"\"Generates a plot for a simulation metric.\n\n Args:\n metric: The metric to plot.\n \"\"\"\n for agent, agent_stats in self._stats_dict.items():\n plt.plot(agent_stats[metric], label=str(agent))\n plt.legend()\n plt.show()\n
"},{"location":"reference/stats/#mabby.stats.SimulationStats.plot_optimality","title":"plot_optimality()
","text":"Generates a plot for the optimality metric.
Source code inmabby/stats.py
def plot_optimality(self) -> None:\n\"\"\"Generates a plot for the optimality metric.\"\"\"\n self.plot(metric=Metric.OPTIMALITY)\n
"},{"location":"reference/stats/#mabby.stats.SimulationStats.plot_regret","title":"plot_regret(cumulative=True)
","text":"Generates a plot for the regret or cumulative regret metrics.
Parameters:
Name Type Description Defaultcumulative
bool
Whether to use the cumulative regret.
True
Source code in mabby/stats.py
def plot_regret(self, cumulative: bool = True) -> None:\n\"\"\"Generates a plot for the regret or cumulative regret metrics.\n\n Args:\n cumulative: Whether to use the cumulative regret.\n \"\"\"\n self.plot(metric=Metric.CUM_REGRET if cumulative else Metric.REGRET)\n
"},{"location":"reference/stats/#mabby.stats.SimulationStats.plot_rewards","title":"plot_rewards(cumulative=True)
","text":"Generates a plot for the rewards or cumulative rewards metrics.
Parameters:
Name Type Description Defaultcumulative
bool
Whether to use the cumulative rewards.
True
Source code in mabby/stats.py
def plot_rewards(self, cumulative: bool = True) -> None:\n\"\"\"Generates a plot for the rewards or cumulative rewards metrics.\n\n Args:\n cumulative: Whether to use the cumulative rewards.\n \"\"\"\n self.plot(metric=Metric.CUM_REWARDS if cumulative else Metric.REWARDS)\n
"},{"location":"reference/utils/","title":"utils","text":"Provides commonly used utility functions.
"},{"location":"reference/utils/#mabby.utils.random_argmax","title":"random_argmax(values, rng)
","text":"Computes random argmax of an array.
If there are multiple maximums, the index of one is chosen at random.
Parameters:
Name Type Description Defaultvalues
ArrayLike
An input array.
requiredrng
Generator
A random number generator.
requiredReturns:
Type Descriptionint
The random argmax of the input array.
Source code inmabby/utils.py
def random_argmax(values: ArrayLike, rng: Generator) -> int:\n\"\"\"Computes random argmax of an array.\n\n If there are multiple maximums, the index of one is chosen at random.\n\n Args:\n values: An input array.\n rng: A random number generator.\n\n Returns:\n The random argmax of the input array.\n \"\"\"\n candidates = np.where(values == np.max(values))[0]\n return int(rng.choice(candidates))\n
"},{"location":"reference/strategies/","title":"strategies","text":"Multi-armed bandit strategies.
mabby provides a collection of preset bandit strategies that can be plugged into simulations. The Strategy
abstract base class can also be sub-classed to implement custom bandit strategies.
BetaTSStrategy(general=False)
","text":" Bases: Strategy
Thompson sampling strategy with Beta priors.
If general
is False
, rewards used for updates must be either 0 or 1. Otherwise, rewards must be with support [0, 1].
Parameters:
Name Type Description Defaultgeneral
bool
Whether to use a generalized version of the strategy.
False
Source code in mabby/strategies/thompson.py
def __init__(self, general: bool = False):\n\"\"\"Initializes a Beta Thompson sampling strategy.\n\n If ``general`` is ``False``, rewards used for updates must be either 0 or 1.\n Otherwise, rewards must be with support [0, 1].\n\n Args:\n general: Whether to use a generalized version of the strategy.\n \"\"\"\n self.general = general\n
"},{"location":"reference/strategies/#mabby.strategies.EpsilonFirstStrategy","title":"EpsilonFirstStrategy(eps)
","text":" Bases: SemiUniformStrategy
Epsilon-first bandit strategy.
The epsilon-first strategy has a pure exploration phase followed by a pure exploitation phase.
Parameters:
Name Type Description Defaulteps
float
The ratio of exploration steps (must be between 0 and 1).
required Source code inmabby/strategies/semi_uniform.py
def __init__(self, eps: float) -> None:\n\"\"\"Initializes an epsilon-first strategy.\n\n Args:\n eps: The ratio of exploration steps (must be between 0 and 1).\n \"\"\"\n super().__init__()\n if eps < 0 or eps > 1:\n raise ValueError(\"eps must be between 0 and 1\")\n self.eps = eps\n
"},{"location":"reference/strategies/#mabby.strategies.EpsilonGreedyStrategy","title":"EpsilonGreedyStrategy(eps)
","text":" Bases: SemiUniformStrategy
Epsilon-greedy bandit strategy.
The epsilon-greedy strategy has a fixed chance of exploration every time step.
Parameters:
Name Type Description Defaulteps
float
The chance of exploration (must be between 0 and 1).
required Source code inmabby/strategies/semi_uniform.py
def __init__(self, eps: float) -> None:\n\"\"\"Initializes an epsilon-greedy strategy.\n\n Args:\n eps: The chance of exploration (must be between 0 and 1).\n \"\"\"\n super().__init__()\n if eps < 0 or eps > 1:\n raise ValueError(\"eps must be between 0 and 1\")\n self.eps = eps\n
"},{"location":"reference/strategies/#mabby.strategies.RandomStrategy","title":"RandomStrategy()
","text":" Bases: SemiUniformStrategy
Random bandit strategy.
The random strategy chooses arms at random, i.e., it explores with 100% chance.
Source code inmabby/strategies/semi_uniform.py
def __init__(self) -> None:\n\"\"\"Initializes a random strategy.\"\"\"\n super().__init__()\n
"},{"location":"reference/strategies/#mabby.strategies.SemiUniformStrategy","title":"SemiUniformStrategy()
","text":" Bases: Strategy
, ABC
, EnforceOverrides
Base class for semi-uniform bandit strategies.
Every semi-uniform strategy must implement effective_eps
to compute the chance of exploration at each time step.
mabby/strategies/semi_uniform.py
def __init__(self) -> None:\n\"\"\"Initializes a semi-uniform strategy.\"\"\"\n
"},{"location":"reference/strategies/#mabby.strategies.semi_uniform.SemiUniformStrategy.effective_eps","title":"effective_eps()
abstractmethod
","text":"Returns the effective epsilon value.
The effective epsilon value is the probability at the current time step that the bandit will explore rather than exploit. Depending on the strategy, the effective epsilon value may be different from the nominal epsilon value set.
Source code inmabby/strategies/semi_uniform.py
@abstractmethod\ndef effective_eps(self) -> float:\n\"\"\"Returns the effective epsilon value.\n\n The effective epsilon value is the probability at the current time step that the\n bandit will explore rather than exploit. Depending on the strategy, the\n effective epsilon value may be different from the nominal epsilon value set.\n \"\"\"\n
"},{"location":"reference/strategies/#mabby.strategies.Strategy","title":"Strategy()
","text":" Bases: ABC
, EnforceOverrides
Base class for a bandit strategy.
A strategy provides the computational logic for choosing which bandit arms to play and updating parameter estimates.
Source code inmabby/strategies/strategy.py
@abstractmethod\ndef __init__(self) -> None:\n\"\"\"Initializes a bandit strategy.\"\"\"\n
"},{"location":"reference/strategies/#mabby.strategies.strategy.Strategy.Ns","title":"Ns: NDArray[np.uint32]
abstractmethod
property
","text":"The number of times each arm has been played.
"},{"location":"reference/strategies/#mabby.strategies.strategy.Strategy.Qs","title":"Qs: NDArray[np.float64]
abstractmethod
property
","text":"The current estimated action values for each arm.
"},{"location":"reference/strategies/#mabby.strategies.strategy.Strategy.__repr__","title":"__repr__()
abstractmethod
","text":"Returns a string representation of the strategy.
Source code inmabby/strategies/strategy.py
@abstractmethod\ndef __repr__(self) -> str:\n\"\"\"Returns a string representation of the strategy.\"\"\"\n
"},{"location":"reference/strategies/#mabby.strategies.strategy.Strategy.agent","title":"agent(**kwargs)
","text":"Creates an agent following the strategy.
Parameters:
Name Type Description Default**kwargs
str
Parameters for initializing the agent (see Agent
)
{}
Returns:
Type DescriptionAgent
The created agent with the strategy.
Source code inmabby/strategies/strategy.py
def agent(self, **kwargs: str) -> Agent:\n\"\"\"Creates an agent following the strategy.\n\n Args:\n **kwargs: Parameters for initializing the agent (see\n [`Agent`][mabby.agent.Agent])\n\n Returns:\n The created agent with the strategy.\n \"\"\"\n return Agent(strategy=self, **kwargs)\n
"},{"location":"reference/strategies/#mabby.strategies.strategy.Strategy.choose","title":"choose(rng)
abstractmethod
","text":"Returns the next arm to play.
Parameters:
Name Type Description Defaultrng
Generator
A random number generator.
requiredReturns:
Type Descriptionint
The index of the arm to play.
Source code inmabby/strategies/strategy.py
@abstractmethod\ndef choose(self, rng: Generator) -> int:\n\"\"\"Returns the next arm to play.\n\n Args:\n rng: A random number generator.\n\n Returns:\n The index of the arm to play.\n \"\"\"\n
"},{"location":"reference/strategies/#mabby.strategies.strategy.Strategy.prime","title":"prime(k, steps)
abstractmethod
","text":"Primes the strategy before running a trial.
Parameters:
Name Type Description Defaultk
int
The number of bandit arms to choose from.
requiredsteps
int
The number of steps to the simulation will be run.
required Source code inmabby/strategies/strategy.py
@abstractmethod\ndef prime(self, k: int, steps: int) -> None:\n\"\"\"Primes the strategy before running a trial.\n\n Args:\n k: The number of bandit arms to choose from.\n steps: The number of steps to the simulation will be run.\n \"\"\"\n
"},{"location":"reference/strategies/#mabby.strategies.strategy.Strategy.update","title":"update(choice, reward, rng=None)
abstractmethod
","text":"Updates internal parameter estimates based on reward observation.
Parameters:
Name Type Description Defaultchoice
int
The most recent choice made.
requiredreward
float
The observed reward from the agent's most recent choice.
requiredrng
Generator | None
A random number generator.
None
Source code in mabby/strategies/strategy.py
@abstractmethod\ndef update(self, choice: int, reward: float, rng: Generator | None = None) -> None:\n\"\"\"Updates internal parameter estimates based on reward observation.\n\n Args:\n choice: The most recent choice made.\n reward: The observed reward from the agent's most recent choice.\n rng: A random number generator.\n \"\"\"\n
"},{"location":"reference/strategies/#mabby.strategies.UCB1Strategy","title":"UCB1Strategy(alpha)
","text":" Bases: Strategy
Strategy using the UCB1 bandit algorithm.
Parameters:
Name Type Description Defaultalpha
float
The exploration parameter.
required Source code inmabby/strategies/ucb.py
def __init__(self, alpha: float) -> None:\n\"\"\"Initializes a UCB1 strategy.\n\n Args:\n alpha: The exploration parameter.\n \"\"\"\n if alpha < 0:\n raise ValueError(\"alpha must be greater than 0\")\n self.alpha = alpha\n
"},{"location":"reference/strategies/semi_uniform/","title":"semi_uniform","text":"Provides implementations of semi-uniform bandit strategies.
Semi-uniform strategies will choose to explore or exploit at each time step. When exploring, a random arm will be played. When exploiting, the arm with the greatest estimated action value will be played. epsilon
, the chance of exploration, is computed differently with different semi-uniform strategies.
EpsilonFirstStrategy(eps)
","text":" Bases: SemiUniformStrategy
Epsilon-first bandit strategy.
The epsilon-first strategy has a pure exploration phase followed by a pure exploitation phase.
Parameters:
Name Type Description Defaulteps
float
The ratio of exploration steps (must be between 0 and 1).
required Source code inmabby/strategies/semi_uniform.py
def __init__(self, eps: float) -> None:\n\"\"\"Initializes an epsilon-first strategy.\n\n Args:\n eps: The ratio of exploration steps (must be between 0 and 1).\n \"\"\"\n super().__init__()\n if eps < 0 or eps > 1:\n raise ValueError(\"eps must be between 0 and 1\")\n self.eps = eps\n
"},{"location":"reference/strategies/semi_uniform/#mabby.strategies.semi_uniform.EpsilonGreedyStrategy","title":"EpsilonGreedyStrategy(eps)
","text":" Bases: SemiUniformStrategy
Epsilon-greedy bandit strategy.
The epsilon-greedy strategy has a fixed chance of exploration every time step.
Parameters:
Name Type Description Defaulteps
float
The chance of exploration (must be between 0 and 1).
required Source code inmabby/strategies/semi_uniform.py
def __init__(self, eps: float) -> None:\n\"\"\"Initializes an epsilon-greedy strategy.\n\n Args:\n eps: The chance of exploration (must be between 0 and 1).\n \"\"\"\n super().__init__()\n if eps < 0 or eps > 1:\n raise ValueError(\"eps must be between 0 and 1\")\n self.eps = eps\n
"},{"location":"reference/strategies/semi_uniform/#mabby.strategies.semi_uniform.RandomStrategy","title":"RandomStrategy()
","text":" Bases: SemiUniformStrategy
Random bandit strategy.
The random strategy chooses arms at random, i.e., it explores with 100% chance.
Source code inmabby/strategies/semi_uniform.py
def __init__(self) -> None:\n\"\"\"Initializes a random strategy.\"\"\"\n super().__init__()\n
"},{"location":"reference/strategies/semi_uniform/#mabby.strategies.semi_uniform.SemiUniformStrategy","title":"SemiUniformStrategy()
","text":" Bases: Strategy
, ABC
, EnforceOverrides
Base class for semi-uniform bandit strategies.
Every semi-uniform strategy must implement effective_eps
to compute the chance of exploration at each time step.
mabby/strategies/semi_uniform.py
def __init__(self) -> None:\n\"\"\"Initializes a semi-uniform strategy.\"\"\"\n
"},{"location":"reference/strategies/semi_uniform/#mabby.strategies.semi_uniform.SemiUniformStrategy.effective_eps","title":"effective_eps()
abstractmethod
","text":"Returns the effective epsilon value.
The effective epsilon value is the probability at the current time step that the bandit will explore rather than exploit. Depending on the strategy, the effective epsilon value may be different from the nominal epsilon value set.
Source code inmabby/strategies/semi_uniform.py
@abstractmethod\ndef effective_eps(self) -> float:\n\"\"\"Returns the effective epsilon value.\n\n The effective epsilon value is the probability at the current time step that the\n bandit will explore rather than exploit. Depending on the strategy, the\n effective epsilon value may be different from the nominal epsilon value set.\n \"\"\"\n
"},{"location":"reference/strategies/strategy/","title":"strategy","text":"Provides Strategy
class.
Strategy()
","text":" Bases: ABC
, EnforceOverrides
Base class for a bandit strategy.
A strategy provides the computational logic for choosing which bandit arms to play and updating parameter estimates.
Source code inmabby/strategies/strategy.py
@abstractmethod\ndef __init__(self) -> None:\n\"\"\"Initializes a bandit strategy.\"\"\"\n
"},{"location":"reference/strategies/strategy/#mabby.strategies.strategy.Strategy.Ns","title":"Ns: NDArray[np.uint32]
abstractmethod
property
","text":"The number of times each arm has been played.
"},{"location":"reference/strategies/strategy/#mabby.strategies.strategy.Strategy.Qs","title":"Qs: NDArray[np.float64]
abstractmethod
property
","text":"The current estimated action values for each arm.
"},{"location":"reference/strategies/strategy/#mabby.strategies.strategy.Strategy.__repr__","title":"__repr__()
abstractmethod
","text":"Returns a string representation of the strategy.
Source code inmabby/strategies/strategy.py
@abstractmethod\ndef __repr__(self) -> str:\n\"\"\"Returns a string representation of the strategy.\"\"\"\n
"},{"location":"reference/strategies/strategy/#mabby.strategies.strategy.Strategy.agent","title":"agent(**kwargs)
","text":"Creates an agent following the strategy.
Parameters:
Name Type Description Default**kwargs
str
Parameters for initializing the agent (see Agent
)
{}
Returns:
Type DescriptionAgent
The created agent with the strategy.
Source code inmabby/strategies/strategy.py
def agent(self, **kwargs: str) -> Agent:\n\"\"\"Creates an agent following the strategy.\n\n Args:\n **kwargs: Parameters for initializing the agent (see\n [`Agent`][mabby.agent.Agent])\n\n Returns:\n The created agent with the strategy.\n \"\"\"\n return Agent(strategy=self, **kwargs)\n
"},{"location":"reference/strategies/strategy/#mabby.strategies.strategy.Strategy.choose","title":"choose(rng)
abstractmethod
","text":"Returns the next arm to play.
Parameters:
Name Type Description Defaultrng
Generator
A random number generator.
requiredReturns:
Type Descriptionint
The index of the arm to play.
Source code inmabby/strategies/strategy.py
@abstractmethod\ndef choose(self, rng: Generator) -> int:\n\"\"\"Returns the next arm to play.\n\n Args:\n rng: A random number generator.\n\n Returns:\n The index of the arm to play.\n \"\"\"\n
"},{"location":"reference/strategies/strategy/#mabby.strategies.strategy.Strategy.prime","title":"prime(k, steps)
abstractmethod
","text":"Primes the strategy before running a trial.
Parameters:
Name Type Description Defaultk
int
The number of bandit arms to choose from.
requiredsteps
int
The number of steps to the simulation will be run.
required Source code inmabby/strategies/strategy.py
@abstractmethod\ndef prime(self, k: int, steps: int) -> None:\n\"\"\"Primes the strategy before running a trial.\n\n Args:\n k: The number of bandit arms to choose from.\n steps: The number of steps to the simulation will be run.\n \"\"\"\n
"},{"location":"reference/strategies/strategy/#mabby.strategies.strategy.Strategy.update","title":"update(choice, reward, rng=None)
abstractmethod
","text":"Updates internal parameter estimates based on reward observation.
Parameters:
Name Type Description Defaultchoice
int
The most recent choice made.
requiredreward
float
The observed reward from the agent's most recent choice.
requiredrng
Generator | None
A random number generator.
None
Source code in mabby/strategies/strategy.py
@abstractmethod\ndef update(self, choice: int, reward: float, rng: Generator | None = None) -> None:\n\"\"\"Updates internal parameter estimates based on reward observation.\n\n Args:\n choice: The most recent choice made.\n reward: The observed reward from the agent's most recent choice.\n rng: A random number generator.\n \"\"\"\n
"},{"location":"reference/strategies/thompson/","title":"thompson","text":"Provides implementations of Thompson sampling strategies.
"},{"location":"reference/strategies/thompson/#mabby.strategies.thompson.BetaTSStrategy","title":"BetaTSStrategy(general=False)
","text":" Bases: Strategy
Thompson sampling strategy with Beta priors.
If general
is False
, rewards used for updates must be either 0 or 1. Otherwise, rewards must be with support [0, 1].
Parameters:
Name Type Description Defaultgeneral
bool
Whether to use a generalized version of the strategy.
False
Source code in mabby/strategies/thompson.py
def __init__(self, general: bool = False):\n\"\"\"Initializes a Beta Thompson sampling strategy.\n\n If ``general`` is ``False``, rewards used for updates must be either 0 or 1.\n Otherwise, rewards must be with support [0, 1].\n\n Args:\n general: Whether to use a generalized version of the strategy.\n \"\"\"\n self.general = general\n
"},{"location":"reference/strategies/ucb/","title":"ucb","text":"Provides implementations of upper confidence bound (UCB) strategies.
"},{"location":"reference/strategies/ucb/#mabby.strategies.ucb.UCB1Strategy","title":"UCB1Strategy(alpha)
","text":" Bases: Strategy
Strategy using the UCB1 bandit algorithm.
Parameters:
Name Type Description Defaultalpha
float
The exploration parameter.
required Source code inmabby/strategies/ucb.py
def __init__(self, alpha: float) -> None:\n\"\"\"Initializes a UCB1 strategy.\n\n Args:\n alpha: The exploration parameter.\n \"\"\"\n if alpha < 0:\n raise ValueError(\"alpha must be greater than 0\")\n self.alpha = alpha\n
"}]}
\ No newline at end of file
diff --git a/sitemap.xml b/sitemap.xml
new file mode 100644
index 0000000..b73be61
--- /dev/null
+++ b/sitemap.xml
@@ -0,0 +1,98 @@
+
+