-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After the deadline in a few hours, please feel free to USE THIS THREAD TO SHARE YOUR STRATEGIES, since the fear of them being stolen will be gone #77
Comments
I will upload it in my fork |
First, I would like to mention that the competition isn't over yet. Second, I recommend leaving a pull request here instead. We have an improved tournament runner, a leaderboard, and a discord as well. |
my strategy has 5 states of memory that describes the opponent and gets the best out of him |
carykh's video description says "But, I'll give you all a 24-hour grace-period, which moves the deadline to May 27th, 2021 at 10 PM UTC, to allow for software bugs / resubmissions etc." So be careful not to reveal too much! |
@Xapakas cheers, guess I'll leave it another day! |
Wait, there was a grace period????? |
Yes, it was put on the youtube video's description. I honestly feel like the grace period is too long, it should be for allowing final fixes for people who tried to submit at the end of the time and weren't able to, but instead it's a significant portion of the competition time and I feel like you have to use it or you're at a significant disadvantage. It also wasn't very clearly announced, I believe the video description was the only place. Oh well, too late to change it at this point. |
My brain is too fried to make any more changes anyway, tried to refactor my code but the cleaner code performed way worse for some reason so I just submitted my ugly hacky one. |
STRATEGY NAME: revJossFGT6 - reversed Joss + forgiving grimTrigger My strategy works like this: at the first rounds (I figured with a lot of testing in different environments that the best number of rounds for this was 6), it behaves like a "reversed Joss" (titFotTat that, when is continuously being attacked — and, therefofre, attacking — tries to cooperate). If the opponent didn't make any unprovoked attack, it would continue like this forever. Else, it would become what I like to call a forgiving grimTrigger. So, as you can see, this is an addaptative approach. If the opponent cooperates for the initial rounds, then I would cooperate back with a very forgiving, but responsive, strat. What I mean by that is that it works like a titForTat, so it responds to the enemy's attack quickly (as opposed to a forgiving titForTat for example) but it is still very forgiving, because it likes to cooperate even if the opponent is attacking a lot (because SS (+3 +3) is better than TT (+1 +1)). And one thing is sure: since the opponent always cooperated in the first rounds, it is very unlikely that it would attack a lot in this stage. But there's a key element that is required for this strategy to work at its best that I didn't mention yet. It is the amount of randomness in the system. If you make a lot of copies of the random.py file in the exampleStrats folder and run the simulation, you will see that the best strategy will probably be the grimTrigger (as opposed to the standart one, titForTat). You can try it yourself, but the ideia behind it is that by getting deffensive (attacking a lot), the grimTrigger will always get at least 1 point each round and, on average, it will score 3 (~50% are TTs (+1 +1) and ~50% are TSs (+5 +0) -> avg = (+3 +0.5)). And since my strategy is essentially a grimTrigger against 50/50 randoms (63/64 times — the only way for it to be a revJoss would be if the random strategy output for the first rounds was SSSSSS, and this only happens once in 2^6 (64) times), it works reeeeally well against randoms. (I don't really mind if people take advantage of knowing this strategy, since there were always forks from people like @l4vr0v with a lot of available strategies, which could be straight up copied for example. Also, the thing that matters the most isn't knowing about a single strategy, but the strategy trends, meaning the most common ones — those that can you can take advantage of — and the least common ones — those that you don't have to care about that much). Wow, that was a lot! It's so cool to see how convoluted a game with only two possible choices for each player can get haha |
hey guys, yeah this thread is cool! To answer @duckboycool 's concerns about the grace period being too long: |
I guess because of the final submission deadline in #81 and the video FAQ, if you're really worried about others knowing your strategies, you should wait until May 28th 10 PM UTC. (Only people who have submitted a resubmission form already will be able to submit in 2 hours until then though.) |
Yeah, the new deadline (ONLY for those who managed to submit re-submission requests late) is roughly 26 hours from now, May 28th 10 PM UTC. There's only about 40 people in this category. Once that date hits, absolutely no more resubmission requests can be accepted. If you haven't submitted a resubmission request as of now, you can't anymore because we're passed original the submission deadline! |
I originally did ML stuff, but it was a bit slow and the result wasn't that good. So I made this. It's only 11 lines of code. def strategy(history, memory):
choice = 1
if len(history[0]) > 1:
choice = history[1,-1]
courage = 0
for x in range(len(history[0])):
courage += 2 * history[1][x] - 1
courage /= len(history[0])
if courage < 0.01 and len(history[0]) > 3:
choice = 0
return choice, None It's Tit for Tat, but it checks the ratio between the opponent's defects and cooperations, if the opponent defects too much it turns grim, but if the opponent starts cooperating enough to offset the ratio, it returns to its normal behavior. This did unnaturally well for its simplicity, and did 8th place in the "prisoners dillemma enjoyers" fork of this repo. |
can't you just do |
I'll try that. Thanks.(EDIT: It worked.) |
This is my strategy it is not the best but i tried. import random
#THIS IS OMEGA JOSS!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#Joss
#+grace period where it is just tit-for-tat
#+Agression detection
#+varing the chance of random defection based on agression
#+goes grim after "grimThreshold" defections
#+avoid double defections
#+programmable next choice (currently not used)
def strategy(history, memory):
#vars
TotalAgression = 0 #How many times it defected
TotalNiceness = 0 #How many times it cooped
choice = 1 #The choice of Omega Joss. duh.
nextChoice = None #This overwrites the next move of Omega Joss
#parameters (test for best value)
chance = 0.8 #initial chance of defecting. TEST DONE
gracePeriod = 10 #the period before any random defection. TEST DONE
grimThreshold = 40
if history.shape[1] < 1: #If it's the first round
return choice, [TotalAgression, nextChoice] #Then Cooperate
TotalAgression = memory[0] #Reading memory
nextChoice = memory[1] #Same
if nextChoice is not None: #Overwriteing any decision if nextChoice is not None
choice = nextChoice # ^
nextChoice = None # ^
elif TotalAgression >= grimThreshold:
choice = 0
elif TotalAgression == history.shape[1]: #If it always defected then we defect
choice = 0
elif random.random() <= (chance - TotalAgression/(history.shape[1]) + TotalNiceness/(history.shape[1])) and history[0, -1] != 0 and history.shape[1] >= gracePeriod:
#Randomization and makeing sure it is not the grace period and that we avoid defecting twice in a row
choice = 0
elif history[1,-1] == 0:
choice = 0
TotalAgression += 1
TotalNiceness = (history.shape[1]-gracePeriod)-TotalAgression
if history.shape[1] == gracePeriod:
TotalAgression = 0
TotalNiceness = 0
return choice, [TotalAgression, nextChoice] #Return the function. duh |
The stark contrast between yours and mine.... |
This is another thing I made, it uses ML to predict the opponent's choice and brute forces the best option. But it's trash, despite being super long. Also the code is a piece of spaghetti garbage. thinklength = 5
howahead = 3
LearnRate= 0.05
LearnIterations = 50
LearnThreshold = 1e-3
table = [[1,5],[0,6]]
import random as r
import time as t
alert = 20
alertatall = False
def activation(x):
if(x > 0):
return x
else:
return x / 1024
def clone(thing):
if(type(thing) == list):
a = []
for x in range(len(thing)):
a.append(clone(thing[x]))
return a
else:
return thing
def derivative(x):
if x > 0:
return 1
else:
return 1 / 1024
class network:
def __init__(self,nodes):
self.nodes = []
self.raw = []
self.weights = []
self.biases = []
a = []
self.CostValue = 1000
for x in range(nodes[0]):
a.append(0)
self.nodes.append(a)
self.raw.append(a)
for x in range(len(nodes) - 1):
self.nodes.append([])
self.raw.append([])
self.biases.append([])
self.weights.append([])
for y in range(nodes[x + 1]):
self.raw[x + 1].append(0)
self.nodes[x + 1].append(0)
self.biases[x].append(r.random())
self.weights[x].append([])
for z in range(nodes[x]):
self.weights[x][y].append(r.random())
def predict(self,input_list):
self.nodes[0] = input_list
self.raw[0] = input_list
for x in range(len(self.biases)):
a = []
c = []
for y in range(len(self.biases[x])):
b = self.biases[x][y]
for z in range(len(self.weights[x][y])):
b += self.weights[x][y][z] * self.nodes[x][z]
a.append(activation(b))
c.append(b)
self.nodes[x + 1] = a
self.raw[x + 1] = c
def output(self):
return self.nodes[len(self.nodes) - 1]
def cost(self,input_list,output_list):
self.predict(input_list)
a = self.output()
b = 0
for x in range(len(a)):
try:
b += ((a[x] - output_list[x]) ** 2)
except OverflowError:
b += 16e+256
self.CostValue = b
return b
def backprop(self, input_list, output_list):
self.predict(input_list)
w = clone(self.weights)
b = clone(self.biases)
expectedoutput = output_list
for p in range(len(self.nodes) - 1):
x = len(self.nodes) - p - 1
differences = []
for y in range(len(self.nodes[x])):
differences.append(self.nodes[x][y] - expectedoutput[y])
for y in range(len(self.nodes[x])):
b[x - 1][y] = 2 * differences[y] * derivative(self.raw[x][y])
for z in range(len(self.nodes[x - 1])):
w[x - 1][y][z] = self.nodes[x - 1][z] * 2 * differences[y] * derivative(self.raw[x][y])
expectedoutput = []
for y in range(len(self.nodes[x - 1])):
a = 0
for z in range(len(self.nodes[x])):
a += self.weights[x - 1][z][y] * 2 * differences[z] * derivative(self.raw[x][z])
expectedoutput.append(((a / len(self.nodes[x])) / (-2)) + self.nodes[x - 1][y])
return [w,b]
def train(self,inputs,outputs,LearnRate,iterations):
for q in range(iterations):
total = 0
c = self.backprop(inputs,outputs)
avgCost = self.cost(inputs,outputs)
for x in range(len(self.weights)):
for y in range(len(self.weights[x])):
total += c[1][x][y]
for z in range(len(self.weights[x][y])):
total += c[0][x][y][z]
if(total < 0):
total = -total
if(total == 0):
total = 1e-256
for x in range(len(self.weights)):
for y in range(len(self.weights[x])):
self.biases[x][y] -= c[1][x][y] * LearnRate * (avgCost ** 0.5) / total
for z in range(len(self.weights[x][y])):
self.weights[x][y][z] -= c[0][x][y][z] * LearnRate * (avgCost ** 0.5) / total
self.CostValue = avgCost
if self.CostValue < 1e-10:
break
def biggest(lis):
big = [0]
for x in range(len(lis) - 1):
y = x + 1
if(lis[y] > lis[big[0]]):
big = [y]
if(lis[y] == lis[big[0]] and big[0] != y):
big.append(y)
return big[r.randint(0,len(big) - 1)]
def shift(thing, newnum):
a = thing
for x in range(len(a) - 1):
a[x] = a[x + 1]
a[len(a) - 1] = newnum
return a
def testseq(feed, sequence, nn):
james = nn
score = 0
future = feed
for x in range(len(sequence)):
james.predict(future)
adversary = 1 * (james.output()[0] > 0.5)
future = shift(shift(future, sequence[x]), adversary)
score += table[sequence[x]][adversary]
return score
def tobinary(num, digits):
number = num
bits = []
for q in range(digits):
x = digits - q - 1
if 2 ** x <= number:
bits.append(1)
number -= 2 ** x
else:
bits.append(0)
return bits
inittime = t.time()
def strategy(history, memory):
choice = 1
tim = t.time()
if type(memory) == list:
player = memory[0]
previousfeed = memory[1]
wronged = memory[2]
else:
player = network([2 * thinklength, 10, 6, 1])
wronged = False
feed = []
for x in range(2 * (thinklength - min(10, len(history[0])))):
feed.append(0)
for x in range(min(thinklength, len(history[0]))):
feed.append(2 * (history[0, x + max(0, len(history[0]) - thinklength)] - 0.5))
feed.append(2 * (history[1, x + max(0, len(history[0]) - thinklength)] - 0.5))
if 0 in history[1]:
wronged = True
if type(memory) == list and wronged:
player.predict(previousfeed)
#if 1 * (player.output()[0] > 0.5) != history[1][len(history[0]) - 1]:
#print(len(history[0]), 1 * (player.output()[0] > 0.5), history[1][len(history[0]) - 1], "incorrect!")
player.train(previousfeed, [history[1][len(history[0]) - 1]], LearnRate, LearnIterations)
#player.train(previousfeed, [history[1][len(history[0]) - 1]], LearnRate, LearnThreshold)
options = []
scores = []
for x in range(2 ** howahead):
trythis = tobinary(x, howahead)
options.append(trythis)
scores.append(testseq(feed, trythis, player))
choice = options[biggest(scores)][0]
if len(history[0]) % alert == 1 and alertatall:
print(player.CostValue)
if not wronged:
choice = 1
#if player.CostValue > 1:
# print(t.time() - inittime, "Crap")
#print(t.time() - tim)
return choice, [player, feed, wronged] |
Wow that is impressive. |
Thank you. |
You only save TotalAgression to the memory. TotalNiceness will always be zero(it's set near the end of the function, but then never used). Not sure if that was what you intended to do.
is exactly the same as
Anyway here's my code class Memory:
def __init__(self):
self.cooperateCount = 0
self.defectCount = 0
self.defectCountInARow = 0
self.cooperateCountInARow = 0
self.turnNumber = 0
self.timesCheated = 0
def analyseMove(self, history):
self.turnNumber = history.shape[1]
myLastMove = history[0, -1]
enemyLastMove = history[1, -1]
if enemyLastMove == 0:
self.defectCount += 1
self.defectCountInARow += 1
self.cooperateCountInARow = 0
if myLastMove == 1:
self.timesCheated += 1
else:
self.cooperateCount += 1
self.cooperateCountInARow += 1
self.defectCountInARow = 0
def decideMove(self, history):
# this prevents wasting resources on always defecting enemies
if self.defectCount / (float)(self.turnNumber) >= 0.44:
return 0, self
# enemy defected two times in a row he is not nice I'm going to be rude back!
if self.defectCountInARow > 2:
if self.defectCountInARow == 3 and self.cooperateCount != 0: # maybe we can make up after all! Let's stop fighting!(but only if you were nice to me before)
return 1, self
if self.defectCountInARow == 4 and self.cooperateCount != 0:
return 1, self
return 0, self
return self.defaultUpgradedTitForTat(history), self
def defaultUpgradedTitForTat(self, history):
myLastMove = history[0, -1]
enemyLastMove = history[1, -1]
if enemyLastMove == 0: # enemy was rude
if myLastMove == 1: # i've been nice and enemy has been rude! I will be rude now as well!
if self.turnNumber >= 2 and history[1, -2] and self.timesCheated < 3: # he was nice before? let's be nice! (but only 2 chances!)
return 1
return 0
else: # i've been rude and enemy has been rude! I guess we should try to make up
return 1
else:
return 1
def strategy(history, memory):
if memory is None: # cooperate on the first turn
return 1, Memory()
else:
memory.analyseMove(history)
return memory.decideMove(history) It's basically a quite forgiving titForTat that gives the enemy multiple chances to make up, but can also work as grim if enemy defects too many times. |
It seems like titForTat and grimTrigger variants are very common (which makes sense). Anyway, since you are posting the source code of your strategies, I figured I might as well post mine (for testing purposes or whatever).
|
A lot of people are posting their strategies! I really hope there will be another one of these soon. |
That's a great idea! Here's my strategy in just 9 LOC, called "weightedMonotonic" (also here): # Weights for opponents' past moves, starting with the most recent one
weights = [1, -0.8, 0.8, -0.8, 0.8, -0.8, 0.8, -0.8, 0.8]
def strategy(history, memory):
round_index = history.shape[1]
opponent_history = history[1]
acc = 0
for (i, w) in enumerate(weights):
if i < round_index:
# Weigh opponents' i-to-last move (-1 if they defected, +1 if they cooperated)
acc += w * (opponent_history[-i - 1] * 2 - 1)
# Cooperate if weighted sum is at least 0. It's important to cooperate in the first round (acc=0) as to not anger
# the Grim Trigger
return 1 if acc >= 0 else 0, None I admit that it doesn't really conform to the "pure" prisoner's dilemma tournament challenge as it doesn't even try to figure out the opponent strategy and relies neither on smart conditionals nor on memory. But it works exceptionally well in my test runs against a pool of various strategies by l4vr0v, valadaptive and redtachyon2098 (huge respect and thanks to you!). It's consistently in first place :) I'm currently running it against the Prisoners-Dilemma-Enjoyers pool and I'll update you when it's done. It's basically just a weighted sum over the opponent's past 9 moves and then a threshold for deciding whether to cooperate or defect. The idea came from FIR filters but I didn't apply any signal processing or FIR theory except for the way the calculation is done. I planned to optimize the weights using a genetic algorithm but sadly didn't have the time. The results I got using just my intuition and a bit of experimentation were already way better than I expected. I noticed that it gets a lot worse if there are many random strategies in the pool, so I hope not too many people were funny and entered one... It does way better against titForTat and grimTrigger variants which I expect there'll be a lot of. |
Wow! I never thought of that! The optimal weights might change depending on what types of strategies there are, so it won't be entirely consistent, but if you already know which strategies you are playing against, it would perform well. |
Man, this is like ML... but not ML... |
Hey! I've got an idea! You should make an evolution simulator for refining the weights of your algorithm to make the ultimate strategy! |
You're probably right. It ends up in 20th place in the Prisoners-Dilemma-Enjoyers pool, even behind some that were in my original test pool. Oh man, this challenge isn't easy (d'oh!) |
Uh-oh. |
Yay, I can finally share my strategy! This is diplomat.py, and places 4th in the strategies currently in the "prisoner's dilemma enjoyers" repo. (But I'm sure there's quite a few more that could give it a run for its money!)
Diplomat builds off of an extremely strong strategy called "Omega Tit-for-Tat". OTFT makes a number of improvements to regular tit-for-tat, and with diplomat I've tried to make even more. Here's a list of what diplomat adds to tit-for-tat (henceforth TFT):
If all of the above has been considered, and Diplomat still does not have a move to make, it will act as pure tit-for-tat. I experimented with a couple other techniques: Josh recommended starting the randomness counter much higher if the opponent starts by Defecting, since most logical opponents will start by Cooperating. I was unable to find success with this, perhaps because it gives strategies that start out mean but are able to play nice later on less of a chance to redeem themselves. Also, I considered disabling the "forgiveness" feature from step 5 during the first X turns so that no detective strategies got the wrong idea. This likely ran into the same problem as before. Additionally, it did not account for "late detectives", which are smart and likely to take up some portion of the final meta. I had so much fun with this! I don't expect this strategy to win, but hopefully it can crack the top 10. We should do something again like this soon. Perhaps the Noisy Iterated Prisoner's Dilemma next?? Edit: github formatting, smh |
Wow, you must be very passionate about your work! |
@Xapakas FYI, yours got to 2nd place with my strategy in the pool (if that's the latest version in the repo) |
A friend of mine coded this bot (we call him unanim) with me. We sent in the wrong version, the acceptable deviation should have been 0.45 because now LLN won't trigger randoms well. import numpy as np ''' We start with the state "detective_phase" that represents a TitForTat play to convince the enemy into cooperating. If we are not in such a phase we play as forgiving as humanly possible, i.e. after a cycle of switching moves [DC][CD] Over the course of the whole game we investigate the "negativeness" of all moves after the first detective phase. This is #memory = [string: "modus", int: distanceforgiving, int: numberforgiving, int: detectivephase, list: titfortatarray] def strategy(history, memory):
|
Here is my submission, called PacificAntiRandomTFT. It ranks pretty high against the strategies in the @l4vr0v fork (which, btw, is great!) and other similar pools of agents I tried (varying, for example, the random and TFT agents distributions) even though it's really simple and heavily improvable. Code: import numpy as np
# PacificAntiRandomTFT
#
# Just like TFT except:
# - Detects random and goes DDDDDDDDD... against it.
# - Against D/D loops it tries to cooperate twice in a row
# to try to get into a C/C loop (which is better for both).
# If the opponent declines, it doesn't ever try again.
#
# In this implementation, memory stores the state of the
# strategy: "defect" (always defect), "declined" (opponent
# declined to exit the D/D loop), "coop next" (have to try
# to cooperate next to exit the D/D loop), "did they declined?"
# (we were in "coop next" in the previous round, and now we have
# to check if the opponent accepted our D/D-loop-exit or not) or
# None (none of the others).
# Parameters:
RANDOM_CHECK_ROUND = 16 # In which round (starting from 0) do I have to check for randomness?
MAX_UNPROVOKED_DEFECTIONS = 4 # Too many unprovoked defections means the opponent is random
def strategy(history, memory):
choice = 1 # Default: cooperate
new_state = memory # Default: don't change state
if history.shape[1] == RANDOM_CHECK_ROUND:
# Checking for random
unprovoked_defections = 0
previous_move = None
for move in history.T:
if previous_move is None:
unprovoked_defections += 1 if move[1] == 0 else 0
elif move[1] == 0 and previous_move[0] == 1:
unprovoked_defections += 1
previous_move = move
if unprovoked_defections > MAX_UNPROVOKED_DEFECTIONS:
# Playing against random, let's always defect
new_state = "defect"
return 0, new_state
if memory == "defect":
choice = 0
elif memory == "coop next":
# Choice is already 1 by default
new_state = "did they declined?"
elif memory == "did they declined?":
if history[1, -1] == 0:
# They declined
new_state = "declined"
choice = 0
else:
# They accepted!
new_state = None
elif memory == "declined":
# Normal TFT, with a check for changing state (see else statement)
if history.shape[1] >= 1 and history[1, -1] == 0: # Choose to defect if and only if the opponent just defected
choice = 0
else:
# We somehow went out from the D/D loop. Let's forget the opponent declined an offer...
new_state = None
else:
if history.shape[1] >= 2 and np.count_nonzero(history[:, -1]) == 0 and np.count_nonzero(history[:, -2]) == 0:
# Last two turns were D/D. Let's try to end with this loop
return 1, "coop next"
# Not in a loop: normal TFT
if history.shape[1] >= 1 and history[1, -1] == 0: # Choose to defect if and only if the opponent just defected
choice = 0
return choice, new_state Also (in case you are wondering), I tried to make an UltraPacific version (meaning that it also tries to convert the C/D D/C loops into C/Cs) and it didn't seem to work as well as the final version I've shown... |
pointsArray = [[1,5],[0,3]]
def strategy(history, memory):
if memory is None:
return 1, 1
score = sum([pointsArray[history[0][i]][history[1][i]] for i in range(memory)]) / memory
return score >= 2.25, memory + 1 |
My solution was created in 10 minutes. Here it is: import random
def strategy(history, memory):
"""
Author: zelosos
Lets write down some assumptions:
- In a group most of the players will not be exploited.
- Players will test, if they can exploit others.
- Both cooperating for the hole time is best for both.
With this said:
- target should be: both coorperate for the hole time
Improvements:
- Maybe this could be improved by the weights of the score matrix, but I have to go to bed now.
"""
# start with cooperate
if len(history[1]) == 0:
return 1, None
played_rounds = len(history[1])
got_wasted = len(history[1])-sum(history[1]) # how often we got cheated
if random.uniform(0, 1) < got_wasted/played_rounds: # how often we got cheated determin how likely it is to cheat back
return 0, None
return 1, None |
I first implemented the Derived Belief Strategy which was originally designed by Tsz-Chiu Au and referenced in this paper, however I found that it only barely scored above Tit for Tat in the typical Iterated Prisoner's Dilemma. This is by design, however, because it's basically Tit for Tat with a layer that evaluates with a level of confidence whether an action is an "accident" (in the context of the Noisy Iterated Prisoner's Dilemma) or whether it conforms to their previous strategy. I wasn't satisfied with my ability to place well in the tournament with a strategy that only barely placed above an example strategy. So, my next strategy that I implemented was a Moran-process-trained looker-up model, which scored sufficiently above Tit for Tat that I was satisfied submitting it over the Derived Belief Strategy. Not only did this perform better, but DBS' computations were pretty heavy and the looker-up model is much faster. |
I initially started by messing around making a lookup table manually, which performed relatively well in the tests I did. I tried looker-up which performed similarly well, but OmegaTFT combined with some added heuristics and cases seemed to be the most successful so that's where I ultimately focused my efforts near the end, and is the kind of strategy I submitted. I still did use a few manually made lookups, but just for a couple special cases. |
First I made some assumptions which I explained more in detail here and created my strategy accordingly. # The War Criminal
# Tests how much an opponent can be exploited by looking at the profit we get compared to always doing (C,C).
# This works great against overly forgiving strategies such as ftft where this strategy alternates between C and D.
# This strategy also has some damage control mechanisms to prevent getting into a negative profit loop with the
# opponent and it uses statistical analysis to detect if the opponent makes random choices, in which case it is better
# to just defect. peko
def strategy(history, memory):
history_range = 5
turn = history.shape[1]
if memory is None:
memory = {
"try_exploit": True,
"exploit_turns": [],
"exploit_max_turns": None,
"exploit_max_profit": 0,
"damage_control_turns": [40, 100]
}
# check if opponent is random
if turn >= 20 and likely_random(history):
memory["exploit_max_turns"] = None
memory["exploit_max_profit"] = 0
memory["exploit_turns"] = []
return 0, memory
# try to exploit opponent
if memory["try_exploit"]:
return try_exploit(history, memory)
# damage control
if turn >= history_range:
opp_history = history[1, -history_range:]
# prevent two Tit-for-Tat like strategies to alternate between (C,D) and (D,C)
if are_histories_alternating(history, 3):
return 1, memory
# if opponent keeps defecting we try getting them to cooperate again
if sum(opp_history[2:]) == 0:
if sum(opp_history) == 0:
if turn in memory["damage_control_turns"]:
return 1, memory
return 0, memory
else:
return 1, memory
return tit_for_tat(history, memory)
# Helper Functions
def are_histories_alternating(history, length):
our_history = history[0, :-1][-length:]
opp_history = history[1, 1:][-length:]
prev_choice = our_history[0]
for i in range(1, len(our_history)):
our_choice = our_history[i]
opp_choice = opp_history[i]
if our_choice != opp_choice or opp_choice == prev_choice:
return False
prev_choice = our_history[i]
return True
def calculate_profit(history):
our_history = [x for (x, y) in history]
opp_history = [y for (x, y) in history]
profit = 0
for i in range(len(our_history)):
if our_history[i] == 1:
profit += 0 if opp_history[i] == 1 else -3
else:
profit += 2 if opp_history[i] == 1 else -2
return profit
def likely_random(history):
# Use statistical analysis to determine if opponent acts randomly
random_limit = 0.28
our_history = history[0, :-1]
opp_history = history[1, 1:]
our_coops = sum(our_history)
our_defects = len(our_history) - our_coops
c, d = (0, 0)
for i in range(len(our_history)):
if our_history[i] == 1:
c += 1 if opp_history[i] == 1 else -1
if our_history[i] == 0:
d += 1 if opp_history[i] == 1 else -1
consistency = (abs(c) + abs(d)) / len(our_history)
if consistency < random_limit:
return True
return False
# Sub Strats
def tit_for_tat(history, memory):
choice = 1 if history.shape[1] < 1 else history[1, -1]
return choice, memory
def try_exploit(history, memory):
exploit_buffer = 1
turn = history.shape[1]
# calculate profit
exploit_history = [history[:, i] for i in memory["exploit_turns"]]
exploit_profit = calculate_profit(exploit_history)
# test how much opponent can be exploited
if memory["exploit_max_turns"] is None:
if exploit_profit >= memory["exploit_max_profit"]:
memory["exploit_max_profit"] = exploit_profit
memory["exploit_turns"].append(turn)
return 0, memory
else:
memory["exploit_max_turns"] = len(exploit_history) - 2
return 1, memory
# check if exploit is still profitable
if memory["exploit_max_turns"] <= 0:
memory["try_exploit"] = False
memory["exploit_max_turns"] = None
memory["exploit_max_profit"] = 0
memory["exploit_turns"] = []
return 1, memory
elif exploit_profit < 0:
memory["exploit_max_turns"] = None
memory["exploit_max_profit"] = 0
memory["exploit_turns"] = []
return 1, memory
# repeat exploit
elif memory["exploit_max_turns"] > len(exploit_history):
memory["exploit_turns"].append(turn)
return 0, memory
# prepare for next exploit when situation has stabilized
elif sum(history[0, -exploit_buffer:]) == sum(history[1, -exploit_buffer:]) == exploit_buffer:
if exploit_profit >= 0:
memory["exploit_turns"] = [turn]
return 0, memory
else:
memory["try_exploit"] = False
return tit_for_tat(history, memory)
return tit_for_tat(history, memory) |
Elaborate. |
code go brrr |
Yours crashed? Unfortunate. |
I'm looking forward to sharing mine, the best strategy doesn't exist until you specify the meta it will be run in, but I'm still pretty proud of how many metas it can beat in a tournament regardless of the real meta.
The text was updated successfully, but these errors were encountered: