Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Irv replications #1

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Irv replications #1

wants to merge 4 commits into from

Conversation

jgibson517
Copy link
Member

This PR has updated code to run the IRV simulations for the Massachusetts' work with votekit.

The python script that handles the actual election runs is simulate.py. I made some tweaks to the .sh files from the Portland files to run things on the cluster, since there is less parameters that vary. I think I updated everything correctly, but let me know if not!

Copy link

@cdonnay cdonnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor changes here or there. @peterrrock2 and I think that this is going to be a very common data pipeline, so we would like to refactor the simulate.py file so that the CLI allows for a more diverse range of inputs, but that isn't necessary for this PR to be approved.

--output="${log_file}" \
--error="${log_file}" \
$running_script_name \
"$num_seats" \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These inputs to the running script name do not match the order you unpack them in mass_irv.sh

#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=8G
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you run one of these to confirm that the run time and memory usage are accurate?


echo --num_districts "$num_districts"
echo --num_seats "$num_seats"
echo --cand_split "$even_split"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be $cand_split

zone_data[modelname] = []
zone_data[modelname].append(
count_winners(winners["ranking"], "D", args.num_seats)
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a winners method for ElectionState. This will make lines 122 through 132 a lot clearer. This may require editing the count_winners function.

)

# save ranking vector from elections
zone_data["raw_outputs"].append({modelname: winners["ranking"]})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just the use the rankings method of ElectionState here.

with open('data/good_seed_3.json', 'r') as f:
assignment = json.loads(f.read())

clean_assign = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a comment as to why this clean_assign is necessary?

Copy link

@peterrrock2 peterrrock2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for getting this done! It looks like most of this will work with the exception of a couple of syntax errors. I also included a couple of comments on things that can improve the readability of this code. @cdonnay also reviewed most of the python sections of this code with me, but I elected to let him comment on the VoteKit stuff since he knows that codebase better. Let me know if you have any questions!


num_districts_array = ("40" "160" "8" "32")

num_seats_array = ("1" "1", "5", "5")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the syntax on this is correct. Bash delimits lists by spaces, so this should be

("1" "1" "5" "5)

You should also be able to just treat these as ints, so making it

(1 1 5 5)

Since everywhere else that you call this you are either passing it to another script or you are using a formatted string

output_dir="mass_output"
log_dir="mass_logs"


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here about how these arrays will be related to each other so that when we share the code, people can identify what is going on? (Maybe just a line that says 85 = 40 and 325 = 40 and how these will be used).

@@ -0,0 +1,34 @@
#!/bin/bash

#SBATCH --time=05:00:00

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this take 5 hours? It might be good to run an example to check the time first

#SBATCH --ntasks-per-node=1
#SBATCH --mem=8G
#SBATCH --mail-type=NONE
#SBATCH --mail-user=NONE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful to add your email here so that you get pinged when the jobs finish or fail. You don't have to, but it does save a lot of headache if something goes wrong. I also add a filter on my email for stuff from SLURM so that it does not consume my inbox

#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=8G

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a lot of memory to request for this job

for node in ma_graph.nodes:
geo_id = ma_graph.nodes[node]['GEOID20']
for election in elections:
ma_graph.nodes[node][election] = ma[ma['GEOID20'] == geo_id][election].iloc[0]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like you are taking the election information that is stored in the shapefile and add it to the graph here, but this is going to be called every time we call this file. This is fine if we call this once (which I think that we do) but it is not optimal. It would be better for reproducibility/auditing if we were to make the dual graph json file with the election data separate from this script and then just use that file here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants