Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Administrator review of contributed councillor data #1203

Closed
1 of 3 tasks
henare opened this issue Jul 28, 2017 · 5 comments
Closed
1 of 3 tasks

Administrator review of contributed councillor data #1203

henare opened this issue Jul 28, 2017 · 5 comments
Assignees

Comments

@henare
Copy link
Member

henare commented Jul 28, 2017

So far as part of Contributing Councillor Information we've built a way for people to suggest new councillor data. That data is then just stored in a new table in the PlanningAlerts database. What happens to it next is what this issue is about.

The simplest thing we can do to start is by replicating the existing process as much as possible but automating some parts of it. So far we've effectively built a way to enter the data that's not Google Docs but creates basically the same output, i.e. tabular data. So the next step is converting that into Popolo and opening a pull request to have it merged into the existing repository.

Here's some ideas for how we could do that:

  • First of all we could expose all of the contributions in PlanningAlerts via a CSV API. This pretty closely replicates what Google Docs provides us, albeit only for 1 council instead of a whole Australian state. (Done in Added download link for suggested councillors csv file in admin/councillor_contribution#show #1237)

  • We then need to convert that CSV into Popolo. The existing repository already has a Rake task to do this. It will need modification because it's currently designed to convert a whole Australian state's CSV into Popolo. Maybe here we need to add the contributed data onto the cached CSVs and then generate the Popolo from that?

  • Once we've got the changed data we need to commit that onto a new branch, push to GitHub, and open a pull request. Here we'll need to use some kind of Git library or shell out, and use the GitHub API to open the pull request. That should give administrators a nice diff they can review and easily merge.

Where should all this code live? @equivalentideas suggested a morph.io scraper, which made me think we could even put this code as a morph scraper into the data repository itself. That's worth considering but if somewhere else makes more sense then let's do that.

Of course that still doesn't get this data back into PlanningAlerts but those next steps should be extracted into other issues. This is the simple, largely status quo, first steps.

@equivalentideas
Copy link
Contributor

@henare @hisayohorie it would be good to extract some more specific issues from the idea above. We've implemented the CSV API part with #1237 what's next?

@equivalentideas
Copy link
Contributor

equivalentideas commented Sep 18, 2017

Maybe here we need to add the contributed data onto the cached CSVs and then generate the Popolo from that?

Riffing off what @henare has written above, here's a way I could imagine this working. Here's an idea for the next (small) step of how we get the CSV data into Popolo format:

  1. admin reviews councillor contribution in the PlanningAlerts admin.
  2. if they're happy with it they download it locally
  3. locally, they add the contribution to the bottom of the CSV for the state the authority is in. CSV's for each state are stored in the australia_local- repo.
  4. they run the csv_to_popolo rake task to generate the Popolo If there are problems or duplicate councillors an error gets thrown (check this).
  5. if they're happy with the result, they commit the new CSV and Popolo json file
  6. they create a PR with the changes for review.

How is this different from the current situation?

Currently the admin has to find the authority in the Google sheet and add the new councillors. This is tricky because we generally already have councillors for most authorities that we got in our initial scraping. These existing councillors don't have emails, and that data is out of date after two years of natural turn over and elections. It's not clear what's been merged into WriteIt etc. Generally, it's tricky!

If the CSVs only contained councillors with emails, this would be a lot more straight forward because you'd just be appending rather than merging.

Having the CSVs in the repo would make things much clearer for admins because you could always work from clean files. With the google sheet, other edits can happen between contributions that change the Popolo, which can be really confusing.

If the contributor is making changes to the CSV and the Popolo in one go, then the diffs should be a lot clearer and the cause and effect line between the csv and the Popolo should be clearer.

So the CSV file remains the editable interface, and the JSON is generated data.

Steps 2-6 above could also be automated as a next step. This feels like a stepping stone to some better solution, which is good.

I can imaging how editing existing councillors fits into this flow too. Having the CSV's in the repo, rather than editble Google Sheets, would make automating a merge step for existing councillors much easier too.

Adding the councillors to the bottom could be done in a rake task using the public CSV API for contributions, manually, or on the command line.

So if we like this idea, here's how we could get there.

  1. Make sure all the current contributions in the Google sheet are imported to the Popolo and into PlanningAlerts. If we're not working from a clean slate I can imaging this getting really confusing. (Done, see Created csv file of local councillor based on state in data folder australian_local_councillors_popolo#124 (comment))
  2. Add the Google Sheet files into the repo as .csv files. Update docs to be editing the CSV files here, rather than the Google Sheet.
  3. Archive copies of existing versions of those sheets, maybe in a folder "old_data_archive" I don't think we need these steps, we can probably create a way to merge in the contribution with the existing data to avoid confusion—needs more thinking
  4. Remove all councillors who don't have emails from production CSVs and regenerate the JSON.

Now there's the production CSVs and the archived ones that have all that original data for keeping. Admins can go through the steps above to add new councillors.

@equivalentideas
Copy link
Contributor

equivalentideas commented Sep 26, 2017

From step 2 above:

Maybe here we need to add the contributed data onto the cached CSVs and then generate the Popolo from that?

I think so :)

So I think the next step here, is to add the ability on that repo to collect a reviewed contribution from the CSV API on PlanningAlerts and merge it into it's local CSV.

We could then provide a way from admins to trigger that process (maybe by running a scraper?) and have a PR openned automagically.

Where in this process the admin can make edits to the contribution if they need to? (update: they can now edit the contribution in PlanningAlerts before accepting it #1259)

@equivalentideas
Copy link
Contributor

We could then provide a way from admins to trigger that process (maybe by running a scraper?) and have a PR openned automagically.

The scraper path seems like a simple first pass at this. Sounds like we'll need a way for an external service to find out about new reviewed contributions. Contributions API?

@equivalentideas
Copy link
Contributor

equivalentideas commented Oct 4, 2017

The data store now has functionality and a Rake task for collecting a contribution csv from a remote URL.

We have an API of accepted contributions https://www.planningalerts.org.au/councillor_contributions.json

Once we've got the changed data we need to commit that onto a new branch, push to GitHub, and open a pull request. Here we'll need to use some kind of Git library or shell out, and use the GitHub API to open the pull request. That should give administrators a nice diff they can review and easily merge.

Where should all this code live? @equivalentideas suggested a morph.io scraper, which made me think we could even put this code as a morph scraper into the data repository itself. That's worth considering but if somewhere else makes more sense then let's do that.

I've opened a [WIP] PR over there with pseudo code for a scraper that could turn the accepted contributions in the JSON feed into Pull Requests to the councillor data. That scraper could be set to autorun to automatically open PRs when we accept new contributions, and admins could pop over to morph.io and hit run if they felt like it.

It would be nice for admins to be able to trigger the PR step directly from PlanningAlerts too. Another option would be to run a little app that received a webhook from PlanningAlerts and then did what the scraper would do. Sounds like a second pass? Or someone could add the functionality to PlanningAlerts to be able to trigger runs with a webhook ;-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants