Skip to content

Commit

Permalink
Merge pull request #102 from OPORA/data-loader-rework
Browse files Browse the repository at this point in the history
Data loader rework
  • Loading branch information
henare committed Nov 18, 2015
2 parents 9cad1c3 + 7fba6d7 commit 926a54d
Show file tree
Hide file tree
Showing 8 changed files with 211 additions and 143 deletions.
25 changes: 18 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,13 @@ and we load this into a Rails application.

#### Ukraine

People data is collected by a [morph.io scraper](https://morph.io/openaustralia/ukraine_verkhovna_rada_deputies) and fed into [EveryPolitician](http://everypolitician.org/ukraine/). This produces [Popolo formatted](http://www.popoloproject.com/) data that is then loaded into TVFY using a Rake task, e.g.:
People data is collected by a [morph.io scraper](https://morph.io/openaustralia/ukraine_verkhovna_rada_deputies) and fed into [EveryPolitician](http://everypolitician.org/ukraine/). This produces [Popolo formatted](http://www.popoloproject.com/) data that is then loaded into TVFY using a Rake task:

bundle exec rake application:load:popolo[https://raw.githubusercontent.com/everypolitician/everypolitician-data/master/data/Ukraine/Verkhovna_Rada/ep-popolo-v1.0.json]
bundle exec rake application:load:ukraine:people

Once the people data has been loaded you can start loading votes. These are scraped by [another morph.io scraper](https://morph.io/openaustralia/ukraine_verkhovna_rada_votes), that saves data in a flat format that can easily be converted to Popolo. The conversion is handled by a [small proxy application](https://github.com/openaustralia/morph_popolo) and the results are imported using another Rake task, e.g.:
Once the people data has been loaded you can start loading votes. These are scraped by [another morph.io scraper](https://morph.io/openaustralia/ukraine_verkhovna_rada_votes), that saves data in a flat format that can easily be converted to Popolo. The conversion is handled by a [small proxy application](https://github.com/openaustralia/morph_popolo) and the results are imported using another Rake task:

bundle exec rake application:load:popolo[https://arcane-mountain-8284.herokuapp.com/vote_events/2015-07-14]
bundle exec rake application:load:ukraine:vote_events[2015-06-17]

As with other countries you then need to update the caches:

Expand Down Expand Up @@ -147,7 +147,18 @@ which is run daily at 09:15 by cron.

### Ukraine

The [Popolo](http://www.popoloproject.com/) data for Ukraine is loaded with the `application:load:popolo` Rake task. It will load people or vote data, depending on what it finds in the file.
These are the tasks you need to know about:

* `application:load:ukraine:people` loads people. You always
need this to run the site. Stictly speaking it only needs to run when details
need updating but can be run as often as you like as it only updates data.
* `application:load:ukraine:vote_events[from_date,to_date]` load division[s]. `to_date` is
optional and if omitted, allows you to load a single date. You can also use "today" as the `to_date`.
* `application:cache` this namespace contains cache updating tasks that are
necessary for the site to run. They should be self-explainatory.

Votes can be updated daily by running `application:load:ukraine:vote_events` without arguments.
It will try to fetch all votes since the most recent one in the database until the present day.

## Better Search

Expand Down Expand Up @@ -190,10 +201,10 @@ bundle exec mina ukraine-dev setup
bundle exec mina ukraine-dev deploy
# Now you can load people data
bundle exec mina ukraine-dev rake[application:load:popolo[https://raw.githubusercontent.com/everypolitician/everypolitician-data/master/data/Ukraine/Verkhovna_Rada/ep-popolo-v1.0.json]]
bundle exec mina ukraine-dev rake[application:load:ukraine:people]
# And some vote data
bundle exec mina ukraine-dev rake[application:load:popolo[https://arcane-mountain-8284.herokuapp.com/vote_events/2015-07-14]]
bundle exec mina ukraine-dev rake[application:load:ukraine:vote_events[2015-07-14]]
# Setup caches
bundle exec mina ukraine-dev rake[application:cache:all_except_member_distances]
Expand Down
2 changes: 1 addition & 1 deletion app/models/whip.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ class Whip < ActiveRecord::Base
belongs_to :division

def self.update_all!
all_possible_votes = Division.joins("LEFT JOIN members ON divisions.house = members.house AND members.entered_house <= divisions.date AND divisions.date < members.left_house").group("divisions.id", :party).count
all_possible_votes = Division.joins("LEFT JOIN members ON divisions.house = members.house AND members.entered_house <= divisions.date AND divisions.date <= members.left_house").group("divisions.id", :party).count
all_votes = calc_all_votes_per_party2

all_possible_votes.keys.each do |division_id, party|
Expand Down
120 changes: 0 additions & 120 deletions lib/data_loader/popolo.rb

This file was deleted.

67 changes: 67 additions & 0 deletions lib/data_loader/ukraine/people.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
module DataLoader
module Ukraine
class People
URL = ENV["DEBUG_URL"] || "https://raw.githubusercontent.com/everypolitician/everypolitician-data/master/data/Ukraine/Verkhovna_Rada/ep-popolo-v1.0.json"

attr_accessor :data, :persons, :organizations, :areas, :events

def initialize
@data = DataLoader::Ukraine::Popolo.load(URL)

@people = @data["persons"]
@organizations = @data["organizations"]
@areas = @data["areas"]
@events = @data["events"]
end

def load!
Rails.logger.info "Loading #{@people.count} people..."
@people.each do |p|
person = Person.find_or_initialize_by(id: extract_rada_id_from_person(p))
person.small_image_url = p["image"]
person.large_image_url = p["image"]
person.save!
end

members = @data["memberships"]
Rails.logger.info "Loading #{members.count} memberships..."
members.each do |m|
raise "Person not found: #{m["person_id"]}" unless person = @people.find { |p| p["id"] == m["person_id"] }
raise "Party not found: #{m["on_behalf_of_id"]}" unless party = @organizations.find { |o| o["id"] == m["on_behalf_of_id"] }
raise "Area not found: #{m["area_id"]}" unless area = @areas.find { |a| a["id"] == m["area_id"] }
raise "Legislative period not found: #{m["legislative_period_id"]}" unless legislative_period = @events.find { |e| e["id"] == m["legislative_period_id"] }
person["rada_id"] = extract_rada_id_from_person(person)

# Default to the start of the legislative period if there no specific one set for this membership
start_date = m["start_date"] || legislative_period["start_date"]

member = Member.find_or_initialize_by(person_id: person["rada_id"], entered_house: start_date)
member.gid = m["person_id"]
member.source_gid = person["rada_id"]
member.first_name = person["given_name"]
member.last_name = person["family_name"]
member.title = ""
member.constituency = area["name"]
member.party = party["name"]
# TODO: Remove hardcoded house
member.house = "rada"
member.entered_house = start_date
member.left_house = m["end_date"] if m["end_date"]
member.person_id = person["rada_id"]
member.save!
end
end

def party_name_from_id(id)
raise("Couldn't find party ID: #{id}") unless party = @organizations.find { |o| o["id"] == id.to_s }
party["name"]
end

private

def extract_rada_id_from_person(person)
person["identifiers"].find { |i| i["scheme"] == "rada" }["identifier"]
end
end
end
end
16 changes: 16 additions & 0 deletions lib/data_loader/ukraine/popolo.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
require "open-uri"
require "json"

module DataLoader
module Ukraine
class Popolo
def self.load(url)
Rails.logger.info "Loading Ukraine Popolo data from #{url}..."
data = JSON.parse(open(url).read)
raise "No loadable data found" unless data["persons"] || data["vote_events"]

data
end
end
end
end
83 changes: 83 additions & 0 deletions lib/data_loader/ukraine/vote_events.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
module DataLoader
module Ukraine
class VoteEvents
BASE_URL = ENV["DEBUG_URL"] || "https://arcane-mountain-8284.herokuapp.com/vote_events/"

attr_accessor :data

def initialize(date)
url = BASE_URL + date.to_s
@data = DataLoader::Ukraine::Popolo.load(url)
end

def load!
vote_events = @data["vote_events"]
people = DataLoader::Ukraine::People.new

Rails.logger.info "Loading #{vote_events.count} vote_events..."
vote_events.each do |v_e|
ActiveRecord::Base.transaction do
division = Division.find_or_initialize_by(id: v_e["identifier"])
division.date = DateTime.parse(v_e["start_date"]).strftime("%F")
division.number = v_e["identifier"]
division.house = v_e["organization_id"]
division.name = v_e["title"]
division.source_url = v_e["sources"].find { |s| s["note"] == "Source URL" }["url"]
division.debate_url = v_e["sources"].find { |s| s["note"] == "Debate URL" }["url"]
division.motion = ""
division.clock_time = DateTime.parse(v_e["start_date"]).strftime("%T")
division.source_gid = v_e["identifier"]
division.debate_gid = ""
division.result = v_e["result"]
division.save!

votes = v_e["votes"]
Rails.logger.info "Loading #{votes.count} votes..."
votes.each do |v|
party_name = people.party_name_from_id(v["group_id"])
member = Member.current_on(division.date).find_by(person_id: v["voter_id"], party: party_name) ||
Member.find_by!(person_id: v["voter_id"], party: party_name) # Fallback when current_on isn't quite right

vote = division.votes.find_or_initialize_by(member: member)
if option = popolo_to_publicwhip_vote(v["option"])
vote.vote = option
vote.save!
else
vote.destroy
end
end

bills = v_e["bills"]
Rails.logger.info "Loading #{bills.count} bills..."
bills.each do |b|
# We need to use create here because otherwise the association isn't saved
bill = division.bills.find_or_create_by(official_id: b["official_id"])
bill.url = b["url"]
bill.title = b["title"]
bill.save!
end
end
end
end

private

def popolo_to_publicwhip_vote(string)
case string
when "yes"
"aye"
when "no"
"no"
when "abstain"
"abstention"
when "absent"
nil
when "not voting"
"not voting"
else
raise "Unknown vote option: #{string}"
end
end
end
end
end
15 changes: 0 additions & 15 deletions lib/tasks/application.rake
Original file line number Diff line number Diff line change
Expand Up @@ -64,21 +64,6 @@ namespace :application do
task('application:load:divisions').invoke(yesterday)
task('application:cache:all').invoke
end

desc "Load Popolo data from a URL"
task :popolo, [:url] => [:environment, :set_logger_to_stdout] do |t, args|
DataLoader::Popolo.load!(args[:url])
end

desc "Load Popolo for a date range, appending the date to a base url"
task :popolo_date_range, [:base_url, :from_date, :to_date] => [:environment, :set_logger_to_stdout] do |t, args|
from_date = Date.parse(args[:from_date])
to_date = args[:to_date] ? Date.parse(args[:to_date]) : Date.today

(from_date..to_date).each do |date|
DataLoader::Popolo.load!(args[:base_url] + date.to_s)
end
end
end

namespace :seed do
Expand Down
26 changes: 26 additions & 0 deletions lib/tasks/ukraine.rake
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
namespace :application do
namespace :load do
namespace :ukraine do
desc "Load latest Ukrainian People data from EveryPolitician"
task people: [:environment, :set_logger_to_stdout] do
DataLoader::Ukraine::People.new.load!
end

desc "Load Ukrainian vote_events for a date or range of dates. Omit dates to load all new ones"
task :vote_events, [:from_date, :to_date] => [:environment, :set_logger_to_stdout] do |t, args|
from_date = args[:from_date] ? Date.parse(args[:from_date]) : Division.order(:date).pluck(:date).last + 1
to_date = if args[:to_date]
Date.parse(args[:to_date])
elsif !args[:from_date]
Date.today
else
from_date
end

(from_date..to_date).each do |date|
DataLoader::Ukraine::VoteEvents.new(date).load!
end
end
end
end
end

0 comments on commit 926a54d

Please sign in to comment.