Skip to content

Commit

Permalink
fixup! Switch to different way of parsing, want them outside speeches…
Browse files Browse the repository at this point in the history
…. Add TWFY bit
  • Loading branch information
ajparsons committed Apr 23, 2024
1 parent 2b9c9f3 commit 01bc06b
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions pyscraper/sp_2024/parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@
"tt",
"u",
"ul",
"timestamp"
]


Expand Down Expand Up @@ -151,6 +152,10 @@ def process_raw_html(raw_html: Tag, agenda_item_url: str) -> BeautifulSoup:
speaker.append(next_sibling)
next_sibling = speaker.find_next_sibling()

# there are currently timestamps inside speeches - we want to move these after their parent
for timestamp in soup.find_all("timestamp"):
timestamp.parent.insert_after(timestamp)

# now, in each speech - we want to iterate through and check for a p tag that's just 'For' or 'Against'
# if so the next sibling will be a list of speakers seperated by <br/>
# we want to create a msplist tag, with a direction of 'For' or 'Against'
Expand Down

0 comments on commit 01bc06b

Please sign in to comment.