fix: remove null URNs from census data #1759
Open
+10
−14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
Minor follow-up to #1748 to tidy some inconsistencies seen in the output data.
AB#242989
Change proposed in this pull request
join
will default to using the index)Guidance to review
.dropna(subset=["URN"])
The workforce data contains a
null
row for some of the earlier years. Previously, this was stripped when theinner
join
didn't find a match with the pupil data. However, following #1688, theouter
join
keeps this in place resulting in an index that is a mix of integers (i.e. URNs) and floats (thenull
is expressed as a float).removal of
on="URN",
join()
defaults to using the index: we want to explicitly join in the index and by this point, can guarantee that we want to join both datasets on their respective indexes.Checklist (add/remove as appropriate)
You have reviewed with UX/Design