Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some small text updates #1

Merged
merged 3 commits into from
Mar 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 14 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# Postcode Lookup Generator

If you just need a _simple_ lookup from postcode to _most likely_ constituency, you can download a CSV from the folks
at MySociety:
at mySociety:

https://pages.mysociety.org/2025-constituencies/datasets/uk_parliament_2025_postcode_lookup/latest
https://www.mysociety.org/2023/09/12/navigating-the-new-constituencies/

Or use https://mapit.mysociety.org/ which includes Northern Ireland postcodes.

If you care about the fact that _some_ postcodes may straddle multiple constituencies (ie. _sometimes_ not all
addresses in a single postcode are in the same constituency), then read on...
Expand All @@ -15,9 +17,9 @@ addresses in a single postcode are in the same constituency), then read on...
[/data/2024-01-28/output/postcode-lookup.csv](https://github.com/asibs/postcode-lookup-generator/blob/main/data/2024-01-28/output/postcode-lookup.csv)

Contains a row for each postcode (postcodes are stripped of any whitespace), and which constituencies we think the
postcode falls within. The constituency columns, `pcon_1`, etc contain the MySociety constituency short code. You can
postcode falls within. The constituency columns, `pcon_1`, etc contain the mySociety constituency short code. You can
use this code to map to other constituency codes (eg. GSS code, etc) using the
[MySociety dataset here](https://pages.mysociety.org/2025-constituencies/data/parliament_con_2025/0.1.4/parl_constituencies_2025.csv).
[mySociety dataset here](https://pages.mysociety.org/2025-constituencies/data/parliament_con_2025/0.1.4/parl_constituencies_2025.csv).

If a postcode is in more than one constituency, the `pcon_1` column will contain the constituency code we are _most
confident_ of / the constituency we believe _most_ addresses in the postcode are in.
Expand Down Expand Up @@ -109,8 +111,10 @@ seeing which constituency it overlaps with.
If every UPRN in a single postcode is in the same constituency, we can assume that the whole postcode is within that
constituency.

If different UPRNs within a single postcode have different constituencies, we know we have a postcode where the exact
address is needed to determine the constituency.
If different UPRNs within a single postcode have different constituencies, we know we might have a postcode where the exact
address is needed to determine the constituency. As the open UPRN data includes non-address UPRNs such as Street Records,
with no classification, it is possible for every address in the postcode to be in one constituency, but for all the UPRNs
to cover more than one constituency.
Comment on lines +114 to +117
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a very useful clarification! 👍


At the time of writing, there's no open data source which maps UPRNs to a human-readable address. This means if a
user's postcode straddles multiple constituencies, we can now detect it and tell the user (possibly asking them to
Expand Down Expand Up @@ -177,14 +181,14 @@ then performs various geo-spatial queries on _every single address_.
We can do various data validation on the installed data:

```sql
-- Look for postcodes which are in the UPRN Lookup dataset, but which aren't in the MySociety dataset
-- Look for postcodes which are in the UPRN Lookup dataset, but which aren't in the mySociety dataset
SELECT DISTINCT postcode
FROM uprn_postcode_to_constituency uprn
WHERE NOT EXISTS (
SELECT 1 FROM mysociety_postcode_to_constituency mysoc WHERE mysoc.postcode = uprn.postcode
);

-- Look for postcodes which are in the MySociety dataset, but which aren't in the UPRN Lookup dataset
-- Look for postcodes which are in the mySociety dataset, but which aren't in the UPRN Lookup dataset
SELECT DISTINCT postcode
FROM mysociety_postcode_to_constituency mysoc
WHERE NOT EXISTS (
Expand All @@ -205,8 +209,8 @@ WHERE NOT EXISTS (
SELECT 1 FROM uprn_postcode_to_constituency uprn WHERE uprn.postcode = onspd.postcode
);

-- Look for postcodes which are in the MySociety dataset AND in the UPRN Lookup dataset, where the constituency
-- identified by MySociety for that postcode has not been identified by our UPRN methodology
-- Look for postcodes which are in the mySociety dataset AND in the UPRN Lookup dataset, where the constituency
-- identified by mySociety for that postcode has not been identified by our UPRN methodology
SELECT *
FROM mysociety_postcode_to_constituency mysoc
JOIN uprn_postcode_to_constituency uprn
Expand Down
4 changes: 2 additions & 2 deletions app/domain/postcode_lookup_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,8 @@ def _calculate_confidences(self, parsed_row: dict[str, Any]) -> dict[str, float]
onspd_match = next((item for item in parsed_row['onspd_pcons'] if item['pcon'] == pcon), None)
mysoc_match = next((item for item in parsed_row['mysociety_pcons'] if item['pcon'] == pcon), None)

# We give 50% of the confidence to UPRN, then we give 25% each to ONSPD & MySoc, so a postcode -> constituency
# will only have 100% if ALL properties in the UPRN give the same constituency, and if both ONSPD & MySociety
# We give 50% of the confidence to UPRN, then we give 25% each to ONSPD & mySoc, so a postcode -> constituency
# will only have 100% if ALL properties in the UPRN give the same constituency, and if both ONSPD & mySociety
# agree with this. Note, the total confidences for a postcode _may not_ add up to 100% if a single property in
# ONSPD overlaps with multiple constituency boundaries (in practice, there are only a couple of records where
# this is a problem).
Expand Down
14 changes: 7 additions & 7 deletions app/scripts/load_postcodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,13 +223,13 @@ def create_onspd_postcode_constituency_map(connection) -> None:
)
connection.commit()

##### MySociety helper methods #####
##### mySociety helper methods #####

def load_mysociety_constituencies(connection) -> None:
invalid_postcodes = []

with connection.cursor() as cursor:
print(f"{time.ctime()} - Loading MySociety postcode to constituencies mappings")
print(f"{time.ctime()} - Loading mySociety postcode to constituencies mappings")
cursor.execute(
"""
CREATE TABLE mysociety_postcode_to_constituency (
Expand Down Expand Up @@ -285,7 +285,7 @@ def create_combo_constituency_map(connection) -> None:
SELECT
postcode,
COALESCE(constituency_code, 'UNKNOWN') AS constituency_code,
'MySociety' AS source,
'mySociety' AS source,
(CASE WHEN constituency_code IS NULL THEN NULL ELSE 1.0 END) AS confidence,
'' AS notes
FROM mysociety_postcode_to_constituency
Expand Down Expand Up @@ -385,7 +385,7 @@ def main() -> None:
# set_onspd_postcode_coords(conn)
# create_onspd_postcode_constituency_map(conn)

# # MySociety processing
# # mySociety processing
# load_mysociety_constituencies(conn)

# Combine the data
Expand All @@ -397,7 +397,7 @@ def main() -> None:
main()


# Once everything is loaded, you can connect to the postgis DB and check for any mismatch with the MySociety data with:
# Once everything is loaded, you can connect to the postgis DB and check for any mismatch with the mySociety data with:
"""
SELECT
map.postcode,
Expand All @@ -412,6 +412,6 @@ def main() -> None:
AND map.constituency_code <> mysoc.constituency_code
ORDER BY 1;
"""
# TODO: Generate the final postcode -> constituncies map by combining _all_ constituencies from our map AND any from MySoc.
# TODO: Generate the final postcode -> constituncies map by combining _all_ constituencies from our map AND any from mySoc.
# This should give us every postcode, and the list of possible constituencies - including any postcodes which are in multiple constituencies.
# For those constituencies, we can fallback to the DemoClub postcode lookup (once the election is announced - their API only returns data for boundaries with elections...)
# For those constituencies, we can fallback to the DemoClub postcode lookup (once the election is announced - their API only returns data for boundaries with elections...)
6 changes: 3 additions & 3 deletions data/2024-01-28/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ To get all the data:

If you don't have the `wget` command line utility (eg. Windows) you can manually download the files and rename them appropriately.

## MySociety Constituency Data, including boundaries (V0.1.4)
## mySociety Constituency Data, including boundaries (V0.1.4)

### Files

Expand All @@ -31,7 +31,7 @@ Creative Commons Attribution 4.0 International License

https://pages.mysociety.org/2025-constituencies/datasets/parliament_con_2025/0_1_4

## MySociety Postcode Data (V0.1.2)
## mySociety Postcode Data (V0.1.2)

### Files

Expand Down Expand Up @@ -103,4 +103,4 @@ work out which new constituency the postcode centroid falls within.
- Contains GeoPlace data © Local Government Information House Limited copyright and database right 2024
- Source: Office for National Statistics licensed under the Open Government Licence v.3.0

https://www.ons.gov.uk/methodology/geography/licences
https://www.ons.gov.uk/methodology/geography/licences