Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Create script to restructure shape files #10

Open
wants to merge 5 commits into
base: staging
Choose a base branch
from

Conversation

DavidTheProgrammer
Copy link

Description

Created a management command in the boundaries application to open and split shape files based on specific arguments passed in.

Why?

The wazimap backend stores geographies in a hierarchy and this data should be representative of the loaded shape files in the application. For example, the Africa root geography should contain countries and each country should contain it's representative sub region, i.e, province, county, district etc. There are some geographies that contain multiple sub regions at the same level, e.g City, Municipality, Province can all be children of Country and this should be accurately represented in the data. This script splits the shape file into multiple files based on these children so that they can be loaded using the loadshp command with the correct argument, e.g ...loadshp ./cities/shape.shp Botswana City ...

Related Issue

#7

How to test it locally

Download the shapefiles from Google Drive and store them in some directory. Run the management command passing in the right options and verify that the files have been split according to your keys. e.g

python manage.py rebuildshpfiles ./datasets/shapefiles/country-regions ENGTYPE_1,Region

Checklist

  • 🚀 is the code ready to be merged and go live?
  • 🛠 does it work (build) locally

Pull Request

  • 📰 good title
  • 📝good description
  • 🔖 issue linked
  • 📖 changelog filled out

Commits

  • commits are clean
  • commit messages are clean

Code Quality

  • 🚧 no commented out code
  • 🖨 no unnecessary logging
  • 🎱 no magic numbers
  • black was run locally (as part of the pre-commit hook)

Testing

  • ✅ added (appropriate) unit tests
  • 💢 edge cases in tests were considered
  • ✅ ran tests locally & are passing

Next step is to write a script to run the `loadshp` command to load the shapefiles into the dir.
@DavidTheProgrammer DavidTheProgrammer added the enhancement New feature or request label Nov 19, 2024
@DavidTheProgrammer DavidTheProgrammer requested a review from a team November 19, 2024 07:27
@DavidTheProgrammer DavidTheProgrammer self-assigned this Nov 19, 2024
@kilemensi
Copy link
Member

May be I missed it but why is this still on Draft @DavidTheProgrammer ?

@DavidTheProgrammer DavidTheProgrammer marked this pull request as ready for review November 19, 2024 12:31
@DavidTheProgrammer
Copy link
Author

May be I missed it but why is this still on Draft @DavidTheProgrammer ?

I think my network died when I clicked the Button. It was supposed to be marked ready for review.

Copy link
Member

@kilemensi kilemensi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


  1. Don't know much about shapefiles to know know how useful this would be
  2. Shouldn't (level) keys be part of the app itself? (profile, settings, etc.) i.e. basically run this command to ensure the given shapefiles will work for your app.

requirements.txt Outdated Show resolved Hide resolved
Comment on lines 19 to 20
- ./Country/Province/province.{shp,shx,dbf}
- ./Country/Municipality/municipality.{shp,shx,dbf}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And wazimap needs and can handle this nested directory structure?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @kilemensi, no it doesn't need the nested directory structure for the output, but it's super helpful if you have your files grouped into directories for input. I've updated it in the other PR to specify an output directory. So what's happening is actually:

Input:

Country/
└── country.shp

Output:

Country/
└── <output-dir>/
    ├── City.shp
    ├── Town.shp
    └── District.shp

For simplicity I'm only showing the .shp file. But each shape file comes with .dbf and shx files to complete the set. They are always in threes and so the chances of you having them in a directory in the first place is very high.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool @DavidTheProgrammer ... how does this affect loading of the shapefiles i.e. does loadshp command also supports nested structure?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kilemensi the loadshp command loads file by file. The other PR I've worked on can now take advantage of this split directory and load all the files in one go by calling the loadshp command with appropriate arguments for each shape file. It takes a glob pattern that matches all the .shp files you intend to load.

Copy link

@thepsalmist thepsalmist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial thoughts, few typos in the Command docstring

Code LGTM 👍
Still trying to figure out the relationship between this management command and the earlier one to load shapefiles available [here],(https://github.com/CodeForAfrica/wazimap-ng/blob/staging/wazimap_ng/boundaries/management/commands/loadshp.py)

Copy link

@thepsalmist thepsalmist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on running this, I see the output as below, few questions

  1. Given a directory ../tmp/shapefile/countries-regions that contains all shapefiles for African countries, the rebuild files are each written to the specific country sub-directory e.g
    /tmp/shapefiles/countries-regions/Angola/rebuilt that has files
    Province.dbf Province.shp Province.shx. Although these files already exist in the pre-built country sub-directory, one level back as
    Angola.cpg Angola.dbf Angola.prj Angola.qgz Angola.qmd Angola.shp Angola.shx

Is my understanding that we are looking at rebuilding indices and group this in folders based on Region such as Province, County etc, and wouldn't that necessitate a restructuring of how we re-organise our rebuilt indices?

  1. I still believe we intend to use the python manage.py loadshp to load the shapefiles.
Screenshot 2024-11-20 at 17 54 23

Screenshot 2024-11-20 at 18 12 50

DavidTheProgrammer and others added 2 commits November 25, 2024 12:43
…pfiles command.

- Added a command to loadshpfiles from a directory, specifying multiple options for possible fields of each necessary value needed for the loadshp command.

- Modified the rebuildshpfiles command to specify output directory and have that directory be skipped
Added new line at EOF

Co-authored-by: Clemence Kyara <[email protected]>
@DavidTheProgrammer
Copy link
Author

Initial thoughts, few typos in the Command docstring

Code LGTM 👍 Still trying to figure out the relationship between this management command and the earlier one to load shapefiles available [here],(https://github.com/CodeForAfrica/wazimap-ng/blob/staging/wazimap_ng/boundaries/management/commands/loadshp.py)

Hi @thepsalmist,

  1. Yes, this command splits a shapefile into it's constituent parts. This will ensure that we have multiple files with the correct level for each file. You're right, the intention is to use the the loadshp command, this command requires a level as an argument. In a file like Botswana.shp show below you can see that the records have different levels (ENGTYPE_1) in the shapefile:
image

We can't accurately load this file with the correct level into our Geography using the loadshp command as we have an ambiguous levels for different records.

This command will split this file into 3 files as shown below:

image

This split allows us to load these records with 3 calls to the loadshp command each with the appropriate level as an argument: City, District and Town. However, as you've noted, if there's only a single level for all the records in the file, it just produces a single file in the output directory that's identical to the input.

The second script will take use the output of this command and load the shapefiles by calling loadshp internally while extracting the arguments from the shapefiles. It also modifies this script to specify an output directory for the splits and takes in a glob as an argument. More details in that PR

@DavidTheProgrammer
Copy link
Author

LGTM

  1. Don't know much about shapefiles to know know how useful this would be
  2. Shouldn't (level) keys be part of the app itself? (profile, settings, etc.) i.e. basically run this command to ensure the given shapefiles will work for your app.

I can respond to 2; unfortunately, that wouldn't work as this is a pre-requisite to running the app in the first place, it's one of the first things we do when trying to run the app; Loading the Geographies into the wazimap backend.

…on. Fixed Docstring to reflect true actions.
@kilemensi
Copy link
Member

I can respond to 2; unfortunately, that wouldn't work as this is ...

I'm likely missing something @DavidTheProgrammer but:

  1. Management commands should be able to load settings so if levels are configured there, any management command should be able to load them.
  2. I was also able to create a Dummy Profile (with dummy Geography hierarchy and dummy Geography) without adding anything shape-specific. Again, if levels are configured there, any management command should be able to load models from the database without any issues.

Why wouldn't any of this work?

@DavidTheProgrammer
Copy link
Author

I can respond to 2; unfortunately, that wouldn't work as this is ...

I'm likely missing something @DavidTheProgrammer but:

  1. Management commands should be able to load settings so if levels are configured there, any management command should be able to load them.
  2. I was also able to create a Dummy Profile (with dummy Geography hierarchy and dummy Geography) without adding anything shape-specific. Again, if levels are configured there, any management command should be able to load models from the database without any issues.

Why wouldn't any of this work?

  1. Oh I believe I misunderstood, if it's SETTINGS as in settings.py yes, yes it can work. One consideration is that this step, isn't necessary to run the app, it's more of a clean-up before load step. In this case, would it be prudent to have settings in the app that have nothing to do with it's runtime operations? In addition to this, this command can work with any shape files that the user may have that contain the details of the geography they may have, so it's possible the field that stores the level may be different in their files, in this case it may have to be an environment variable, but the original consideration remains.

  2. There's a table/model, that's not in the admin panel, GeographyBoundary that contains the shape information and relates that to the Geography. The command we use to ultimately load the shape files loadshp uses this model to relate the Geography and Boundary information from the shape files,

@kilemensi
Copy link
Member

kilemensi commented Nov 25, 2024

Oh I believe I misunderstood, ....

Cool, cool ... I could be wrong but I don't think Django management commands are the kind of commands that someone will install on their system to do generic things; they're mostly tied to the running/management of app. I'm sure there are apps/tools/scripts/commands out there to process shapefiles. The utility of this command lies in it processing the shapefiles in a way that's useful to this app moving forward.

But it's cool if you / Xavier think the current approach works; I feel like like just saying "I have these shapefiles for this profile" is enough for this kind of a management command; it should be able to pull levels, etc. from the specified profile.

There's a table/model, that's not in the admin panel,

Yes, my point was a Profile can be created before any of shapefiles/boundaries models/configuration are added to the system and that's everything this command needs (or settings if the app will only have one profile)

@DavidTheProgrammer
Copy link
Author

DavidTheProgrammer commented Nov 25, 2024

The utility of this command lies in it processing the shapefiles in a way that's useful to this app moving forward

That's a fair point @kilemensi. I guess the decision then lies on whether the options should be configurations in the app (in settings or somewhere) or passed as arguments to the command. Do I get you right? My design was inspired by the existing loadshp command that has all the configs passed in as args. I'll discuss with @thepsalmist is this is optimal for our use-case or we can benefit from a slight deviation.

Yes, my point was a Profile can be created before any of shapefiles/boundaries models/configuration are added to the system and that's everything this command needs

Yes, that's true. And actually, this command's only job is to split the shape files to prepare them for load and it technically doesn't need anything other than the level keys. The other command in PR #11 is the one that actually does all the loading.

@kilemensi
Copy link
Member

Two things to keep in mind @DavidTheProgrammer / @thepsalmist

  1. For all practical purposes, the design of Wazimap NG is slightly on older side by now (4-5 years old code, targeting Python 3.7 / Django 2, etc.) so we should think a bit more before adding new features/code, and
  2. Related to this, before investing deeply into enhancements/workarounds, it may make sense to first establish lines of communication to understand where OpenUp is on their roadmap and any immediate future plans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 🆕 Icebox
Development

Successfully merging this pull request may close these issues.

4 participants