Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix in valid GIS names used to create water network models #452

Merged
merged 8 commits into from
Nov 12, 2024
Merged
42 changes: 21 additions & 21 deletions documentation/gis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,13 +112,13 @@ For example, the junctions GeoDataFrame contains the following information:
:skipif: gpd is None

>>> print(wn_gis.junctions.head())
node_type elevation initial_quality geometry
name
10 Junction 216.408 5.000e-04 POINT (20.00000 70.00000)
11 Junction 216.408 5.000e-04 POINT (30.00000 70.00000)
12 Junction 213.360 5.000e-04 POINT (50.00000 70.00000)
13 Junction 211.836 5.000e-04 POINT (70.00000 70.00000)
21 Junction 213.360 5.000e-04 POINT (30.00000 40.00000)
elevation initial_quality geometry
name
10 216.408 5.000e-04 POINT (20.00000 70.00000)
11 216.408 5.000e-04 POINT (30.00000 70.00000)
12 213.360 5.000e-04 POINT (50.00000 70.00000)
13 211.836 5.000e-04 POINT (70.00000 70.00000)
21 213.360 5.000e-04 POINT (30.00000 40.00000)

Each GeoDataFrame contains attributes and geometry:

Expand Down Expand Up @@ -334,23 +334,23 @@ and then translates the GeoDataFrames coordinates to EPSG:3857.

>>> wn_gis = wntr.network.to_gis(wn, crs='EPSG:4326')
>>> print(wn_gis.junctions.head())
node_type elevation initial_quality geometry
name
10 Junction 216.408 5.000e-04 POINT (20.00000 70.00000)
11 Junction 216.408 5.000e-04 POINT (30.00000 70.00000)
12 Junction 213.360 5.000e-04 POINT (50.00000 70.00000)
13 Junction 211.836 5.000e-04 POINT (70.00000 70.00000)
21 Junction 213.360 5.000e-04 POINT (30.00000 40.00000)
elevation initial_quality geometry
name
10 216.408 5.000e-04 POINT (20.00000 70.00000)
11 216.408 5.000e-04 POINT (30.00000 70.00000)
12 213.360 5.000e-04 POINT (50.00000 70.00000)
13 211.836 5.000e-04 POINT (70.00000 70.00000)
21 213.360 5.000e-04 POINT (30.00000 40.00000)

>>> wn_gis.to_crs('EPSG:3857')
>>> print(wn_gis.junctions.head())
node_type elevation initial_quality geometry
name
10 Junction 216.408 5.000e-04 POINT (2226389.816 11068715.659)
11 Junction 216.408 5.000e-04 POINT (3339584.724 11068715.659)
12 Junction 213.360 5.000e-04 POINT (5565974.540 11068715.659)
13 Junction 211.836 5.000e-04 POINT (7792364.356 11068715.659)
21 Junction 213.360 5.000e-04 POINT (3339584.724 4865942.280)
elevation initial_quality geometry
name
10 216.408 5.000e-04 POINT (2226389.816 11068715.659)
11 216.408 5.000e-04 POINT (3339584.724 11068715.659)
12 213.360 5.000e-04 POINT (5565974.540 11068715.659)
13 211.836 5.000e-04 POINT (7792364.356 11068715.659)
21 213.360 5.000e-04 POINT (3339584.724 4865942.280)

Snap point geometries to the nearest point or line
----------------------------------------------------
Expand Down
39 changes: 21 additions & 18 deletions documentation/model_io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -206,27 +206,29 @@ GeoJSON files
GeoJSON files are commonly used to store geographic data structures.
More information on GeoJSON files can be found at https://geojson.org.

To use GeoJSON files in WNTR, a set of valid base column names are required.
Valid base GeoJSON column names can be obtained using the
:class:`~wntr.network.io.valid_gis_names` function.
The following example returns valid base GeoJSON column names for junctions.
When reading GeoJSON files into WNTR, only a set of valid column names can be used.
Valid GeoJSON column names can be obtained using the
:class:`~wntr.network.io.valid_gis_names` function. By default, the function
returns all column names, both required and optional.
The following example returns valid GeoJSON column names for junctions.

.. doctest::
:skipif: gpd is None

>>> geojson_column_names = wntr.network.io.valid_gis_names()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked back at this portion of the documentation and it is a little confusing how we talk about the valid_gis_names function. It is probably easiest for me to explain by making my suggest updates in a PR to your branch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated discussion of the function in the user docs in kaklise#76

>>> print(geojson_column_names['junctions'])
['name', 'elevation', 'coordinates', 'emitter_coefficient', 'initial_quality', 'minimum_pressure', 'required_pressure', 'pressure_exponent', 'tag']
['name', 'elevation', 'geometry', 'emitter_coefficient', 'initial_quality', 'minimum_pressure', 'required_pressure', 'pressure_exponent', 'tag']

A minimal list of valid column names can also be obtained by setting ``complete_list`` to False.
Column names that are optional (i.e., ``initial_quality``) and not included in the GeoJSON file are defined using default values.
A minimal list of required column names can also be obtained by setting ``complete_list`` to False.
Column names that are optional (i.e., ``initial_quality``) and not included in the GeoJSON file are
defined using default values.

.. doctest::
:skipif: gpd is None

>>> geojson_column_names = wntr.network.io.valid_gis_names(complete_list=False)
>>> print(geojson_column_names['junctions'])
['name', 'elevation', 'coordinates']
['name', 'elevation', 'geometry']

Note that GeoJSON files can contain additional custom column names that are assigned to WaterNetworkModel objects.

Expand All @@ -253,7 +255,7 @@ Note that patterns, curves, sources, controls, and options are not stored in the

The :class:`~wntr.network.io.read_geojson` function creates a WaterNetworkModel from a
dictionary of GeoJSON files.
Valid base column names and additional custom attributes are added to the model.
Valid column names and additional custom attributes are added to the model.
The function can also be used to append information from GeoJSON files into an existing WaterNetworkModel.

.. doctest::
Expand Down Expand Up @@ -300,20 +302,21 @@ To use Esri Shapefiles in WNTR, several formatting requirements are enforced:
assumed that the first 10 characters of each attribute are unique.

* To create WaterNetworkModel from Shapefiles, a set of valid field names are required.
Valid base Shapefiles field names can be obtained using the
:class:`~wntr.network.io.valid_gis_names` function.
For Shapefiles, the `truncate` input parameter should be set to 10 (characters).
The following example returns valid base Shapefile field names for junctions.
Note that attributes like ``base_demand`` are truncated to ``base_deman``.
Valid Shapefiles field names can be obtained using the
:class:`~wntr.network.io.valid_gis_names` function. By default, the function
returns all column names, both required and optional.
For Shapefiles, the `truncate_names` input parameter should be set to 10 (characters).
The following example returns valid Shapefile field names for junctions.
Note that attributes like ``minimum_pressure`` are truncated to ``minimum_pr``.

.. doctest::
:skipif: gpd is None

>>> shapefile_field_names = wntr.network.io.valid_gis_names(truncate_names=10)
>>> print(shapefile_field_names['junctions'])
['name', 'elevation', 'coordinate', 'emitter_co', 'initial_qu', 'minimum_pr', 'required_p', 'pressure_e', 'tag']
['name', 'elevation', 'geometry', 'emitter_co', 'initial_qu', 'minimum_pr', 'required_p', 'pressure_e', 'tag']

A minimal list of valid field names can also be obtained by setting ``complete_list`` to False.
A minimal list of required field names can also be obtained by setting ``complete_list`` to False.
Field names that are optional (i.e., ``initial_quality``) and not included in the Shapefile are defined using default values.

.. doctest::
Expand All @@ -322,7 +325,7 @@ To use Esri Shapefiles in WNTR, several formatting requirements are enforced:
>>> shapefile_field_names = wntr.network.io.valid_gis_names(complete_list=False,
... truncate_names=10)
>>> print(shapefile_field_names['junctions'])
['name', 'elevation', 'coordinate']
['name', 'elevation', 'geometry']

* Shapefiles can contain additional custom field names that are assigned to WaterNetworkModel objects.

Expand All @@ -349,7 +352,7 @@ Note that patterns, curves, sources, controls, and options are not stored in the

The :class:`~wntr.network.io.read_shapefile` function creates a WaterNetworkModel from a dictionary of
Shapefile directories.
Valid base field names and additional custom field names are added to the model.
Valid field names and additional custom field names are added to the model.
The function can also be used to append information from Shapefiles into an existing WaterNetworkModel.

.. doctest::
Expand Down
67 changes: 53 additions & 14 deletions wntr/gis/network.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,14 +99,21 @@ def _create_gis(self, wn, crs: str = None, pumps_as_points: bool = False,
Represent valves as points (True) or lines (False), by default False
"""

def _extract_geodataframe(df, crs=None, links_as_points=False):
# Drop any column with all NaN
def _extract_geodataframe(df, crs=None, valid_base_names=None,
links_as_points=False):
if valid_base_names is None:
valid_base_names = []

# Drop any column with all NaN, this removes excess attributes
# Valid base attributes that have all None values are added back
# at the end of this routine
df = df.loc[:, ~df.isna().all()]

# Define geom and drop node_type/link_type
if df.shape[0] > 0:
# Define geom
if 'node_type' in df.columns:
geom = [Point((x,y)) for x,y in df['coordinates']]
del df['node_type']
elif 'link_type' in df.columns:
geom = []
for link_name in df['name']:
Expand All @@ -120,24 +127,33 @@ def _extract_geodataframe(df, crs=None, links_as_points=False):
ls.append(v)
ls.append(link.end_node.coordinates)
geom.append(LineString(ls))
del df['link_type']

# Drop column if not a str, float, int, or bool
# Drop column if not a str, float, int, or bool (or np.bool_)
# This drops columns like coordinates, vertices
# This could be extended to keep additional data type (list,
# tuple, network elements like Patterns, Curves)
drop_cols = []
for col in df.columns:
if not isinstance(df.iloc[0][col], (str, float, int, bool)):
# Added np.bool_ to the following check
# Returned by df.to_dict('records') for some network models
if not isinstance(df.iloc[0][col], (str, float, int, bool, np.bool_)):
drop_cols.append(col)
df = df.drop(columns=drop_cols)

# Add back in valid base attributes that had all None values
cols = list(set(valid_base_names) - set(df.columns))
if len(cols) > 0:
df[cols] = None

# Set index
if len(df) > 0:
df.set_index('name', inplace=True)

df = gpd.GeoDataFrame(df, crs=crs, geometry=geom)
else:
df = gpd.GeoDataFrame()

return df

# Convert the WaterNetworkModel to a dictionary
Expand All @@ -146,29 +162,31 @@ def _extract_geodataframe(df, crs=None, links_as_points=False):
df_nodes = pd.DataFrame(wn_dict['nodes'])
df_links = pd.DataFrame(wn_dict['links'])

valid_base_names = self._valid_names(complete_list=False, truncate_names=None)

# Junctions
df = df_nodes[df_nodes['node_type'] == 'Junction']
self.junctions = _extract_geodataframe(df, crs)
self.junctions = _extract_geodataframe(df, crs, valid_base_names['junctions'])

# Tanks
df = df_nodes[df_nodes['node_type'] == 'Tank']
self.tanks = _extract_geodataframe(df, crs)
self.tanks = _extract_geodataframe(df, crs, valid_base_names['tanks'])

# Reservoirs
df = df_nodes[df_nodes['node_type'] == 'Reservoir']
self.reservoirs = _extract_geodataframe(df, crs)
self.reservoirs = _extract_geodataframe(df, crs, valid_base_names['reservoirs'])

# Pipes
df = df_links[df_links['link_type'] == 'Pipe']
self.pipes = _extract_geodataframe(df, crs, False)
self.pipes = _extract_geodataframe(df, crs, valid_base_names['pipes'], False)

# Pumps
df = df_links[df_links['link_type'] == 'Pump']
self.pumps = _extract_geodataframe(df, crs, pumps_as_points)
self.pumps = _extract_geodataframe(df, crs, valid_base_names['pumps'], pumps_as_points)

# Valves
df = df_links[df_links['link_type'] == 'Valve']
self.valves = _extract_geodataframe(df, crs, valves_as_points)
self.valves = _extract_geodataframe(df, crs, valid_base_names['valves'], valves_as_points)

def _create_wn(self, append=None):
"""
Expand All @@ -187,22 +205,32 @@ def _create_wn(self, append=None):
wn_dict['nodes'] = []
wn_dict['links'] = []

for element in [self.junctions, self.tanks, self.reservoirs]:
# Modifications to create a WaterNetworkModel from a dict
# Reset index
# Create coordinates/vertices from geometry
# Add node_type/link_type
for node_type, element in [('Junction', self.junctions),
('Tank', self.tanks),
('Reservoir', self.reservoirs)]:
if element.shape[0] > 0:
assert (element['geometry'].geom_type).isin(['Point']).all()
df = element.reset_index(names="name")
df.rename(columns={'geometry':'coordinates'}, inplace=True)
df['coordinates'] = [[x,y] for x,y in zip(df['coordinates'].x,
df['coordinates'].y)]
df['node_type'] = node_type
wn_dict['nodes'].extend(df.to_dict('records'))

for element in [self.pipes, self.pumps, self.valves]:
for link_type, element in [('Pipe', self.pipes),
('Pump', self.pumps),
('Valve', self.valves)]:
if element.shape[0] > 0:
assert 'start_node_name' in element.columns
assert 'end_node_name' in element.columns
df = element.reset_index(names="name")
df['vertices'] = df.apply(lambda row: list(row.geometry.coords)[1:-1], axis=1)
df.drop(columns=['geometry'], inplace=True)
df['link_type'] = link_type
wn_dict['links'].extend(df.to_dict('records'))

# Create WaterNetworkModel from dictionary
Expand Down Expand Up @@ -470,6 +498,17 @@ def _valid_names(self, complete_list=True, truncate_names=None):
if truncate_names is not None and truncate_names > 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to change element_objects to have keys that match the syntax of the corresponding elements. Something like

        element_objects = {
            'Junction': wntr.network.elements.Junction,
            'Tank': wntr.network.elements.Tank,
            'Reservoir': wntr.network.elements.Reservoir,
            'Pipe': wntr.network.elements.Pipe,
            'Pump': wntr.network.elements.Pump,
            'Valve': wntr.network.elements.Valve}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into doing this and it is a bit of a lift since it is used all over the code base and hard to track down. Best to keep it as is...

for element, attributes in valid_names.items():
valid_names[element] = [attribute[:truncate_names] for attribute in attributes]

for key, vals in valid_names.items():
# Remove coordinates and vertices (not used to create GeoDataFrame geometry)
if 'coordinates' in valid_names[key]:
valid_names[key].remove('coordinates')
if 'vertices' in valid_names[key]:
valid_names[key].remove('vertices')

# Add geometry
if 'geometry' not in valid_names[key]:
valid_names[key].append('geometry')

return valid_names

Expand Down
4 changes: 2 additions & 2 deletions wntr/network/elements.py
Original file line number Diff line number Diff line change
Expand Up @@ -394,7 +394,7 @@ class Tank(Node):
"min_level",
"max_level",
"diameter",
"min_vol"
"min_vol",
"vol_curve_name",
"overflow",
"coordinates"]
Expand Down Expand Up @@ -1041,7 +1041,7 @@ class Pump(Link):
"end_node_name",
"pump_type",
"pump_curve_name",
"power"
"power",
"base_speed",
"speed_pattern_name",
"initial_status"]
Expand Down
5 changes: 3 additions & 2 deletions wntr/network/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -644,12 +644,13 @@ def valid_gis_names(complete_list=True, truncate_names=None):
Valid column/field names for GeoJSON or Shapefiles

Note that Shapefile field names are truncated to 10 characters
(set truncate=10)
(set truncate_names=10)

Parameters
----------
complete_list : bool
Include a complete list of column/field names (beyond basic attributes)
When true, returns both optional and required column/field names.
When false, only returns required column/field names.
truncate_names : None or int
Truncate column/field names to specified number of characters,
set truncate=10 for Shapefiles. None indicates no truncation.
Expand Down
Loading
Loading