This page was generated from nhd_navigation.ipynb. Interactive online version:
NHD Navigation#
[1]:
from __future__ import annotations
import folium
import networkx as nx
import pynhd
from pynhd import GeoConnex, WaterData
In this example, we explore navigating the National Hydrography Dataset (NHD) using the networkx
package. First, we use pynhd.enhd_flowlines_nx
to obtain the CONUS drainage network based on the ENHD. This function returns a networkx.DiGraph
object, a mapping from the node IDs of the generated network to ComIDs of ENHD, and a list of topologically sorted ComIDs. Note that for performance reasons, the node IDs of
the generated network are different than the ComIDs of ENHD.
[2]:
graph, node2comid, _ = pynhd.enhd_flowlines_nx()
comid2node = {v: k for k, v in node2comid.items()}
We validate our navigation approach with the Mainstems’ dataset that we can retrieve from the GeoConnex web service. Moreover, for visualization purposes, we obtain the flowline geometries from the WaterData web service.
[3]:
gcx = GeoConnex("mainstems")
nhd_flw = WaterData("nhdflowline_network")
We pick Wisconsin River for this tutorial. Let’s start by getting the mainstem of Wisconsin River from GeoConnex.
[4]:
ms = gcx.byitem("323742")
ms.explore()
[4]:
GeoConnex returns a GeoDataFrame containing the requested mainstem(s) and their headwater and outlet NHDPlus v2 ComIDs. We use these two ComIDs to the navigate Wisconsin River in the ENHD network.
[5]:
outlet_comid = int(ms["outlet_nhdpv2_comid"].str.rsplit("/", n=1).iloc[0][-1])
head_comid = int(ms["head_nhdpv2_comid"].str.rsplit("/", n=1).iloc[0][-1])
Navigating from a headwater to an outlet is straightforward. We use networkx.shortest_path
that returns a list of nodes that we can use to get the ComIDs of the flowlines. Finding the shortest path between two nodes in a network is a well-known problem in graph theory. For a weighted graph with positive weight values, the Dijkstra’s algorithm is the most efficient approach. Note that in graph theory, the shortest path is a path that
has the minimum sum of its edge weights, i.e., the path with the least resistance (minimum navigation cost).
So, for navigating a NHDPlus from upstream to downstream, we need to consider using an attribute as edge weights that favors the flowlines without divergence. Thus, using the divergence
attribute is recommended for cases that divergence is a concern. However, ENHD provides a dendritic drainage network for CONUS, i.e., no divergence. So, we can simply navigate between two nodes in the ENHD network without using any weights.
[6]:
main_comids = [
node2comid[n] for n in nx.shortest_path(graph, comid2node[head_comid], comid2node[outlet_comid])
]
flw_main = nhd_flw.byid("comid", main_comids)
For visualization purposes and validating our approach, we overlay the flowlines that we obtained through navigation, on the mainstem of the Wisconsin River.
[7]:
m = ms.explore(style_kwds={"color": "blue", "weight": 4})
folium.GeoJson(flw_main, style_function=lambda f: {"color": "red", "weight": 1}).add_to(m)
m
[7]:
Now, let’s navigate based on distance. For this purpose, we need to use a Breadth-First Search algorithm. networkx
provides networkx.bfs_edges
that returns an iterator over edges in a breadth-first-search starting at source. However, it does not take into account the distance or direction. Thus, we write a custom BFS method that allows us to navigate in upstream (including both the mainstem and its tributaries), upstream only
through mainstam, and downstream. Note that since ENHD is a dendritic network, we don’t have to worry about downstream navigation through mainstem only since it will be the same as downstream navigation through the whole network. For upstream navigation along mainstems, we need to use the stream order attribute. NHDPlus provides Strahler stream order as an attribute called streamorde
. In Strahler’s stream order, the stream order increases as streams merge, i.e., mainstems have the highest
stream order. So, we can use this attribute to navigate upstream along the mainstems by favoring the flowlines with higher stream order. Also, we use lengthkm
attribute from NHDPlus for computing the distance of the navigation path.
In the function below, note the use of Python’s deque
built-in function for performance reasons.
[8]:
from collections import deque
from typing import Literal
def navigate_by_distance(
graph: nx.DiGraph,
source_node: int,
nav_dir: Literal["downstream", "upstream", "upstream_main"],
distance_km: int,
lengthkm_attr: str = "lengthkm",
streamorder_attr: str = "streamorde",
) -> list[int]:
"""Navigate a directed graph by distance, with modifications to handle edge cases.
Parameters
----------
graph: nx.DiGraph
The directed graph to navigate.
source_node: int
The source node in `graph` from which to start navigation.
nav_dir: {"downstream", "upstream", "upstream_main"}
Direction of navigation.
distance_km: int
The maximum distance to navigate in kilometers.
lengthkm_attr: str, optional
The edge attribute name that indicates the length of edges.
Defaults to ``length``.
streamorder_attr: str, optional
The edge attribute name that indicates the stream order of edges.
Only used if ``nav_dir`` is ``upstream_main``.
Defaults to ``streamorde``.
Returns
-------
list
Nodes within the specified distance from the source in the given
navigation direction.
"""
attrs = next(iter(graph.edges.data()))[-1]
if lengthkm_attr not in attrs:
raise ValueError(f"lengthkm_attr must be one of {list(attrs)}")
if nav_dir == "upstream_main" and streamorder_attr not in attrs:
raise ValueError(f"streamorder_attr must be one of {list(attrs)}")
if nav_dir == "downstream":
nb_func = graph.successors
get_weight = lambda node, nb, w: graph.edges[node, nb][w]
elif nav_dir == "upstream":
nb_func = graph.predecessors
get_weight = lambda node, nb, w: graph.edges[nb, node][w]
elif nav_dir == "upstream_main":
get_stream_order = lambda node, pred: graph.edges[pred, node][streamorder_attr]
def max_predecessor(node: int) -> list[int]:
preds = graph.predecessors(node)
try:
return [max(preds, key=lambda pred: get_stream_order(node, pred))]
except ValueError:
return []
nb_func = max_predecessor
get_weight = lambda node, nb, w: graph.edges[nb, node][w]
else:
raise ValueError("nav_dir must be either 'downstream', 'upstream', or 'upstream_main'")
if source_node not in graph:
raise ValueError(f"Source node {source_node} not found in graph")
visited = {source_node}
queue = deque([(source_node, 0)])
nodes_within_distance = []
while queue:
node, distance = queue.popleft()
if distance <= distance_km:
nodes_within_distance.append(node)
for nb in nb_func(node):
total_distance = distance + get_weight(node, nb, lengthkm_attr)
if nb not in visited and total_distance <= distance_km:
visited.add(nb)
queue.append((nb, total_distance))
return nodes_within_distance
First, let’s get all the flowlines that are upstream of the river’s outlet up to 80 km.
[9]:
nodes_up = navigate_by_distance(graph, comid2node[outlet_comid], "upstream", 80)
comids = [node2comid[n] for n in nodes_up]
flw_up = nhd_flw.byid("comid", comids)
[10]:
m = ms.explore(style_kwds={"color": "blue", "weight": 4})
folium.GeoJson(flw_up, style_function=lambda f: {"color": "red", "weight": 1}).add_to(m)
m
[10]:
Next, let’s get all the flowlines that are upstream of the river’s outlet up to 80 km and along its mainstem.
[11]:
nodes_up = navigate_by_distance(graph, comid2node[outlet_comid], "upstream_main", 80)
comids = [node2comid[n] for n in nodes_up]
flw_up = nhd_flw.byid("comid", comids)
[12]:
m = ms.explore(style_kwds={"color": "blue", "weight": 4})
folium.GeoJson(flw_up, style_function=lambda f: {"color": "red", "weight": 1}).add_to(m)
m
[12]:
Next, we get all the flowlines that are downstream of the river’s headwater up to 200 km.
[13]:
nodes_dn = navigate_by_distance(graph, comid2node[head_comid], "downstream", 200)
comids = [node2comid[n] for n in nodes_dn]
flw_dn = nhd_flw.byid("comid", comids)
[14]:
m = ms.explore(style_kwds={"color": "blue", "weight": 4})
folium.GeoJson(flw_dn, style_function=lambda f: {"color": "red", "weight": 1}).add_to(m)
m
[14]: