This page was generated from nhd_navigation.ipynb. Interactive online version: Binder badge

NHD Navigation#

[1]:
from __future__ import annotations

import folium
import networkx as nx

import pynhd
from pynhd import GeoConnex, WaterData

In this example, we explore navigating the National Hydrography Dataset (NHD) using the networkx package. First, we use pynhd.enhd_flowlines_nx to obtain the CONUS drainage network based on the ENHD. This function returns a networkx.DiGraph object, a mapping from the node IDs of the generated network to ComIDs of ENHD, and a list of topologically sorted ComIDs. Note that for performance reasons, the node IDs of the generated network are different than the ComIDs of ENHD.

[2]:
graph, node2comid, _ = pynhd.enhd_flowlines_nx()
comid2node = {v: k for k, v in node2comid.items()}

We validate our navigation approach with the Mainstems’ dataset that we can retrieve from the GeoConnex web service. Moreover, for visualization purposes, we obtain the flowline geometries from the WaterData web service.

[3]:
gcx = GeoConnex("mainstems")
nhd_flw = WaterData("nhdflowline_network")

We pick Wisconsin River for this tutorial. Let’s start by getting the mainstem of Wisconsin River from GeoConnex.

[4]:
ms = gcx.byitem("323742")
ms.explore()
[4]:
Make this Notebook Trusted to load map: File -> Trust Notebook

GeoConnex returns a GeoDataFrame containing the requested mainstem(s) and their headwater and outlet NHDPlus v2 ComIDs. We use these two ComIDs to the navigate Wisconsin River in the ENHD network.

[5]:
outlet_comid = int(ms["outlet_nhdpv2_comid"].str.rsplit("/", n=1).iloc[0][-1])
head_comid = int(ms["head_nhdpv2_comid"].str.rsplit("/", n=1).iloc[0][-1])

Navigating from a headwater to an outlet is straightforward. We use networkx.shortest_path that returns a list of nodes that we can use to get the ComIDs of the flowlines. Finding the shortest path between two nodes in a network is a well-known problem in graph theory. For a weighted graph with positive weight values, the Dijkstra’s algorithm is the most efficient approach. Note that in graph theory, the shortest path is a path that has the minimum sum of its edge weights, i.e., the path with the least resistance (minimum navigation cost).

So, for navigating a NHDPlus from upstream to downstream, we need to consider using an attribute as edge weights that favors the flowlines without divergence. Thus, using the divergence attribute is recommended for cases that divergence is a concern. However, ENHD provides a dendritic drainage network for CONUS, i.e., no divergence. So, we can simply navigate between two nodes in the ENHD network without using any weights.

[6]:
main_comids = [
    node2comid[n] for n in nx.shortest_path(graph, comid2node[head_comid], comid2node[outlet_comid])
]
flw_main = nhd_flw.byid("comid", main_comids)

For visualization purposes and validating our approach, we overlay the flowlines that we obtained through navigation, on the mainstem of the Wisconsin River.

[7]:
m = ms.explore(style_kwds={"color": "blue", "weight": 4})
folium.GeoJson(flw_main, style_function=lambda f: {"color": "red", "weight": 1}).add_to(m)
m
[7]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Now, let’s navigate based on distance. For this purpose, we need to use a Breadth-First Search algorithm. networkx provides networkx.bfs_edges that returns an iterator over edges in a breadth-first-search starting at source. However, it does not take into account the distance or direction. Thus, we write a custom BFS method that allows us to navigate in upstream (including both the mainstem and its tributaries), upstream only through mainstam, and downstream. Note that since ENHD is a dendritic network, we don’t have to worry about downstream navigation through mainstem only since it will be the same as downstream navigation through the whole network. For upstream navigation along mainstems, we need to use the stream order attribute. NHDPlus provides Strahler stream order as an attribute called streamorde. In Strahler’s stream order, the stream order increases as streams merge, i.e., mainstems have the highest stream order. So, we can use this attribute to navigate upstream along the mainstems by favoring the flowlines with higher stream order. Also, we use lengthkm attribute from NHDPlus for computing the distance of the navigation path.

In the function below, note the use of Python’s deque built-in function for performance reasons.

[8]:
from collections import deque
from typing import Literal


def navigate_by_distance(
    graph: nx.DiGraph,
    source_node: int,
    nav_dir: Literal["downstream", "upstream", "upstream_main"],
    distance_km: int,
    lengthkm_attr: str = "lengthkm",
    streamorder_attr: str = "streamorde",
) -> list[int]:
    """Navigate a directed graph by distance, with modifications to handle edge cases.

    Parameters
    ----------
    graph: nx.DiGraph
        The directed graph to navigate.
    source_node: int
        The source node in `graph` from which to start navigation.
    nav_dir: {"downstream", "upstream", "upstream_main"}
        Direction of navigation.
    distance_km: int
        The maximum distance to navigate in kilometers.
    lengthkm_attr: str, optional
        The edge attribute name that indicates the length of edges.
        Defaults to ``length``.
    streamorder_attr: str, optional
        The edge attribute name that indicates the stream order of edges.
        Only used if ``nav_dir`` is ``upstream_main``.
        Defaults to ``streamorde``.

    Returns
    -------
    list
        Nodes within the specified distance from the source in the given
        navigation direction.
    """
    attrs = next(iter(graph.edges.data()))[-1]
    if lengthkm_attr not in attrs:
        raise ValueError(f"lengthkm_attr must be one of {list(attrs)}")

    if nav_dir == "upstream_main" and streamorder_attr not in attrs:
        raise ValueError(f"streamorder_attr must be one of {list(attrs)}")

    if nav_dir == "downstream":
        nb_func = graph.successors
        get_weight = lambda node, nb, w: graph.edges[node, nb][w]
    elif nav_dir == "upstream":
        nb_func = graph.predecessors
        get_weight = lambda node, nb, w: graph.edges[nb, node][w]
    elif nav_dir == "upstream_main":
        get_stream_order = lambda node, pred: graph.edges[pred, node][streamorder_attr]

        def max_predecessor(node: int) -> list[int]:
            preds = graph.predecessors(node)
            try:
                return [max(preds, key=lambda pred: get_stream_order(node, pred))]
            except ValueError:
                return []

        nb_func = max_predecessor
        get_weight = lambda node, nb, w: graph.edges[nb, node][w]
    else:
        raise ValueError("nav_dir must be either 'downstream',  'upstream', or 'upstream_main'")

    if source_node not in graph:
        raise ValueError(f"Source node {source_node} not found in graph")

    visited = {source_node}
    queue = deque([(source_node, 0)])
    nodes_within_distance = []

    while queue:
        node, distance = queue.popleft()
        if distance <= distance_km:
            nodes_within_distance.append(node)
            for nb in nb_func(node):
                total_distance = distance + get_weight(node, nb, lengthkm_attr)

                if nb not in visited and total_distance <= distance_km:
                    visited.add(nb)
                    queue.append((nb, total_distance))

    return nodes_within_distance

First, let’s get all the flowlines that are upstream of the river’s outlet up to 80 km.

[9]:
nodes_up = navigate_by_distance(graph, comid2node[outlet_comid], "upstream", 80)
comids = [node2comid[n] for n in nodes_up]
flw_up = nhd_flw.byid("comid", comids)
[10]:
m = ms.explore(style_kwds={"color": "blue", "weight": 4})
folium.GeoJson(flw_up, style_function=lambda f: {"color": "red", "weight": 1}).add_to(m)
m
[10]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Next, let’s get all the flowlines that are upstream of the river’s outlet up to 80 km and along its mainstem.

[11]:
nodes_up = navigate_by_distance(graph, comid2node[outlet_comid], "upstream_main", 80)
comids = [node2comid[n] for n in nodes_up]
flw_up = nhd_flw.byid("comid", comids)
[12]:
m = ms.explore(style_kwds={"color": "blue", "weight": 4})
folium.GeoJson(flw_up, style_function=lambda f: {"color": "red", "weight": 1}).add_to(m)
m
[12]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Next, we get all the flowlines that are downstream of the river’s headwater up to 200 km.

[13]:
nodes_dn = navigate_by_distance(graph, comid2node[head_comid], "downstream", 200)
comids = [node2comid[n] for n in nodes_dn]
flw_dn = nhd_flw.byid("comid", comids)
[14]:
m = ms.explore(style_kwds={"color": "blue", "weight": 4})
folium.GeoJson(flw_dn, style_function=lambda f: {"color": "red", "weight": 1}).add_to(m)
m
[14]:
Make this Notebook Trusted to load map: File -> Trust Notebook