pynhd.network_tools#
Access NLDI and WaterData databases.
Module Contents#
- class pynhd.network_tools.NHDTools(flowlines)#
Prepare NHDPlus data for downstream analysis.
Notes
Some of these tools are ported from nhdplusTools.
- Parameters:
flowlines (
geopandas.GeoDataFrame
) – NHDPlus flowlines with at least the following columns:comid
,lengthkm
,ftype
,terminalfl
,fromnode
,tonode
,totdasqkm
,startflag
,streamorde
,streamcalc
,terminalpa
,pathlength
,divergence
,hydroseq
, andlevelpathi
.
- add_tocomid()#
Find the downstream comid(s) of each comid in NHDPlus flowline database.
Notes
- This functions requires the following columns:
comid
,terminalpa
,fromnode
,tonode
- static check_requirements(reqs, cols)#
Check for all the required data.
- clean_flowlines(use_enhd_attrs, terminal2nan)#
Clean up flowlines.
- remove_isolated()#
Remove isolated flowlines.
- remove_tinynetworks(min_path_size, min_path_length, min_network_size)#
Remove small paths in NHDPlus flowline database.
Notes
This functions requires the following columns:
levelpathi
,hydroseq
,totdasqkm
,terminalfl
,startflag
,pathlength
, andterminalpa
.- Parameters:
min_network_size (
float
) – Minimum size of drainage network in sqkm.min_path_length (
float
) – Minimum length of terminal level path of a network in km.min_path_size (
float
) – Minimum size of outlet level path of a drainage basin in km. Drainage basins with an outlet drainage area smaller than this value will be removed.
- to_linestring()#
Convert flowlines to shapely LineString objects.
- pynhd.network_tools.enhd_flowlines_nx()#
Get a
networkx.DiGraph
of the entire NHD flowlines.Changed in version 0.16.2: The function now replaces all 0 values in the
tocomid
column of ENHD with the negative of their correspondingcomid
values. This ensures all sinks are unique and treated accordingly for topological sorting and other network analysis. The difference are in the returnedlabel2comid
dictionary andonnetwork_sorted
which will contain the negative values for the sinks.Notes
The graph is directed and has the all the attributes of the flowlines in ENHD. Note that COMIDs are based on the 2020 snapshot of the NHDPlusV2.1.
- Returns:
graph (
networkx.DiGraph
) – The generated directed graphlabel2comid (
dict
) – A mapping of COMIDs to the node IDs in the graphonnetwork_sorted (
list
) – A topologically sorted list of the COMIDs.
- Return type:
- pynhd.network_tools.flowline_resample(flw, spacing, id_col='comid', smoothing=None)#
Resample a flowline based on a given spacing.
- Parameters:
flw (
geopandas.GeoDataFrame
) – A dataframe withgeometry
andid_col
columns and CRS attribute. The flowlines should be able to merged to a singleLineString
. Otherwise, you should use thenetwork_resample()
function.spacing (
float
) – Spacing between the sample points in meters.id_col (
str
, optional) – Name of the flowlines column containing IDs, defaults tocomid
.smoothing (
float
orNone
, optional) – Smoothing factor is used for determining the number of knots. This arg controls the tradeoff between closeness and smoothness of fit. Largersmoothing
means more smoothing while smaller values ofsmoothing
indicates less smoothing. If None (default), smoothing is done with all points.
- Returns:
geopandas.GeoDataFrame
– Resampled flowline.- Return type:
- pynhd.network_tools.flowline_xsection(flw, distance, width, id_col='comid', smoothing=None)#
Get cross-section of a river network at a given spacing.
- Parameters:
flw (
geopandas.GeoDataFrame
) – A dataframe withgeometry
and,id_col
, andlevelpathi
columns and a projected CRS attribute.distance (
float
) – The distance between two consecutive cross-sections.width (
float
) – The width of the cross-section.id_col (
str
, optional) – Name of the flowlines column containing IDs, defaults tocomid
.smoothing (
float
orNone
, optional) – Smoothing factor is used for determining the number of knots. This arg controls the tradeoff between closeness and smoothness of fit. Largersmoothing
means more smoothing while smaller values ofsmoothing
indicates less smoothing. If None (default), smoothing is done with all points.
- Returns:
geopandas.GeoDataFrame
– A dataframe with two columns:geometry
andcomid
. Thegeometry
column contains the cross-section of the river network and thecomid
column contains the correspondingcomid
from the input dataframe. Note that eachcomid
can have multiple cross-sections depending on the given spacing distance.- Return type:
- pynhd.network_tools.mainstem_huc12_nx()#
Get a
networkx.DiGraph
of the entire mainstem HUC12s.Notes
The directed graph is generated from the
nhdplusv2wbd.csv
file with all attributes that can be found in Mainstem. Note that HUC12s are based on the 2020 snapshot of the NHDPlusV2.1.- Returns:
networkx.DiGraph
– The mainstem as anetworkx.DiGraph
with all the attributes of the mainstems.dict
– A mapping of the HUC12s to the node IDs in the graph.list
– A topologically sorted list of the HUC12s which strings of length 12.
- Return type:
- pynhd.network_tools.network_resample(flw, spacing, id_col='comid', smoothing=None)#
Resample a network flowline based on a given spacing.
- Parameters:
flw (
geopandas.GeoDataFrame
) – A dataframe withgeometry
and,id_col
, andlevelpathi
columns and a projected CRS attribute.spacing (
float
) – Target spacing between the sample points in the length unit of theflw
’s CRS.id_col (
str
, optional) – Name of the flowlines column containing IDs, defaults tocomid
.smoothing (
float
orNone
, optional) – Smoothing factor is used for determining the number of knots. This arg controls the tradeoff between closeness and smoothness of fit. Largersmoothing
means more smoothing while smaller values ofsmoothing
indicates less smoothing. If None (default), smoothing is done with all points.
- Returns:
geopandas.GeoDataFrame
– Resampled flowlines.- Return type:
- pynhd.network_tools.network_xsection(flw, distance, width, id_col='comid', smoothing=None)#
Get cross-section of a river network at a given spacing.
- Parameters:
flw (
geopandas.GeoDataFrame
) – A dataframe withgeometry
and,id_col
, andlevelpathi
columns and a projected CRS attribute.distance (
float
) – The distance between two consecutive cross-sections.width (
float
) – The width of the cross-section.id_col (
str
, optional) – Name of the flowlines column containing IDs, defaults tocomid
.smoothing (
float
orNone
, optional) – Smoothing factor is used for determining the number of knots. This arg controls the tradeoff between closeness and smoothness of fit. Largersmoothing
means more smoothing while smaller values ofsmoothing
indicates less smoothing. If None (default), smoothing is done with all points.
- Returns:
geopandas.GeoDataFrame
– A dataframe with two columns:geometry
andcomid
. Thegeometry
column contains the cross-section of the river network and thecomid
column contains the correspondingcomid
from the input dataframe. Note that eachcomid
can have multiple cross-sections depending on the given spacing distance.- Return type:
- pynhd.network_tools.nhdflw2nx(flowlines, id_col='comid', toid_col='tocomid', edge_attr=None)#
Convert NHDPlus flowline database to networkx graph.
- Parameters:
flowlines (
geopandas.GeoDataFrame
) – NHDPlus flowlines.id_col (
str
, optional) – Name of the column containing the node ID, defaults to “comid”.toid_col (
str
, optional) – Name of the column containing the downstream node ID, defaults to “tocomid”.edge_attr (
str
, optional) – Name of the column containing the edge attributes, defaults toNone
. IfTrue
, all remaining columns will be used as edge attributes.
- Returns:
nx.DiGraph
– Networkx directed graph of the NHDPlus flowlines. Note that all elements of thetoid_col
are replaced with negative values of their correspondingid_cl
values if they areNaN
or 0. This is to ensure that the generated nodes in the graph are unique.- Return type:
- pynhd.network_tools.nhdplus_l48(layer=None, data_dir='cache', **kwargs)#
Get the entire NHDPlus dataset.
Notes
The entire NHDPlus dataset for CONUS (Lower 48) is downloaded from here. This 7.3 GB file will take a while to download, depending on your internet connection. The first time you run this function, the file will be downloaded and stored in the
./cache
directory. Subsequent calls will use the cached file. Moreover, there are two additional dependencies required to read the file:pyogrio
andpy7zr
. These dependencies can be installed usingpip install pyogrio py7zr
orconda install -c conda-forge pyogrio py7zr
.- Parameters:
layer (
str
, optional) – The layer name to be returned. Eitherlayer
should be provided orsql
. Defaults toNone
. The available layers are:Gage
BurnAddLine
BurnAddWaterbody
LandSea
Sink
Wall
Catchment
CatchmentSP
NHDArea
NHDWaterbody
HUC12
NHDPlusComponentVersions
PlusARPointEvent
PlusFlowAR
NHDFCode
DivFracMP
BurnLineEvent
NHDFlowline_Network
NHDFlowline_NonNetwork
GeoNetwork_Junctions
PlusFlow
N_1_Desc
N_1_EDesc
N_1_EStatus
N_1_ETopo
N_1_FloDir
N_1_JDesc
N_1_JStatus
N_1_JTopo
N_1_JTopo2
N_1_Props
data_dire (
str
orpathlib.Pathlib.Path
) – Directory to store the downloaded file and use in subsequent calls, defaults to./cache
.**kwargs – Keyword arguments are passed to
pyogrio.read_dataframe
. For more information, visit pyogrio.
- Returns:
geopandas.GeoDataFrame
– A dataframe with all the NHDPlus data.- Return type:
- pynhd.network_tools.prepare_nhdplus(flowlines, min_network_size, min_path_length, min_path_size=0, purge_non_dendritic=False, remove_isolated=False, use_enhd_attrs=False, terminal2nan=True)#
Clean up and fix common issues of NHDPlus MR and HR flowlines.
Ported from nhdplusTools.
- Parameters:
flowlines (
geopandas.GeoDataFrame
) – NHDPlus flowlines with at least the following columns:comid
,lengthkm
,ftype
,terminalfl
,fromnode
,tonode
,totdasqkm
,startflag
,streamorde
,streamcalc
,terminalpa
,pathlength
,divergence
,hydroseq
,levelpathi
.min_network_size (
float
) – Minimum size of drainage network in sqkmmin_path_length (
float
) – Minimum length of terminal level path of a network in km.min_path_size (
float
, optional) – Minimum size of outlet level path of a drainage basin in km. Drainage basins with an outlet drainage area smaller than this value will be removed. Defaults to 0.purge_non_dendritic (
bool
, optional) – Whether to remove non dendritic paths, defaults toFalse
.remove_isolated (
bool
, optional) – Whether to remove isolated flowlines, i.e., keep only the largest connected component of the flowlines. Defaults toFalse
.use_enhd_attrs (
bool
, optional) – Whether to replace the attributes with the ENHD attributes, defaults toFalse
. Note that this only works for NHDPlus mid-resolution (MR) and does not work for NHDPlus high-resolution (HR). For more information, see this.terminal2nan (
bool
, optional) – Whether to replace the COMID of the terminal flowline of the network with NaN, defaults toTrue
. IfFalse
, the terminal COMID will be set from the ENHD attributes i.e.use_enhd_attrs
will be set toTrue
which is only applicable to NHDPlus mid-resolution (MR).
- Returns:
geopandas.GeoDataFrame
– Cleaned up flowlines. Note that all column names are converted to lower case.- Return type:
- pynhd.network_tools.topoogical_sort(flowlines, edge_attr=None, largest_only=False, id_col='ID', toid_col='toID')#
Topological sorting of a river network.
- Parameters:
flowlines (
pandas.DataFrame
) – A dataframe with columns ID and toIDedge_attr (
str
orlist
, optional) – Names of the columns in the dataframe to be used as edge attributes, defaults to None.largest_only (
bool
, optional) – Whether to return only the largest network, defaults toFalse
.id_col (
str
, optional) – Name of the column containing the node ID, defaults toID
.toid_col (
str
, optional) – Name of the column containing the downstream node ID, defaults totoID
.
- Returns:
(list
, dict ,networkx.DiGraph)
– A list of topologically sorted IDs, a dictionary with keys as IDs and values as a list of its upstream nodes, and the generatednetworkx.DiGraph
object. Note that node IDs are associated with the input flow line IDs, but there might be some negative IDs in the output graph that are not present in the input flow line IDs. These “artificial” nodes are used to represent the graph outlet (the most downstream nodes) in the graph.- Return type:
tuple[list[numpy.int64 | pandas._libs.missing.NAType], dict[int, list[int]], networkx.DiGraph]
- pynhd.network_tools.vector_accumulation(flowlines, func, attr_col, arg_cols, id_col='comid', toid_col='tocomid')#
Flow accumulation using vector river network data.
- Parameters:
flowlines (
pandas.DataFrame
) – A dataframe containing comid, tocomid, attr_col and all the columns that ara required for passing tofunc
.func (
function
) – The function that routes the flow in a single river segment. Positions of the arguments in the function should be as follows:func(qin, *arg_cols)
qin
is computed in this function and the rest are in the order of thearg_cols
. For example, ifarg_cols = ["slope", "roughness"]
then the functions is called this way:func(qin, slope, roughness)
where slope and roughness are elemental values read from the flowlines.attr_col (
str
) – The column name of the attribute being accumulated in the network. The column should contain the initial condition for the attribute for each river segment. It can be a scalar or an array (e.g., time series).arg_cols (
list
ofstrs
) – List of the flowlines columns that contain all the required data for a routing a single river segment such as slope, length, lateral flow, etc.id_col (
str
, optional) – Name of the flowlines column containing IDs, defaults tocomid
toid_col (
str
, optional) – Name of the flowlines column containingtoIDs
, defaults totocomid
- Returns:
pandas.Series
– Accumulated flow for all the nodes. The dataframe is sorted from upstream to downstream (topological sorting). Depending on the given initial condition in theattr_col
, the outflow for each river segment can be a scalar or an array.- Return type: