pynhd.nhdplus_derived#

Access NLDI and WaterData databases.

Module Contents#

class pynhd.nhdplus_derived.StreamCat(lakes_only=False)#

Get StreamCat API’s properties.

Parameters:

lakes_only (bool, optional) – If True, only return metrics for lakes and their associated catchments from the LakeCat dataset.

base_url#

The base URL of the API.

Type:

str

valid_names#

The valid names of the metrics.

Type:

list of str

alt_names#

The alternative names of some metrics.

Type:

dict of str

valid_regions#

The valid hydro regions.

Type:

list of str

valid_states#

The valid two letter states’ abbreviations.

Type:

pandas.DataFrame

valid_counties#

The valid counties’ FIPS codes.

Type:

pandas.DataFrame

valid_aois#

The valid types of areas of interest.

Type:

list of str

metrics_df#

The metrics’ metadata such as description and units.

Type:

pandas.DataFrame

valid_years#

A dictionary of the valid years for annual metrics.

Type:

dict

pynhd.nhdplus_derived.enhd_attrs(parquet_path=None)#

Get updated NHDPlus attributes from ENHD V2.0.

Notes

This function downloads a 160 MB parquet file from here. Although this dataframe does not include geometry, it can be linked to other geospatial NHDPlus dataframes through ComIDs.

Parameters:

parquet_path (str or pathlib.Pathlib.Path, optional) – Path to a file with .parquet extension for storing the file, defaults to ./cache/enhd_attrs.parquet.

Returns:

pandas.DataFrame – A dataframe that includes ComID-level attributes for 2.7 million NHDPlus flowlines.

Return type:

pandas.DataFrame

pynhd.nhdplus_derived.epa_nhd_catchments(comids, feature)#

Get NHDPlus catchment-scale data from EPA’s HMS REST API.

Notes

For more information about curve number please refer to the project’s webpage on the EPA’s website.

Parameters:
  • comids (int or list of int) – ComID(s) of NHDPlus catchments.

  • feature (str) – The feature of interest. Available options are:

    • curve_number: 16-day average Curve Number.

    • comid_info: ComID information.

Returns:

dict of pandas.DataFrame or geopandas.GeoDataFrame – A dict of the requested dataframes. A comid_info dataframe is always returned.

Return type:

dict[str, pandas.DataFrame]

Examples

>>> import pynhd
>>> data = pynhd.epa_nhd_catchments(9533477, "curve_number")
>>> data["curve_number"].mean(axis=1).item()
75.576
pynhd.nhdplus_derived.nhd_fcode()#

Get all the NHDPlus FCodes.

pynhd.nhdplus_derived.nhdplus_attrs(attr_name=None)#

Stage the NHDPlus Attributes database and save to nhdplus_attrs.parquet.

Notes

More info can be found here.

Parameters:

attr_names (str , *optional*) – Name of NHDPlus attribute to return, defaults to None, i.e., only return a metadata dataframe that includes the attribute names and their description and units.

Returns:

pandas.DataFrame – The staged data as a DataFrame.

Return type:

pandas.DataFrame

pynhd.nhdplus_derived.nhdplus_attrs_s3(attr_names=None, pyarrow_filter=None, nodata=False)#

Access NHDPlus V2.1 derived attributes over CONUS.

Notes

More info can be found here.

Parameters:
  • attr_names (str or list of str, optional) – Names of NHDPlus attribute(s) to return, defaults to None, i.e., only return a metadata dataframe that includes the attribute names and their description and units.

  • pyarrow_filter (pyarrow.compute.Expression, optional) – A filter expression to apply to the dataset, defaults to None. Please refer to the PyArrow documentation for more information here.

  • nodata (bool) – Whether to include NODATA percentages, default is False.

Returns:

pandas.DataFrame – A dataframe of requested NHDPlus attributes.

Return type:

pandas.DataFrame

pynhd.nhdplus_derived.nhdplus_h12pp(gpkg_path=None)#

Access HUC12 Pour Points for NHDPlus V2.1 L48 (CONUS).

Notes

More info can be found here.

Parameters:

gpkg_path (str or pathlib.Pathlib.Path, optional) – Path to the geopackage file, defaults to None, i.e., download the file to the cache directory as 102020wbd_outlets.gpkg.

Returns:

geopandas.GeoDataFrame – A geodataframe of HUC12 pour points.

Return type:

pandas.DataFrame

pynhd.nhdplus_derived.nhdplus_vaa(parquet_path=None)#

Get NHDPlus Value Added Attributes including roughness.

Notes

This function downloads a 245 MB parquet file from here. Although this dataframe does not include geometry, it can be linked to other geospatial NHDPlus dataframes through ComIDs.

Parameters:

parquet_path (str or pathlib.Pathlib.Path, optional) – Path to a file with .parquet extension for storing the file, defaults to ./cache/nldplus_vaa.parquet.

Returns:

pandas.DataFrame – A dataframe that includes ComID-level attributes for 2.7 million NHDPlus flowlines.

Return type:

pandas.DataFrame

pynhd.nhdplus_derived.streamcat(metric_names=None, metric_areas=None, comids=None, regions=None, states=None, counties=None, conus=False, percent_full=False, area_sqkm=False, lakes_only=False)#

Get various metrics for NHDPlusV2 catchments from EPA’s StreamCat.

Notes

For more information about the service check its webpage at https://www.epa.gov/national-aquatic-resource-surveys/streamcat-dataset.

Parameters:
  • metric_names (str or list of str, optional) – Metric name(s) to retrieve. There are 567 metrics available. to get a full list check out StreamCat.valid_names(). To get a description of each metric, check out StreamCat.metrics_df(). Some metrics require year and/or slope to be specified, which have [Year] and/or [Slope] in their name. For convenience all these variables and their years/slopes are converted to a dict that can be accessed via StreamCat.valid_years() and StreamCat.valid_slopes(). Defaults to None, which will return a dataframe of the metrics metadata.

  • metric_areas (str or list of str, optional) – Areas to return the metrics for, defaults to None, i.e. all areas. Valid options are: cat for catchment, catrp100 for 100-m riparian catchment, ws for watershed, wsrp100 for 100-m riparian watershed,

  • comids (int or list of int, optional) – NHDPlus COMID(s), defaults to None. Either comids, regions, states, counties, or conus must be passed. They are mutually exclusive.

  • regions (str or list of str, optional) – Hydro region(s) to retrieve metrics for, defaults to None. For a full list of valid regions check out StreamCat.valid_regions() Either comids, regions, states, counties, or conus must be passed. They are mutually exclusive.

  • states (str or list of str, optional) – Two letter state abbreviation(s) to retrieve metrics for, defaults to None. For a full list of valid states check out StreamCat.valid_states() Either comids, regions, states, counties, or conus must be passed. They are mutually exclusive.

  • counties (str or list of str, optional) – County FIPS codes(s) to retrieve metrics for, defaults to None. For a full list of valid county codes check out StreamCat.valid_counties() Either comids, regions, states, counties, or conus must be passed. They are mutually exclusive.

  • conus (bool, optional) – If True, metric_names of all NHDPlus COMIDs are retrieved, defaults False. Either comids, regions, states, counties, or conus must be passed. They are mutually exclusive.

  • percent_full (bool, optional) – If True, return the percent of each area of interest covered by the metric.

  • area_sqkm (bool, optional) – If True, return the area in square kilometers.

  • lakes_only (bool, optional) – If True, only return metrics for lakes and their associated catchments from the LakeCat dataset.

Returns:

pandas.DataFrame – A dataframe with the requested metrics.

Return type:

pandas.DataFrame