This page was generated from geoconnex.ipynb. Interactive online version:
GeoConnex#
[1]:
from pathlib import Path
from pynhd import GeoConnex
GeoConnex is a web service developed by Internet Of Water that provides access to many hydro-linked datasets. PyNHD provides access to all GeoConnex endpoints.
To explore available datasets in GeoConnex (endpoints), we start by instantiating a pynhd.GeoConnex
class.
[2]:
gcx = GeoConnex(dev=False)
print("\n".join(f"{n}: {e.description}" for n, e in gcx.endpoints.items()))
hu02: Two-digit Hydrologic Regions
hu04: Four-digit Hydrologic Subregion
hu06: Six-digit Hydrologic Basins
hu08: Eight-digit Hydrologic Subbasins
hu10: Ten-digit Watersheds
nat_aq: National Aquifers of the United States from USGS National Water Information System National Aquifer code list.
principal_aq: Principal Aquifers of the United States from 2003 USGS data release
sec_hydrg_reg: Secondary Hydrogeologic Regions of the Conterminous United States from 2018 USGS data release
gages: US Reference Stream Gage Monitoring Locations
mainstems: US Reference Mainstem Rivers
states: U.S. States
counties: U.S. Counties
aiannh: Native American Lands
cbsa: U.S. Metropolitan and Micropolitan Statistical Areas
ua10: Urbanized Areas and Urban Clusters (2010 Census)
places: U.S. legally incororated and Census designated places
pws: U.S. Public Water Systems
dams: US Reference Dams
pynhd.GeoConnex
provides three ways of querying GeoConnex: Spatial query, by ID, and using CQL filters. CQL filter is powerful and can be used for all sort of queries, however, using them require some knowledge of CQL filter which you can learn from here. For example, let’s say that we want to find all urban clusters in the US that have water area of larger than 100 km\(^2\). We can use the following CQL filter to do so:
[3]:
gcx.item = "ua10"
awa = gcx.bycql({"gt": [{"property": "awater10"}, 100e6]})
awa.name10
[3]:
0 Boston, MA--NH--RI
1 Cape Coral, FL
2 Chicago, IL--IN
3 Jacksonville, FL
4 Miami, FL
5 Minneapolis--St. Paul, MN--WI
6 New York--Newark, NY--NJ--CT
7 Orlando, FL
8 Palm Bay--Melbourne, FL
9 Philadelphia, PA--NJ--DE--MD
10 Sarasota--Bradenton, FL
11 Seattle, WA
12 Tampa--St. Petersburg, FL
13 Virginia Beach, VA
Name: name10, dtype: object
For demonstrating spatial query and query by ID, let’s say that we are interested in getting all mainstems within HUC 02 (Mid Atlantic Region). We can start by getting HUC 02’s geometry and then carry out a spatial query to get all mainstems within HUC 02. For this purpose, first we select huc02
endpoint (or item) from GeoConnex and then use query
to get the geometry of HUC 02. We can the queryable fields of this item by just printing the instance of GeoConnext
class that we just
created.
[4]:
gcx.item = "hu02"
gcx
[4]:
Item: 'hu02'
Description: Two-digit Hydrologic Regions
Queryable Fields: fid, uri, name, gnis_url, gnis_id, huc2, loaddate
Extent: (-170, 15, -51, 72)
Therefore, we should query huc2
field and pass 02
as the value.
[5]:
h2 = gcx.byid("huc2", "02")
ax = h2.plot(figsize=(6, 6))
ax.set_axis_off()
With this geometry, we can query the mainstems that are within
HUC 02:
[6]:
gcx.item = "mainstems"
ms = gcx.bygeometry(h2.geometry.iloc[0], predicate="within", crs=h2.crs)
ax = h2.plot(figsize=(6, 6), facecolor="none", edgecolor="k")
ms.plot(ax=ax, color="b", lw=0.3)
ax.set_axis_off()
Next, let’s say that we need all dams that are located on the headwater flowline of the mainstems that we just obtained. We set the endpoint to dams
and find its queryable field of that’s related to NHDPlusc2 ComIDs.
[7]:
gcx.item = "dams"
gcx
[7]:
Item: 'dams'
Description: US Reference Dams
Queryable Fields: fid, id, uri, name, description, subjectof, provider, provider_id, nhdpv2_comid, feature_data_source
Extent: (-170, 15, -51, 72)
[8]:
dam = gcx.byid("nhdpv2_comid", ms.head_nhdpv2_comid)
ax = h2.plot(figsize=(6, 6), facecolor="none", edgecolor="k")
ms.plot(ax=ax, color="b", lw=0.3)
dam.plot(ax=ax, color="r", markersize=10, label="Dams")
ax.legend()
ax.set_axis_off()
Let’s find gage stations that located outlets of the mainstems. We use gages
endpoint and find its queryable field that’s related to NHDPlusc2 ComIDs.
[9]:
gcx.item = "gages"
gcx
[9]:
Item: 'gages'
Description: US Reference Stream Gage Monitoring Locations
Queryable Fields: fid, id, uri, name, description, subjectof, provider, provider_id, nhdpv2_reachcode, nhdpv2_reach_measure, nhdpv2_comid, mainstem_uri
Extent: (-170, 15, -51, 72)
We can check out the data type of the nhdpv2_comid
in this dataset using the endpoints
property of the GeoConnex
class.
[10]:
gcx.endpoints["gages"].dtypes
[10]:
{'fid': 'int64',
'id': 'str',
'uri': 'str',
'name': 'str',
'description': 'str',
'subjectof': 'str',
'provider': 'str',
'provider_id': 'str',
'nhdpv2_reachcode': 'str',
'nhdpv2_reach_measure': 'f8',
'nhdpv2_comid': 'f8',
'mainstem_uri': 'str'}
We notice that nhdpv2_comid
is a float and since in the mainstem dataset outlet_nhdpv2_comid
is a URL, we need to extract the ComIDs values. First, we should find the outlet ComIDs of the mainstems that these dams are located on.
[11]:
outlet_ids = ms.loc[ms["head_nhdpv2_comid"].isin(dam["nhdpv2_comid"]), "outlet_nhdpv2_comid"]
comids = outlet_ids.str.rsplit("/", n=1).str[-1].astype(float)
gages = gcx.byid("nhdpv2_comid", comids)
ax = h2.plot(figsize=(6, 6), facecolor="none", edgecolor="k")
ms.plot(ax=ax, color="b", lw=0.3)
dam.plot(ax=ax, color="r", markersize=10, label="Dams")
gages.plot(ax=ax, color="g", markersize=10, marker="^", label="Gages")
ax.legend()
ax.set_axis_off()
We notice that some dams don’t have any downstream gages, so we need to filter those out:
[12]:
gage_comid = "https://geoconnex.us/nhdplusv2/comid/" + gages.nhdpv2_comid.astype("Int64").astype(
str
)
cond = ms["head_nhdpv2_comid"].isin(dam["nhdpv2_comid"]) & ms["outlet_nhdpv2_comid"].isin(
gage_comid
)
pair_ids = ms.loc[cond, ["head_nhdpv2_comid", "outlet_nhdpv2_comid"]]
dam_ids = dam[dam["nhdpv2_comid"].isin(pair_ids["head_nhdpv2_comid"])]
pair_ids["outlet_nhdpv2_comid"] = (
pair_ids["outlet_nhdpv2_comid"].str.rsplit("/", n=1).str[-1].astype(float)
)
gage_ids = gages[gages["nhdpv2_comid"].isin(pair_ids["outlet_nhdpv2_comid"])]
We need to add a common column for dams and gages, so we can pair them later on, if needed.
[13]:
gage_ids = gage_ids.set_index("nhdpv2_comid")
gage_ids["head_nhdpv2_comid"] = pair_ids.set_index("outlet_nhdpv2_comid")["head_nhdpv2_comid"]
gage_ids = gage_ids.reset_index()
[14]:
ax = h2.plot(figsize=(6, 6), facecolor="none", edgecolor="k")
ms.plot(ax=ax, color="b", lw=0.3)
dam_ids.plot(ax=ax, color="r", markersize=10, label="Dams")
gage_ids.plot(ax=ax, color="g", markersize=10, marker="^", label="Gages")
ax.legend()
ax.set_axis_off()
ax.figure.savefig(Path("_static/geoconnex.png"), dpi=100, bbox_inches="tight")