async_retriever.async_retriever
Contents
async_retriever.async_retriever
#
Core async functions.
Module Contents#
- async_retriever.async_retriever.delete_url_cache(url, request_method='GET', cache_name=None, **kwargs)#
Delete cached response associated with
url
, along with its history (if applicable).- Parameters
url (
str
) – URL to be deleted from the cacherequest_method (
str
, optional) – HTTP request method to be deleted from the cache, defaults toGET
.cache_name (
str
, optional) – Path to a file for caching the session, defaults to./cache/aiohttp_cache.sqlite
.kwargs (
dict
, optional) – Keywords to pass to thecache.delete_url()
.
- async_retriever.async_retriever.retrieve(urls, read_method, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5.0, expire_after=- 1, ssl=None, disable=False)#
Send async requests.
- Parameters
read_method (
str
) – Method for returning the request;binary
,json
, andtext
.request_kwds (
list
ofdict
, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults toNone
. For example,[{"params": {...}, "headers": {...}}, ...]
.request_method (
str
, optional) – Request type;GET
(get
) orPOST
(post
). Defaults toGET
.max_workers (
int
, optional) – Maximum number of async processes, defaults to 8.cache_name (
str
, optional) – Path to a file for caching the session, defaults to./cache/aiohttp_cache.sqlite
.timeout (
float
, optional) – Timeout for the request, defaults to 5.0.expire_after (
int
, optional) – Expiration time for response caching in seconds, defaults to -1 (never expire).ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.disable (
bool
, optional) – IfTrue
temporarily disable caching requests and get new responses from the server, defaults to False.
- Returns
list
– List of responses in the order of input URLs.
Examples
>>> import async_retriever as ar >>> stations = ["01646500", "08072300", "11073495"] >>> url = "https://waterservices.usgs.gov/nwis/site" >>> urls, kwds = zip( ... *[ ... (url, {"params": {"format": "rdb", "sites": s, "siteStatus": "all"}}) ... for s in stations ... ] ... ) >>> resp = ar.retrieve(urls, "text", request_kwds=kwds) >>> resp[0].split('\n')[-2].split('\t')[1] '01646500'
- async_retriever.async_retriever.retrieve_binary(urls, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5.0, expire_after=- 1, ssl=None, disable=False)#
Send async requests and get the response as
bytes
.- Parameters
request_kwds (
list
ofdict
, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults toNone
. For example,[{"params": {...}, "headers": {...}}, ...]
.request_method (
str
, optional) – Request type;GET
(get
) orPOST
(post
). Defaults toGET
.max_workers (
int
, optional) – Maximum number of async processes, defaults to 8.cache_name (
str
, optional) – Path to a file for caching the session, defaults to./cache/aiohttp_cache.sqlite
.timeout (
float
, optional) – Timeout for the request, defaults to 5.0.expire_after (
int
, optional) – Expiration time for response caching in seconds, defaults to -1 (never expire).ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.disable (
bool
, optional) – IfTrue
temporarily disable caching requests and get new responses from the server, defaults to False.
- Returns
bytes
– List of responses in the order of input URLs.
- async_retriever.async_retriever.retrieve_json(urls, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5.0, expire_after=- 1, ssl=None, disable=False)#
Send async requests and get the response as
json
.- Parameters
request_kwds (
list
ofdict
, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults toNone
. For example,[{"params": {...}, "headers": {...}}, ...]
.request_method (
str
, optional) – Request type;GET
(get
) orPOST
(post
). Defaults toGET
.max_workers (
int
, optional) – Maximum number of async processes, defaults to 8.cache_name (
str
, optional) – Path to a file for caching the session, defaults to./cache/aiohttp_cache.sqlite
.timeout (
float
, optional) – Timeout for the request, defaults to 5.0.expire_after (
int
, optional) – Expiration time for response caching in seconds, defaults to -1 (never expire).ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.disable (
bool
, optional) – IfTrue
temporarily disable caching requests and get new responses from the server, defaults to False.
- Returns
dict
– List of responses in the order of input URLs.
Examples
>>> import async_retriever as ar >>> urls = ["https://labs.waterdata.usgs.gov/api/nldi/linked-data/comid/position"] >>> kwds = [ ... { ... "params": { ... "f": "json", ... "coords": "POINT(-68.325 45.0369)", ... }, ... }, ... ] >>> r = ar.retrieve_json(urls, kwds) >>> print(r[0]["features"][0]["properties"]["identifier"]) 2675320
- async_retriever.async_retriever.retrieve_text(urls, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5.0, expire_after=- 1, ssl=None, disable=False)#
Send async requests and get the response as
text
.- Parameters
request_kwds (
list
ofdict
, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults toNone
. For example,[{"params": {...}, "headers": {...}}, ...]
.request_method (
str
, optional) – Request type;GET
(get
) orPOST
(post
). Defaults toGET
.max_workers (
int
, optional) – Maximum number of async processes, defaults to 8.cache_name (
str
, optional) – Path to a file for caching the session, defaults to./cache/aiohttp_cache.sqlite
.timeout (
float
, optional) – Timeout for the request in seconds, defaults to 5.0.expire_after (
int
, optional) – Expiration time for response caching in seconds, defaults to -1 (never expire).ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.disable (
bool
, optional) – IfTrue
temporarily disable caching requests and get new responses from the server, defaults to False.
- Returns
list
– List of responses in the order of input URLs.
Examples
>>> import async_retriever as ar >>> stations = ["01646500", "08072300", "11073495"] >>> url = "https://waterservices.usgs.gov/nwis/site" >>> urls, kwds = zip( ... *[ ... (url, {"params": {"format": "rdb", "sites": s, "siteStatus": "all"}}) ... for s in stations ... ] ... ) >>> resp = ar.retrieve_text(urls, kwds) >>> resp[0].split('\n')[-2].split('\t')[1] '01646500'
- async_retriever.async_retriever.stream_write(urls, file_paths, request_kwds=None, request_method='GET', max_workers=8, ssl=None, chunk_size=None)#
Send async requests.
- Parameters
file_paths (
list
ofstr
or~Path
) – List of file paths to write the response to.request_kwds (
list
ofdict
, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults toNone
. For example,[{"params": {...}, "headers": {...}}, ...]
.request_method (
str
, optional) – Request type;GET
(get
) orPOST
(post
). Defaults toGET
.max_workers (
int
, optional) – Maximum number of async processes, defaults to 8.ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.chunk_size (
int
, optional) – The size of the chunks in bytes to be written to the file, defaults toNone
, which will iterates over data chunks and write them as received from the server.
Examples
>>> import async_retriever as ar >>> import tempfile >>> url = "https://freetestdata.com/wp-content/uploads/2021/09/Free_Test_Data_500KB_CSV-1.csv" >>> with tempfile.NamedTemporaryFile() as temp: ... ar.stream_write([url], [temp.name])