async_retriever.streaming#

Download multiple files concurrently by streaming their content to disk.

Module Contents#

async_retriever.streaming.generate_filename(url, params=None, data=None, prefix=None, file_extension='')#

Generate a unique filename using SHA-256 from a query.

Parameters:
  • url (str) – The URL for the request.

  • params (dict, multidict.MultiDict, optional) – Query parameters for the request, default is None.

  • data (dict, str, optional) – Data or JSON to include in the hash, default is None.

  • prefix (str, optional) – A custom prefix to attach to the filename, default is None.

  • file_extension (str, optional) – The file extension to append to the filename, default is "".

Returns:

str – A unique filename with the SHA-256 hash, optional prefix, and the file extension.

Return type:

str

async_retriever.streaming.stream_write(urls, file_paths, chunk_size=CHUNK_SIZE, limit_per_host=MAX_HOSTS, timeout=600, raise_status=True)#

Download multiple files concurrently by streaming their content to disk.

Parameters:
  • urls (list of str) – URLs to download.

  • file_paths (list of pathlib.Path) – Paths to save the downloaded files.

  • chunk_size (int, optional) – Size of the chunks to download, by default 1 MB.

  • limit_per_host (int, optional) – Maximum number of concurrent connections per host, by default 4.

  • timeout (int, optional) – Request timeout in seconds, by default 10 minutes.

  • raise_status (bool, optional) – Raise an exception if a request fails, by default True. Otherwise, the exception is logged and the function continues.