Rainfall Runoff datasets

This section include datasets which can be used for rainfall runoff modeling. They all contain observed streamflow and meteological data as time series. These are named as dynamic features. The physical catchment properties are included as static features. Although each data source has a dedicated class, however aqua_fetch.rr.RainfallRunoff class can be used to access all the datasets.

List of datasets

Stations per Source

Source Name

Class

Number of Daily Stations

Number of Hourly Stations

Reference

Arcticnet

aqua_fetch.rr.Arcticnet

106

R-Arcticnet

Bull

aqua_fetch.Bull

484

Aparicio et al., 2024

CABra

aqua_fetch.rr.CABra

735

Almagro et al., 2021

CAMELS_AUS

aqua_fetch.rr.CAMELS_AUS

222, 561

Flower et al., 2021

CAMELS_GB

aqua_fetch.rr.CAMELS_GB

671

Coxon et al., 2020

CAMELS_BR

aqua_fetch.rr.CAMELS_BR

897

Chagas et al., 2020

CAMELS_CH

aqua_fetch.rr.CAMELS_CH

331

Hoege et al., 2023

CAMELS_CL

aqua_fetch.rr.CAMELS_CL

516

Alvarez-Garreton et al., 2018

CAMELS_DK

aqua_fetch.rr.CAMELS_DK

304

Liu et al., 2024

CAMELS_DE

aqua_fetch.rr.CAMELS_DE

1555

Loritz et al., 2024

CAMELS_FR

aqua_fetch.rr.CAMELS_FR

654

Delaigue et al., 2024

CAMELS_IND

aqua_fetch.rr.CAMELS_IND

472

Mangukiya et al., 2024

CAMELS_SE

aqua_fetch.rr.CAMELS_SE

50

Teutschbein et al., 2024

CAMELS_US

aqua_fetch.rr.CAMELS_US

671

Newman et al., 2014

Caravan_DK

aqua_fetch.rr.Caravan_DK

304

Koch 2022

CCAM

aqua_fetch.rr.CCAM

111

Hao et al., 2021

Finland

aqua_fetch.rr.Finland

669

ymparisto.fi

GRDCCaravan

aqua_fetch.rr.GRDCCaravan

5357

Faerber et al., 2023

HYPE

aqua_fetch.rr.HYPE

561

Arciniega-Esparza and Birkel, 2020

HYSETS

aqua_fetch.rr.HYSETS

14425

Arsenault et al., 2020

Ireland

aqua_fetch.rr.Ireland

464

EPA Ireland

Italy

aqua_fetch.rr.Italy

294

EPA Ireland

Japan

aqua_fetch.rr.Japan

751

river.go.jp

LamaHCE

aqua_fetch.rr.LamaHCE

859

859

Klingler et al., 2021

LamaHIce

aqua_fetch.rr.LamaHIce

111

Helgason and Nijssen 2024

Poland

aqua_fetch.rr.Poland

1287

imgw.pl

Portugal

aqua_fetch.rr.Portugal

280

snirh

RRLuleaSweden

aqua_fetch.RRLuleaSweden

1

Broekhuizen et al., 2020

Spain

aqua_fetch.rr.Spain

889

ceh-flumen64

Simbi

aqua_fetch.rr.Simbi

24

Bathelemy et al., 2024

Thailand

aqua_fetch.rr.Thailand

73

RID project

USGS

aqua_fetch.rr.USGS

12004

USGS nwis

WaterBenchIowa

aqua_fetch.rr.WaterBenchIowa

125

Demir et al., 2022

High Level API

The high level API is provided by aqua_fetch.rr.RainfallRunoff class to provide a unified and easy-to-use interface to access all the datasets. The datasets are accessed by their names.

class aqua_fetch.rr.RainfallRunoff(dataset: str, path: str | PathLike = None, overwrite: bool = False, to_netcdf: bool = True, processes: int = None, remove_zip: bool = True, verbosity: int = 1, **kwargs)[source]

Bases: object

This class provides access to all the rainfall-runoff datasets. For simiplity and resusability, use this class instead of using the individual dataset classes.

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')  # instead of CAMELS_AUS, you can provide any other dataset name
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.columns = df.columns.get_level_values('dynamic_features')
>>> df.shape
   (21184, 26)
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   222
... # get data of 10 % of stations as dataframe
>>> df = dataset.fetch(0.1, as_dataframe=True)
>>> df.shape
   (550784, 22)
... # The returned dataframe is a multi-indexed data
>>> df.index.names == ['time', 'dynamic_features']
    True
... # get data by station id
>>> df = dataset.fetch(stations='224214A', as_dataframe=True).unstack()
>>> df.shape
    (21184, 26)
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> data = dataset.fetch(1, as_dataframe=True,
...  dynamic_features=['tmax_AWAP', 'precipitation_AWAP', 'et_morton_actual_SILO', 'streamflow_MLd']).unstack()
>>> data.shape
   (21184, 4)
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape  # remember this is a multiindexed dataframe
   (21184, 260)
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='224214A', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
((1, 166), (550784, 1))
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (472, 2)
>>> dataset.stn_coords('3001')  # returns coordinates of station whose id is 3001
    18.3861 80.3917
>>> dataset.stn_coords(['3001', '17021'])  # returns coordinates of two stations

See sphx_glr_auto_examples_camels_australia.py for more comprehensive usage example.

__init__(dataset: str, path: str | PathLike = None, overwrite: bool = False, to_netcdf: bool = True, processes: int = None, remove_zip: bool = True, verbosity: int = 1, **kwargs)[source]

Rainfall Runoff datasets

Parameters:
  • dataset (str) –

    dataset name. This must be one of the following:

    • Arcticnet

    • Bull

    • CABra

    • CCAM

    • CAMELS_AUS

    • CAMELS_BR

    • CAMELS_CH

    • CAMELS_CL

    • CAMELS_DE

    • CAMELS_DK0

    • CAMELS_DK

    • CAMELS_FR

    • CAMELS_GB

    • CAMELS_IND

    • CAMELS_SE

    • CAMELS_US

    • EStreams

    • Finland

    • GRDCCaravan

    • GSHA

    • HYSETS

    • HYPE

    • Ireland

    • Italy

    • Japan

    • LamaHCE

    • LamaHIce

    • Poland

    • Portugal

    • RRLuleaSweden

    • Simbi

    • Spain

    • Thailand

    • USGS

    • WaterBenchIowa

  • path (str) – path to directory inside which data is located/downloaded. If provided and the path/dataset exists, then the data will be read from this path. If provided and the path/dataset does not exist, then the data will be downloaded at this path. If not provided, then the data will be downloaded in the default path which is .../water-datasts/data/.

  • overwrite (bool) – If the data is already downloaded then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarray.

  • verbosity (int) – 0: no message will be printed

  • kwargs – additional keyword arguments for the underlying dataset class For example version for water_quality.rr.CAMELS_AUS or timestep for water_quality.rr.LamaHCE dataset or met_src for CAMELS_BR

area(stations: str | List[str] = 'all') Series[source]

Returns area (Km2) of all/selected catchments as pandas series

Parameters:

stations (str/list (default=``all``)) – name/names of stations. Default is all, which will return area of all stations

Returns:

a pandas series whose indices are catchment ids and values are areas of corresponding catchments.

Return type:

pd.Series

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_CH')
>>> dataset.area()  # returns area of all stations
>>> dataset.area('2004')  # returns area of station whose id is 2004
>>> dataset.area(['2004', '6004'])  # returns area of two stations
property dynamic_features: List[str]

returns names of dynamic features as python list of strings

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.dynamic_features
property end: str

returns end date of data

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.end()
fetch(stations: str | List[str] | int | float = 'all', dynamic_features: List[str] | str | None = 'all', static_features: str | List[str] | None = None, st: None | str = None, en: None | str = None, as_dataframe: bool = False, **kwargs) dict | DataFrame[source]

Fetches the features of one or more stations.

Parameters:
  • stations

    It can have following values:

    • int : number of (randomly selected) stations to fetch

    • float : fraction of (randomly selected) stations to fetch

    • str : name/id of station to fetch. However, if all is provided, then all stations will be fetched.

    • list : list of names/ids of stations to fetch

  • dynamic_features ((default=``all``)) –

    It can have following values:

    • str : name of dynamic feature to fetch. If all is provided, then all dynamic features will be fetched.

    • list : list of dynamic features to fetch.

    • None : No dynamic feature will be fetched.

  • static_features ((default=None)) –

    It can have following values:

    • str : name of static feature to fetch. If all is provided, then all static features will be fetched.

    • list : list of static features to fetch.

    • None : No static feature will be fetched.

  • st – starting date of data to be returned. If None, the data will be returned from where it is available.

  • en – end date of data to be returned. If None, then the data will be returned till the date data is available.

  • as_dataframe – whether to return dynamic attributes as pandas dataframe or as xarray dataset.

  • kwargs – keyword arguments

Returns:

  • If both static and dynamic features are obtained then it returns a

  • dictionary whose keys are station/gauge_ids and values are the

  • attributes and dataframes.

  • Otherwise either dynamic or static features are returned.

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> # get data of 10% of stations
>>> df = dataset.fetch(stations=0.1, as_dataframe=True)  # returns a multiindex dataframe
...  # fetch data of 5 (randomly selected) stations
>>> five_random_stn_data = dataset.fetch(stations=5, as_dataframe=True)
... # fetch data of 3 selected stations
>>> three_selec_stn_data = dataset.fetch(stations=['912101A','912105A','915011A'], as_dataframe=True)
... # fetch data of a single stations
>>> single_stn_data = dataset.fetch(stations='318076', as_dataframe=True)
... # get both static and dynamic features as dictionary
>>> data = dataset.fetch(1, static_features="all", as_dataframe=True)  # -> dict
>>> data['dynamic']
... # get only selected dynamic features
>>> sel_dyn_features = dataset.fetch(stations='318076',
...     dynamic_features=['streamflow_MLd', 'solarrad_AWAP'], as_dataframe=True)
... # fetch data between selected periods
>>> data = dataset.fetch(stations='318076', st="20010101", en="20101231", as_dataframe=True)
fetch_dynamic_features(stn_id: str, dynamic_features='all', st=None, en=None, as_dataframe=False)[source]

Fetches all or selected dynamic attributes of one station.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

  • st (Optional (default=None)) – start time from where to fetch the data.

  • en (Optional (default=None)) – end time untill where to fetch the data

  • as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is xarray dataset

Examples

>>> from water_datasets import RainfallRunoff
>>> camels = RainfallRunoff('CAMELS_AUS')
>>> camels.fetch_dynamic_features('224214A', as_dataframe=True).unstack()
>>> camels.dynamic_features
>>> camels.fetch_dynamic_features('224214A',
... features=['tmax_AWAP', 'vprp_AWAP', 'streamflow_mmd'],
... as_dataframe=True).unstack()
fetch_static_features(stations: str | list = 'all', static_features: str | list = 'all') DataFrame[source]

Fetches all or selected static attributes of one or more stations.

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe

Return type:

pd.DataFrame

Examples

>>> from water_datasets import RainfallRunoff
>>> camels = RainfallRunoff('CAMELS_AUS')
>>> camels.fetch_static_features('224214A')
>>> camels.static_features
>>> camels.fetch_static_features('224214A',
... features=['elev_mean', 'relief', 'ksat', 'pop_mean'])
fetch_station_features(stn_id: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, as_ts: bool = False, st: str | None = None, en: str | None = None, **kwargs) DataFrame[source]

Fetches features for one station.

Parameters:
  • station – station id/gauge id for which the data is to be fetched.

  • dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch

  • static_features – names of static features/attributes to be fetches

  • as_ts (bool) – whether static features are to be converted into a time series or not. If yes then the returned time series will be of same length as that of dynamic attribtues.

  • st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.

  • en (str, optional) – end point of data to be fetched. By default the dat will be fetched

Returns:

dataframe if as_ts is True else it returns a dictionary of static and dynamic features for a station/gauge_id

Return type:

pd.DataFrame

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.fetch_station_features('912101A')
fetch_stations_features(stations: str | List[str], dynamic_features: str | List[str] | None = 'all', static_features: str | List[str] | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

Reads attributes of more than one stations.

Parameters:
  • stations – list of stations for which data is to be fetched.

  • dynamic_features

    list of dynamic features to be fetched.

    if ‘all’, then all dynamic features will be fetched.

  • static_features – list of static features to be fetched. If all, then all static features will be fetched. If None, then no static attribute will be fetched.

  • st – start of data to be fetched.

  • en – end of data to be fetched.

  • as_dataframe (whether to return the data as pandas dataframe. default) – is xr.Dataset object

  • dict (kwargs) – additional keyword arguments

Returns:

Dynamic and static features of one or multiple stations. Dynamic features are by default returned as xr.Dataset unless as_dataframe is True or xarray is not installed, in such a case, it is a pandas dataframe with multiindex. If xr.Dataset, it consists of data_vars equal to number of stations and for each station, the DataArray is of dimensions (time, dynamic_features). where time is defined by st and en i.e. length of DataArray. In case, when the returned object is pandas DataFrame, the first index is time and second index is dyanamic_features. Static attributes are always returned as pandas DataFrame and have following shape (stations, static_features). If `dynamic_features is None, then they are not returned and the returned value only consists of static features. Same holds true for static_features. If both are not None, then the returned type is a dictionary with static and dynamic keys.

Return type:

pd.DataFrame or xr.Dataset or dict

Raises:

ValueError, if both dynamic_features and static_features are None

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
... # find out station ids
>>> dataset.stations()
... # get data of selected stations
>>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'],
...  as_dataframe=True)
get_boundary(stn_id: str, as_type: str = 'numpy')[source]

returns boundary of a catchment in a required format

Parameters:
  • stn_id (str) – name/id of catchment

  • as_type (str) – ‘numpy’ or ‘geopandas’

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_SE')
>>> dataset.get_boundary(dataset.stations()[0])
property name: str

returns name of dataset

num_dynamic() int[source]

number of dynamic features associated with the dataset

num_static() int[source]

number of static features associated with the dataset

property path: str

returns path where the data is stored. The default path is ~../water_quality/data

plot_catchment(stn_id: str, ax: Axes = None, show: bool = True, **kwargs) Axes[source]

plots catchment boundaries

Parameters:
  • ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.

  • show (bool)

  • **kwargs

Return type:

plt.Axes

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.plot_catchment()
>>> dataset.plot_catchment(marker='o', ms=0.3)
>>> ax = dataset.plot_catchment(marker='o', ms=0.3, show=False)
>>> ax.set_title("Catchment Boundaries")
>>> plt.show()
plot_stations(stations: List[str] = 'all', marker='.', ax: Axes = None, show: bool = True, **kwargs) Axes[source]

plots coordinates of stations

Parameters:
  • stations – name/names of stations. If not given, all stations will be plotted

  • marker – marker to use.

  • ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.

  • show (bool)

  • **kwargs

Return type:

plt.Axes

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.plot_stations()
>>> dataset.plot_stations(['1', '2', '3'])
>>> dataset.plot_stations(marker='o', ms=0.3)
>>> ax = dataset.plot_stations(marker='o', ms=0.3, show=False)
>>> ax.set_title("Stations")
>>> plt.show()
q_mmd(stations: str | List[str] = 'all') DataFrame[source]

returns streamflow in the units of milimeter per day. This is obtained by diving q/area

Parameters:

stations (str/list) – name/names of stations. Default is all, which will return area of all stations

Returns:

a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

property start: str

returns starting date of data

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.start()
property static_features: List[str]

returns names of static features as python list of strings

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.static_features
stations() List[str][source]

returns names of all stations

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.stations()
stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

coords

Examples

>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_CH')
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('2004')  # returns coordinates of station whose id is 2004
>>> dataset.stn_coords(['2004', '6004'])  # returns coordinates of two stations
>>> from water_datasets import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('912101A')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['G0050115', '912101A'])  # returns coordinates of two stations

Low Level API

The low level API provides access to each individual dataset classes. This provides more control over the datasets.

class aqua_fetch.rr.Camels(path: str = None, timestep: str = 'D', id_idx_in_bndry_shape: int = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: Datasets

This is the parent class for invidual rainfall-runoff datasets like CAMELS-GB etc. This class is not meant to be for direct use. It is inherited by the child classes which are specific to a dataset like CAMELS-GB, CAMELS-AUS etc. This class first downloads the CAMELS dataset if it is not already downloaded. Then the selected features for a selected id are fetched and provided to the user using the method fetch.

- path str/path
Type:

diretory of the dataset

- dynamic_features list

this dataset

Type:

tells which dynamic features are available in

- static_features list
Type:

a list of static features.

- static_attribute_categories list

are present in this category.

Type:

tells which kinds of static features

- stations : returns name/id of stations for which the data (dynamic features)

exists as list of strings.

- fetch : fetches all features (both static and dynamic type) of all

station/gauge_ids or a speficified station. It can also be used to fetch all features of a number of stations ids either by providing their guage_id or by just saying that we need data of 20 stations which will then be chosen randomly.

- fetch_dynamic_features :

fetches speficied dynamic features of one specified station. If the dynamic attribute is not specified, all dynamic features will be fetched for the specified station. If station is not specified, the specified dynamic features will be fetched for all stations.

- fetch_static_features :

works same as fetch_dynamic_features but for static features. Here if the category is not specified then static features of the specified station for all categories are returned.

stations : returns list of stations

__init__(path: str = None, timestep: str = 'D', id_idx_in_bndry_shape: int = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

area(stations: str | List[str] = 'all') Series[source]

Returns area (Km2) of all/selected catchments as pandas series

Parameters:

stations (str/list (default=None)) – name/names of stations. Default is all, which will return area of all stations

Returns:

a pandas series whose indices are catchment ids and values are areas of corresponding catchments.

Return type:

pd.Series

Examples

>>> from water_datasets import CAMELS_CH
>>> dataset = CAMELS_CH()
>>> dataset.area()  # returns area of all stations
>>> dataset.area('2004')  # returns area of station whose id is 2004
>>> dataset.area(['2004', '6004'])  # returns area of two stations
property camels_dir

Directory where all camels datasets will be saved. This will under datasets directory

fetch(stations: str | list | int | float = 'all', dynamic_features: List[str] | str | None = 'all', static_features: str | List[str] | None = None, st: None | str = None, en: None | str = None, as_dataframe: bool = False, **kwargs) dict | DataFrame[source]

Fetches the features of one or more stations.

Parameters:
  • stations

    It can have following values:
    • int : number of (randomly selected) stations to fetch

    • float : fraction of (randomly selected) stations to fetch

    • strname/id of station to fetch. However, if all is

      provided, then all stations will be fetched.

    • list : list of names/ids of stations to fetch

  • dynamic_features – If not None, then it is the features to be fetched. If None, then all available features are fetched

  • static_features – list of static features to be fetches. None means no static attribute will be fetched.

  • st – starting date of data to be returned. If None, the data will be returned from where it is available.

  • en – end date of data to be returned. If None, then the data will be returned till the date data is available.

  • as_dataframe – whether to return dynamic features as pandas dataframe or as xarray dataset.

  • kwargs – keyword arguments to read the files

Returns:

If both static and dynamic features are obtained then it returns a dictionary whose keys are station/gauge_ids and values are the features and dataframes. Otherwise either dynamic or static features are returned.

Examples

>>> from water_datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> # get data of 10% of stations
>>> df = dataset.fetch(stations=0.1, as_dataframe=True)  # returns a multiindex dataframe
...  # fetch data of 5 (randomly selected) stations
>>> five_random_stn_data = dataset.fetch(stations=5, as_dataframe=True)
... # fetch data of 3 selected stations
>>> three_selec_stn_data = dataset.fetch(stations=['912101A','912105A','915011A'], as_dataframe=True)
... # fetch data of a single stations
>>> single_stn_data = dataset.fetch(stations='318076', as_dataframe=True)
... # get both static and dynamic features as dictionary
>>> data = dataset.fetch(1, static_features="all", as_dataframe=True)  # -> dict
>>> data['dynamic']
... # get only selected dynamic features
>>> sel_dyn_features = dataset.fetch(stations='318076',
...     dynamic_features=['streamflow_MLd', 'solarrad_AWAP'], as_dataframe=True)
... # fetch data between selected periods
>>> data = dataset.fetch(stations='318076', st="20010101", en="20101231", as_dataframe=True)
fetch_dynamic_features(stn_id: str, dynamic_features='all', st=None, en=None, as_dataframe=False)[source]

Fetches all or selected dynamic features of one station.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

  • st (Optional (default=None)) – start time from where to fetch the data.

  • en (Optional (default=None)) – end time untill where to fetch the data

  • as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is xarray dataset

Examples

>>> from water_datasets import CAMELS_AUS
>>> camels = CAMELS_AUS()
>>> camels.fetch_dynamic_features('224214A', as_dataframe=True).unstack()
>>> camels.dynamic_features
>>> camels.fetch_dynamic_features('224214A',
... features=['tmax_AWAP', 'vprp_AWAP', 'streamflow_mmd'],
... as_dataframe=True).unstack()
fetch_static_features(stn_id: str | list = None, static_features: str | list = None) DataFrame[source]

Fetches all or selected static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe

Return type:

pd.DataFrame

Examples

>>> from water_datasets import CAMELS_AUS
>>> camels = CAMELS_AUS()
>>> camels.fetch_static_features('224214A')
>>> camels.static_features
>>> camels.fetch_static_features('224214A',
... static_features=['elev_mean', 'relief', 'ksat', 'pop_mean'])
fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, as_ts: bool = False, st: str | None = None, en: str | None = None, **kwargs) DataFrame[source]

Fetches features for one station.

Parameters:
  • station – station id/gauge id for which the data is to be fetched.

  • dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch

  • static_features – names of static features/attributes to be fetches

  • as_ts (bool) – whether static features are to be converted into a time series or not. If yes then the returned time series will be of same length as that of dynamic attribtues.

  • st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.

  • en (str, optional) – end point of data to be fetched. By default the dat will be fetched

Returns:

dataframe if as_ts is True else it returns a dictionary of static and dynamic features for a station/gauge_id

Return type:

pd.DataFrame

Examples

>>> from water_datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> dataset.fetch_station_features('912101A')
fetch_stations_features(stations: str | List[str], dynamic_features: str | List[str] = 'all', static_features: str | List[str] = None, st: str | Timestamp = None, en: str | Timestamp = None, as_dataframe: bool = False, **kwargs)[source]

Reads features of more than one stations.

Parameters:
  • stations – list of stations for which data is to be fetched.

  • dynamic_features – list of dynamic features to be fetched. if all, then all dynamic features will be fetched.

  • static_features (list of static features to be fetched.) – If all, then all static features will be fetched. If None, `then no static attribute will be fetched.

  • st – start of data to be fetched.

  • en – end of data to be fetched.

  • as_dataframe – whether to return the dynamic data as pandas dataframe. default is xr.Dataset object

  • dict (kwargs) – additional keyword arguments

Returns:

  • Dynamic and static features of multiple stations. Dynamic features

  • are by default returned as xr.Dataset unless as_dataframe is True, in

  • such a case, it is a pandas dataframe with multiindex. If xr.Dataset,

  • it consists of data_vars equal to number of stations and for each

  • station, the DataArray is of dimensions (time, dynamic_features).

  • where time is defined by st and en i.e. length of DataArray.

  • In case, when the returned object is pandas DataFrame, the first index

  • is time and second index is dyanamic_features. Static features

  • are always returned as pandas DataFrame and have shape

  • (stations, static_features). If dynamic_features is None,

  • then they are not returned and the returned value only consists of

  • static features. Same holds true for static_features.

  • If both are not None, then the returned type is a dictionary with

  • static and dynamic keys.

Raises:

ValueError, if both dynamic_features and static_features are None

Examples

>>> from water_datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
... # find out station ids
>>> dataset.stations()
... # get data of selected stations as xarray Dataset
>>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'])
... # get data of selected stations as pandas DataFrame
>>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'],
...  as_dataframe=True)
... # get both dynamic and static features of selected stations
>>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'],
... dynamic_features=['streamflow_mmd', 'tmax_AWAP'], static_features=['elev_mean'])
get_boundary(catchment_id: str, as_type: str = 'numpy')[source]

returns boundary of a catchment in a required format

Parameters:
  • catchment_id (str) – name/id of catchment

  • as_type (str) – ‘numpy’ or ‘geopandas’

Examples

>>> from water_datasets import CAMELS_SE
>>> dataset = CAMELS_SE()
>>> dataset.get_boundary(dataset.stations()[0])
plot_catchment(catchment_id: str, ax: Axes = None, show: bool = True, **kwargs) Axes[source]

plots catchment boundaries

Parameters:
  • ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.

  • show (bool)

  • **kwargs

Return type:

plt.Axes

Examples

>>> from water_datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> dataset.plot_catchment()
>>> dataset.plot_catchment(marker='o', ms=0.3)
>>> ax = dataset.plot_catchment(marker='o', ms=0.3, show=False)
>>> ax.set_title("Catchment Boundaries")
>>> plt.show()
plot_stations(stations: List[str] = 'all', marker='.', ax: Axes = None, show: bool = True, **kwargs) Axes[source]

plots coordinates of stations

Parameters:
  • stations – name/names of stations. If not given, all stations will be plotted

  • marker – marker to use.

  • ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.

  • show (bool)

  • **kwargs

Return type:

plt.Axes

Examples

>>> from water_datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> dataset.plot_stations()
>>> dataset.plot_stations(['1', '2', '3'])
>>> dataset.plot_stations(marker='o', ms=0.3)
>>> ax = dataset.plot_stations(marker='o', ms=0.3, show=False)
>>> ax.set_title("Stations")
>>> plt.show()
q_mmd(stations: str | List[str] = 'all') DataFrame[source]

returns streamflow in the units of milimeter per day. This is obtained by diving q/area

Parameters:

stations (str/list) – name/names of stations. Default is all, which will return area of all stations

Returns:

a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

coords

Examples

>>> from water_datasets import CAMELS_CH
>>> dataset = CAMELS_CH()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('2004')  # returns coordinates of station whose id is 2004
>>> dataset.stn_coords(['2004', '6004'])  # returns coordinates of two stations
>>> from water_datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('912101A')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['G0050115', '912101A'])  # returns coordinates of two stations
transform_coords(xyz: ndarray) ndarray[source]

transforms coordinates from projected to geographic

must be implemented in base classes

transform_stn_coords(df: DataFrame) DataFrame[source]

transforms coordinates from geographic to projected

must be implemented in base classes

class aqua_fetch.rr._gsha._GSHA(gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: Camels

Parent class for those datasets which uses static and dynamic features from GSHA dataset . The following dataset classes are based on this class:

  • py:class:water_datasets.Japan

  • py:class:water_datasets.Thailand

  • py:class:water_datasets.Spain

__init__(gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', st=None, en=None, as_ts=False) DataFrame[source]

returns static atttributes of one or multiple stations

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

  • st

  • en

  • as_ts

Examples

>>> from water_datasets import Japan
>>> dataset = Japan()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    12004
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (12004, 27)
get static data of one station only
>>> static_data = dataset.fetch_static_features('01010070')
>>> static_data.shape
   (1, 27)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['Drainage_Area_km2', 'Elevation_m'])
>>> static_data.shape
   (12004, 2)
fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

returns features of multiple stations

Examples

>>> from water_datasets import Arcticnet
>>> dataset = Arcticnet()
>>> stations = dataset.stations()
>>> features = dataset.fetch_stations_features(stations)
get_boundary(catchment_id: str, as_type: str = 'numpy')[source]

returns boundary of a catchment in a required format

Parameters:
  • catchment_id (str) – name/id of catchment

  • as_type (str) – ‘numpy’ or ‘geopandas’

Examples

>>> from water_datasets import Japan
>>> dataset = Japan()
>>> dataset.get_boundary(dataset.stations()[0])
class aqua_fetch.Arcticnet(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _GSHA

Data of 106 catchments of arctic region from r-arcticnet project . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2003-12-31.

__init__(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

class aqua_fetch.Bull(path, overwrite=False, **kwargs)[source]

Bases: Camels

Following the works of Aparicio et al., 2024. The data is taken from the Zenodo repository. This dataset contains 484 stations with 55 dynamic (time series) features and 214 static features. The dynamic features span from 1951 to 2021.

Examples

>>> from water_datasets import Bull
>>> dataset = Bull()
>>> data = dataset.fetch(0.1, as_dataframe=True)
>>> data.shape
(1426260, 48)  # 40 represents number of stations
Since data is a multi-index dataframe, we can get data of one station as below
>>> data['BULL_9007'].unstack().shape  # the name of station could be different
(25932, 13)
If we don't set as_dataframe=True, then the returned data will be a xarray Dataset
>>> data = dataset.fetch(0.1)
>>> type(data)
    xarray.core.dataset.Dataset
>>> data.dims
FrozenMappingWarningOnValuesAccess({'time': 25932, 'dynamic_features': 55})
>>> len(data.data_vars)
    48
>>> df = dataset.fetch(stations=1, as_dataframe=True)  # get data of only one random station
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
(25932, 55)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
484
# get data by station id
>>> df = dataset.fetch(stations='BULL_9007', as_dataframe=True).unstack()
>>> df.shape
(25932, 55)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True,
... dynamic_features=['potential_evapotranspiration_AEMET',  'temperature_mean_AEMET',
... 'total_precipitation_ERA5_Land', 'obs_q_cms']).unstack()
>>> df.shape
(25932, 4)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(166166, 10)  # remember this is multi-indexed DataFrame
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='BULL_9007', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
((1, 214), (1426260, 1))
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (484, 2)
>>> dataset.stn_coords('BULL_9007')  # returns coordinates of station whose id is GRDC_3664802
    41.298  -1.967
>>> dataset.stn_coords(['BULL_9007', 'BULL_8083'])  # returns coordinates of two stations
__init__(path, overwrite=False, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

caravan_attributes() DataFrame[source]

a dataframe of shape (484, 10)

fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import Bull
>>> dataset = Bull()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    484
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (484, 214)
get static data of one station only
>>> static_data = dataset.fetch_static_features('42600042')
>>> static_data.shape
   (1, 214)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['seasonality', 'moisture_index'])
>>> static_data.shape
   (484, 2)
>>> data = dataset.fetch_static_features('42600042', static_features=['seasonality', 'moisture_index'])
>>> data.shape
   (1, 2)
hydroatlas_attributes() DataFrame[source]

a dataframe of shape (484, 197)

other_attributes() DataFrame[source]

a dataframe of shape (484, 7)

class aqua_fetch.rr.CABra(path=None, overwrite=False, to_netcdf: bool = True, met_src: str = 'ens', **kwargs)[source]

Bases: Camels

Reads and fetches CABra dataset which is catchment attribute dataset following the work of Almagro et al., 2021 This dataset consists of 97 static and 12 dynamic features of 735 Brazilian catchments. The temporal extent is from 1980 to 2020. The dyanmic features consist of daily hydro-meteorological time series

Examples

>>> from water_datasets import CABra
>>> dataset = CABra()
>>> data = dataset.fetch(0.1, as_dataframe=True)
>>> data.shape
(131472, 73)  # 73 represents number of stations
>>> data.index.names == ['time', 'dynamic_features']
True
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
(10956, 12)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
735
# get data by station id
>>> df = dataset.fetch(stations='92', as_dataframe=True).unstack()
>>> df.shape
(10956, 12)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True,
... dynamic_features=['p_ens', 'tmax_ens', 'pet_pm', 'rh_ens', 'Streamflow']).unstack()
>>> df.shape
(10956, 5)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(131472, 10)  # remember this is multi-indexed DataFrame
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='92', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
((1, 97), (131472, 1))
__init__(path=None, overwrite=False, to_netcdf: bool = True, met_src: str = 'ens', **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.

  • met_src (str) – source of meteorological data, must be one of ens, era5 or ref.

add_attrs() DataFrame[source]

Returns additional catchment attributes

climate_attrs() DataFrame[source]

returns climate attributes for all catchments

fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import CABra
>>> dataset = CABra()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    735
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (735, 97)
get static data of one station only
>>> static_data = dataset.fetch_static_features('92')
>>> static_data.shape
   (1, 97)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['gauge_lat', 'area'])
>>> static_data.shape
   (735, 2)
>>> data = dataset.fetch_static_features('92', static_features=['gauge_lat', 'area'])
>>> data.shape
   (1, 2)
general_attrs() DataFrame[source]

returns general attributes for all catchments

geology_attrs() DataFrame[source]

returns geological attributes for all catchments

gw_attrs() DataFrame[source]

returns groundwater attributes for all catchments

hydro_distrub_attrs() DataFrame[source]

returns geological attributes for all catchments

lc_attrs() DataFrame[source]

returns land cover attributes for all catchments

q_attrs() DataFrame[source]

returns streamflow attributes for all catchments

q_mmd(stations: str | List[str] = 'all') DataFrame[source]

returns streamflow in the units of milimeter per day. It is obtained by dividing Streamflow time series by area

Parameters:

stations (str/list) – name/names of stations. Default is all, which will return area of all stations

Returns:

a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

soil_attrs() DataFrame[source]

returns soil attributes for all catchments

property static_features: List[str]

names of static features

stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

coords

Examples

>>> dataset = CABra()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('92')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['92', '142'])  # returns coordinates of two stations
topology_attrs() DataFrame[source]

returns topology attributes for all catchments

class aqua_fetch.rr.CAMELS_AUS(path: str = None, version: int = 2, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: Camels

This is a dataset of 561 Australian catchments with 187 static features and 26 dyanmic features for each catchment. The dyanmic features are timeseries from 1950-01-01 to 2022-03-31. This class Reads CAMELS-AUS dataset of Fowler et al., 2024 .

If version is 1 then this class reads data following Fowler et al., 2021 which is a dataset of 222 Australian catchments with 161 static features and 26 dyanmic features for each catchment. The dyanmic features are timeseries from 1957-01-01 to 2018-12-31.

Examples

>>> from water_datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
   (21184, 26)
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   222
... # get data of 10 % of stations as dataframe
>>> df = dataset.fetch(0.1, as_dataframe=True)
>>> df.shape
   (550784, 22)
... # The returned dataframe is a multi-indexed data
>>> df.index.names == ['time', 'dynamic_features']
    True
... # get data by station id
>>> df = dataset.fetch(stations='224214A', as_dataframe=True).unstack()
>>> df.shape
    (21184, 26)
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> data = dataset.fetch(1, as_dataframe=True,
...  dynamic_features=['tmax_AWAP', 'precipitation_AWAP', 'et_morton_actual_SILO', 'streamflow_MLd']).unstack()
>>> data.shape
   (21184, 4)
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape  # remember this is a multiindexed dataframe
   (21184, 260)
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='224214A', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
>>> ((1, 166), (550784, 1))
__init__(path: str = None, version: int = 2, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path – path where the CAMELS_AUS dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.

  • version – version of the dataset to download. Allowed values are 1 and 2.

  • to_netcdf

fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]

Fetches static features of one or more stations as dataframe.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from water_datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    222
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (222, 161)
get static data of one station only
>>> static_data = dataset.fetch_static_features('305202')
>>> static_data.shape
   (1, 161)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['catchment_di', 'elev_mean'])
>>> static_data.shape
   (222, 2)
q_mmd(stations: str | List[str] = None) DataFrame[source]

returns streamflow in the units of milimeter per day. This is obtained by diving q_cms/area

Parameters:

stations (str/list) – name/names of stations. Default is None, which will return area of all stations

Returns:

a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

class aqua_fetch.rr.CAMELS_BR(path=None, verbosity: int = 1, **kwargs)[source]

Bases: Camels

This is a dataset of 897 Brazilian catchments with 67 static features and 10 dyanmic features for each catchment. The dyanmic features are timeseries from 1920-01-01 to 2019-02-28. This class downloads and processes CAMELS dataset of Brazil as provided by VP Changas et al., 2020 . The simulated streamflow of 593 and raw streamflow of 3679 stations shipped with this data is not included in dynamic features. Both can be fetched through fetch_simulated_streamflow and fetch_raw_streamflow methods.

Examples

>>> from water_datasets import CAMELS_BR
>>> dataset = CAMELS_BR()
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
(14245, 12)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
593
# we can get data of 10% catchments as below
>>> data = dataset.fetch(0.1, as_dataframe=True)
>>> data.shape
(170940, 59)
# the data is multi-index with ``time`` and ``dynamic_features`` as indices
>>> data.index.names == ['time', 'dynamic_features']
 True
# get data by station id
>>> df = dataset.fetch(stations='46035000', as_dataframe=True).unstack()
>>> df.shape
(14245, 12)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True,
... dynamic_features=['precipitation_cpc', 'evapotransp_mgb', 'temperature_mean', 'streamflow_m3s']).unstack()
>>> df.shape
(14245, 4)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(170940, 10)  # remember this is multi-indexed DataFrame
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='46035000', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
((1, 67), (170940, 1))
__init__(path=None, verbosity: int = 1, **kwargs)[source]
Parameters:

path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

all_stations(feature: str) List[str][source]

Tells all station ids for which a data of a specific attribute is available.

area(stations: str | List[str] = 'all', source: str = 'gsim') Series[source]

Returns area (Km2) of all catchments as pandas series

Parameters:
  • stations (str/list) – name/names of stations. Default is None, which will return area of all stations

  • source (str) – source of area calculation. It should be either gsim or ana

Returns:

a pandas series whose indices are catchment ids and values are areas of corresponding catchments.

Return type:

pd.Series

Examples

>>> from water_datasets import CAMELS_BR
>>> dataset = CAMELS_BR()
>>> dataset.area()  # returns area of all stations
>>> dataset.stn_coords('65100000')  # returns area of station whose id is 912101A
>>> dataset.stn_coords(['65100000', '64075000'])  # returns area of two stations
fetch_raw_streamflow(station_id: str = None) DataFrame[source]

returns raw streamflow data for one or more stations.

Example

>>> dataset = CAMELS_BR()
>>> data = dataset.fetch_raw_streamflow('10500000')
... # fetch all time series data associated with a station.
>>> x = dataset.fetch_raw_streamflow(dataset.all_stations())
fetch_simulated_streamflow(station_id: str = None) DataFrame[source]

returns raw streamflow data for one or more stations.

Example

>>> dataset = CAMELS_BR()
>>> data = dataset.fetch_simulated_streamflow('10500000')
... # fetch all time series data associated with a station.
>>> x = dataset.fetch_simulated_streamflow(dataset.all_stations())
fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]

fetches static feature/features of one or mroe stations

Parameters:
  • stn_id (int/list) – station id whose attribute to fetch.

  • static_features (str/list) – name of attribute to fetch. Default is None, which will return all the attributes for a particular station of the specified category.

Example

>>> dataset = Camels()
>>> df = dataset.fetch_static_features('11500000', 'climate')
# read all static features of all stations
>>> data = dataset.fetch_static_features(dataset.stations(), dataset.static_features)
>>> data.shape
(597, 67)
q_mmd(stations: str | List[str] = 'all') DataFrame[source]

returns streamflow in the units of milimeter per day. he name of original timeseries is streamflow_mm.

Parameters:

stations (str/list) – name/names of stations. Default is None, which will return area of all stations

Returns:

a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

stations() List[str][source]

Returns a list of station ids.

Example

>>> dataset = CAMELS_BR()
>>> stations = dataset.stations()
stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

coords

Examples

>>> dataset = CAMELS_BR()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('65100000')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['65100000', '64075000'])  # returns coordinates of two stations
class aqua_fetch.rr.CAMELS_CH(path=None, overwrite: bool = False, to_netcdf: bool = True, timestep: str = 'D', **kwargs)[source]

Bases: Camels

Data of 331 Swiss catchments from Hoege et al., 2023 . The dataset consists of 209 static catchment features and 9 dynamic features. The dynamic features span from 19810101 to 20201231 with daily timestep. For daily (D) timestep, only streamflow is available for 170 swiss catchments. The hourly (H) streamflow data is obtained from Kauzlaric et al., 2023 .

Examples

>>> from water_datasets import CAMELS_CH
>>> dataset = CAMELS_CH()
>>> data = dataset.fetch(0.1, as_dataframe=True)
>>> data.shape
(128560, 10)
>>> data.index.names == ['time', 'dynamic_features']
True
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
(8036, 9)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
331
# get data by station id
>>> df = dataset.fetch(stations='2004', as_dataframe=True).unstack()
>>> df.shape
(8036, 9)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True, dynamic_features=['precipitation(mm/d)', 'temperature_mean(°C)', 'discharge_vol(m3/s)']).unstack()
>>> df.shape
(8036, 3)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(72324, 10)  # remember this is multi-indexed DataFrame
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='2004', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
((1, 209), (72324, 1))
__init__(path=None, overwrite: bool = False, to_netcdf: bool = True, timestep: str = 'D', **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc. but will require netcdf5 package as well as xarry.

all_hourly_stations() List[str][source]

Names of all stations which have hourly data

climate_attrs() DataFrame[source]

returns 14 climate attributes of catchments.

fetch_static_features(stn_id: str | list = 'all', static_features: str | list = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import CAMELS_CH
>>> dataset = CAMELS_CH()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    331
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (331, 209)
get static data of one station only
>>> static_data = dataset.fetch_static_features('2004')
>>> static_data.shape
   (1, 209)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['gauge_lon', 'gauge_lat', 'area'])
>>> static_data.shape
   (331, 3)
>>> data = dataset.fetch_static_features('2004', static_features=['gauge_lon', 'gauge_lat', 'area'])
>>> data.shape
   (1, 3)
foen_stations() List[str][source]

Returns all the stations in the FOEN folder

geol_attrs() DataFrame[source]

15 geological features

glacier_attrs() DataFrame[source]
returns a dataframe with four columns
  • ‘glac_area’

  • ‘glac_vol’

  • ‘glac_mass’

  • ‘glac_area_neighbours’

hourly_stations() List[str][source]

IDs of those stations which have hourly data and which are also part of CAMELS-CH dataset

human_inf_attrs() DataFrame[source]

14 athropogenic factors

hydrogeol_attrs() DataFrame[source]

10 hydrogeological factors

hydrol_attrs() DataFrame[source]

14 hydrological parameters + 2 useful infos

landcolover_attrs() DataFrame[source]

13 landcover parameters

soil_attrs() DataFrame[source]

80 soil parameters

stations() List[str][source]

Returns station ids for catchments

supp_geol_attrs() DataFrame[source]

supplimentary geological features

topo_attrs() DataFrame[source]

topographic parameters

class aqua_fetch.rr.CAMELS_CL(path: str = None, **kwargs)[source]

Bases: Camels

This is a dataset of 516 Chilean catchments with 104 static features and 12 dyanmic features for each catchment. The dyanmic features are timeseries from 1913-02-15 to 2018-03-09. This class downloads and processes CAMELS dataset of Chile following the work of Alvarez-Garreton et al., 2018 .

Examples

>>> from water_datasets import CAMELS_CL
>>> dataset = CAMELS_CL()
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
    (38374, 12)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
516
# we can get data of 10% catchments as below
>>> data = dataset.fetch(0.1, as_dataframe=True)
>>> data.shape
(460488, 51)
# the data is multi-index with ``time`` and ``dynamic_features`` as indices
>>> df.index.names == ['time', 'dynamic_features']
 True
# get data by station id
>>> df = dataset.fetch(stations='8350001', as_dataframe=True).unstack()
>>> df.shape
(38374, 12)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True,
... dynamic_features=['pet_hargreaves', 'precip_tmpa', 'tmean_cr2met', 'streamflow_m3s']).unstack()
>>> df.shape
(38374, 4)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(460488, 10)
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='8350001', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
>>> ((1, 104), (460488, 1))
__init__(path: str = None, **kwargs)[source]
Parameters:

path – path where the CAMELS-CL dataset has been downloaded. This path must contain five zip files and one xlsx file.

fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all')[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from water_datasets import CAMELS_CL
>>> dataset = CAMELS_CL()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    516
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (516, 104)
get static data of one station only
>>> static_data = dataset.fetch_static_features('11315001')
>>> static_data.shape
   (1, 104)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['slope_mean', 'area'])
>>> static_data.shape
   (516, 2)
>>> data = dataset.fetch_static_features('2110002', static_features=['slope_mean', 'area'])
>>> data.shape
   (1, 2)
stations() list[source]

Tells all station ids for which a data of a specific attribute is available.

stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

coords

Examples

>>> dataset = CAMELS_CL()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('12872001')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['12872001', '12876004'])  # returns coordinates of two stations
class aqua_fetch.rr.CAMELS_GB(path=None, **kwargs)[source]

Bases: Camels

This is a dataset of 671 catchments with 145 static features and 10 dyanmic features for each catchment following the work of Coxon et al., 2020. The dyanmic features are timeseries from 1970-10-01 to 2015-09-30. The data is downloaded from ceh website

Examples

>>> from water_datasets import CAMELS_GB
>>> dataset = CAMELS_GB()
>>> data = dataset.fetch(0.1, as_dataframe=True)
>>> data.shape
 (164360, 67)
>>> data.index.names == ['time', 'dynamic_features']
True
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
(16436, 10)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
671
# get data by station id
>>> df = dataset.fetch(stations='97002', as_dataframe=True).unstack()
>>> df.shape
(16436, 10)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True,
... dynamic_features=['windspeed', 'temperature', 'pet', 'precipitation', 'discharge_vol']).unstack()
>>> df.shape
(16436, 5)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(164360, 10)  # remember this is multi-indexed DataFrame
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='97002', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
((1, 290), (164360, 1))
__init__(path=None, **kwargs)[source]
Parameters:

path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]

Fetches static features of one or more stations for one or more category as dataframe.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from water_datasets import CAMELS_GB
>>> dataset = CAMELS_GB(path="path/to/CAMELS_GB")
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    671
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (671, 145)
get static data of one station only
>>> static_data = dataset.fetch_static_features('85004')
>>> static_data.shape
   (1, 145)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['area', 'elev_mean'])
>>> static_data.shape
   (671, 2)
class aqua_fetch.rr.CAMELS_US(data_source: str = 'basin_mean_daymet', path=None, **kwargs)[source]

Bases: Camels

This is a dataset of 671 US catchments with 59 static features and 8 dyanmic features for each catchment. The dyanmic features are timeseries from 1980-01-01 to 2014-12-31. This class downloads and processes CAMELS dataset of 671 catchments named as CAMELS from ucar.edu following Newman et al., 2015 , Newman et al., 2022 and Addor et al., 2017.

Examples

>>> from water_datasets import CAMELS_US
>>> dataset = CAMELS_US()
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
(12784, 8)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
671
# we can get data of 10% catchments as below
>>> data = dataset.fetch(0.1, as_dataframe=True)
>>> data.shape
(460488, 51)
# the data is multi-index with ``time`` and ``dynamic_features`` as indices
>>> data.index.names == ['time', 'dynamic_features']
 True
# get data by station id
>>> df = dataset.fetch(stations='11478500', as_dataframe=True).unstack()
>>> df.shape
(12784, 8)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True,
... dynamic_features=['prcp(mm/day)', 'srad(W/m2)', 'tmax(C)', 'tmin(C)', 'Flow']).unstack()
>>> df.shape
(12784, 5)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(102272, 10)  # remember this is multi-indexed DataFrame
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='11478500', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
((1, 59), (102272, 1))
__init__(data_source: str = 'basin_mean_daymet', path=None, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • data_source (str) –

    allowed values are
    • basin_mean_daymet

    • basin_mean_maurer

    • basin_mean_nldas

    • basin_mean_v1p15_daymet

    • basin_mean_v1p15_nldas

    • elev_bands

    • hru

fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all')[source]

gets one or more static features of one or more stations

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from water_datasets import CAMELS_US
>>> camels = CAMELS_US()
>>> st_data = camels.fetch_static_features('11532500')
>>> st_data.shape
   (1, 59)
get names of available static features
>>> camels.static_features
get specific features of one station
>>> static_data = camels.fetch_static_features('11528700',
>>> static_features=['area_gages2', 'geol_porostiy', 'soil_conductivity', 'elev_mean'])
>>> static_data.shape
   (1, 4)
get names of allstations
>>> all_stns = camels.stations()
>>> len(all_stns)
   671
>>> all_static_data = camels.fetch_static_features(all_stns)
>>> all_static_data.shape
   (671, 59)
class aqua_fetch.rr.CAMELS_DE(path=None, overwrite: bool = False, to_netcdf: bool = True, verbosity: int = 1, **kwargs)[source]

Bases: Camels

This is the data from 1555 German catchments following the work of Loritz et al., 2024 . The data is downloaded from zenodo . This data consists of 155 static and 21 dynamic features. The dynamic features span from 1951-01-01 to 2020-12-31 with daily timestep.

Examples

>>> from water_datasets import CAMELS_DE
>>> dataset = CAMELS_DE()
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
   (25568, 21)
get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   1555
get data of 10 % of stations as dataframe
>>> df = dataset.fetch(0.1, as_dataframe=True)
>>> df.shape
    (536928, 155)
The returned dataframe is a multi-indexed data
>>> df.index.names == ['time', 'dynamic_features']
    True
get data by station id
>>> df = dataset.fetch(stations='DE110260', as_dataframe=True).unstack()
>>> df.shape
    (25568, 21)
get names of available dynamic features
>>> dataset.dynamic_features
get only selected dynamic features
>>> data = dataset.fetch(1, as_dataframe=True,
...  dynamic_features=['temperature_mean', 'humidity_mean', 'precipitation_mean', 'discharge_vol']).unstack()
>>> data.shape
    (25568, 4)
get names of available static features
>>> dataset.static_features
get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape  # remember this is a multiindexed dataframe
    (536928, 10)
when we get both static and dynamic data, the returned data is a dictionary
with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='DE110260', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
    ((1, 111), (536928, 1))
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (1555, 2)
>>> dataset.stn_coords('DE110250')  # returns coordinates of station whose id is DE110250
    47.925221       8.191595
>>> dataset.stn_coords(['DE110250', 'DE110260'])  # returns coordinates of two stations
__init__(path=None, overwrite: bool = False, to_netcdf: bool = True, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc. but will require netCDF5 package as well as xarray.

fetch_static_features(stn_id: str | list = 'all', static_features: str | list = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import CAMELS_CH
>>> dataset = CAMELS_DE()
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (1555, 111)
get static data of one station only
>>> static_data = dataset.fetch_static_features('DE110010')
>>> static_data.shape
   (1, 111)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['p_mean', 'p_seasonality', 'frac_snow'])
>>> static_data.shape
   (1555, 3)
>>> data = dataset.fetch_static_features('DE110000', static_features=['p_mean', 'p_seasonality', 'frac_snow'])
>>> data.shape
   (1, 3)
class aqua_fetch.rr.CAMELS_DK(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]

Bases: Camels

This is an updated version of :py class: water_datasets.rr.CAMELS_DK0 dataset . This dataset was presented by Liu et al., 2024 and is available at dataverse . This dataset consists of 119 static and 13 dynamic features from 3330 danish catchments. The dynamic (time series) features span from 1989-01-02 to 2023-12-31 with daily timestep. However, the streamflow observations are available for only 304 catchments.

Examples

>>> from water_datasets import CAMELS_DK
>>> dataset = CAMELS_DK()
>>> data = dataset.fetch(0.1, as_dataframe=True)
>>> data.shape
(166166, 30)  # 30 represents number of stations
Since data is a multi-index dataframe, we can get data of one station as below
>>> data['54130033'].unstack().shape
(12782, 13)
If we don't set as_dataframe=True, then the returned data will be a xarray Dataset
>>> data = dataset.fetch(0.1)
>>> type(data)
    xarray.core.dataset.Dataset
>>> data.dims
FrozenMappingWarningOnValuesAccess({'time': 12782, 'dynamic_features': 13})
>>> len(data.data_vars)
    30
>>> df = dataset.fetch(stations=1, as_dataframe=True)  # get data of only one random station
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
(12782, 13)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
304
# get data by station id
>>> df = dataset.fetch(stations='54130033', as_dataframe=True).unstack()
>>> df.shape
(12782, 13)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True,
... dynamic_features=['Abstraction', 'pet', 'temperature', 'precipitation', 'Qobs']).unstack()
>>> df.shape
(12782, 5)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(166166, 10)  # remember this is multi-indexed DataFrame
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='54130033', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
((1, 119), (166166, 1))
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (304, 2)
>>> dataset.stn_coords('54130033')  # returns coordinates of station whose id is GRDC_3664802
    6131379.493     559057.7232
>>> dataset.stn_coords(['54130033', '13210113'])  # returns coordinates of two stations
__init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.

property dynamic_features: List[str]

returns names of dynamic features

fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import CAMELS_DK
>>> dataset = CAMELS_DK()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    304
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (304, 119)
get static data of one station only
>>> static_data = dataset.fetch_static_features('42600042')
>>> static_data.shape
   (1, 119)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['slope_mean', 'aridity'])
>>> static_data.shape
   (304, 2)
>>> data = dataset.fetch_static_features('42600042', static_features=['slope_mean', 'aridity'])
>>> data.shape
   (1, 2)
static_data() DataFrame[source]

combination of topographic + soil + landuse + geology + climate features

Returns:

a pandas DataFrame of static features of all catchments of shape (3330, 119)

Return type:

pd.DataFrame

property static_features: List[str]

returns static features for Denmark catchments

transform_coords(coords)[source]

Transforms the coordinates to the required format.

transform_stn_coords(df: DataFrame) DataFrame[source]

transforms coordinates from geographic to projected

must be implemented in base classes

class aqua_fetch.rr.Caravan_DK(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]

Bases: Camels

Reads Caravan extension Denmark - Danish dataset for large-sample hydrology following the works of Koch and Schneider 2022 . The dataset is downloaded from zenodo . This dataset consists of static and dynamic features from 308 danish catchments. There are 38 dynamic (time series) features from 1981-01-02 to 2020-12-31 with daily timestep and 211 static features for each of 308 catchments.

Please note that there is an updated version of this dataset following the works of Liu et al., 2024 . This dataset is associated with the water_datasets.CAMELS_DK class which can be imported as follows:

>>> from water_datasets import CAMELS_DK

Examples

>>> from water_datasets import Caravan_DK
>>> dataset = Caravan_DK()
>>> data = dataset.fetch(0.1, as_dataframe=True)
>>> data.shape
(569751, 30)  # 30 represents number of stations
>>> data.index.names == ['time', 'dynamic_features']
True
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
(14609, 39)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
308
# get data by station id
>>> df = dataset.fetch(stations='80001', as_dataframe=True).unstack()
>>> df.shape
(14609, 39)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True,
... dynamic_features=['snow_depth_water_equivalent_mean', 'temperature_2m_mean',
... 'potential_evaporation_sum', 'total_precipitation_sum', 'streamflow']).unstack()
>>> df.shape
(14609, 5)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(569751, 10)  # remember this is multi-indexed DataFrame
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dynamic`` keys.
>>> data = dataset.fetch(stations='80001', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
((1, 211), (569751, 1))
__init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.

property caravan_attr_fpath

returns path to attributes_caravan_camelsdk.csv file

caravan_static_attributes(stations='all') DataFrame[source]
Return type:

a pandas DataFrame of shape (308, 10)

property dynamic_features: List[str]

returns names of dynamic features

fetch_static_features(stn_id: str | List[str] = 'all', features: str | List[str] = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import Caravan_DK
>>> dataset = Caravan_DK()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    308
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (308, 211)
get static data of one station only
>>> static_data = dataset.fetch_static_features('80001')
>>> static_data.shape
   (1, 211)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['gauge_lat', 'area'])
>>> static_data.shape
   (308, 2)
>>> data = dataset.fetch_static_features('80001', features=['gauge_lat', 'area'])
>>> data.shape
   (1, 2)
hyd_atlas_attributes(stations='all') DataFrame[source]
Return type:

a pandas DataFrame of shape (308, 196)

property other_attr_fpath

returns path to attributes_other_camelsdk.csv file

other_static_attributes(stations='all') DataFrame[source]
Return type:

a pandas DataFrame of shape (308, 5)

q_mmd(stations: str | List[str] = 'all') DataFrame[source]

returns streamflow in the units of milimeter per day. This is obtained by diving streamflow/area

Parameters:

stations (str/list) – name/names of stations. Default is all, which will return area of all stations

Returns:

a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

property static_features: List[str]

returns static features for Denmark catchments

stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

coords

Examples

>>> dataset = Caravan_DK()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('100010')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['100010', '210062'])  # returns coordinates of two stations
class aqua_fetch.rr.CAMELS_FR(path=None, overwrite=False, **kwargs)[source]

Bases: Camels

Dataset of 654 catchments from France following the works of Delaigue et al., 2024. The dataset consists of 344 static catchment features and 22 dynamic features. The dynamic features span from 1970101 to 20211231 with daily timestep.

__init__(path=None, overwrite=False, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

property dynamic_features: List[str]

returns names of dynamic features

fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import CAMELS_FR
>>> dataset = CAMELS_FR()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    654
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (472, 210)
get static data of one station only
>>> static_data = dataset.fetch_static_features('42600042')
>>> static_data.shape
   (1, 210)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['slope_mean', 'aridity'])
>>> static_data.shape
   (472, 2)
>>> data = dataset.fetch_static_features('42600042', static_features=['slope_mean', 'aridity'])
>>> data.shape
   (1, 2)
static_attrs() DataFrame[source]

combination of topographic + soil + landuse + geology + climate + hydro + climate + anthropogenic features

Returns:

a pandas DataFrame of static features of all catchments of shape (654, xxxx)

Return type:

pd.DataFrame

static_data() DataFrame[source]

static attributes plus timeseries statistics

Returns:

a pandas DataFrame of static features of all catchments of shape (654, xxxx)

Return type:

pd.DataFrame

property static_features: List[str]

returns static features for Denmark catchments

ts_attrs() DataFrame[source]

daily_timeseries statistics of all catchments

Returns:

a pandas DataFrame of static features of all catchments of shape (654, xxxx)

Return type:

pd.DataFrame

class aqua_fetch.CAMELS_IND(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]

Bases: Camels

Dataset of 472 catchments from Republic of India following the works of Mangukiya et al., 2024. The dataset consists of 210 static catchment features and 20 dynamic features. The dynamic features span from 19800101 to 20201231 with daily timestep.

Examples

>>> from water_datasets import CAMELS_IND
>>> dataset = CAMELS_IND()
>>> data = dataset.fetch(0.1, as_dataframe=True)
>>> data.shape
(299520, 47)  # 47 represents number of stations
Since data is a multi-index dataframe, we can get data of one station as below
>>> data['17015'].unstack().shape
(14976, 20)
If we don't set as_dataframe=True, then the returned data will be a xarray Dataset
>>> data = dataset.fetch(0.1)
>>> type(data)
    xarray.core.dataset.Dataset
>>> data.dims
FrozenMappingWarningOnValuesAccess({'time': 14976, 'dynamic_features': 20})
>>> len(data.data_vars)
    47
>>> df = dataset.fetch(stations=1, as_dataframe=True)  # get data of only one random station
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
(14976, 20)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
472
# get data by station id
>>> df = dataset.fetch(stations='3001', as_dataframe=True).unstack()
>>> df.shape
(14976, 20)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True,
... dynamic_features=['prcp(mm/day)', 'rel_hum(%)', 'tavg(C)', 'pet(mm/day)', 'streamflow_cms']).unstack()
>>> df.shape
(14976, 5)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(299520, 10)  # remember this is multi-indexed DataFrame
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='3001', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
((1, 220), (299520, 1))
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (472, 2)
>>> dataset.stn_coords('3001')  # returns coordinates of station whose id is 3001
    18.3861 80.3917
>>> dataset.stn_coords(['3001', '17021'])  # returns coordinates of two stations
__init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

property dynamic_features: List[str]

returns names of dynamic features

fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import CAMELS_IND
>>> dataset = CAMELS_IND()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    472
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (472, 210)
get static data of one station only
>>> static_data = dataset.fetch_static_features('42600042')
>>> static_data.shape
   (1, 210)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['slope_mean', 'aridity'])
>>> static_data.shape
   (472, 2)
>>> data = dataset.fetch_static_features('42600042', static_features=['slope_mean', 'aridity'])
>>> data.shape
   (1, 2)
static_data() DataFrame[source]

combination of topographic + soil + landuse + geology + climate + hydro + climate + anthropogenic features

Returns:

a pandas DataFrame of static features of all catchments of shape (3330, 119)

Return type:

pd.DataFrame

property static_features: List[str]

returns static features for Denmark catchments

stations() List[str][source]

returns names of stations a list

Node: 0s are omitted from the start of the station names which means 03001 is returned as 3001

class aqua_fetch.rr.CAMELS_SE(path: str = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: Camels

Dataset of 50 Swedish catchments following the works of Teutschbein et al., 2024 . The dataset consists of 76 static catchment features and 4 dynamic features. The dynamic features span from 19610101 to 20201231 with daily timestep.

Examples

>>> from water_datasets import CAMELS_SE
>>> dataset = CAMELS_SE()
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
   (21915, 4)
get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   50
get data of 10 % of stations as dataframe
>>> df = dataset.fetch(0.1, as_dataframe=True)
>>> df.shape
   (87660, 5)
The returned dataframe is a multi-indexed data
>>> df.index.names == ['time', 'dynamic_features']
    True
get data by station id
>>> df = dataset.fetch(stations='5', as_dataframe=True).unstack()
>>> df.shape
     (21915, 4)
get names of available dynamic features
>>> dataset.dynamic_features
get only selected dynamic features
>>> data = dataset.fetch(1, as_dataframe=True,
...  dynamic_features=['Qobs_m3s', 'Qobs_mm', 'Pobs_mm', 'Tobs_C']).unstack()
>>> data.shape
    (21915, 4)
get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape  # remember this is a multiindexed dataframe
    (87660, 10)
when we get both static and dynamic data, the returned data is a dictionary
with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='5', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
    ((1, 76), (87660, 1))
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (50, 2)
>>> dataset.stn_coords('5')  # returns coordinates of station whose id is GRDC_3664802
    68.0356 21.9758
>>> dataset.stn_coords(['5', '200'])  # returns coordinates of two stations
__init__(path: str = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path – path where the CAMELS_SE dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.

  • to_netcdf

fetch_static_features(stn_id: str | list = 'all', static_features: str | list = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import CAMELS_SE
>>> dataset = CAMELS_SE()
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (50, 76)
get static data of one station only
>>> static_data = dataset.fetch_static_features('5')
>>> static_data.shape
   (1, 76)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['Area_km2', 'Water_percentage', 'Elevation_mabsl'])
>>> static_data.shape
   (50, 3)
>>> data = dataset.fetch_static_features('5', static_features=['Area_km2', 'Water_percentage', 'Elevation_mabsl'])
>>> data.shape
   (1, 3)
class aqua_fetch.rr.CCAM(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]

Bases: Camels

Dataset for chinese catchments. The CCAM dataset was published by Hao et al., 2021 has two sets. One set consists of catchment attributes, meteorological data, catchment boundaries of over 4000 catchments. However this data does not have streamflow data. The second set consists of streamflow, catchment attributes, catchment boundaries and meteorological data for 102 catchments of Yellow River. Since this second set conforms to the norms of CAMELS, this class uses this second set. Therefore, the fetch, stations and other methods/attributes of this class return data of only Yellow River catchments and not for whole china. However, the first set of data is can also be fetched using fetch_meteo method of this class. The temporal extent of both sets is from 1999 to 2020. However, the streamflow time series in first set has very large number of missing values. The data of Yellow river consists fo 16 dynamic features (time series) and 124 static features (catchment attributes).

Examples

>>> from water_datasets import CCAM
>>> dataset = CCAM()
>>> data = dataset.fetch(0.1, as_dataframe=True)
>>> data.shape
(128560, 10)
>>> data.index.names == ['time', 'dynamic_features']
True
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
(8035, 16)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
102
# get data by station id
>>> df = dataset.fetch(stations='0010', as_dataframe=True).unstack()
>>> df.shape
(8035, 16)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True, dynamic_features=['pre', 'tem_mean', 'evp', 'rhu', 'q']).unstack()
>>> df.shape
(8035, 5)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(128560, 10)  # remember this is multi-indexed DataFrame
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='0010', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
((1, 124), (128560, 1))
__init__(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.

property dynamic_features: List[str]

names of hydro-meteorological time series data for Yellow River catchments

fetch_meteo(stn_id: str | List[str] = 'all', features: str | List[str] = 'all', st='1990-01-01', en='2021-03-31', as_dataframe: bool = True)[source]

fetches meteorological data of 4902 chinese catchments

>>> from water_datasets import CCAM
>>> dataset = CCAM()
>>> dynamic_features = ['PRE', 'TEM', 'PRS', 'RHU', 'EVP', 'WIN', 'PET']
>>> st = '1999-01-01'
>>> en = '2020-03-31'
>>> xds = dataset.fetch_meteo(features=features, st=st, en=en)
fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import CAMELS_DK
>>> dataset = CAMELS_DK()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    102
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (102, 124)
get static data of one station only
>>> static_data = dataset.fetch_static_features('0140')
>>> static_data.shape
   (1, 124)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['lon', 'lat', 'area'])
>>> static_data.shape
   (102, 3)
>>> data = dataset.fetch_static_features('0140', static_features=['lon', 'lat', 'area'])
>>> data.shape
   (1, 3)
property meteo_path

path where daily meteorological data of stations is present

q_mmd(stations: str | List[str] = 'all') DataFrame[source]

returns streamflow in the units of milimeter per day. This is obtained by diving q/area

Parameters:

stations (str/list) – name/names of stations. Default is all, which will return area of all stations

Returns:

a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

property static_features: List[str]

names of static features for Yellow River catchments

stations()[source]

Returns station ids for catchments on Yellow River

class aqua_fetch.rr.Finland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _EStreams

Data of 669 catchments of Finland. The observed streamflow data is downloaded from https://wwwi3.ymparisto.fi . The meteorological data, static catchment features and catchment boundaries are taken from water_datasets.EStreams follwoing the works of Nascimento et al., 2024 . Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 2012-01-01 to 2023-06-30.

__init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

get_q(as_dataframe: bool = True, overwrite: bool = False)[source]

downloads (if not already downloaded) and returns the daily streamflow data of Finland. either as pandas dataframe or as xarray dataset.

stations() List[str][source]

returns the basin_id of the stations

class aqua_fetch.rr.GRDCCaravan(path=None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: Camels

This is a dataset of 5357 catchments from around the globe following the works of Faerber et al., 2023 . The dataset consists of 39 dynamic (timeseries) features and 211 static features. The dynamic (timeseries) data spands from 1950-01-02 to 2019-05-19.

if xarray + netCDF4 packages are installed then netcdf files will be downloaded otherwise csv files will be downloaded and used.

Examples

>>> from water_datasets import GRDCCaravan
>>> dataset = GRDCCaravan()
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
   (26801, 39)
get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   5357
get data of 10 % of stations as dataframe
>>> df = dataset.fetch(0.1, as_dataframe=True)
>>> df.shape
   (1045239, 535)
The returned dataframe is a multi-indexed data
>>> df.index.names == ['time', 'dynamic_features']
    True
get data by station id
>>> df = dataset.fetch(stations='GRDC_3664802', as_dataframe=True).unstack()
>>> df.shape
     (26800, 39)
get names of available dynamic features
>>> dataset.dynamic_features
get only selected dynamic features
>>> data = dataset.fetch(1, as_dataframe=True,
...  dynamic_features=['total_precipitation_sum', 'potential_evaporation_sum', 'temperature_2m_mean', 'streamflow']).unstack()
>>> data.shape
    (26800, 4)
get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape  # remember this is a multiindexed dataframe
    (1045239, 10)
when we get both static and dynamic data, the returned data is a dictionary
with ``static`` and ``dyanic`` keys.
>>> data = dataset.fetch(stations='GRDC_3664802', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
    ((1, 211), (1045200, 1))
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (5357, 2)
>>> dataset.stn_coords('GRDC_3664802')  # returns coordinates of station whose id is GRDC_3664802
    -26.2271        -51.0771
>>> dataset.stn_coords(['GRDC_3664802', 'GRDC_1159337'])  # returns coordinates of two stations
__init__(path=None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

fetch_static_features(stn_id: str | list = 'all', static_features: str | list = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import GRDCCaravan
>>> dataset = GRDCCaravan()
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (1555, 111)
get static data of one station only
>>> static_data = dataset.fetch_static_features('DE110010')
>>> static_data.shape
   (1, 111)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['p_mean', 'p_seasonality', 'frac_snow'])
>>> static_data.shape
   (1555, 3)
>>> data = dataset.fetch_static_features('DE110000', static_features=['p_mean', 'p_seasonality', 'frac_snow'])
>>> data.shape
   (1, 3)
fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, as_ts: bool = False, st: str | None = None, en: str | None = None, **kwargs) Dict[str, DataFrame][source]

Fetches features for one station.

Parameters:
  • station – station id/gauge id for which the data is to be fetched.

  • dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch

  • static_features – names of static features/attributes to be fetches

  • as_ts (bool) – whether static features are to be converted into a time series or not. If yes then the returned time series will be of same length as that of dynamic attribtues.

  • st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.

  • en (str, optional) – end point of data to be fetched. By default the dat will be fetched

Returns:

dataframe if as_ts is True else it returns a dictionary of static and dynamic features for a station/gauge_id

Return type:

Dict

Examples

>>> from water_datasets import GRDCCaravan
>>> dataset = GRDCCaravan()
>>> dataset.fetch_station_features('912101A')
class aqua_fetch.rr.HYSETS(path: str, swe_source: str = 'SNODAS_SWE', discharge_source: str = 'ERA5', tasmin_source: str = 'ERA5', tasmax_source: str = 'ERA5', pr_source: str = 'ERA5', **kwargs)[source]

Bases: Camels

database for hydrometeorological modeling of 14,425 North American watersheds from 1950-2018 following the work of Arsenault et al., 2020 The user must manually download the files, unpack them and provide the path where these files are saved.

This data comes with multiple sources. Each source having one or more dynamic_features Following data_source are available.

sources

dynamic_features

SNODAS_SWE

dscharge, swe

SCDNA

discharge, pr, tasmin, tasmax

nonQC_stations

discharge, pr, tasmin, tasmax

Livneh

discharge, pr, tasmin, tasmax

ERA5

discharge, pr, tasmax, tasmin

ERAS5Land_SWE

discharge, swe

ERA5Land

discharge, pr, tasmax, tasmin

all sources contain one or more following dynamic_features with following shapes

dynamic_features

shape

time

(25202,)

watershedID

(14425,)

drainage_area

(14425,)

drainage_area_GSIM

(14425,)

flag_GSIM_boundaries

(14425,)

flag_artificial_boundaries

(14425,)

centroid_lat

(14425,)

centroid_lon

(14425,)

elevation

(14425,)

slope

(14425,)

discharge

(14425, 25202)

pr

(14425, 25202)

tasmax

(14425, 25202)

tasmin

(14425, 25202)

Examples

>>> from water_datasets import HYSETS
>>> dataset = HYSETS(path="path/to/HYSETS")
... # fetch data of a random station
>>> df = dataset.fetch(1, as_dataframe=True)
>>> df.shape
(25202, 5)
>>> stations = dataset.stations()
>>> len(stations)
14425
>>> df = dataset.fetch('999', as_dataframe=True)
>>> df.unstack().shape
(25202, 5)
__init__(path: str, swe_source: str = 'SNODAS_SWE', discharge_source: str = 'ERA5', tasmin_source: str = 'ERA5', tasmax_source: str = 'ERA5', pr_source: str = 'ERA5', **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • swe_source (str) – source of swe data.

  • discharge_source – source of discharge data

  • tasmin_source – source of tasmin data

  • tasmax_source – source of tasmax data

  • pr_source – source of pr data

  • kwargs – arguments for Camels base class

property OfficialID_WatershedID_map

A dictionary mapping Official_ID to Watershed_ID. For example ‘1’: ‘01AD002’

property WatershedID_OfficialID_map

A dictionary mapping Watershed_ID to Official_ID. For example ‘01AD002’: ‘1’

area(stations: str | List[str] = 'all', source: str = 'other') Series[source]

Returns area_gov (Km2) of all catchments as pandas series

Parameters:
  • stations (str/list) – name/names of stations. Default is None, which will return area of all stations

  • source (str) – source of area calculation. It should be either gsim or other

Returns:

a pandas series whose indices are catchment ids and values are areas of corresponding catchments.

Return type:

pd.Series

Examples

>>> from water_datasets import HYSETS
>>> dataset = HYSETS()
>>> dataset.area()  # returns area of all stations
>>> dataset.area('92')  # returns area of station whose id is 912101A
>>> dataset.area(['92', '142'])  # returns area of two stations
fetch_dynamic_features(stn_id, features='all', st=None, en=None, as_dataframe=False)[source]

Fetches dynamic features of one station.

Examples

>>> from water_datasets import HYSETS
>>> dataset = HYSETS()
>>> dyn_features = dataset.fetch_dynamic_features('station_name')
fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', st=None, en=None, as_ts=False) DataFrame[source]

returns static atttributes of one or multiple stations

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

  • st

  • en

  • as_ts

Examples

>>> from water_datasets import HYSETS
>>> dataset = HYSETS()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    14425
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (14425, 28)
get static data of one station only
>>> static_data = dataset.fetch_static_features('991')
>>> static_data.shape
   (1, 28)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['Drainage_Area_km2', 'Elevation_m'])
>>> static_data.shape
   (14425, 2)
fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

returns features of multiple stations .. rubric:: Examples

>>> from water_datasets import HYSETS
>>> dataset = HYSETS()
>>> stations = dataset.stations()[0:3]
>>> features = dataset.fetch_stations_features(stations)
get_boundary(catchment_id: str, as_type: str = 'numpy')[source]

returns boundary of a catchment in a required format

Parameters:
  • catchment_id (str) – name/id of catchment

  • as_type (str) – ‘numpy’ or ‘geopandas’

Examples

>>> from water_datasets import HYSETS
>>> dataset = HYSETS()
>>> dataset.get_boundary(dataset.stations()[0])
q_mmd(stations: str | List[str] = 'all') DataFrame[source]

returns streamflow in the units of milimeter per day. This is obtained by diving q_cms/area

Parameters:

stations (str/list) – name/names of stations. Default is None, which will return area of all stations

Returns:

a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

read_static_data(usecols=None, nrows=None)[source]

reads the HYSETS_watershed_properties.txt file while using Watershed_ID as index instead of Official_ID. Watershed_ID starts with 1,2,3 and so on while Official_ID is code from meteo agency such as 01AD002 for station 1.

stations() List[str][source]

retuns a list of station names. The Watershed_ID of the station is used as station name instead of Official_ID. This is because in .nc files watershed_ID is used for stations instead of Official_ID. Official_ID starts with 1, 2, 3 and so on while Watershed_ID is a code from meteo agency such as 01AD002 for station 1.

Returns:

a list of ids of stations

Return type:

list

Examples

>>> from water_datasets import HYSETS
>>> dataset = HYSETS()
... # get name of all stations as list
>>> dataset.stations()
stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

coords

Examples

>>> dataset = HYSETS()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('92')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['92', '142'])  # returns coordinates of two stations
class aqua_fetch.rr.HYPE(time_step: str = 'daily', path=None, **kwargs)[source]

Bases: Camels

Downloads and preprocesses HYPE [1] dataset from Lindstroem et al., 2010 [2] . This is a rainfall-runoff dataset of Costa Rica of 564 stations from 1985 to 2019 at daily, monthly and yearly time steps.

Examples

>>> from water_datasets import HYPE
>>> dataset = HYPE()
... # get data of 5% of stations
>>> df = dataset.fetch(stations=0.05, as_dataframe=True)  # returns a multiindex dataframe
>>> df.shape
  (115047, 28)
... # fetch data of 5 (randomly selected) stations
>>> df = dataset.fetch(stations=5, as_dataframe=True)
>>> df.shape
   (115047, 5)
fetch data of 3 selected stations
>>> df = dataset.fetch(stations=['564','563','562'], as_dataframe=True)
>>> df.shape
   (115047, 3)
... # fetch data of a single stations
>>> df = dataset.fetch(stations='500', as_dataframe=True)
   (115047, 1)
# get only selected dynamic features
>>> df = dataset.fetch(stations='501',
...    dynamic_features=['AET_mm', 'Prec_mm',  'Streamflow_mm'], as_dataframe=True)
# fetch data between selected periods
>>> df = dataset.fetch(stations='225', st="20010101", en="20101231", as_dataframe=True)
>>> df.shape
   (32868, 1)
... # get data at monthly time step
>>> dataset = HYPE(time_step="month")
>>> df = dataset.fetch(stations='500', as_dataframe=True)
>>> df.shape
   (3780, 1)
__init__(time_step: str = 'daily', path=None, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • time_step (str) – one of daily, month or year

  • **kwargs – key word arguments

area(stations: str | List[str] = None) Series[source]

Returns area (Km2) of all catchments as pandas series

Parameters:

stations (str/list) – name/names of stations. Default is None, which will return area of all stations

Returns:

a pandas series whose indices are catchment ids and values are areas of corresponding catchments.

Return type:

pd.Series

Examples

>>> from water_datasets import HYPE
>>> dataset = HYPE()
>>> dataset.area()  # returns area of all stations
>>> dataset.stn_coords('2')  # returns area of station whose id is 912101A
>>> dataset.stn_coords(['2', '605'])  # returns area of two stations
fetch_static_features(stn_id, static_features=None)[source]

static data for HYPE is not available.

stn_coords(stations: str | List[str] = None) DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Examples

>>> dataset = HYPE()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('2')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['2', '605'])  # returns coordinates of two stations
class aqua_fetch.Ireland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _EStreams

Data of 464 catchments of Ireland. Out of these 464 catchments, 280 are from OPW and 184 are from EPA. The observed streamflow data for EPA stations is downloaded from https://epawebapp.epa.ie/Hydronet/#Flow while the observed streamflow for OPW stations is downloaded from https://waterlevel.ie/hydro-data/#/overview/Waterlevel. It should be that out of 280 OPW stations, streamflow data is available for only 129 stations. The meteorological data, static catchment features and catchment boundaries are taken from water_datasets.EStreams follwoing the works of Nascimento et al., 2024 project. Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1992-01-01 to 2020-06-31.

__init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

download_epa_data_seq()[source]

Examples

>>> epa_df = download_epa_data()
download_opw_data_seq()[source]

Examples

>>> opw_df = download_opw_data()
class aqua_fetch.rr.Italy(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _EStreams

Data of 294 catchments of Italy. The observed streamflow data is downloaded from http://www.hiscentral.isprambiente.gov.it/hiscentral/hydromap.aspx?map=obsclient . The meteorological data, static catchment features and catchment boundaries are taken from water_datasets.EStreams follwoing the works of Nascimento et al., 2024 . Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1992-01-01 to 2020-06-31.

__init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

stations() List[str][source]

returns the basin_id of the stations

class aqua_fetch.Japan(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _GSHA

Data of 694 catchments of Japan from river.go.jp website . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2022-12-31.

__init__(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

get_q(as_dataframe: bool = True) DataFrame[source]

reads daily streamflow for all stations and puts them in a single file named data.csv. If data.csv is already present, then it is read and its contents are returned as dataframe.

class aqua_fetch.rr.LamaHCE(*, timestep: str, data_type: str, path=None, to_netcdf: bool = True, overwrite=False, **kwargs)[source]

Bases: Camels

Large-Sample Data for Hydrology and Environmental Sciences for Central Europe (mainly Austria). The dataset is downloaded from zenodo following the work of Klingler et al., 2021 . For total_upstrm data, there are 859 stations with 61 static features and 17 dynamic features. The temporal extent of data is from 1981-01-01 to 2019-12-31.

__init__(*, timestep: str, data_type: str, path=None, to_netcdf: bool = True, overwrite=False, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • timestep – possible values are D for daily or H for hourly timestep

  • data_type – possible values are total_upstrm, diff_upstrm_all or diff_upstrm_lowimp

Examples

>>> from water_datasets import LamaHCE
>>> dataset = LamaHCE(timestep='D', data_type='total_upstrm')
# The daily dataset is from 859 with 80 static and 22 dynamic features
>>> len(dataset.stations()), len(dataset.static_features), len(dataset.dynamic_features)
(859, 80, 22)
>>> df = dataset.fetch(3, as_dataframe=True)
>>> df.shape
(313368, 3)
>>> dataset = LamaHCE(timestep='H', data_type='total_upstrm')
>>> len(dataset.stations()), len(dataset.static_features), len(dataset.dynamic_features)
(859, 80, 17)
>>> dataset.fetch_dynamic_features('1', features = ['obs_q_cms'])
fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = None) DataFrame[source]

static features of LamaHCE

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from water_datasets import LamaHCE
>>> dataset = LamaHCE(timestep='D', data_type='total_upstrm')
>>> df = dataset.fetch_static_features('99')  # (1, 61)
...  # get list of all static features
>>> dataset.static_features
>>> dataset.fetch_static_features('99',
>>> static_features=['area_calc', 'elev_mean', 'agr_fra', 'sand_fra'])  # (1, 4)
fetch_stations_features(stations: list, dynamic_features='all', static_features=None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

Reads attributes of more than one stations.

This function checks of .nc files exist, then they are not prepared and saved otherwise first nc files are prepared and then the data is read again from nc files. Upon subsequent calls, the nc files are used for reading the data.

Parameters:
  • stations – list of stations for which data is to be fetched.

  • dynamic_features – list of dynamic attributes to be fetched. if ‘all’, then all dynamic attributes will be fetched.

  • static_features – list of static attributes to be fetched. If all, then all static attributes will be fetched. If None, then no static attribute will be fetched.

  • st – start of data to be fetched.

  • en – end of data to be fetched.

  • as_dataframe – whether to return the data as pandas dataframe. default is xr.dataset object

  • dict (kwargs) – additional keyword arguments

Returns:

Dynamic and static features of multiple stations. Dynamic features are by default returned as xr.Dataset unless as_dataframe is True, in such a case, it is a pandas dataframe with multiindex. If xr.Dataset, it consists of data_vars equal to number of stations and for each station, the DataArray is of dimensions (time, dynamic_features). where time is defined by st and en i.e length of DataArray. In case, when the returned object is pandas DataFrame, the first index is time and second index is dyanamic_features. Static attributes are always returned as pandas DataFrame and have the shape: (stations, static_features). If dynamic_features is None, then they are not returned and the returned value only consists of static features. Same holds true for static_features. If both are not None, then the returned type is a dictionary with static and dynamic keys.

Raises:

ValueError, if both dynamic_features and static_features are None

Examples

>>> from water_datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
... # find out station ids
>>> dataset.stations()
... # get data of selected stations
>>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'],
...  as_dataframe=True)
class aqua_fetch.rr.LamaHIce(path=None, overwrite=False, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = True, **kwargs)[source]

Bases: LamaHCE

Daily and hourly hydro-meteorological time series data of 111 river basins of Iceland following Helgason et al., 2024. The total period of dataset is from 1950 to 2021 for daily and 1976-20023 for hourly timestep. The average length of daily data is 33 years while for that of hourly it is 11 years. The dataset is available on hydroshare

__init__(path=None, overwrite=False, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • timestep – possible values are D for daily or H for hourly timestep

  • data_type – possible values are total_upstrm, intermediate_all or intermediate_lowimp

basin_attributes() DataFrame[source]

returns basin attributes which are catchment attributes, water balance all attributes and water balance filtered attributes

Returns:

a dataframe of shape (111, 104) where 104 are the static catchment/basin attributes

Return type:

pd.DataFrame

catchment_attributes() DataFrame[source]

returns catchment attributes as DataFrame with 90 columns

fetch_clim_features(stations: str | List[str] = None)[source]

Returns climate time series data for one or more stations

Return type:

pd.DataFrame

fetch_q(stations: str | List[str] = None, qc_flag: int = None)[source]

returns streamflow for one or more stations

Parameters:
  • stations (str/List[str]) – name or names of stations for which streamflow is to be fetched

  • qc_flag (int) – following flags are available 40 Good 80 Fair 100 Estimated 120 suspect 200 unchecked 250 missing

Returns:

a pandas dataframe whose index is the time and columns are names of stations For daily timestep, the dataframe has shape of 32630 rows and 111 columns

Return type:

pd.DataFrame

fetch_static_features(stn_id: str | list = 'all', static_features: str | list = None) DataFrame[source]

static features of LamaHCE

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from water_datasets import LamaHCE
>>> dataset = LamaHCE(timestep='D', data_type='total_upstrm')
>>> df = dataset.fetch_static_features('99')  # (1, 61)
...  # get list of all static features
>>> dataset.static_features
>>> dataset.fetch_static_features('99',
>>> static_features=['area_calc', 'elev_mean', 'agr_fra', 'sand_fra'])  # (1, 4)
fetch_stn_meteo(stn: str, nrows: int = None) DataFrame[source]

returns climate/meteorological time series data for one station

Returns:

a pandas dataframe with 23 columns

Return type:

pd.DataFrame

fetch_stn_q(stn: str, qc_flag: int = None) Series[source]

returns streamflow for single station

gauge_attributes() DataFrame[source]

returns gauge attributes from following two files

  • Gauge_attributes.csv

  • hydro_indices_1981_2018.csv

Returns:

a dataframe of shape (111, 28)

Return type:

pd.DataFrame

property gauges_path

returns the path where gauge data files are located

q_mmd(stations: str | List[str] = None) DataFrame[source]

returns streamflow in the units of milimeter per day. This is obtained by diving q_cms/area

Parameters:

stations (str/list) – name/names of stations. Default is None, which will return area of all stations

Returns:

a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

property q_path

path where all q files are located

read_ts_of_station(stn_id: str) DataFrame[source]

Reads daily dynamic (meteorological + streamflow) data for one catchment and returns as DataFrame

static_data() DataFrame[source]

returns static data of all stations

stations() List[str][source]

returns names of stations as a list

wat_bal_attrs() DataFrame[source]

water balance attributes

wat_bal_unfiltered() DataFrame[source]

water balance attributes from unfiltered q

class aqua_fetch.rr.Poland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _EStreams

Data of 1287 catchments of Poland. The observed streamflow data is downloaded from https://danepubliczne.imgw.pl . The meteorological data, static catchment features and catchment boundaries are taken from water_datasets.EStreams follwoing the works of Nascimento et al., 2024 . Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1992-01-01 to 2020-06-31.

__init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

property csv_files_dir: str

path where csv (obtained after extracting zip files) files will be stored

stations() List[str][source]

returns the basin_id of the stations

property zip_files_dir: str

path where zip files will be stored

class aqua_fetch.rr.Portugal(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _EStreams

Data of 280 catchments of Portugal. The observed streamflow data is downloaded from https://snirh.apambiente.pt . The meteorological data, static catchment features and catchment boundaries for the 280 catchments are taken from water_datasets.EStreams follwoing the works of Nascimento et al., 2024 project. Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1972-01-01 to 2022-12-31 .

__init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

download_q_data_parallel(cpus: int = 4)[source]

downloads q data in parallel

download_q_data_seq()[source]

downloads q data sequentially

get_q(as_dataframe: bool = True)[source]

returns the streamflow data of Portugal as xarray.Dataset or pandas.DataFrame

Returns:

  • xarray.Dataset or pandas.DataFrame. If as_dataframe is True, returns pandas.DataFrame

  • with columns as station codes and index as time. If as_dataframe is False, returns

  • xarray.Dataset with station codes as variables and time as dimension.

class aqua_fetch.RRLuleaSweden(path=None, **kwargs)[source]

Bases: Datasets

Rainfall runoff data for an urban catchment from 2016-2019 following the work of Broekhuizen et al., 2020 .

__init__(path=None, **kwargs)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

  • processes – int number of processes to use for parallel processing

  • verbosity – int determines the amount of information to be printed

  • remove_zip – bool whether to remove the zip files after unz

fetch(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None)[source]

fetches rainfall runoff data

Parameters:
  • st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00

  • en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:41

fetch_flow(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]

fetches flow data

Parameters:
  • st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00

  • en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:35:00

Returns:

a dataframe of shape (37_618, 3) where the columns are velocity, level and flow rate

Return type:

pd.DataFrame

Examples

>>> from water_datasets import RRLuleaSweden
>>> dataset = RRLuleaSweden()
>>> flow = dataset.fetch_flow()
>>> flow.shape
(37618, 3)
fetch_pcp(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]

fetches precipitation data

Parameters:
  • st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 19:48:00

  • en (optional) – end of data to be fetched. By default the end is 2019-10-26 23:59:00

Returns:

a dataframe of shape (967_080, 1)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import RRLuleaSweden
>>> dataset = RRLuleaSweden()
>>> pcp = dataset.fetch_pcp()
>>> pcp.shape
(967080, 1)
class aqua_fetch.rr.Simbi(path: str = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: Camels

monthly rainfall from 1905 - 2005, daily rainfall from 1920-1940, 70 daily streamflow series, and 23 monthly temperature series for 24 catchments of Haiti

Bathelemy et al., 2023 Bathelemy et al., 2024

Examples

>>> from water_datasets import Simbi
>>> simbi = Simbi()
__init__(path: str = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path – path where the Simbi dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.

  • to_netcdf

all_stations() List[str][source]

Not all stations have all data.

aquifer_class() DataFrame[source]

Read the aquifer class values.

boundary_stations() List[str][source]

Returns names/IDs of 24 stations with boundary data.

carb_sed_magma() DataFrame[source]

Read the carbonated sedimentary and magmatic values.

clim_sigs() DataFrame[source]

Read the climate signatures.

daily_bsi() DataFrame[source]

Read the daily BSI values.

daily_clim_sigs() DataFrame[source]

Read the daily climate signatures.

daily_high_q_dur() DataFrame[source]

Read the daily high flow values.

daily_high_q_freq() DataFrame[source]

Read the daily flow frequency values.

daily_low_q_dur() DataFrame[source]

Read the daily low flow values.

daily_low_q_freq() DataFrame[source]

Read the daily low flow frequency values.

daily_q_mean() DataFrame[source]

Read the daily mean flow values.

daily_quantile_5() DataFrame[source]

Read the daily 5th quantile flow values.

daily_quantile_95() DataFrame[source]

Read the daily 95th quantile flow values.

fetch_static_features(stn_id: str | list = 'all', static_features: str | list = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_quality import Simbi
>>> dataset = Simbi()
get all static data of all stations
>>> stns = dataset.static_data_stations()
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (24, 232)
get static data of one station only
>>> static_data = dataset.fetch_static_features('001')
>>> static_data.shape
   (1, 232)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['stream_density', 'pcp', 'Forest_lc_98'])
>>> static_data.shape
   (24, 3)
>>> data = dataset.fetch_static_features('001', static_features=['stream_density', 'pcp', 'Forest_lc_98'])
>>> data.shape
   (1, 3)
hypsometric_curve() DataFrame[source]

Read the hyposometric curve values.

monthly_QMNA5() DataFrame[source]

Read the monthly QMNA5 flow values.

monthly_QMXA10() DataFrame[source]

Read the monthly QMNA10 flow values.

monthly_aridity_runoff() DataFrame[source]

Read the monthly aridity runoff values.

monthly_average() DataFrame[source]

Read the monthly average flow values.

monthly_clim_sigs() DataFrame[source]

Read the monthly climate signatures.

monthly_quantile_5() DataFrame[source]

Read the monthly 5th quantile flow values.

monthly_quantile_95() DataFrame[source]

Read the monthly 95th quantile flow values.

other_attributes() DataFrame[source]

Read the other attributes.

pcp_stations() List[str][source]

Returns IDs of 74 stations with daily rainfall data.

percent_geology() DataFrame[source]

Read the geology percentage values.

percent_lc_95() DataFrame[source]

Read the 95th land cover percentage values.

percent_lc_98() DataFrame[source]

Read the land cover percentage values.

q_stations() List[str][source]

Returns names/IDs of 70 stations with daily streamflow data.

read_stn_pcp(stn: str) DataFrame[source]

Read the daily rainfall data for a station.

read_stn_q(stn: str) DataFrame[source]

Read the daily streamflow data for a station.

read_stn_temp(stn: str) DataFrame[source]

Read the daily temperature data for a station.

static_data() DataFrame[source]

Read the static data.

static_data_stations() List[str][source]

Returns names/IDs of 24 stations with static data.

stations() List[str][source]

Returns names/IDs of 24 stations which have all (boundary, streamflow, static features) data. Although there are 70 stations which have daily streamflow data, only 24 of them have static + boundary data.

stream_density() DataFrame[source]

Read the stream density values.

temp_stations() List[str][source]

Returns names/IDs of 21 stations with daily temperature data.

topography() DataFrame[source]

Read the topography values.

class aqua_fetch.rr.Spain(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: _GSHA

Data of 889 catchments of Spain from ceh-es website. The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2020-09-30.

__init__(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

daily_q_all_areas() DataFrame[source]

Daily data of gauging stations in river from all areas

Retuns

16_806_305 rows x 3

daily_q_area(area: str) DataFrame[source]

Reads Daily data of gauging stations in river which is in afliq.csv file

get_q(as_dataframe: bool = True)[source]

returns daily q of all stations

Returns:

a pandas dataframe of shape (39721, 1447)

Return type:

pd.DataFrame

class aqua_fetch.Thailand(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: _GSHA

Data of 73 catchments of Thailand from RID project . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1980-01-01 to 1999-12-31.

__init__(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

get_q(as_dataframe: bool = True)[source]

reads q

class aqua_fetch.USGS(path: str | PathLike = None, hysets_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: Camels

This class handles the hydrometeorological data for the USA. The daily and hourly discharge data is downloaded from usgs/nwis website . The data is optionally stored in a netCDF file if xarray is available. Currently the data is downloaded for only those sites/catchments that are in the HYSETS database. This is because the catchment boundaries are taken from HYSETS database using water_datasets.HYSETS.

For hourly timestep, “iv” service is used to download the instantaneous data which is then resampled to hourly data. Data with only A, [92], A, [91], A, [93], A, e, A flags is used. For daily streamflow, “dv” service is used to download the data. In this case, the data with only A and A, e flags is used.

__init__(path: str | PathLike = None, hysets_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:

path (str) – Path to store the data

area(stations: str | List[str] = 'all') Series[source]

Returns area_gov (Km2) of all catchments as pandas series

Parameters:

stations (str/list) – name/names of stations. Default is None, which will return area of all stations

Returns:

a pandas series whose indices are catchment ids and values are areas of corresponding catchments.

Return type:

pd.Series

Examples

>>> from water_datasets import USGS
>>> dataset = USGS()
>>> dataset.area()  # returns area of all stations
>>> dataset.area('912101A')  # returns area of station whose id is 912101A
>>> dataset.area(['912101A', '12388200'])  # returns area of two stations
fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', st=None, en=None, as_ts=False) DataFrame[source]

returns static atttributes of one or multiple stations

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

  • st

  • en

  • as_ts

Examples

>>> from water_datasets import USGS
>>> dataset = USGS()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    12004
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (12004, 27)
get static data of one station only
>>> static_data = dataset.fetch_static_features('01010070')
>>> static_data.shape
   (1, 27)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['Drainage_Area_km2', 'Elevation_m'])
>>> static_data.shape
   (12004, 2)
fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

returns features of multiple stations

Examples

>>> from water_datasets import USGS
>>> dataset = USGS()
>>> stations = dataset.stations()[0:3]
>>> features = dataset.fetch_stations_features(stations)
get_boundary(catchment_id: str, as_type: str = 'numpy')[source]

returns boundary of a catchment in a required format

Parameters:
  • catchment_id (str) – name/id of catchment

  • as_type (str) – ‘numpy’ or ‘geopandas’

Examples

>>> from water_datasets import USGS
>>> dataset = USGS()
>>> dataset.get_boundary(dataset.stations()[0])
stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

coords

Examples

>>> dataset = USGS()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('01010000')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['01010000', '01010070'])  # returns coordinates of two stations
class aqua_fetch.rr.WaterBenchIowa(path=None, **kwargs)[source]

Bases: Camels

Rainfall run-off dataset for Iowa (US) following the work of Demir et al., 2022 This is hourly dataset of 125 catchments with 7 static features and 3 dyanmic features (pcp, et, discharge) for each catchment. The dyanmic features are timeseries from 2011-10-01 12:00 to 2018-09-30 11:00.

Examples

>>> from water_datasets import WaterBenchIowa
>>> ds = WaterBenchIowa()
... # fetch static and dynamic features of 5 stations
>>> data = ds.fetch(5, as_dataframe=True)
>>> data.shape  # it is a multi-indexed DataFrame
(184032, 5)
... # fetch both static and dynamic features of 5 stations
>>> data = ds.fetch(5, static_features="all", as_dataframe=True)
>>> data.keys()
dict_keys(['dynamic', 'static'])
>>> data['static'].shape
(5, 7)
>>> data['dynamic']  # returns a xarray DataSet
... # using another method
>>> data = ds.fetch_dynamic_features('644', as_dataframe=True)
>>> data.unstack().shape
(61344, 3)
# when we get both static and dynamic data, the returned data is a dictionary
# with ``static`` and ``dyanic`` keys.
>>> data = ds.fetch(stations='644', static_features="all", as_dataframe=True)
>>> data['static'].shape, data['dynamic'].shape
>>> ((1, 7), (184032, 1))
__init__(path=None, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

fetch_static_features(stn_id: str | List[str], static_features: str | List[str] = 'all') DataFrame[source]
Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from water_datasets import WaterBenchIowa
>>> dataset = WaterBenchIowa()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    125
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (125, 7)
get static data of one station only
>>> static_data = dataset.fetch_static_features('592')
>>> static_data.shape
   (1, 7)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['slope', 'area'])
>>> static_data.shape
   (125, 2)
>>> data = dataset.fetch_static_features('592', static_features=['slope', 'area'])
>>> data.shape
   (1, 2)
fetch_station_attributes(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, as_ts: bool = False, st: str | None = None, en: str | None = None, **kwargs) DataFrame[source]

Examples

>>> from water_datasets import WaterBenchIowa
>>> dataset = WaterBenchIowa()
>>> data = dataset.fetch_station_attributes('666')

The following datasets are very much similar to RainfallRunoff datasets, but they do not have observed streamflow data. They are used to provide static and dynamic features to other datasets.

class aqua_fetch.GSHA(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]

Bases: Camels

Global streamflow characteristics, hydrometeorology and catchment attributes following Peirong et al., 2023. The data is downloaded from its zenodo repository. It should be noted that this dataset does not contain observed streamflow data. It has 21568 stations, 26 dynamic (meteorological + storage) features with daily timestep, 21 dynamic features (landcover + streamflow indices + reservoir) with yearly timestep and 35 static features.

Examples

>>> from water_datasets import GSHA
>>> dataset = GSHA()
>>> len(dataset.stations())
21568
>>> dataset.agencies
['arcticnet', 'AFD', 'GRDC', 'IWRIS', 'MLIT', 'HYDAT', 'ANA', 'BOM', 'CCRR', 'China', 'CHP', 'RID', 'USGS']
>>> dataset.start
Timestamp('1979-01-01 00:00:00')
>>> dataset.end
Timestamp('2022-12-31 00:00:00')
>>> dataset.static_features
['ele_mt_uav', 'slp_dg_uav', 'lat', 'long', 'area', 'agency', ...]
>>> len(dataset.dynamic_features)
26
>>> len(dataset.daily_dynamic_features)
26
>>> len(dataset.yearly_dynamic_features)
21
>>> dataset.fetch_static_features('1001_arcticnet')
fetch static features for all stations of arcticnet agency
>>> dataset.fetch_static_features(agency='arcticnet')
fetch static features for all stations of arcticnet agency
>>> ds.fetch_dynamic_features(agency='arcticnet')
__init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Parameters:

to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.

property agencies: List[str]

returns the names of agencies as list

  • arcticnet : Antarctica

  • AFD : Spain

  • GRDC : Global

  • IWRIS : India

  • MLIT : Japan

  • HYDAT : Canada

  • ANA: Brazil

  • BOM : Australia

  • CCRR : Chile

  • China

  • CHP : China

  • RID : Thailand

  • USGS

agency_of_stn(stn: str) str[source]

find the agency to which a station belongs

agency_stations(agency: str) List[str][source]

returns the station ids from a particular agency

area(stations: List[str] = 'all', agency: List[str] = 'all') Series[source]

area of catchments

atlas(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]

The link table between GSHA watershed IDs and RiverATLAS river reach IDs, as well as the selected static attributes

Returns:

a pandas DataFrame of shape (n, 24) where n is the number of stations

Return type:

pd.DataFrame

fetch_dynamic_features(stations: List[str] | str = 'all', dynamic_features='all', st=None, en=None, as_dataframe=False, agency: List[str] = 'all')[source]

Fetches all or selected dynamic features of one station.

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

  • st (Optional (default=None)) – start time from where to fetch the data.

  • en (Optional (default=None)) – end time untill where to fetch the data

  • as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is xarray dataset

Examples

>>> from water_datasets import GSHA
>>> camels = GSHA()
>>> camels.fetch_dynamic_features('1001_arcticnet', as_dataframe=True).unstack()
>>> camels.dynamic_features
>>> camels.fetch_dynamic_features('1001_arcticnet',
... features=['tmax_AWAP', 'vprp_AWAP', 'streamflow_mmd'],
... as_dataframe=True).unstack()
fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', agency: List[str] = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stations (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import GSHA
>>> dataset = GSHA()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    21568
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (21568, 35)
get static data of one station only
>>> static_data = dataset.fetch_static_features('1001_arcticnet')
>>> static_data.shape
   (1, 35)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['ele_mt_uav', 'slp_dg_uav'])
>>> static_data.shape
   (21568, 2)
>>> data = dataset.fetch_static_features('1001_arcticnet', static_features=['slp_dg_uav', 'slp_dg_uav'])
>>> data.shape
   (1, 2)
>>> out = ds.fetch_static_features(agency='arcticnet')
>>> out.shape
(106, 35
fetch_stn_dynamic_features(stn_id: str, dynamic_features='all') DataFrame[source]

Fetches all or selected dynamic features of one station.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

Returns:

a pandas dataframe of shape (n, features) where n is the number of days

Return type:

pd.DataFrame

Examples

>>> from water_datasets import GSHA
>>> camels = GSHA()
>>> camels.fetch_stn_dynamic_features('1001_arcticnet').unstack()
>>> camels.dynamic_features
>>> camels.fetch_stn_dynamic_features('1001_arcticnet',
... features=['tmax_AWAP', 'vprp_AWAP']).unstack()
lai(stations: List[str] = 'all', agency: List[str] = 'all')[source]

Leaf Area Index timeseries for one or more than one station either as xr.Dataset or pandas DataFrame. The data has daily timestep.

lai_stn(stn: str) Series[source]

Daily leaf area index. As per documentation, due to satellite data quality, some watersheds might have relatively serious data missing issue. The data is from 1981-01-01 to 2020-12-31.

Returns:

a pandas Series of shape (14571,) where 14571 is the number of days

Return type:

pd.Series

lc_variables(stations: List[str] = 'all', agency: List[str] = 'all')[source]

Landcover variables for one or more than one station either as xr.Dataset or dictionary. The data has yearly timestep.

lc_variables_stn(stn: str) DataFrame[source]

Landcover variables for a given station which have yearly timestep. Following three landcover variables are provided:

  • urban_fraction(%): Ratio of urban extent to the entire watershed area (percentage).

  • forest_fraction(%): Ratio of forest extent to the entire watershed area (percentage).

  • cropland_fraction(%): Ratio of cropland extent to the entire watershed area (percentage).

Returns:

a pandas DataFrame of shape (n, 3) where n is the number of years

Return type:

pd.DataFrame

meteo_vars(stations: List[str] = 'all', agency: List[str] = 'all')[source]

Meteorological variables from 1979-01-01 to 2022-12-31 for one or more than one station either as xr.Dataset or dictionary. The data has daily timestep.

meteo_vars_all_stns()[source]

Meteorological variables from 1979-01-01 to 2022-12-31 for all stations either as xr.Dataset or dictionary. The data has daily timestep.

meteo_vars_stn(stn: str) DataFrame[source]

Daily meteorological variables from 1979-01-01 to 2022-12-31 for a given station.

Returns:

a pandas DataFrame of shape (16071, 19) where n is the number of days

Return type:

pd.DataFrame

reservoir_variables(stations: List[str] = 'all', agency: List[str] = 'all')[source]

Reservoir variables for one or more than one station either as xr.Dataset or dictionary. The data has yearly timestep.

reservoir_variables_stn(stn: str) DataFrame[source]

Reservoir variables for a given station from 1979 to 2020 with yearly timestep. Following two reservoir variables are provided:

  • capacity: Reservoir capacity of the year in the watershed (m3). To avoid including too many missing values, we use the ICOLD capacity in the linked table of the GeoDAR dataset.

  • dor: Degree of regulation of the watershed (yearly reservoir capacity/yearly mean flow). If yearly mean flow is missing, the value is substituted with the average of all mean flow values.

Returns:

a pandas DataFrame of shape (42, 2) where 42 is the number of years

Return type:

pd.DataFrame

stations(agency: str = 'all') List[str][source]

returns names of stations as list

stn_coords(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]

returns the latitude and longitude of stations

Returns:

a pandas DataFrame of shape (n, 2) where n is the number of stations

Return type:

pd.DataFrame

Examples

>>> from water_datasets import GSHA
>>> dataset = GSHA()
>>> dataset.stn_coords('1001_arcticnet')
>>> dataset.stn_coords(['1001_arcticnet', '1002_arcticnet'])
get coordinates for all stations of arcticnet agency
>>> dataset.stn_coords(agency='arcticnet')
storage_vars(stations: List[str] = 'all', agency: List[str] = 'all')[source]

Water storage term variables from 1979-01-01 to 2021-12-31 for one or more than one station either as xr.Dataset or dictionary. The data has daily timestep.

storage_vars_all_stns()[source]

Water storage term variables from 1979-01-01 to 2021-12-31 for all stations either as xr.Dataset or dictionary. The data has daily timestep.

storage_vars_stn(stn: str) DataFrame[source]

Daily Water storage term variables from 1979-01-01 to 2021-12-31 for a given station.

  • SM_layer1: 0-7 cm soil moisture from ERA5 land soil water layer 1 (m3/m3) for 1979-2021.

  • SM_layer2: 7-28 cm soil moisture from ERA5 land soil water layer 2 (m3/m3) for 1979-2021.

  • SM_layer3: 28-100 cm soil moisture from ERA5 land soil water layer 3 (m3/m3) for 1979-2021.

  • SM_layer4: 100-289 cm soil moisture from ERA5 land soil water layer 4 (m3/m3) for 1979-2021.

  • SWDE: Snow water equivalent from ERA5 snow depth water equivalent (m of water equivalent) for 1979-2021.

  • groundwater(%): Groundwater percentage from GRACE-FO data assimilation (%) for 2003-2021 (weekly).

Returns:

a pandas DataFrame of shape (15706, 6) where n is the number of days

Return type:

pd.DataFrame

streamflow_indices(stations: List[str] = 'all', agency: List[str] = 'all')[source]

Landcover variables for one or more than one station either as xr.Dataset or dictionary. The data has yearly timestep.

streamflow_indices_stn(stn: str) DataFrame[source]

Streamflow indices for a given station which have yearly timestep.

Returns:

a pandas DataFrame of shape (n, 16) where n is the number of years

Return type:

pd.DataFrame

uncertainty(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]

Uncertainty estimates of all meteorological variables over all watersheds

  • P_uncertainty (%) Precipitation uncertainty estimates (in percentage). Uncertainties are calculated from EM-Earth deterministic and MSWEP datasets.

  • T_uncertainty (%) Temperature uncertainty estimates (in percentage). Uncertainties are calculated from EUSTACE, MERRA-2, and ERA5 datasets.

  • EVP_uncertainty (%) Actual evapotranspiration uncertainty estimates (in percentage). Uncertainties are calculated from GLEAM and REA datasets.

  • LRAD_uncertainty (%) Downward longwave radiation uncertainty estimates (in percentage). Uncertainties are calculated from MERRA-2 and ERA5-land datasets.

  • SRAD_uncertainty (%) Downward shortwave radiation uncertainty estimates (in percentage). Uncertainties are calculated from MERRA-2 and ERA5-land datasets.

  • wind_uncertainty (%) Wind speed uncertainty estimates (in percentage). The u- and v- components are aggregated on each grid to obtain wind speed. Uncertainties are calculated from MERRA-2 and ERA5-land datasets.

  • pet_uncertainty (%) Potential evapotranspiration uncertainty estimates (in percentage). Uncertainties are calculated from GLEAM and REA datasets.

Returns:

a pandas DataFrame of shape (n, 7) where n is the number of stations

Return type:

pd.DataFrame

class aqua_fetch.EStreams(path=None, **kwargs)[source]

Bases: Camels

Handles EStreams data following the work of Nascimento et al., 2024 . The data is available at its zenodo repository . It should be noted that this dataset does not contain observed streamflow data. It has 15047 stations, 9 dynamic (meteorological) features with daily timestep, 27 dynamic features with yearly timestep and 208 static features. The dynamic features are from 1950-01-01 to 2023-06-30.

__init__(path=None, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str)

  • verbosity (int) – 0: no message will be printed

  • kwargs (dict) – Any other keyword arguments for the Datasets class

area(stations: List[str] = 'all', countries: List[str] = 'all') Series[source]

area of catchments im km2

property countries: List[str]

returns the names of 39 countries covered by EStreams as list

country_of_stn(stn: str) str[source]

find the agency to which a station belongs

country_stations(country: str) List[str][source]

returns the station ids from a particular country

fetch_dynamic_features(stations: List[str] | str = 'all', dynamic_features='all', st=None, en=None, as_dataframe=False, countries: str | List[str] = 'all')[source]

Fetches all or selected dynamic features of one station.

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

  • st (Optional (default=None)) – start time from where to fetch the data.

  • en (Optional (default=None)) – end time untill where to fetch the data

  • as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is xarray dataset

Examples

>>> from water_datasets import EStreams
>>> camels = EStreams()
>>> camels.fetch_dynamic_features('IEEP0281', as_dataframe=True).unstack()
>>> camels.dynamic_features
>>> camels.fetch_dynamic_features('IEEP0281',
... features=['p_mean', 't_mean', 'pet_mean'],
... as_dataframe=True).unstack()
fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', countries: List[str] = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas dataframe of shape (stations, static_features)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import EStreams
>>> dataset = EStreams()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    15047
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (15047, 153)
get static data of one station only
>>> static_data = dataset.fetch_static_features('IEEP0281')
>>> static_data.shape
   (1, 153)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['slp_dg_mean', 'ele_mt_mean'])
>>> static_data.shape
   (15047, 2)
>>> data = dataset.fetch_static_features('IEEP0281', static_features=['slp_dg_mean', 'ele_mt_mean'])
>>> data.shape
   (1, 2)
>>> out = ds.fetch_static_features(countries='IE')
>>> out.shape
(464, 153
fetch_stn_dynamic_features(stn_id: str, dynamic_features='all') DataFrame[source]

Fetches all or selected dynamic features of one station.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

Returns:

a pandas dataframe of shape (n, features) where n is the number of days

Return type:

pd.DataFrame

Examples

>>> from water_datasets import EStreams
>>> camels = EStreams()
>>> camels.fetch_stn_dynamic_features('IEEP0281').unstack()
>>> camels.dynamic_features
>>> camels.fetch_stn_dynamic_features('IEEP0281',
... features=['p_mean', 't_mean', 'pet_mean']).unstack()
gauge_stations() DataFrame[source]

reads the file estreams_gauging_stations.csv as dataframe

hydro_clim_sigs(stations: List[str] = 'all', countries: List[str] = 'all') DataFrame[source]

Returns the hydro-climatic signatures of one or more stations

Returns:

a pandas dataframe of hydro-climatic signatures of shape (stations, 31)

Return type:

pd.DataFrame

meteo_data(stations: str | List[str] = 'all', countries: List[str] | str = 'all')[source]

Returns the meteorological data of one or more stations either as dictionary of dataframes or xarray Dataset

meteo_data_station(stn_id: str) DataFrame[source]

Returns the meteorological data of a station

Returns:

a pandas dataframe of meteorological data of shape (time, 9)

Return type:

pd.DataFrame

static_data() DataFrame[source]

Returns a dataframe with static attributes of catchments

stations() List[str][source]

Returns a list of all station names

stn_coords(stations: List[str] = 'all', countries: List[str] = 'all') DataFrame[source]

Returns the coordinates of one or more stations

Returns:

a pandas dataframe of shape (stations, 2)

Return type:

pd.DataFrame

Examples

>>> from water_datasets import EStreams
>>> dataset = EStreams()
>>> dataset.stn_coords('IEEP0281')
>>> dataset.stn_coords(['IEEP0281', 'IEEP0282'])
>>> dataset.stn_coords(countries='IE')