Rainfall Runoff datasets
This section include datasets which can be used for rainfall runoff modeling.
They all contain observed streamflow and meteological data as time series.
These are named as dynamic features. The physical catchment properties
are included as static features. Although each data source has a dedicated
class, however aqua_fetch.rr.RainfallRunoff class can be used to access all the datasets.
List of datasets
Source Name |
Class |
Number of Daily Stations |
Number of Hourly Stations |
Reference |
|---|---|---|---|---|
|
|
106 |
||
|
484 |
|||
|
735 |
|||
|
222, 561 |
|||
|
671 |
|||
|
897 |
|||
|
331 |
|||
|
516 |
|||
|
304 |
|||
|
1555 |
|||
|
654 |
|||
|
|
472 |
||
|
50 |
|||
|
671 |
|||
|
304 |
|||
|
111 |
|||
|
669 |
|||
|
5357 |
|||
|
561 |
|||
|
14425 |
|||
|
|
464 |
||
|
294 |
|||
|
|
751 |
||
|
859 |
859 |
||
|
111 |
|||
|
1287 |
|||
|
280 |
|||
|
1 |
|||
|
889 |
|||
|
24 |
|||
|
|
73 |
||
|
|
12004 |
||
|
125 |
High Level API
The high level API is provided by aqua_fetch.rr.RainfallRunoff
class to provide a unified and easy-to-use interface to access all the datasets.
The datasets are accessed by their names.
- class aqua_fetch.rr.RainfallRunoff(dataset: str, path: str | PathLike = None, overwrite: bool = False, to_netcdf: bool = True, processes: int = None, remove_zip: bool = True, verbosity: int = 1, **kwargs)[source]
Bases:
objectThis class provides access to all the rainfall-runoff datasets. For simiplity and resusability, use this class instead of using the individual dataset classes.
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') # instead of CAMELS_AUS, you can provide any other dataset name >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.columns = df.columns.get_level_values('dynamic_features') >>> df.shape (21184, 26) ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 222 ... # get data of 10 % of stations as dataframe >>> df = dataset.fetch(0.1, as_dataframe=True) >>> df.shape (550784, 22) ... # The returned dataframe is a multi-indexed data >>> df.index.names == ['time', 'dynamic_features'] True ... # get data by station id >>> df = dataset.fetch(stations='224214A', as_dataframe=True).unstack() >>> df.shape (21184, 26) ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> data = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['tmax_AWAP', 'precipitation_AWAP', 'et_morton_actual_SILO', 'streamflow_MLd']).unstack() >>> data.shape (21184, 4) ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape # remember this is a multiindexed dataframe (21184, 260) # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='224214A', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 166), (550784, 1)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (472, 2) >>> dataset.stn_coords('3001') # returns coordinates of station whose id is 3001 18.3861 80.3917 >>> dataset.stn_coords(['3001', '17021']) # returns coordinates of two stations
See sphx_glr_auto_examples_camels_australia.py for more comprehensive usage example.
- __init__(dataset: str, path: str | PathLike = None, overwrite: bool = False, to_netcdf: bool = True, processes: int = None, remove_zip: bool = True, verbosity: int = 1, **kwargs)[source]
Rainfall Runoff datasets
- Parameters:
dataset (str) –
dataset name. This must be one of the following:
ArcticnetBullCABraCCAMCAMELS_AUSCAMELS_BRCAMELS_CHCAMELS_CLCAMELS_DECAMELS_DK0CAMELS_DKCAMELS_FRCAMELS_GBCAMELS_INDCAMELS_SECAMELS_USEStreamsFinlandGRDCCaravanGSHAHYSETSHYPEIrelandItalyJapanLamaHCELamaHIcePolandPortugalRRLuleaSwedenSimbiSpainThailandUSGSWaterBenchIowa
path (str) – path to directory inside which data is located/downloaded. If provided and the path/dataset exists, then the data will be read from this path. If provided and the path/dataset does not exist, then the data will be downloaded at this path. If not provided, then the data will be downloaded in the default path which is
.../water-datasts/data/.overwrite (bool) – If the data is already downloaded then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarray.
verbosity (int) – 0: no message will be printed
kwargs – additional keyword arguments for the underlying dataset class For example
versionforwater_quality.rr.CAMELS_AUSortimestepforwater_quality.rr.LamaHCEdataset ormet_srcforCAMELS_BR
- area(stations: str | List[str] = 'all') Series[source]
Returns area (Km2) of all/selected catchments as pandas series
- Parameters:
stations (str/list (default=``all``)) – name/names of stations. Default is
all, which will return area of all stations- Returns:
a pandas series whose indices are catchment ids and values are areas of corresponding catchments.
- Return type:
pd.Series
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_CH') >>> dataset.area() # returns area of all stations >>> dataset.area('2004') # returns area of station whose id is 2004 >>> dataset.area(['2004', '6004']) # returns area of two stations
- property dynamic_features: List[str]
returns names of dynamic features as python list of strings
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.dynamic_features
- property end: str
returns end date of data
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.end()
- fetch(stations: str | List[str] | int | float = 'all', dynamic_features: List[str] | str | None = 'all', static_features: str | List[str] | None = None, st: None | str = None, en: None | str = None, as_dataframe: bool = False, **kwargs) dict | DataFrame[source]
Fetches the features of one or more stations.
- Parameters:
stations –
It can have following values:
int : number of (randomly selected) stations to fetch
float : fraction of (randomly selected) stations to fetch
str : name/id of station to fetch. However, if
allis provided, then all stations will be fetched.list : list of names/ids of stations to fetch
dynamic_features ((default=``all``)) –
It can have following values:
str : name of dynamic feature to fetch. If
allis provided, then all dynamic features will be fetched.list : list of dynamic features to fetch.
None : No dynamic feature will be fetched.
static_features ((default=None)) –
It can have following values:
str : name of static feature to fetch. If
allis provided, then all static features will be fetched.list : list of static features to fetch.
None : No static feature will be fetched.
st – starting date of data to be returned. If None, the data will be returned from where it is available.
en – end date of data to be returned. If None, then the data will be returned till the date data is available.
as_dataframe – whether to return dynamic attributes as pandas dataframe or as xarray dataset.
kwargs – keyword arguments
- Returns:
If both static and dynamic features are obtained then it returns a
dictionary whose keys are station/gauge_ids and values are the
attributes and dataframes.
Otherwise either dynamic or static features are returned.
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> # get data of 10% of stations >>> df = dataset.fetch(stations=0.1, as_dataframe=True) # returns a multiindex dataframe ... # fetch data of 5 (randomly selected) stations >>> five_random_stn_data = dataset.fetch(stations=5, as_dataframe=True) ... # fetch data of 3 selected stations >>> three_selec_stn_data = dataset.fetch(stations=['912101A','912105A','915011A'], as_dataframe=True) ... # fetch data of a single stations >>> single_stn_data = dataset.fetch(stations='318076', as_dataframe=True) ... # get both static and dynamic features as dictionary >>> data = dataset.fetch(1, static_features="all", as_dataframe=True) # -> dict >>> data['dynamic'] ... # get only selected dynamic features >>> sel_dyn_features = dataset.fetch(stations='318076', ... dynamic_features=['streamflow_MLd', 'solarrad_AWAP'], as_dataframe=True) ... # fetch data between selected periods >>> data = dataset.fetch(stations='318076', st="20010101", en="20101231", as_dataframe=True)
- fetch_dynamic_features(stn_id: str, dynamic_features='all', st=None, en=None, as_dataframe=False)[source]
Fetches all or selected dynamic attributes of one station.
- Parameters:
stn_id (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
st (Optional (default=None)) – start time from where to fetch the data.
en (Optional (default=None)) – end time untill where to fetch the data
as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is xarray dataset
Examples
>>> from water_datasets import RainfallRunoff >>> camels = RainfallRunoff('CAMELS_AUS') >>> camels.fetch_dynamic_features('224214A', as_dataframe=True).unstack() >>> camels.dynamic_features >>> camels.fetch_dynamic_features('224214A', ... features=['tmax_AWAP', 'vprp_AWAP', 'streamflow_mmd'], ... as_dataframe=True).unstack()
- fetch_static_features(stations: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Fetches all or selected static attributes of one or more stations.
- Parameters:
stations (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import RainfallRunoff >>> camels = RainfallRunoff('CAMELS_AUS') >>> camels.fetch_static_features('224214A') >>> camels.static_features >>> camels.fetch_static_features('224214A', ... features=['elev_mean', 'relief', 'ksat', 'pop_mean'])
- fetch_station_features(stn_id: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, as_ts: bool = False, st: str | None = None, en: str | None = None, **kwargs) DataFrame[source]
Fetches features for one station.
- Parameters:
station – station id/gauge id for which the data is to be fetched.
dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch
static_features – names of static features/attributes to be fetches
as_ts (bool) – whether static features are to be converted into a time series or not. If yes then the returned time series will be of same length as that of dynamic attribtues.
st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.
en (str, optional) – end point of data to be fetched. By default the dat will be fetched
- Returns:
dataframe if as_ts is True else it returns a dictionary of static and dynamic features for a station/gauge_id
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.fetch_station_features('912101A')
- fetch_stations_features(stations: str | List[str], dynamic_features: str | List[str] | None = 'all', static_features: str | List[str] | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]
Reads attributes of more than one stations.
- Parameters:
stations – list of stations for which data is to be fetched.
dynamic_features –
- list of dynamic features to be fetched.
if ‘all’, then all dynamic features will be fetched.
static_features – list of static features to be fetched. If all, then all static features will be fetched. If None, then no static attribute will be fetched.
st – start of data to be fetched.
en – end of data to be fetched.
as_dataframe (whether to return the data as pandas dataframe. default) – is xr.Dataset object
dict (kwargs) – additional keyword arguments
- Returns:
Dynamic and static features of one or multiple stations. Dynamic features are by default returned as xr.Dataset unless as_dataframe is True or xarray is not installed, in such a case, it is a pandas dataframe with multiindex. If xr.Dataset, it consists of data_vars equal to number of stations and for each station, the DataArray is of dimensions (time, dynamic_features). where time is defined by st and en i.e. length of DataArray. In case, when the returned object is pandas DataFrame, the first index is time and second index is dyanamic_features. Static attributes are always returned as pandas DataFrame and have following shape (stations, static_features). If `dynamic_features is None, then they are not returned and the returned value only consists of static features. Same holds true for static_features. If both are not None, then the returned type is a dictionary with static and dynamic keys.
- Return type:
pd.DataFrame or xr.Dataset or dict
- Raises:
ValueError, if both dynamic_features and static_features are None –
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') ... # find out station ids >>> dataset.stations() ... # get data of selected stations >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'], ... as_dataframe=True)
- get_boundary(stn_id: str, as_type: str = 'numpy')[source]
returns boundary of a catchment in a required format
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_SE') >>> dataset.get_boundary(dataset.stations()[0])
- property path: str
returns path where the data is stored. The default path is ~../water_quality/data
- plot_catchment(stn_id: str, ax: Axes = None, show: bool = True, **kwargs) Axes[source]
plots catchment boundaries
- Parameters:
ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.
show (bool)
**kwargs
- Return type:
plt.Axes
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.plot_catchment() >>> dataset.plot_catchment(marker='o', ms=0.3) >>> ax = dataset.plot_catchment(marker='o', ms=0.3, show=False) >>> ax.set_title("Catchment Boundaries") >>> plt.show()
- plot_stations(stations: List[str] = 'all', marker='.', ax: Axes = None, show: bool = True, **kwargs) Axes[source]
plots coordinates of stations
- Parameters:
stations – name/names of stations. If not given, all stations will be plotted
marker – marker to use.
ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.
show (bool)
**kwargs
- Return type:
plt.Axes
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.plot_stations() >>> dataset.plot_stations(['1', '2', '3']) >>> dataset.plot_stations(marker='o', ms=0.3) >>> ax = dataset.plot_stations(marker='o', ms=0.3, show=False) >>> ax.set_title("Stations") >>> plt.show()
- q_mmd(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per day. This is obtained by diving
q/area- Parameters:
stations (str/list) – name/names of stations. Default is
all, which will return area of all stations- Returns:
a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.
- Return type:
pd.DataFrame
- property start: str
returns starting date of data
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.start()
- property static_features: List[str]
returns names of static features as python list of strings
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.static_features
- stations() List[str][source]
returns names of all stations
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.stations()
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas DataFrame with
longandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
coords
Examples
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_CH') >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('2004') # returns coordinates of station whose id is 2004 >>> dataset.stn_coords(['2004', '6004']) # returns coordinates of two stations
>>> from water_datasets import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('912101A') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['G0050115', '912101A']) # returns coordinates of two stations
Low Level API
The low level API provides access to each individual dataset classes. This provides more control over the datasets.
- class aqua_fetch.rr.Camels(path: str = None, timestep: str = 'D', id_idx_in_bndry_shape: int = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
DatasetsThis is the parent class for invidual rainfall-runoff datasets like CAMELS-GB etc. This class is not meant to be for direct use. It is inherited by the child classes which are specific to a dataset like CAMELS-GB, CAMELS-AUS etc. This class first downloads the CAMELS dataset if it is not already downloaded. Then the selected features for a selected id are fetched and provided to the user using the method fetch.
- - path str/path
- Type:
diretory of the dataset
- - dynamic_features list
this dataset
- Type:
tells which dynamic features are available in
- - static_features list
- Type:
a list of static features.
- - static_attribute_categories list
are present in this category.
- Type:
tells which kinds of static features
- - stations : returns name/id of stations for which the data (dynamic features)
exists as list of strings.
- - fetch : fetches all features (both static and dynamic type) of all
station/gauge_ids or a speficified station. It can also be used to fetch all features of a number of stations ids either by providing their guage_id or by just saying that we need data of 20 stations which will then be chosen randomly.
- - fetch_dynamic_features :
fetches speficied dynamic features of one specified station. If the dynamic attribute is not specified, all dynamic features will be fetched for the specified station. If station is not specified, the specified dynamic features will be fetched for all stations.
- - fetch_static_features :
works same as fetch_dynamic_features but for static features. Here if the category is not specified then static features of the specified station for all categories are returned.
stations : returns list of stations
- __init__(path: str = None, timestep: str = 'D', id_idx_in_bndry_shape: int = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- area(stations: str | List[str] = 'all') Series[source]
Returns area (Km2) of all/selected catchments as pandas series
- Parameters:
stations (str/list (default=None)) – name/names of stations. Default is
all, which will return area of all stations- Returns:
a pandas series whose indices are catchment ids and values are areas of corresponding catchments.
- Return type:
pd.Series
Examples
>>> from water_datasets import CAMELS_CH >>> dataset = CAMELS_CH() >>> dataset.area() # returns area of all stations >>> dataset.area('2004') # returns area of station whose id is 2004 >>> dataset.area(['2004', '6004']) # returns area of two stations
- property camels_dir
Directory where all camels datasets will be saved. This will under datasets directory
- fetch(stations: str | list | int | float = 'all', dynamic_features: List[str] | str | None = 'all', static_features: str | List[str] | None = None, st: None | str = None, en: None | str = None, as_dataframe: bool = False, **kwargs) dict | DataFrame[source]
Fetches the features of one or more stations.
- Parameters:
stations –
- It can have following values:
int : number of (randomly selected) stations to fetch
float : fraction of (randomly selected) stations to fetch
- strname/id of station to fetch. However, if
allis provided, then all stations will be fetched.
- strname/id of station to fetch. However, if
list : list of names/ids of stations to fetch
dynamic_features – If not None, then it is the features to be fetched. If None, then all available features are fetched
static_features – list of static features to be fetches. None means no static attribute will be fetched.
st – starting date of data to be returned. If None, the data will be returned from where it is available.
en – end date of data to be returned. If None, then the data will be returned till the date data is available.
as_dataframe – whether to return dynamic features as pandas dataframe or as xarray dataset.
kwargs – keyword arguments to read the files
- Returns:
If both static and dynamic features are obtained then it returns a dictionary whose keys are station/gauge_ids and values are the features and dataframes. Otherwise either dynamic or static features are returned.
Examples
>>> from water_datasets import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> # get data of 10% of stations >>> df = dataset.fetch(stations=0.1, as_dataframe=True) # returns a multiindex dataframe ... # fetch data of 5 (randomly selected) stations >>> five_random_stn_data = dataset.fetch(stations=5, as_dataframe=True) ... # fetch data of 3 selected stations >>> three_selec_stn_data = dataset.fetch(stations=['912101A','912105A','915011A'], as_dataframe=True) ... # fetch data of a single stations >>> single_stn_data = dataset.fetch(stations='318076', as_dataframe=True) ... # get both static and dynamic features as dictionary >>> data = dataset.fetch(1, static_features="all", as_dataframe=True) # -> dict >>> data['dynamic'] ... # get only selected dynamic features >>> sel_dyn_features = dataset.fetch(stations='318076', ... dynamic_features=['streamflow_MLd', 'solarrad_AWAP'], as_dataframe=True) ... # fetch data between selected periods >>> data = dataset.fetch(stations='318076', st="20010101", en="20101231", as_dataframe=True)
- fetch_dynamic_features(stn_id: str, dynamic_features='all', st=None, en=None, as_dataframe=False)[source]
Fetches all or selected dynamic features of one station.
- Parameters:
stn_id (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
st (Optional (default=None)) – start time from where to fetch the data.
en (Optional (default=None)) – end time untill where to fetch the data
as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is xarray dataset
Examples
>>> from water_datasets import CAMELS_AUS >>> camels = CAMELS_AUS() >>> camels.fetch_dynamic_features('224214A', as_dataframe=True).unstack() >>> camels.dynamic_features >>> camels.fetch_dynamic_features('224214A', ... features=['tmax_AWAP', 'vprp_AWAP', 'streamflow_mmd'], ... as_dataframe=True).unstack()
- fetch_static_features(stn_id: str | list = None, static_features: str | list = None) DataFrame[source]
Fetches all or selected static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import CAMELS_AUS >>> camels = CAMELS_AUS() >>> camels.fetch_static_features('224214A') >>> camels.static_features >>> camels.fetch_static_features('224214A', ... static_features=['elev_mean', 'relief', 'ksat', 'pop_mean'])
- fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, as_ts: bool = False, st: str | None = None, en: str | None = None, **kwargs) DataFrame[source]
Fetches features for one station.
- Parameters:
station – station id/gauge id for which the data is to be fetched.
dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch
static_features – names of static features/attributes to be fetches
as_ts (bool) – whether static features are to be converted into a time series or not. If yes then the returned time series will be of same length as that of dynamic attribtues.
st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.
en (str, optional) – end point of data to be fetched. By default the dat will be fetched
- Returns:
dataframe if as_ts is True else it returns a dictionary of static and dynamic features for a station/gauge_id
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> dataset.fetch_station_features('912101A')
- fetch_stations_features(stations: str | List[str], dynamic_features: str | List[str] = 'all', static_features: str | List[str] = None, st: str | Timestamp = None, en: str | Timestamp = None, as_dataframe: bool = False, **kwargs)[source]
Reads features of more than one stations.
- Parameters:
stations – list of stations for which data is to be fetched.
dynamic_features – list of dynamic features to be fetched. if
all, then all dynamic features will be fetched.static_features (list of static features to be fetched.) – If
all, then all static features will be fetched. If None, `then no static attribute will be fetched.st – start of data to be fetched.
en – end of data to be fetched.
as_dataframe – whether to return the dynamic data as pandas dataframe. default is xr.Dataset object
dict (kwargs) – additional keyword arguments
- Returns:
Dynamic and static features of multiple stations. Dynamic features
are by default returned as xr.Dataset unless as_dataframe is True, in
such a case, it is a pandas dataframe with multiindex. If xr.Dataset,
it consists of data_vars equal to number of stations and for each
station, the DataArray is of dimensions (time, dynamic_features).
where time is defined by st and en i.e. length of DataArray.
In case, when the returned object is pandas DataFrame, the first index
is time and second index is dyanamic_features. Static features
are always returned as pandas DataFrame and have shape
(stations, static_features). If dynamic_features is None,
then they are not returned and the returned value only consists of
static features. Same holds true for static_features.
If both are not None, then the returned type is a dictionary with
static and dynamic keys.
- Raises:
ValueError, if both dynamic_features and static_features are None –
Examples
>>> from water_datasets import CAMELS_AUS >>> dataset = CAMELS_AUS() ... # find out station ids >>> dataset.stations() ... # get data of selected stations as xarray Dataset >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A']) ... # get data of selected stations as pandas DataFrame >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'], ... as_dataframe=True) ... # get both dynamic and static features of selected stations >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'], ... dynamic_features=['streamflow_mmd', 'tmax_AWAP'], static_features=['elev_mean'])
- get_boundary(catchment_id: str, as_type: str = 'numpy')[source]
returns boundary of a catchment in a required format
Examples
>>> from water_datasets import CAMELS_SE >>> dataset = CAMELS_SE() >>> dataset.get_boundary(dataset.stations()[0])
- plot_catchment(catchment_id: str, ax: Axes = None, show: bool = True, **kwargs) Axes[source]
plots catchment boundaries
- Parameters:
ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.
show (bool)
**kwargs
- Return type:
plt.Axes
Examples
>>> from water_datasets import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> dataset.plot_catchment() >>> dataset.plot_catchment(marker='o', ms=0.3) >>> ax = dataset.plot_catchment(marker='o', ms=0.3, show=False) >>> ax.set_title("Catchment Boundaries") >>> plt.show()
- plot_stations(stations: List[str] = 'all', marker='.', ax: Axes = None, show: bool = True, **kwargs) Axes[source]
plots coordinates of stations
- Parameters:
stations – name/names of stations. If not given, all stations will be plotted
marker – marker to use.
ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.
show (bool)
**kwargs
- Return type:
plt.Axes
Examples
>>> from water_datasets import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> dataset.plot_stations() >>> dataset.plot_stations(['1', '2', '3']) >>> dataset.plot_stations(marker='o', ms=0.3) >>> ax = dataset.plot_stations(marker='o', ms=0.3, show=False) >>> ax.set_title("Stations") >>> plt.show()
- q_mmd(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per day. This is obtained by diving
q/area- Parameters:
stations (str/list) – name/names of stations. Default is
all, which will return area of all stations- Returns:
a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.
- Return type:
pd.DataFrame
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas DataFrame with
longandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
coords
Examples
>>> from water_datasets import CAMELS_CH >>> dataset = CAMELS_CH() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('2004') # returns coordinates of station whose id is 2004 >>> dataset.stn_coords(['2004', '6004']) # returns coordinates of two stations
>>> from water_datasets import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('912101A') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['G0050115', '912101A']) # returns coordinates of two stations
- class aqua_fetch.rr._gsha._GSHA(gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
CamelsParent class for those datasets which uses static and dynamic features from GSHA dataset . The following dataset classes are based on this class:
py:class:water_datasets.Japan
py:class:water_datasets.Thailand
py:class:water_datasets.Spain
- __init__(gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', st=None, en=None, as_ts=False) DataFrame[source]
returns static atttributes of one or multiple stations
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
st
en
as_ts
Examples
>>> from water_datasets import Japan >>> dataset = Japan() get the names of stations >>> stns = dataset.stations() >>> len(stns) 12004 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (12004, 27) get static data of one station only >>> static_data = dataset.fetch_static_features('01010070') >>> static_data.shape (1, 27) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['Drainage_Area_km2', 'Elevation_m']) >>> static_data.shape (12004, 2)
- fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]
returns features of multiple stations
Examples
>>> from water_datasets import Arcticnet >>> dataset = Arcticnet() >>> stations = dataset.stations() >>> features = dataset.fetch_stations_features(stations)
- class aqua_fetch.Arcticnet(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_GSHAData of 106 catchments of arctic region from r-arcticnet project . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2003-12-31.
- __init__(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- class aqua_fetch.Bull(path, overwrite=False, **kwargs)[source]
Bases:
CamelsFollowing the works of Aparicio et al., 2024. The data is taken from the Zenodo repository. This dataset contains 484 stations with 55 dynamic (time series) features and 214 static features. The dynamic features span from 1951 to 2021.
Examples
>>> from water_datasets import Bull >>> dataset = Bull() >>> data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (1426260, 48) # 40 represents number of stations Since data is a multi-index dataframe, we can get data of one station as below >>> data['BULL_9007'].unstack().shape # the name of station could be different (25932, 13) If we don't set as_dataframe=True, then the returned data will be a xarray Dataset >>> data = dataset.fetch(0.1) >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 25932, 'dynamic_features': 55}) >>> len(data.data_vars) 48 >>> df = dataset.fetch(stations=1, as_dataframe=True) # get data of only one random station >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (25932, 55) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 484 # get data by station id >>> df = dataset.fetch(stations='BULL_9007', as_dataframe=True).unstack() >>> df.shape (25932, 55) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['potential_evapotranspiration_AEMET', 'temperature_mean_AEMET', ... 'total_precipitation_ERA5_Land', 'obs_q_cms']).unstack() >>> df.shape (25932, 4) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape (166166, 10) # remember this is multi-indexed DataFrame # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='BULL_9007', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 214), (1426260, 1)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (484, 2) >>> dataset.stn_coords('BULL_9007') # returns coordinates of station whose id is GRDC_3664802 41.298 -1.967 >>> dataset.stn_coords(['BULL_9007', 'BULL_8083']) # returns coordinates of two stations
- __init__(path, overwrite=False, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import Bull >>> dataset = Bull() get the names of stations >>> stns = dataset.stations() >>> len(stns) 484 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (484, 214) get static data of one station only >>> static_data = dataset.fetch_static_features('42600042') >>> static_data.shape (1, 214) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['seasonality', 'moisture_index']) >>> static_data.shape (484, 2) >>> data = dataset.fetch_static_features('42600042', static_features=['seasonality', 'moisture_index']) >>> data.shape (1, 2)
- class aqua_fetch.rr.CABra(path=None, overwrite=False, to_netcdf: bool = True, met_src: str = 'ens', **kwargs)[source]
Bases:
CamelsReads and fetches CABra dataset which is catchment attribute dataset following the work of Almagro et al., 2021 This dataset consists of 97 static and 12 dynamic features of 735 Brazilian catchments. The temporal extent is from 1980 to 2020. The dyanmic features consist of daily hydro-meteorological time series
Examples
>>> from water_datasets import CABra >>> dataset = CABra() >>> data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (131472, 73) # 73 represents number of stations >>> data.index.names == ['time', 'dynamic_features'] True >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (10956, 12) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 735 # get data by station id >>> df = dataset.fetch(stations='92', as_dataframe=True).unstack() >>> df.shape (10956, 12) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['p_ens', 'tmax_ens', 'pet_pm', 'rh_ens', 'Streamflow']).unstack() >>> df.shape (10956, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape (131472, 10) # remember this is multi-indexed DataFrame # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='92', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 97), (131472, 1))
- __init__(path=None, overwrite=False, to_netcdf: bool = True, met_src: str = 'ens', **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.
met_src (str) – source of meteorological data, must be one of
ens,era5orref.
- fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import CABra >>> dataset = CABra() get the names of stations >>> stns = dataset.stations() >>> len(stns) 735 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (735, 97) get static data of one station only >>> static_data = dataset.fetch_static_features('92') >>> static_data.shape (1, 97) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['gauge_lat', 'area']) >>> static_data.shape (735, 2) >>> data = dataset.fetch_static_features('92', static_features=['gauge_lat', 'area']) >>> data.shape (1, 2)
- q_mmd(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per day. It is obtained by dividing
Streamflowtime series by area- Parameters:
stations (str/list) – name/names of stations. Default is
all, which will return area of all stations- Returns:
a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.
- Return type:
pd.DataFrame
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas DataFrame with
longandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
coords
Examples
>>> dataset = CABra() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('92') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['92', '142']) # returns coordinates of two stations
- class aqua_fetch.rr.CAMELS_AUS(path: str = None, version: int = 2, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
CamelsThis is a dataset of 561 Australian catchments with 187 static features and 26 dyanmic features for each catchment. The dyanmic features are timeseries from 1950-01-01 to 2022-03-31. This class Reads CAMELS-AUS dataset of Fowler et al., 2024 .
If
versionis 1 then this class reads data following Fowler et al., 2021 which is a dataset of 222 Australian catchments with 161 static features and 26 dyanmic features for each catchment. The dyanmic features are timeseries from 1957-01-01 to 2018-12-31.Examples
>>> from water_datasets import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (21184, 26) ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 222 ... # get data of 10 % of stations as dataframe >>> df = dataset.fetch(0.1, as_dataframe=True) >>> df.shape (550784, 22) ... # The returned dataframe is a multi-indexed data >>> df.index.names == ['time', 'dynamic_features'] True ... # get data by station id >>> df = dataset.fetch(stations='224214A', as_dataframe=True).unstack() >>> df.shape (21184, 26) ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> data = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['tmax_AWAP', 'precipitation_AWAP', 'et_morton_actual_SILO', 'streamflow_MLd']).unstack() >>> data.shape (21184, 4) ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape # remember this is a multiindexed dataframe (21184, 260) # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='224214A', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape >>> ((1, 166), (550784, 1))
- __init__(path: str = None, version: int = 2, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path – path where the CAMELS_AUS dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.
version – version of the dataset to download. Allowed values are 1 and 2.
to_netcdf
- fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]
Fetches static features of one or more stations as dataframe.
- Parameters:
stn_id (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from water_datasets import CAMELS_AUS >>> dataset = CAMELS_AUS() get the names of stations >>> stns = dataset.stations() >>> len(stns) 222 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (222, 161) get static data of one station only >>> static_data = dataset.fetch_static_features('305202') >>> static_data.shape (1, 161) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['catchment_di', 'elev_mean']) >>> static_data.shape (222, 2)
- q_mmd(stations: str | List[str] = None) DataFrame[source]
returns streamflow in the units of milimeter per day. This is obtained by diving q_cms/area
- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.
- Return type:
pd.DataFrame
- class aqua_fetch.rr.CAMELS_BR(path=None, verbosity: int = 1, **kwargs)[source]
Bases:
CamelsThis is a dataset of 897 Brazilian catchments with 67 static features and 10 dyanmic features for each catchment. The dyanmic features are timeseries from 1920-01-01 to 2019-02-28. This class downloads and processes CAMELS dataset of Brazil as provided by VP Changas et al., 2020 . The simulated streamflow of 593 and raw streamflow of 3679 stations shipped with this data is not included in dynamic features. Both can be fetched through fetch_simulated_streamflow and fetch_raw_streamflow methods.
Examples
>>> from water_datasets import CAMELS_BR >>> dataset = CAMELS_BR() >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (14245, 12) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 593 # we can get data of 10% catchments as below >>> data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (170940, 59) # the data is multi-index with ``time`` and ``dynamic_features`` as indices >>> data.index.names == ['time', 'dynamic_features'] True # get data by station id >>> df = dataset.fetch(stations='46035000', as_dataframe=True).unstack() >>> df.shape (14245, 12) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['precipitation_cpc', 'evapotransp_mgb', 'temperature_mean', 'streamflow_m3s']).unstack() >>> df.shape (14245, 4) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape (170940, 10) # remember this is multi-indexed DataFrame # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='46035000', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 67), (170940, 1))
- __init__(path=None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.
- all_stations(feature: str) List[str][source]
Tells all station ids for which a data of a specific attribute is available.
- area(stations: str | List[str] = 'all', source: str = 'gsim') Series[source]
Returns area (Km2) of all catchments as pandas series
- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
source (str) – source of area calculation. It should be either
gsimorana
- Returns:
a pandas series whose indices are catchment ids and values are areas of corresponding catchments.
- Return type:
pd.Series
Examples
>>> from water_datasets import CAMELS_BR >>> dataset = CAMELS_BR() >>> dataset.area() # returns area of all stations >>> dataset.stn_coords('65100000') # returns area of station whose id is 912101A >>> dataset.stn_coords(['65100000', '64075000']) # returns area of two stations
- fetch_raw_streamflow(station_id: str = None) DataFrame[source]
returns raw streamflow data for one or more stations.
Example
>>> dataset = CAMELS_BR() >>> data = dataset.fetch_raw_streamflow('10500000') ... # fetch all time series data associated with a station. >>> x = dataset.fetch_raw_streamflow(dataset.all_stations())
- fetch_simulated_streamflow(station_id: str = None) DataFrame[source]
returns raw streamflow data for one or more stations.
Example
>>> dataset = CAMELS_BR() >>> data = dataset.fetch_simulated_streamflow('10500000') ... # fetch all time series data associated with a station. >>> x = dataset.fetch_simulated_streamflow(dataset.all_stations())
- fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]
fetches static feature/features of one or mroe stations
- Parameters:
stn_id (int/list) – station id whose attribute to fetch.
static_features (str/list) – name of attribute to fetch. Default is None, which will return all the attributes for a particular station of the specified category.
Example
>>> dataset = Camels() >>> df = dataset.fetch_static_features('11500000', 'climate') # read all static features of all stations >>> data = dataset.fetch_static_features(dataset.stations(), dataset.static_features) >>> data.shape (597, 67)
- q_mmd(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per day. he name of original timeseries is
streamflow_mm.- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.
- Return type:
pd.DataFrame
- stations() List[str][source]
Returns a list of station ids.
Example
>>> dataset = CAMELS_BR() >>> stations = dataset.stations()
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas DataFrame with
longandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
coords
Examples
>>> dataset = CAMELS_BR() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('65100000') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['65100000', '64075000']) # returns coordinates of two stations
- class aqua_fetch.rr.CAMELS_CH(path=None, overwrite: bool = False, to_netcdf: bool = True, timestep: str = 'D', **kwargs)[source]
Bases:
CamelsData of 331 Swiss catchments from Hoege et al., 2023 . The dataset consists of 209 static catchment features and 9 dynamic features. The dynamic features span from 19810101 to 20201231 with daily timestep. For daily (
D)timestep, only streamflow is available for 170 swiss catchments. The hourly (H) streamflow data is obtained from Kauzlaric et al., 2023 .Examples
>>> from water_datasets import CAMELS_CH >>> dataset = CAMELS_CH() >>> data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (128560, 10) >>> data.index.names == ['time', 'dynamic_features'] True >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (8036, 9) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 331 # get data by station id >>> df = dataset.fetch(stations='2004', as_dataframe=True).unstack() >>> df.shape (8036, 9) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> df = dataset.fetch(1, as_dataframe=True, dynamic_features=['precipitation(mm/d)', 'temperature_mean(°C)', 'discharge_vol(m3/s)']).unstack() >>> df.shape (8036, 3) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape (72324, 10) # remember this is multi-indexed DataFrame # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='2004', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 209), (72324, 1))
- __init__(path=None, overwrite: bool = False, to_netcdf: bool = True, timestep: str = 'D', **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc. but will require netcdf5 package as well as xarry.
- fetch_static_features(stn_id: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import CAMELS_CH >>> dataset = CAMELS_CH() get the names of stations >>> stns = dataset.stations() >>> len(stns) 331 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (331, 209) get static data of one station only >>> static_data = dataset.fetch_static_features('2004') >>> static_data.shape (1, 209) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['gauge_lon', 'gauge_lat', 'area']) >>> static_data.shape (331, 3) >>> data = dataset.fetch_static_features('2004', static_features=['gauge_lon', 'gauge_lat', 'area']) >>> data.shape (1, 3)
- glacier_attrs() DataFrame[source]
- returns a dataframe with four columns
‘glac_area’
‘glac_vol’
‘glac_mass’
‘glac_area_neighbours’
- class aqua_fetch.rr.CAMELS_CL(path: str = None, **kwargs)[source]
Bases:
CamelsThis is a dataset of 516 Chilean catchments with 104 static features and 12 dyanmic features for each catchment. The dyanmic features are timeseries from 1913-02-15 to 2018-03-09. This class downloads and processes CAMELS dataset of Chile following the work of Alvarez-Garreton et al., 2018 .
Examples
>>> from water_datasets import CAMELS_CL >>> dataset = CAMELS_CL() >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (38374, 12) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 516 # we can get data of 10% catchments as below >>> data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (460488, 51) # the data is multi-index with ``time`` and ``dynamic_features`` as indices >>> df.index.names == ['time', 'dynamic_features'] True # get data by station id >>> df = dataset.fetch(stations='8350001', as_dataframe=True).unstack() >>> df.shape (38374, 12) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pet_hargreaves', 'precip_tmpa', 'tmean_cr2met', 'streamflow_m3s']).unstack() >>> df.shape (38374, 4) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape (460488, 10) # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='8350001', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape >>> ((1, 104), (460488, 1))
- __init__(path: str = None, **kwargs)[source]
- Parameters:
path – path where the CAMELS-CL dataset has been downloaded. This path must contain five zip files and one xlsx file.
- fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all')[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from water_datasets import CAMELS_CL >>> dataset = CAMELS_CL() get the names of stations >>> stns = dataset.stations() >>> len(stns) 516 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (516, 104) get static data of one station only >>> static_data = dataset.fetch_static_features('11315001') >>> static_data.shape (1, 104) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['slope_mean', 'area']) >>> static_data.shape (516, 2) >>> data = dataset.fetch_static_features('2110002', static_features=['slope_mean', 'area']) >>> data.shape (1, 2)
- stations() list[source]
Tells all station ids for which a data of a specific attribute is available.
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas DataFrame with
longandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
coords
Examples
>>> dataset = CAMELS_CL() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('12872001') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['12872001', '12876004']) # returns coordinates of two stations
- class aqua_fetch.rr.CAMELS_GB(path=None, **kwargs)[source]
Bases:
CamelsThis is a dataset of 671 catchments with 145 static features and 10 dyanmic features for each catchment following the work of Coxon et al., 2020. The dyanmic features are timeseries from 1970-10-01 to 2015-09-30. The data is downloaded from ceh website
Examples
>>> from water_datasets import CAMELS_GB >>> dataset = CAMELS_GB() >>> data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (164360, 67) >>> data.index.names == ['time', 'dynamic_features'] True >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (16436, 10) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 671 # get data by station id >>> df = dataset.fetch(stations='97002', as_dataframe=True).unstack() >>> df.shape (16436, 10) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['windspeed', 'temperature', 'pet', 'precipitation', 'discharge_vol']).unstack() >>> df.shape (16436, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape (164360, 10) # remember this is multi-indexed DataFrame # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='97002', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 290), (164360, 1))
- __init__(path=None, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.
- fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]
Fetches static features of one or more stations for one or more category as dataframe.
- Parameters:
stn_id (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from water_datasets import CAMELS_GB >>> dataset = CAMELS_GB(path="path/to/CAMELS_GB") get the names of stations >>> stns = dataset.stations() >>> len(stns) 671 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (671, 145) get static data of one station only >>> static_data = dataset.fetch_static_features('85004') >>> static_data.shape (1, 145) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['area', 'elev_mean']) >>> static_data.shape (671, 2)
- class aqua_fetch.rr.CAMELS_US(data_source: str = 'basin_mean_daymet', path=None, **kwargs)[source]
Bases:
CamelsThis is a dataset of 671 US catchments with 59 static features and 8 dyanmic features for each catchment. The dyanmic features are timeseries from 1980-01-01 to 2014-12-31. This class downloads and processes CAMELS dataset of 671 catchments named as CAMELS from ucar.edu following Newman et al., 2015 , Newman et al., 2022 and Addor et al., 2017.
Examples
>>> from water_datasets import CAMELS_US >>> dataset = CAMELS_US() >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (12784, 8) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 671 # we can get data of 10% catchments as below >>> data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (460488, 51) # the data is multi-index with ``time`` and ``dynamic_features`` as indices >>> data.index.names == ['time', 'dynamic_features'] True # get data by station id >>> df = dataset.fetch(stations='11478500', as_dataframe=True).unstack() >>> df.shape (12784, 8) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['prcp(mm/day)', 'srad(W/m2)', 'tmax(C)', 'tmin(C)', 'Flow']).unstack() >>> df.shape (12784, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape (102272, 10) # remember this is multi-indexed DataFrame # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='11478500', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 59), (102272, 1))
- __init__(data_source: str = 'basin_mean_daymet', path=None, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.data_source (str) –
- allowed values are
basin_mean_daymet
basin_mean_maurer
basin_mean_nldas
basin_mean_v1p15_daymet
basin_mean_v1p15_nldas
elev_bands
hru
- fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all')[source]
gets one or more static features of one or more stations
- Parameters:
stn_id (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from water_datasets import CAMELS_US >>> camels = CAMELS_US() >>> st_data = camels.fetch_static_features('11532500') >>> st_data.shape (1, 59) get names of available static features >>> camels.static_features get specific features of one station >>> static_data = camels.fetch_static_features('11528700', >>> static_features=['area_gages2', 'geol_porostiy', 'soil_conductivity', 'elev_mean']) >>> static_data.shape (1, 4) get names of allstations >>> all_stns = camels.stations() >>> len(all_stns) 671 >>> all_static_data = camels.fetch_static_features(all_stns) >>> all_static_data.shape (671, 59)
- class aqua_fetch.rr.CAMELS_DE(path=None, overwrite: bool = False, to_netcdf: bool = True, verbosity: int = 1, **kwargs)[source]
Bases:
CamelsThis is the data from 1555 German catchments following the work of Loritz et al., 2024 . The data is downloaded from zenodo . This data consists of 155 static and 21 dynamic features. The dynamic features span from 1951-01-01 to 2020-12-31 with daily timestep.
Examples
>>> from water_datasets import CAMELS_DE >>> dataset = CAMELS_DE() >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (25568, 21) get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 1555 get data of 10 % of stations as dataframe >>> df = dataset.fetch(0.1, as_dataframe=True) >>> df.shape (536928, 155) The returned dataframe is a multi-indexed data >>> df.index.names == ['time', 'dynamic_features'] True get data by station id >>> df = dataset.fetch(stations='DE110260', as_dataframe=True).unstack() >>> df.shape (25568, 21) get names of available dynamic features >>> dataset.dynamic_features get only selected dynamic features >>> data = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['temperature_mean', 'humidity_mean', 'precipitation_mean', 'discharge_vol']).unstack() >>> data.shape (25568, 4) get names of available static features >>> dataset.static_features get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape # remember this is a multiindexed dataframe (536928, 10) when we get both static and dynamic data, the returned data is a dictionary with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='DE110260', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 111), (536928, 1)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (1555, 2) >>> dataset.stn_coords('DE110250') # returns coordinates of station whose id is DE110250 47.925221 8.191595 >>> dataset.stn_coords(['DE110250', 'DE110260']) # returns coordinates of two stations
- __init__(path=None, overwrite: bool = False, to_netcdf: bool = True, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc. but will require netCDF5 package as well as xarray.
- fetch_static_features(stn_id: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import CAMELS_CH >>> dataset = CAMELS_DE() get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (1555, 111) get static data of one station only >>> static_data = dataset.fetch_static_features('DE110010') >>> static_data.shape (1, 111) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['p_mean', 'p_seasonality', 'frac_snow']) >>> static_data.shape (1555, 3) >>> data = dataset.fetch_static_features('DE110000', static_features=['p_mean', 'p_seasonality', 'frac_snow']) >>> data.shape (1, 3)
- class aqua_fetch.rr.CAMELS_DK(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
CamelsThis is an updated version of :py class: water_datasets.rr.CAMELS_DK0 dataset . This dataset was presented by Liu et al., 2024 and is available at dataverse . This dataset consists of 119 static and 13 dynamic features from 3330 danish catchments. The dynamic (time series) features span from 1989-01-02 to 2023-12-31 with daily timestep. However, the streamflow observations are available for only 304 catchments.
Examples
>>> from water_datasets import CAMELS_DK >>> dataset = CAMELS_DK() >>> data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (166166, 30) # 30 represents number of stations Since data is a multi-index dataframe, we can get data of one station as below >>> data['54130033'].unstack().shape (12782, 13) If we don't set as_dataframe=True, then the returned data will be a xarray Dataset >>> data = dataset.fetch(0.1) >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 12782, 'dynamic_features': 13}) >>> len(data.data_vars) 30 >>> df = dataset.fetch(stations=1, as_dataframe=True) # get data of only one random station >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (12782, 13) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 304 # get data by station id >>> df = dataset.fetch(stations='54130033', as_dataframe=True).unstack() >>> df.shape (12782, 13) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['Abstraction', 'pet', 'temperature', 'precipitation', 'Qobs']).unstack() >>> df.shape (12782, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape (166166, 10) # remember this is multi-indexed DataFrame # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='54130033', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 119), (166166, 1)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (304, 2) >>> dataset.stn_coords('54130033') # returns coordinates of station whose id is GRDC_3664802 6131379.493 559057.7232 >>> dataset.stn_coords(['54130033', '13210113']) # returns coordinates of two stations
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.
- fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import CAMELS_DK >>> dataset = CAMELS_DK() get the names of stations >>> stns = dataset.stations() >>> len(stns) 304 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (304, 119) get static data of one station only >>> static_data = dataset.fetch_static_features('42600042') >>> static_data.shape (1, 119) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['slope_mean', 'aridity']) >>> static_data.shape (304, 2) >>> data = dataset.fetch_static_features('42600042', static_features=['slope_mean', 'aridity']) >>> data.shape (1, 2)
- class aqua_fetch.rr.Caravan_DK(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
CamelsReads Caravan extension Denmark - Danish dataset for large-sample hydrology following the works of Koch and Schneider 2022 . The dataset is downloaded from zenodo . This dataset consists of static and dynamic features from 308 danish catchments. There are 38 dynamic (time series) features from 1981-01-02 to 2020-12-31 with daily timestep and 211 static features for each of 308 catchments.
Please note that there is an updated version of this dataset following the works of Liu et al., 2024 . This dataset is associated with the
water_datasets.CAMELS_DKclass which can be imported as follows:>>> from water_datasets import CAMELS_DK
Examples
>>> from water_datasets import Caravan_DK >>> dataset = Caravan_DK() >>> data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (569751, 30) # 30 represents number of stations >>> data.index.names == ['time', 'dynamic_features'] True >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (14609, 39) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 308 # get data by station id >>> df = dataset.fetch(stations='80001', as_dataframe=True).unstack() >>> df.shape (14609, 39) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['snow_depth_water_equivalent_mean', 'temperature_2m_mean', ... 'potential_evaporation_sum', 'total_precipitation_sum', 'streamflow']).unstack() >>> df.shape (14609, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape (569751, 10) # remember this is multi-indexed DataFrame # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dynamic`` keys. >>> data = dataset.fetch(stations='80001', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 211), (569751, 1))
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.
- property caravan_attr_fpath
returns path to attributes_caravan_camelsdk.csv file
- caravan_static_attributes(stations='all') DataFrame[source]
- Return type:
a pandas DataFrame of shape (308, 10)
- fetch_static_features(stn_id: str | List[str] = 'all', features: str | List[str] = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import Caravan_DK >>> dataset = Caravan_DK() get the names of stations >>> stns = dataset.stations() >>> len(stns) 308 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (308, 211) get static data of one station only >>> static_data = dataset.fetch_static_features('80001') >>> static_data.shape (1, 211) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['gauge_lat', 'area']) >>> static_data.shape (308, 2) >>> data = dataset.fetch_static_features('80001', features=['gauge_lat', 'area']) >>> data.shape (1, 2)
- hyd_atlas_attributes(stations='all') DataFrame[source]
- Return type:
a pandas DataFrame of shape (308, 196)
- property other_attr_fpath
returns path to attributes_other_camelsdk.csv file
- other_static_attributes(stations='all') DataFrame[source]
- Return type:
a pandas DataFrame of shape (308, 5)
- q_mmd(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per day. This is obtained by diving
streamflow/area- Parameters:
stations (str/list) – name/names of stations. Default is
all, which will return area of all stations- Returns:
a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.
- Return type:
pd.DataFrame
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas DataFrame with
longandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
coords
Examples
>>> dataset = Caravan_DK() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('100010') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['100010', '210062']) # returns coordinates of two stations
- class aqua_fetch.rr.CAMELS_FR(path=None, overwrite=False, **kwargs)[source]
Bases:
CamelsDataset of 654 catchments from France following the works of Delaigue et al., 2024. The dataset consists of 344 static catchment features and 22 dynamic features. The dynamic features span from 1970101 to 20211231 with daily timestep.
- __init__(path=None, overwrite=False, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import CAMELS_FR >>> dataset = CAMELS_FR() get the names of stations >>> stns = dataset.stations() >>> len(stns) 654 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (472, 210) get static data of one station only >>> static_data = dataset.fetch_static_features('42600042') >>> static_data.shape (1, 210) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['slope_mean', 'aridity']) >>> static_data.shape (472, 2) >>> data = dataset.fetch_static_features('42600042', static_features=['slope_mean', 'aridity']) >>> data.shape (1, 2)
- static_attrs() DataFrame[source]
combination of topographic + soil + landuse + geology + climate + hydro + climate + anthropogenic features
- Returns:
a pandas DataFrame of static features of all catchments of shape (654, xxxx)
- Return type:
pd.DataFrame
- class aqua_fetch.CAMELS_IND(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
CamelsDataset of 472 catchments from Republic of India following the works of Mangukiya et al., 2024. The dataset consists of 210 static catchment features and 20 dynamic features. The dynamic features span from 19800101 to 20201231 with daily timestep.
Examples
>>> from water_datasets import CAMELS_IND >>> dataset = CAMELS_IND() >>> data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (299520, 47) # 47 represents number of stations Since data is a multi-index dataframe, we can get data of one station as below >>> data['17015'].unstack().shape (14976, 20) If we don't set as_dataframe=True, then the returned data will be a xarray Dataset >>> data = dataset.fetch(0.1) >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 14976, 'dynamic_features': 20}) >>> len(data.data_vars) 47 >>> df = dataset.fetch(stations=1, as_dataframe=True) # get data of only one random station >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (14976, 20) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 472 # get data by station id >>> df = dataset.fetch(stations='3001', as_dataframe=True).unstack() >>> df.shape (14976, 20) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['prcp(mm/day)', 'rel_hum(%)', 'tavg(C)', 'pet(mm/day)', 'streamflow_cms']).unstack() >>> df.shape (14976, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape (299520, 10) # remember this is multi-indexed DataFrame # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='3001', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 220), (299520, 1)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (472, 2) >>> dataset.stn_coords('3001') # returns coordinates of station whose id is 3001 18.3861 80.3917 >>> dataset.stn_coords(['3001', '17021']) # returns coordinates of two stations
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import CAMELS_IND >>> dataset = CAMELS_IND() get the names of stations >>> stns = dataset.stations() >>> len(stns) 472 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (472, 210) get static data of one station only >>> static_data = dataset.fetch_static_features('42600042') >>> static_data.shape (1, 210) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['slope_mean', 'aridity']) >>> static_data.shape (472, 2) >>> data = dataset.fetch_static_features('42600042', static_features=['slope_mean', 'aridity']) >>> data.shape (1, 2)
- class aqua_fetch.rr.CAMELS_SE(path: str = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
CamelsDataset of 50 Swedish catchments following the works of Teutschbein et al., 2024 . The dataset consists of 76 static catchment features and 4 dynamic features. The dynamic features span from 19610101 to 20201231 with daily timestep.
Examples
>>> from water_datasets import CAMELS_SE >>> dataset = CAMELS_SE() >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (21915, 4) get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 50 get data of 10 % of stations as dataframe >>> df = dataset.fetch(0.1, as_dataframe=True) >>> df.shape (87660, 5) The returned dataframe is a multi-indexed data >>> df.index.names == ['time', 'dynamic_features'] True get data by station id >>> df = dataset.fetch(stations='5', as_dataframe=True).unstack() >>> df.shape (21915, 4) get names of available dynamic features >>> dataset.dynamic_features get only selected dynamic features >>> data = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['Qobs_m3s', 'Qobs_mm', 'Pobs_mm', 'Tobs_C']).unstack() >>> data.shape (21915, 4) get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape # remember this is a multiindexed dataframe (87660, 10) when we get both static and dynamic data, the returned data is a dictionary with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='5', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 76), (87660, 1)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (50, 2) >>> dataset.stn_coords('5') # returns coordinates of station whose id is GRDC_3664802 68.0356 21.9758 >>> dataset.stn_coords(['5', '200']) # returns coordinates of two stations
- __init__(path: str = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path – path where the CAMELS_SE dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.
to_netcdf
- fetch_static_features(stn_id: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import CAMELS_SE >>> dataset = CAMELS_SE() get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (50, 76) get static data of one station only >>> static_data = dataset.fetch_static_features('5') >>> static_data.shape (1, 76) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['Area_km2', 'Water_percentage', 'Elevation_mabsl']) >>> static_data.shape (50, 3) >>> data = dataset.fetch_static_features('5', static_features=['Area_km2', 'Water_percentage', 'Elevation_mabsl']) >>> data.shape (1, 3)
- class aqua_fetch.rr.CCAM(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
Bases:
CamelsDataset for chinese catchments. The CCAM dataset was published by Hao et al., 2021 has two sets. One set consists of catchment attributes, meteorological data, catchment boundaries of over 4000 catchments. However this data does not have streamflow data. The second set consists of streamflow, catchment attributes, catchment boundaries and meteorological data for 102 catchments of Yellow River. Since this second set conforms to the norms of CAMELS, this class uses this second set. Therefore, the
fetch,stationsand other methods/attributes of this class return data of only Yellow River catchments and not for whole china. However, the first set of data is can also be fetched using fetch_meteo method of this class. The temporal extent of both sets is from 1999 to 2020. However, the streamflow time series in first set has very large number of missing values. The data of Yellow river consists fo 16 dynamic features (time series) and 124 static features (catchment attributes).Examples
>>> from water_datasets import CCAM >>> dataset = CCAM() >>> data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (128560, 10) >>> data.index.names == ['time', 'dynamic_features'] True >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (8035, 16) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 102 # get data by station id >>> df = dataset.fetch(stations='0010', as_dataframe=True).unstack() >>> df.shape (8035, 16) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> df = dataset.fetch(1, as_dataframe=True, dynamic_features=['pre', 'tem_mean', 'evp', 'rhu', 'q']).unstack() >>> df.shape (8035, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape (128560, 10) # remember this is multi-indexed DataFrame # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='0010', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 124), (128560, 1))
- __init__(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.
- property dynamic_features: List[str]
names of hydro-meteorological time series data for Yellow River catchments
- fetch_meteo(stn_id: str | List[str] = 'all', features: str | List[str] = 'all', st='1990-01-01', en='2021-03-31', as_dataframe: bool = True)[source]
fetches meteorological data of 4902 chinese catchments
>>> from water_datasets import CCAM >>> dataset = CCAM() >>> dynamic_features = ['PRE', 'TEM', 'PRS', 'RHU', 'EVP', 'WIN', 'PET'] >>> st = '1999-01-01' >>> en = '2020-03-31' >>> xds = dataset.fetch_meteo(features=features, st=st, en=en)
- fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import CAMELS_DK >>> dataset = CAMELS_DK() get the names of stations >>> stns = dataset.stations() >>> len(stns) 102 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (102, 124) get static data of one station only >>> static_data = dataset.fetch_static_features('0140') >>> static_data.shape (1, 124) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['lon', 'lat', 'area']) >>> static_data.shape (102, 3) >>> data = dataset.fetch_static_features('0140', static_features=['lon', 'lat', 'area']) >>> data.shape (1, 3)
- property meteo_path
path where daily meteorological data of stations is present
- q_mmd(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per day. This is obtained by diving
q/area- Parameters:
stations (str/list) – name/names of stations. Default is
all, which will return area of all stations- Returns:
a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.
- Return type:
pd.DataFrame
- class aqua_fetch.rr.Finland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 669 catchments of Finland. The observed streamflow data is downloaded from https://wwwi3.ymparisto.fi . The meteorological data, static catchment features and catchment boundaries are taken from
water_datasets.EStreamsfollwoing the works of Nascimento et al., 2024 . Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 2012-01-01 to 2023-06-30.- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- class aqua_fetch.rr.GRDCCaravan(path=None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
CamelsThis is a dataset of 5357 catchments from around the globe following the works of Faerber et al., 2023 . The dataset consists of 39 dynamic (timeseries) features and 211 static features. The dynamic (timeseries) data spands from 1950-01-02 to 2019-05-19.
if xarray + netCDF4 packages are installed then netcdf files will be downloaded otherwise csv files will be downloaded and used.
Examples
>>> from water_datasets import GRDCCaravan >>> dataset = GRDCCaravan() >>> df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (26801, 39) get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 5357 get data of 10 % of stations as dataframe >>> df = dataset.fetch(0.1, as_dataframe=True) >>> df.shape (1045239, 535) The returned dataframe is a multi-indexed data >>> df.index.names == ['time', 'dynamic_features'] True get data by station id >>> df = dataset.fetch(stations='GRDC_3664802', as_dataframe=True).unstack() >>> df.shape (26800, 39) get names of available dynamic features >>> dataset.dynamic_features get only selected dynamic features >>> data = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['total_precipitation_sum', 'potential_evaporation_sum', 'temperature_2m_mean', 'streamflow']).unstack() >>> data.shape (26800, 4) get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> df = dataset.fetch(10, as_dataframe=True) >>> df.shape # remember this is a multiindexed dataframe (1045239, 10) when we get both static and dynamic data, the returned data is a dictionary with ``static`` and ``dyanic`` keys. >>> data = dataset.fetch(stations='GRDC_3664802', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape ((1, 211), (1045200, 1)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (5357, 2) >>> dataset.stn_coords('GRDC_3664802') # returns coordinates of station whose id is GRDC_3664802 -26.2271 -51.0771 >>> dataset.stn_coords(['GRDC_3664802', 'GRDC_1159337']) # returns coordinates of two stations
- __init__(path=None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- fetch_static_features(stn_id: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import GRDCCaravan >>> dataset = GRDCCaravan() get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (1555, 111) get static data of one station only >>> static_data = dataset.fetch_static_features('DE110010') >>> static_data.shape (1, 111) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['p_mean', 'p_seasonality', 'frac_snow']) >>> static_data.shape (1555, 3) >>> data = dataset.fetch_static_features('DE110000', static_features=['p_mean', 'p_seasonality', 'frac_snow']) >>> data.shape (1, 3)
- fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, as_ts: bool = False, st: str | None = None, en: str | None = None, **kwargs) Dict[str, DataFrame][source]
Fetches features for one station.
- Parameters:
station – station id/gauge id for which the data is to be fetched.
dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch
static_features – names of static features/attributes to be fetches
as_ts (bool) – whether static features are to be converted into a time series or not. If yes then the returned time series will be of same length as that of dynamic attribtues.
st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.
en (str, optional) – end point of data to be fetched. By default the dat will be fetched
- Returns:
dataframe if as_ts is True else it returns a dictionary of static and dynamic features for a station/gauge_id
- Return type:
Dict
Examples
>>> from water_datasets import GRDCCaravan >>> dataset = GRDCCaravan() >>> dataset.fetch_station_features('912101A')
- class aqua_fetch.rr.HYSETS(path: str, swe_source: str = 'SNODAS_SWE', discharge_source: str = 'ERA5', tasmin_source: str = 'ERA5', tasmax_source: str = 'ERA5', pr_source: str = 'ERA5', **kwargs)[source]
Bases:
Camelsdatabase for hydrometeorological modeling of 14,425 North American watersheds from 1950-2018 following the work of Arsenault et al., 2020 The user must manually download the files, unpack them and provide the path where these files are saved.
This data comes with multiple sources. Each source having one or more dynamic_features Following data_source are available.
sources
dynamic_features
SNODAS_SWE
dscharge, swe
SCDNA
discharge, pr, tasmin, tasmax
nonQC_stations
discharge, pr, tasmin, tasmax
Livneh
discharge, pr, tasmin, tasmax
ERA5
discharge, pr, tasmax, tasmin
ERAS5Land_SWE
discharge, swe
ERA5Land
discharge, pr, tasmax, tasmin
all sources contain one or more following dynamic_features with following shapes
dynamic_features
shape
time
(25202,)
watershedID
(14425,)
drainage_area
(14425,)
drainage_area_GSIM
(14425,)
flag_GSIM_boundaries
(14425,)
flag_artificial_boundaries
(14425,)
centroid_lat
(14425,)
centroid_lon
(14425,)
elevation
(14425,)
slope
(14425,)
discharge
(14425, 25202)
pr
(14425, 25202)
tasmax
(14425, 25202)
tasmin
(14425, 25202)
Examples
>>> from water_datasets import HYSETS >>> dataset = HYSETS(path="path/to/HYSETS") ... # fetch data of a random station >>> df = dataset.fetch(1, as_dataframe=True) >>> df.shape (25202, 5) >>> stations = dataset.stations() >>> len(stations) 14425 >>> df = dataset.fetch('999', as_dataframe=True) >>> df.unstack().shape (25202, 5)
- __init__(path: str, swe_source: str = 'SNODAS_SWE', discharge_source: str = 'ERA5', tasmin_source: str = 'ERA5', tasmax_source: str = 'ERA5', pr_source: str = 'ERA5', **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.swe_source (str) – source of swe data.
discharge_source – source of discharge data
tasmin_source – source of tasmin data
tasmax_source – source of tasmax data
pr_source – source of pr data
kwargs – arguments for
Camelsbase class
- property OfficialID_WatershedID_map
A dictionary mapping Official_ID to Watershed_ID. For example ‘1’: ‘01AD002’
- property WatershedID_OfficialID_map
A dictionary mapping Watershed_ID to Official_ID. For example ‘01AD002’: ‘1’
- area(stations: str | List[str] = 'all', source: str = 'other') Series[source]
Returns area_gov (Km2) of all catchments as pandas series
- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
source (str) – source of area calculation. It should be either
gsimorother
- Returns:
a pandas series whose indices are catchment ids and values are areas of corresponding catchments.
- Return type:
pd.Series
Examples
>>> from water_datasets import HYSETS >>> dataset = HYSETS() >>> dataset.area() # returns area of all stations >>> dataset.area('92') # returns area of station whose id is 912101A >>> dataset.area(['92', '142']) # returns area of two stations
- fetch_dynamic_features(stn_id, features='all', st=None, en=None, as_dataframe=False)[source]
Fetches dynamic features of one station.
Examples
>>> from water_datasets import HYSETS >>> dataset = HYSETS() >>> dyn_features = dataset.fetch_dynamic_features('station_name')
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', st=None, en=None, as_ts=False) DataFrame[source]
returns static atttributes of one or multiple stations
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
st
en
as_ts
Examples
>>> from water_datasets import HYSETS >>> dataset = HYSETS() get the names of stations >>> stns = dataset.stations() >>> len(stns) 14425 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (14425, 28) get static data of one station only >>> static_data = dataset.fetch_static_features('991') >>> static_data.shape (1, 28) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['Drainage_Area_km2', 'Elevation_m']) >>> static_data.shape (14425, 2)
- fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]
returns features of multiple stations .. rubric:: Examples
>>> from water_datasets import HYSETS >>> dataset = HYSETS() >>> stations = dataset.stations()[0:3] >>> features = dataset.fetch_stations_features(stations)
- get_boundary(catchment_id: str, as_type: str = 'numpy')[source]
returns boundary of a catchment in a required format
Examples
>>> from water_datasets import HYSETS >>> dataset = HYSETS() >>> dataset.get_boundary(dataset.stations()[0])
- q_mmd(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per day. This is obtained by diving q_cms/area
- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.
- Return type:
pd.DataFrame
- read_static_data(usecols=None, nrows=None)[source]
reads the HYSETS_watershed_properties.txt file while using Watershed_ID as index instead of
Official_ID. Watershed_ID starts with 1,2,3 and so on whileOfficial_IDis code from meteo agency such as01AD002for station 1.
- stations() List[str][source]
retuns a list of station names. The
Watershed_IDof the station is used as station name instead ofOfficial_ID. This is because in .nc files watershed_ID is used for stations instead of Official_ID.Official_IDstarts with 1, 2, 3 and so on whileWatershed_IDis a code from meteo agency such as01AD002for station 1.- Returns:
a list of ids of stations
- Return type:
Examples
>>> from water_datasets import HYSETS >>> dataset = HYSETS() ... # get name of all stations as list >>> dataset.stations()
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas DataFrame with
longandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
coords
Examples
>>> dataset = HYSETS() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('92') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['92', '142']) # returns coordinates of two stations
- class aqua_fetch.rr.HYPE(time_step: str = 'daily', path=None, **kwargs)[source]
Bases:
CamelsDownloads and preprocesses HYPE [1] dataset from Lindstroem et al., 2010 [2] . This is a rainfall-runoff dataset of Costa Rica of 564 stations from 1985 to 2019 at daily, monthly and yearly time steps.
Examples
>>> from water_datasets import HYPE >>> dataset = HYPE() ... # get data of 5% of stations >>> df = dataset.fetch(stations=0.05, as_dataframe=True) # returns a multiindex dataframe >>> df.shape (115047, 28) ... # fetch data of 5 (randomly selected) stations >>> df = dataset.fetch(stations=5, as_dataframe=True) >>> df.shape (115047, 5) fetch data of 3 selected stations >>> df = dataset.fetch(stations=['564','563','562'], as_dataframe=True) >>> df.shape (115047, 3) ... # fetch data of a single stations >>> df = dataset.fetch(stations='500', as_dataframe=True) (115047, 1) # get only selected dynamic features >>> df = dataset.fetch(stations='501', ... dynamic_features=['AET_mm', 'Prec_mm', 'Streamflow_mm'], as_dataframe=True) # fetch data between selected periods >>> df = dataset.fetch(stations='225', st="20010101", en="20101231", as_dataframe=True) >>> df.shape (32868, 1) ... # get data at monthly time step >>> dataset = HYPE(time_step="month") >>> df = dataset.fetch(stations='500', as_dataframe=True) >>> df.shape (3780, 1)
- __init__(time_step: str = 'daily', path=None, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.time_step (str) – one of
daily,monthoryear**kwargs – key word arguments
- area(stations: str | List[str] = None) Series[source]
Returns area (Km2) of all catchments as pandas series
- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a pandas series whose indices are catchment ids and values are areas of corresponding catchments.
- Return type:
pd.Series
Examples
>>> from water_datasets import HYPE >>> dataset = HYPE() >>> dataset.area() # returns area of all stations >>> dataset.stn_coords('2') # returns area of station whose id is 912101A >>> dataset.stn_coords(['2', '605']) # returns area of two stations
- stn_coords(stations: str | List[str] = None) DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
Examples
>>> dataset = HYPE() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('2') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['2', '605']) # returns coordinates of two stations
- class aqua_fetch.Ireland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 464 catchments of Ireland. Out of these 464 catchments, 280 are from OPW and 184 are from EPA. The observed streamflow data for EPA stations is downloaded from https://epawebapp.epa.ie/Hydronet/#Flow while the observed streamflow for OPW stations is downloaded from https://waterlevel.ie/hydro-data/#/overview/Waterlevel. It should be that out of 280 OPW stations, streamflow data is available for only 129 stations. The meteorological data, static catchment features and catchment boundaries are taken from
water_datasets.EStreamsfollwoing the works of Nascimento et al., 2024 project. Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1992-01-01 to 2020-06-31.- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- class aqua_fetch.rr.Italy(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 294 catchments of Italy. The observed streamflow data is downloaded from http://www.hiscentral.isprambiente.gov.it/hiscentral/hydromap.aspx?map=obsclient . The meteorological data, static catchment features and catchment boundaries are taken from
water_datasets.EStreamsfollwoing the works of Nascimento et al., 2024 . Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1992-01-01 to 2020-06-31.- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- class aqua_fetch.Japan(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_GSHAData of 694 catchments of Japan from river.go.jp website . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2022-12-31.
- __init__(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- class aqua_fetch.rr.LamaHCE(*, timestep: str, data_type: str, path=None, to_netcdf: bool = True, overwrite=False, **kwargs)[source]
Bases:
CamelsLarge-Sample Data for Hydrology and Environmental Sciences for Central Europe (mainly Austria). The dataset is downloaded from zenodo following the work of Klingler et al., 2021 . For
total_upstrmdata, there are 859 stations with 61 static features and 17 dynamic features. The temporal extent of data is from 1981-01-01 to 2019-12-31.- __init__(*, timestep: str, data_type: str, path=None, to_netcdf: bool = True, overwrite=False, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.timestep – possible values are
Dfor daily orHfor hourly timestepdata_type – possible values are
total_upstrm,diff_upstrm_allordiff_upstrm_lowimp
Examples
>>> from water_datasets import LamaHCE >>> dataset = LamaHCE(timestep='D', data_type='total_upstrm') # The daily dataset is from 859 with 80 static and 22 dynamic features >>> len(dataset.stations()), len(dataset.static_features), len(dataset.dynamic_features) (859, 80, 22) >>> df = dataset.fetch(3, as_dataframe=True) >>> df.shape (313368, 3) >>> dataset = LamaHCE(timestep='H', data_type='total_upstrm') >>> len(dataset.stations()), len(dataset.static_features), len(dataset.dynamic_features) (859, 80, 17) >>> dataset.fetch_dynamic_features('1', features = ['obs_q_cms'])
- fetch_static_features(stn_id: str | List[str] = 'all', static_features: str | List[str] = None) DataFrame[source]
static features of LamaHCE
- Parameters:
stn_id (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from water_datasets import LamaHCE >>> dataset = LamaHCE(timestep='D', data_type='total_upstrm') >>> df = dataset.fetch_static_features('99') # (1, 61) ... # get list of all static features >>> dataset.static_features >>> dataset.fetch_static_features('99', >>> static_features=['area_calc', 'elev_mean', 'agr_fra', 'sand_fra']) # (1, 4)
- fetch_stations_features(stations: list, dynamic_features='all', static_features=None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]
Reads attributes of more than one stations.
This function checks of .nc files exist, then they are not prepared and saved otherwise first nc files are prepared and then the data is read again from nc files. Upon subsequent calls, the nc files are used for reading the data.
- Parameters:
stations – list of stations for which data is to be fetched.
dynamic_features – list of dynamic attributes to be fetched. if ‘all’, then all dynamic attributes will be fetched.
static_features – list of static attributes to be fetched. If all, then all static attributes will be fetched. If None, then no static attribute will be fetched.
st – start of data to be fetched.
en – end of data to be fetched.
as_dataframe – whether to return the data as pandas dataframe. default is xr.dataset object
dict (kwargs) – additional keyword arguments
- Returns:
Dynamic and static features of multiple stations. Dynamic features are by default returned as xr.Dataset unless
as_dataframeis True, in such a case, it is a pandas dataframe with multiindex. If xr.Dataset, it consists ofdata_varsequal to number of stations and for each station, theDataArrayis of dimensions (time, dynamic_features). where time is defined bystandeni.e length ofDataArray. In case, when the returned object is pandas DataFrame, the first index is time and second index is dyanamic_features. Static attributes are always returned as pandas DataFrame and have the shape:(stations, static_features). Ifdynamic_featuresis None, then they are not returned and the returned value only consists of static features. Same holds true for static_features. If both are not None, then the returned type is a dictionary with static and dynamic keys.- Raises:
ValueError, if both dynamic_features and static_features are None –
Examples
>>> from water_datasets import CAMELS_AUS >>> dataset = CAMELS_AUS() ... # find out station ids >>> dataset.stations() ... # get data of selected stations >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'], ... as_dataframe=True)
- class aqua_fetch.rr.LamaHIce(path=None, overwrite=False, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = True, **kwargs)[source]
Bases:
LamaHCEDaily and hourly hydro-meteorological time series data of 111 river basins of Iceland following Helgason et al., 2024. The total period of dataset is from 1950 to 2021 for daily and 1976-20023 for hourly timestep. The average length of daily data is 33 years while for that of hourly it is 11 years. The dataset is available on hydroshare
- __init__(path=None, overwrite=False, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.timestep – possible values are
Dfor daily orHfor hourly timestepdata_type – possible values are
total_upstrm,intermediate_allorintermediate_lowimp
- basin_attributes() DataFrame[source]
returns basin attributes which are catchment attributes, water balance all attributes and water balance filtered attributes
- Returns:
a dataframe of shape (111, 104) where 104 are the static catchment/basin attributes
- Return type:
pd.DataFrame
- fetch_clim_features(stations: str | List[str] = None)[source]
Returns climate time series data for one or more stations
- Return type:
pd.DataFrame
- fetch_q(stations: str | List[str] = None, qc_flag: int = None)[source]
returns streamflow for one or more stations
- Parameters:
- Returns:
a pandas dataframe whose index is the time and columns are names of stations For daily timestep, the dataframe has shape of 32630 rows and 111 columns
- Return type:
pd.DataFrame
- fetch_static_features(stn_id: str | list = 'all', static_features: str | list = None) DataFrame[source]
static features of LamaHCE
- Parameters:
stn_id (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from water_datasets import LamaHCE >>> dataset = LamaHCE(timestep='D', data_type='total_upstrm') >>> df = dataset.fetch_static_features('99') # (1, 61) ... # get list of all static features >>> dataset.static_features >>> dataset.fetch_static_features('99', >>> static_features=['area_calc', 'elev_mean', 'agr_fra', 'sand_fra']) # (1, 4)
- fetch_stn_meteo(stn: str, nrows: int = None) DataFrame[source]
returns climate/meteorological time series data for one station
- Returns:
a pandas dataframe with 23 columns
- Return type:
pd.DataFrame
- gauge_attributes() DataFrame[source]
returns gauge attributes from following two files
Gauge_attributes.csv
hydro_indices_1981_2018.csv
- Returns:
a dataframe of shape (111, 28)
- Return type:
pd.DataFrame
- property gauges_path
returns the path where gauge data files are located
- q_mmd(stations: str | List[str] = None) DataFrame[source]
returns streamflow in the units of milimeter per day. This is obtained by diving q_cms/area
- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a pandas DataFrame whose indices are time-steps and columns are catchment/station ids.
- Return type:
pd.DataFrame
- property q_path
path where all q files are located
- class aqua_fetch.rr.Poland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 1287 catchments of Poland. The observed streamflow data is downloaded from https://danepubliczne.imgw.pl . The meteorological data, static catchment features and catchment boundaries are taken from
water_datasets.EStreamsfollwoing the works of Nascimento et al., 2024 . Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1992-01-01 to 2020-06-31.- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- class aqua_fetch.rr.Portugal(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 280 catchments of Portugal. The observed streamflow data is downloaded from https://snirh.apambiente.pt . The meteorological data, static catchment features and catchment boundaries for the 280 catchments are taken from
water_datasets.EStreamsfollwoing the works of Nascimento et al., 2024 project. Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1972-01-01 to 2022-12-31 .- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- get_q(as_dataframe: bool = True)[source]
returns the streamflow data of Portugal as xarray.Dataset or pandas.DataFrame
- Returns:
xarray.Dataset or pandas.DataFrame. If as_dataframe is True, returns pandas.DataFrame
with columns as station codes and index as time. If as_dataframe is False, returns
xarray.Dataset with station codes as variables and time as dimension.
- class aqua_fetch.RRLuleaSweden(path=None, **kwargs)[source]
Bases:
DatasetsRainfall runoff data for an urban catchment from 2016-2019 following the work of Broekhuizen et al., 2020 .
- __init__(path=None, **kwargs)[source]
- Parameters:
name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz
- fetch(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None)[source]
fetches rainfall runoff data
- Parameters:
st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00
en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:41
- fetch_flow(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]
fetches flow data
- Parameters:
st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00
en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:35:00
- Returns:
a dataframe of shape (37_618, 3) where the columns are velocity, level and flow rate
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import RRLuleaSweden >>> dataset = RRLuleaSweden() >>> flow = dataset.fetch_flow() >>> flow.shape (37618, 3)
- fetch_pcp(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]
fetches precipitation data
- Parameters:
st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 19:48:00
en (optional) – end of data to be fetched. By default the end is 2019-10-26 23:59:00
- Returns:
a dataframe of shape (967_080, 1)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import RRLuleaSweden >>> dataset = RRLuleaSweden() >>> pcp = dataset.fetch_pcp() >>> pcp.shape (967080, 1)
- class aqua_fetch.rr.Simbi(path: str = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
Camelsmonthly rainfall from 1905 - 2005, daily rainfall from 1920-1940, 70 daily streamflow series, and 23 monthly temperature series for 24 catchments of Haiti
Bathelemy et al., 2023 Bathelemy et al., 2024
Examples
>>> from water_datasets import Simbi >>> simbi = Simbi()
- __init__(path: str = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path – path where the Simbi dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.
to_netcdf
- fetch_static_features(stn_id: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_quality import Simbi >>> dataset = Simbi() get all static data of all stations >>> stns = dataset.static_data_stations() >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (24, 232) get static data of one station only >>> static_data = dataset.fetch_static_features('001') >>> static_data.shape (1, 232) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['stream_density', 'pcp', 'Forest_lc_98']) >>> static_data.shape (24, 3) >>> data = dataset.fetch_static_features('001', static_features=['stream_density', 'pcp', 'Forest_lc_98']) >>> data.shape (1, 3)
- class aqua_fetch.rr.Spain(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_GSHAData of 889 catchments of Spain from ceh-es website. The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2020-09-30.
- __init__(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- daily_q_all_areas() DataFrame[source]
Daily data of gauging stations in river from all areas
Retuns
16_806_305 rows x 3
- class aqua_fetch.Thailand(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_GSHAData of 73 catchments of Thailand from RID project . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of staic features are 35 and dynamic features are 27 and the data is available from 1980-01-01 to 1999-12-31.
- __init__(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- class aqua_fetch.USGS(path: str | PathLike = None, hysets_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
CamelsThis class handles the hydrometeorological data for the USA. The daily and hourly discharge data is downloaded from usgs/nwis website . The data is optionally stored in a netCDF file if xarray is available. Currently the data is downloaded for only those sites/catchments that are in the HYSETS database. This is because the catchment boundaries are taken from HYSETS database using
water_datasets.HYSETS.For hourly timestep, “iv” service is used to download the instantaneous data which is then resampled to hourly data. Data with only
A, [92],A, [91],A, [93],A, e,Aflags is used. For daily streamflow, “dv” service is used to download the data. In this case, the data with onlyAandA, eflags is used.- __init__(path: str | PathLike = None, hysets_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – Path to store the data
- area(stations: str | List[str] = 'all') Series[source]
Returns area_gov (Km2) of all catchments as pandas series
- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a pandas series whose indices are catchment ids and values are areas of corresponding catchments.
- Return type:
pd.Series
Examples
>>> from water_datasets import USGS >>> dataset = USGS() >>> dataset.area() # returns area of all stations >>> dataset.area('912101A') # returns area of station whose id is 912101A >>> dataset.area(['912101A', '12388200']) # returns area of two stations
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', st=None, en=None, as_ts=False) DataFrame[source]
returns static atttributes of one or multiple stations
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
st
en
as_ts
Examples
>>> from water_datasets import USGS >>> dataset = USGS() get the names of stations >>> stns = dataset.stations() >>> len(stns) 12004 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (12004, 27) get static data of one station only >>> static_data = dataset.fetch_static_features('01010070') >>> static_data.shape (1, 27) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['Drainage_Area_km2', 'Elevation_m']) >>> static_data.shape (12004, 2)
- fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]
returns features of multiple stations
Examples
>>> from water_datasets import USGS >>> dataset = USGS() >>> stations = dataset.stations()[0:3] >>> features = dataset.fetch_stations_features(stations)
- get_boundary(catchment_id: str, as_type: str = 'numpy')[source]
returns boundary of a catchment in a required format
Examples
>>> from water_datasets import USGS >>> dataset = USGS() >>> dataset.get_boundary(dataset.stations()[0])
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas DataFrame with
longandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
coords
Examples
>>> dataset = USGS() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('01010000') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['01010000', '01010070']) # returns coordinates of two stations
- class aqua_fetch.rr.WaterBenchIowa(path=None, **kwargs)[source]
Bases:
CamelsRainfall run-off dataset for Iowa (US) following the work of Demir et al., 2022 This is hourly dataset of 125 catchments with 7 static features and 3 dyanmic features (pcp, et, discharge) for each catchment. The dyanmic features are timeseries from 2011-10-01 12:00 to 2018-09-30 11:00.
Examples
>>> from water_datasets import WaterBenchIowa >>> ds = WaterBenchIowa() ... # fetch static and dynamic features of 5 stations >>> data = ds.fetch(5, as_dataframe=True) >>> data.shape # it is a multi-indexed DataFrame (184032, 5) ... # fetch both static and dynamic features of 5 stations >>> data = ds.fetch(5, static_features="all", as_dataframe=True) >>> data.keys() dict_keys(['dynamic', 'static']) >>> data['static'].shape (5, 7) >>> data['dynamic'] # returns a xarray DataSet ... # using another method >>> data = ds.fetch_dynamic_features('644', as_dataframe=True) >>> data.unstack().shape (61344, 3) # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = ds.fetch(stations='644', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape >>> ((1, 7), (184032, 1))
- __init__(path=None, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- fetch_static_features(stn_id: str | List[str], static_features: str | List[str] = 'all') DataFrame[source]
- Parameters:
stn_id (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from water_datasets import WaterBenchIowa >>> dataset = WaterBenchIowa() get the names of stations >>> stns = dataset.stations() >>> len(stns) 125 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (125, 7) get static data of one station only >>> static_data = dataset.fetch_static_features('592') >>> static_data.shape (1, 7) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['slope', 'area']) >>> static_data.shape (125, 2) >>> data = dataset.fetch_static_features('592', static_features=['slope', 'area']) >>> data.shape (1, 2)
- fetch_station_attributes(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, as_ts: bool = False, st: str | None = None, en: str | None = None, **kwargs) DataFrame[source]
Examples
>>> from water_datasets import WaterBenchIowa >>> dataset = WaterBenchIowa() >>> data = dataset.fetch_station_attributes('666')
The following datasets are very much similar to RainfallRunoff datasets, but they do not have observed streamflow data. They are used to provide static and dynamic features to other datasets.
- class aqua_fetch.GSHA(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
CamelsGlobal streamflow characteristics, hydrometeorology and catchment attributes following Peirong et al., 2023. The data is downloaded from its zenodo repository. It should be noted that this dataset does not contain observed streamflow data. It has 21568 stations, 26 dynamic (meteorological + storage) features with daily timestep, 21 dynamic features (landcover + streamflow indices + reservoir) with yearly timestep and 35 static features.
Examples
>>> from water_datasets import GSHA >>> dataset = GSHA() >>> len(dataset.stations()) 21568 >>> dataset.agencies ['arcticnet', 'AFD', 'GRDC', 'IWRIS', 'MLIT', 'HYDAT', 'ANA', 'BOM', 'CCRR', 'China', 'CHP', 'RID', 'USGS'] >>> dataset.start Timestamp('1979-01-01 00:00:00') >>> dataset.end Timestamp('2022-12-31 00:00:00') >>> dataset.static_features ['ele_mt_uav', 'slp_dg_uav', 'lat', 'long', 'area', 'agency', ...] >>> len(dataset.dynamic_features) 26 >>> len(dataset.daily_dynamic_features) 26 >>> len(dataset.yearly_dynamic_features) 21 >>> dataset.fetch_static_features('1001_arcticnet') fetch static features for all stations of arcticnet agency >>> dataset.fetch_static_features(agency='arcticnet') fetch static features for all stations of arcticnet agency >>> ds.fetch_dynamic_features(agency='arcticnet')
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.
- property agencies: List[str]
returns the names of agencies as list
arcticnet: AntarcticaAFD: SpainGRDC: GlobalIWRIS: IndiaMLIT: JapanHYDAT: CanadaANA: BrazilBOM: AustraliaCCRR: ChileChinaCHP: ChinaRID: ThailandUSGS
- atlas(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]
The link table between GSHA watershed IDs and RiverATLAS river reach IDs, as well as the selected static attributes
- Returns:
a pandas DataFrame of shape (n, 24) where n is the number of stations
- Return type:
pd.DataFrame
- fetch_dynamic_features(stations: List[str] | str = 'all', dynamic_features='all', st=None, en=None, as_dataframe=False, agency: List[str] = 'all')[source]
Fetches all or selected dynamic features of one station.
- Parameters:
stations (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
st (Optional (default=None)) – start time from where to fetch the data.
en (Optional (default=None)) – end time untill where to fetch the data
as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is xarray dataset
Examples
>>> from water_datasets import GSHA >>> camels = GSHA() >>> camels.fetch_dynamic_features('1001_arcticnet', as_dataframe=True).unstack() >>> camels.dynamic_features >>> camels.fetch_dynamic_features('1001_arcticnet', ... features=['tmax_AWAP', 'vprp_AWAP', 'streamflow_mmd'], ... as_dataframe=True).unstack()
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', agency: List[str] = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stations (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import GSHA >>> dataset = GSHA() get the names of stations >>> stns = dataset.stations() >>> len(stns) 21568 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (21568, 35) get static data of one station only >>> static_data = dataset.fetch_static_features('1001_arcticnet') >>> static_data.shape (1, 35) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['ele_mt_uav', 'slp_dg_uav']) >>> static_data.shape (21568, 2) >>> data = dataset.fetch_static_features('1001_arcticnet', static_features=['slp_dg_uav', 'slp_dg_uav']) >>> data.shape (1, 2) >>> out = ds.fetch_static_features(agency='arcticnet') >>> out.shape (106, 35
- fetch_stn_dynamic_features(stn_id: str, dynamic_features='all') DataFrame[source]
Fetches all or selected dynamic features of one station.
- Parameters:
stn_id (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
- Returns:
a pandas dataframe of shape (n, features) where n is the number of days
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import GSHA >>> camels = GSHA() >>> camels.fetch_stn_dynamic_features('1001_arcticnet').unstack() >>> camels.dynamic_features >>> camels.fetch_stn_dynamic_features('1001_arcticnet', ... features=['tmax_AWAP', 'vprp_AWAP']).unstack()
- lai(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Leaf Area Index timeseries for one or more than one station either as xr.Dataset or pandas DataFrame. The data has daily timestep.
- lai_stn(stn: str) Series[source]
Daily leaf area index. As per documentation, due to satellite data quality, some watersheds might have relatively serious data missing issue. The data is from 1981-01-01 to 2020-12-31.
- Returns:
a pandas Series of shape (14571,) where 14571 is the number of days
- Return type:
pd.Series
- lc_variables(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Landcover variables for one or more than one station either as xr.Dataset or dictionary. The data has yearly timestep.
- lc_variables_stn(stn: str) DataFrame[source]
Landcover variables for a given station which have yearly timestep. Following three landcover variables are provided:
urban_fraction(%): Ratio of urban extent to the entire watershed area (percentage).
forest_fraction(%): Ratio of forest extent to the entire watershed area (percentage).
cropland_fraction(%): Ratio of cropland extent to the entire watershed area (percentage).
- Returns:
a pandas DataFrame of shape (n, 3) where n is the number of years
- Return type:
pd.DataFrame
- meteo_vars(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Meteorological variables from 1979-01-01 to 2022-12-31 for one or more than one station either as xr.Dataset or dictionary. The data has daily timestep.
- meteo_vars_all_stns()[source]
Meteorological variables from 1979-01-01 to 2022-12-31 for all stations either as xr.Dataset or dictionary. The data has daily timestep.
- meteo_vars_stn(stn: str) DataFrame[source]
Daily meteorological variables from 1979-01-01 to 2022-12-31 for a given station.
- Returns:
a pandas DataFrame of shape (16071, 19) where n is the number of days
- Return type:
pd.DataFrame
- reservoir_variables(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Reservoir variables for one or more than one station either as xr.Dataset or dictionary. The data has yearly timestep.
- reservoir_variables_stn(stn: str) DataFrame[source]
Reservoir variables for a given station from 1979 to 2020 with yearly timestep. Following two reservoir variables are provided:
capacity: Reservoir capacity of the year in the watershed (m3). To avoid including too many missing values, we use the ICOLD capacity in the linked table of the GeoDAR dataset.dor: Degree of regulation of the watershed (yearly reservoir capacity/yearly mean flow). If yearly mean flow is missing, the value is substituted with the average of all mean flow values.
- Returns:
a pandas DataFrame of shape (42, 2) where 42 is the number of years
- Return type:
pd.DataFrame
- stn_coords(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]
returns the latitude and longitude of stations
- Returns:
a pandas DataFrame of shape (n, 2) where n is the number of stations
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import GSHA >>> dataset = GSHA() >>> dataset.stn_coords('1001_arcticnet') >>> dataset.stn_coords(['1001_arcticnet', '1002_arcticnet']) get coordinates for all stations of arcticnet agency >>> dataset.stn_coords(agency='arcticnet')
- storage_vars(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Water storage term variables from 1979-01-01 to 2021-12-31 for one or more than one station either as xr.Dataset or dictionary. The data has daily timestep.
- storage_vars_all_stns()[source]
Water storage term variables from 1979-01-01 to 2021-12-31 for all stations either as xr.Dataset or dictionary. The data has daily timestep.
- storage_vars_stn(stn: str) DataFrame[source]
Daily Water storage term variables from 1979-01-01 to 2021-12-31 for a given station.
SM_layer1: 0-7 cm soil moisture from ERA5 land soil water layer 1 (m3/m3) for 1979-2021.
SM_layer2: 7-28 cm soil moisture from ERA5 land soil water layer 2 (m3/m3) for 1979-2021.
SM_layer3: 28-100 cm soil moisture from ERA5 land soil water layer 3 (m3/m3) for 1979-2021.
SM_layer4: 100-289 cm soil moisture from ERA5 land soil water layer 4 (m3/m3) for 1979-2021.
SWDE: Snow water equivalent from ERA5 snow depth water equivalent (m of water equivalent) for 1979-2021.
groundwater(%): Groundwater percentage from GRACE-FO data assimilation (%) for 2003-2021 (weekly).
- Returns:
a pandas DataFrame of shape (15706, 6) where n is the number of days
- Return type:
pd.DataFrame
- streamflow_indices(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Landcover variables for one or more than one station either as xr.Dataset or dictionary. The data has yearly timestep.
- streamflow_indices_stn(stn: str) DataFrame[source]
Streamflow indices for a given station which have yearly timestep.
- Returns:
a pandas DataFrame of shape (n, 16) where n is the number of years
- Return type:
pd.DataFrame
- uncertainty(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]
Uncertainty estimates of all meteorological variables over all watersheds
P_uncertainty (%) Precipitation uncertainty estimates (in percentage). Uncertainties are calculated from EM-Earth deterministic and MSWEP datasets.
T_uncertainty (%) Temperature uncertainty estimates (in percentage). Uncertainties are calculated from EUSTACE, MERRA-2, and ERA5 datasets.
EVP_uncertainty (%) Actual evapotranspiration uncertainty estimates (in percentage). Uncertainties are calculated from GLEAM and REA datasets.
LRAD_uncertainty (%) Downward longwave radiation uncertainty estimates (in percentage). Uncertainties are calculated from MERRA-2 and ERA5-land datasets.
SRAD_uncertainty (%) Downward shortwave radiation uncertainty estimates (in percentage). Uncertainties are calculated from MERRA-2 and ERA5-land datasets.
wind_uncertainty (%) Wind speed uncertainty estimates (in percentage). The u- and v- components are aggregated on each grid to obtain wind speed. Uncertainties are calculated from MERRA-2 and ERA5-land datasets.
pet_uncertainty (%) Potential evapotranspiration uncertainty estimates (in percentage). Uncertainties are calculated from GLEAM and REA datasets.
- Returns:
a pandas DataFrame of shape (n, 7) where n is the number of stations
- Return type:
pd.DataFrame
- class aqua_fetch.EStreams(path=None, **kwargs)[source]
Bases:
CamelsHandles EStreams data following the work of Nascimento et al., 2024 . The data is available at its zenodo repository . It should be noted that this dataset does not contain observed streamflow data. It has 15047 stations, 9 dynamic (meteorological) features with daily timestep, 27 dynamic features with yearly timestep and 208 static features. The dynamic features are from 1950-01-01 to 2023-06-30.
- __init__(path=None, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- area(stations: List[str] = 'all', countries: List[str] = 'all') Series[source]
area of catchments im km2
- fetch_dynamic_features(stations: List[str] | str = 'all', dynamic_features='all', st=None, en=None, as_dataframe=False, countries: str | List[str] = 'all')[source]
Fetches all or selected dynamic features of one station.
- Parameters:
stations (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
st (Optional (default=None)) – start time from where to fetch the data.
en (Optional (default=None)) – end time untill where to fetch the data
as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is xarray dataset
Examples
>>> from water_datasets import EStreams >>> camels = EStreams() >>> camels.fetch_dynamic_features('IEEP0281', as_dataframe=True).unstack() >>> camels.dynamic_features >>> camels.fetch_dynamic_features('IEEP0281', ... features=['p_mean', 't_mean', 'pet_mean'], ... as_dataframe=True).unstack()
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', countries: List[str] = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stn_id (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a pandas dataframe of shape (stations, static_features)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import EStreams >>> dataset = EStreams() get the names of stations >>> stns = dataset.stations() >>> len(stns) 15047 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (15047, 153) get static data of one station only >>> static_data = dataset.fetch_static_features('IEEP0281') >>> static_data.shape (1, 153) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['slp_dg_mean', 'ele_mt_mean']) >>> static_data.shape (15047, 2) >>> data = dataset.fetch_static_features('IEEP0281', static_features=['slp_dg_mean', 'ele_mt_mean']) >>> data.shape (1, 2) >>> out = ds.fetch_static_features(countries='IE') >>> out.shape (464, 153
- fetch_stn_dynamic_features(stn_id: str, dynamic_features='all') DataFrame[source]
Fetches all or selected dynamic features of one station.
- Parameters:
stn_id (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
- Returns:
a pandas dataframe of shape (n, features) where n is the number of days
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import EStreams >>> camels = EStreams() >>> camels.fetch_stn_dynamic_features('IEEP0281').unstack() >>> camels.dynamic_features >>> camels.fetch_stn_dynamic_features('IEEP0281', ... features=['p_mean', 't_mean', 'pet_mean']).unstack()
- hydro_clim_sigs(stations: List[str] = 'all', countries: List[str] = 'all') DataFrame[source]
Returns the hydro-climatic signatures of one or more stations
- Returns:
a pandas dataframe of hydro-climatic signatures of shape (stations, 31)
- Return type:
pd.DataFrame
- meteo_data(stations: str | List[str] = 'all', countries: List[str] | str = 'all')[source]
Returns the meteorological data of one or more stations either as dictionary of dataframes or xarray Dataset
- meteo_data_station(stn_id: str) DataFrame[source]
Returns the meteorological data of a station
- Returns:
a pandas dataframe of meteorological data of shape (time, 9)
- Return type:
pd.DataFrame
- stn_coords(stations: List[str] = 'all', countries: List[str] = 'all') DataFrame[source]
Returns the coordinates of one or more stations
- Returns:
a pandas dataframe of shape (stations, 2)
- Return type:
pd.DataFrame
Examples
>>> from water_datasets import EStreams >>> dataset = EStreams() >>> dataset.stn_coords('IEEP0281') >>> dataset.stn_coords(['IEEP0281', 'IEEP0282']) >>> dataset.stn_coords(countries='IE')