Miscellaneous

This section contains the documentation for the miscellaneous datasets available in the package.

Groundwater of Punjab region

aqua_fetch.gw_punjab(data_type: str = 'full', country: str = None) → DataFrame[source]

groundwater level (meters below ground level) dataset from Punjab region (Pakistan and north-west India) following the study of MacAllister et al., 2022.

Parameters:

data_type (str (default="full")) – either full or LTS. The full contains the full dataset, there are 68783 rows of observed groundwater level data from 4028 individual sites. In LTS there are 7547 rows of groundwater level observations from 130 individual sites, which have water level data available for a period of more than 40 years and from which at least two thirds of the annual observations are available.
country (str (default=None)) – the country for which data to retrieve. Either PAK or IND.

Returns:

a pandas DataFrame with datetime index

Return type:

pd.DataFrame

Examples

>>> from water_quality import gw_punjab
>>> full_data = gw_punjab()
find out the earliest observation
>>> print(full_data.sort_index().head(1))
>>> lts_data = gw_punjab()
>>> lts_data.shape
    (68782, 4)
>>> df_pak = gw_punjab(country="PAK")
>>> df_pak.sort_index().dropna().head(1)

Weisssee

class aqua_fetch.Weisssee(path=None, overwrite=False, **kwargs)[source]

Bases: Datasets

__init__(path=None, overwrite=False, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

fetch(**kwargs)[source]

Examples

>>> from water_quality import Weisssee
>>> dataset = Weisssee()
>>> data = dataset.fetch()

WeatherJena

class aqua_fetch.WeatherJena(path=None, obs_loc='roof')[source]

Bases: Datasets

10 minute weather dataset of Jena, Germany hosted at https://www.bgc-jena.mpg.de/wetter/index.html from 2002 onwards.

>>> from water_quality import WeatherJena
>>> dataset = WeatherJena()
>>> data = dataset.fetch()
>>> data.sum()

__init__(path=None, obs_loc='roof')[source]

The ETP data is collected at three different locations i.e. roof, soil and saale(hall).

Parameters:

obs_loc (str, optional (default=roof)) –

location of observation. It can be one of following

roof
soil
saale

property dynamic_features: list: returns names of features available

fetch(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) → DataFrame[source]

Fetches the time series data between given period as pandas dataframe.

Parameters:

st (Optional) – start of data to be fetched. If None, the data from start (2003-01-01) will be retuned
en (Optional) – end of data to be fetched. If None, the data from till (2021-12-31) end be retuned.

Returns:

a pandas dataframe of shape (972111, 21)

Return type:

pd.DataFrame

Examples

>>> from water_quality import WeatherJena
>>> dataset = WeatherJena()
>>> data = dataset.fetch()
>>> data.shape
(972111, 21)
... # get data between specific period
>>> data = dataset.fetch("20110101", "20201231")
>>> data.shape
(525622, 21)

SWECanada

class aqua_fetch.SWECanada(path=None, **kwargs)[source]

Bases: Datasets

Daily Canadian historical Snow Water Equivalent dataset from 1928 to 2020 from Brown et al., 2019 .

Examples

>>> from water_quality import SWECanada
>>> swe = SWECanada()
... # get names of all available stations
>>> stns = swe.stations()
>>> len(stns)
2607
... # get data of one station
>>> df1 = swe.fetch('SCD-NS010')
>>> df1['SCD-NS010'].shape
(33816, 3)
... # get data of 10 stations
>>> df5 = swe.fetch(5, st='20110101')
>>> df5.keys()
['YT-10AA-SC01', 'ALE-05CA805', 'SCD-NF078', 'SCD-NF086', 'INA-07RA01B']
>>> [v.shape for v in df5.values()]
[(3500, 3), (3500, 3), (3500, 3), (3500, 3), (3500, 3)]
... # get data of 0.1% of stations
>>> df2 = swe.fetch(0.001, st='20110101')
... # get data of one stations starting from 2011
>>> df3 = swe.fetch('ALE-05AE810', st='20110101')
>>> df3.keys()
>>> ['ALE-05AE810']
>>> df4 = swe.fetch(stns[0:10], st='20110101')

__init__(path=None, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

fetch(station_id: None | str | float | int | list = None, features: None | str | list = None, q_flags: None | str | list = None, st=None, en=None) → dict[source]

Fetches time series data from selected stations.

Parameters:

station_id – station/stations to be retrieved. In None, then data from all stations will be returned.
features –
Names of features to be retrieved. Following features are allowed:
- snw snow water equivalent kg/m3
- snd snow depth m
- den snowpack bulk density kg/m3
If None, then all three features will be retrieved.
q_flags –
If None, then no qflags will be returned. Following q_flag values are available.
- data_flag_snw
- data_flag_snd
- qc_flag_snw
- qc_flag_snd
st – start of data to be retrieved
en – end of data to be retrived.

Returns:

a dictionary of dataframes of shape (st:en, features + q_flags) whose length is equal to length of stations being considered.

Return type:

dict

fetch_station_attributes(stn, features_to_fetch, st=None, en=None) → DataFrame[source]: fetches attributes of one station

Miscellaneous

Groundwater of Punjab region

Weisssee

WeatherJena

SWECanada

MtropicsLaos

Datasets