Miscellaneous

This section contains the documentation for the miscellaneous datasets available in the package.

Groundwater of Punjab region

aqua_fetch.gw_punjab(data_type: str = 'full', country: str = None) DataFrame[source]

groundwater level (meters below ground level) dataset from Punjab region (Pakistan and north-west India) following the study of MacAllister et al., 2022.

Parameters:
  • data_type (str (default="full")) – either full or LTS. The full contains the full dataset, there are 68783 rows of observed groundwater level data from 4028 individual sites. In LTS there are 7547 rows of groundwater level observations from 130 individual sites, which have water level data available for a period of more than 40 years and from which at least two thirds of the annual observations are available.

  • country (str (default=None)) – the country for which data to retrieve. Either PAK or IND.

Returns:

a pandas DataFrame with datetime index

Return type:

pd.DataFrame

Examples

>>> from water_quality import gw_punjab
>>> full_data = gw_punjab()
find out the earliest observation
>>> print(full_data.sort_index().head(1))
>>> lts_data = gw_punjab()
>>> lts_data.shape
    (68782, 4)
>>> df_pak = gw_punjab(country="PAK")
>>> df_pak.sort_index().dropna().head(1)

Weisssee

class aqua_fetch.Weisssee(path=None, overwrite=False, **kwargs)[source]

Bases: Datasets

__init__(path=None, overwrite=False, **kwargs)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

  • processes – int number of processes to use for parallel processing

  • verbosity – int determines the amount of information to be printed

  • remove_zip – bool whether to remove the zip files after unz

fetch(**kwargs)[source]

Examples

>>> from water_quality import Weisssee
>>> dataset = Weisssee()
>>> data = dataset.fetch()

WeatherJena

class aqua_fetch.WeatherJena(path=None, obs_loc='roof')[source]

Bases: Datasets

10 minute weather dataset of Jena, Germany hosted at https://www.bgc-jena.mpg.de/wetter/index.html from 2002 onwards.

>>> from water_quality import WeatherJena
>>> dataset = WeatherJena()
>>> data = dataset.fetch()
>>> data.sum()
__init__(path=None, obs_loc='roof')[source]

The ETP data is collected at three different locations i.e. roof, soil and saale(hall).

Parameters:

obs_loc (str, optional (default=roof)) –

location of observation. It can be one of following
  • roof

  • soil

  • saale

property dynamic_features: list

returns names of features available

fetch(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]

Fetches the time series data between given period as pandas dataframe.

Parameters:
  • st (Optional) – start of data to be fetched. If None, the data from start (2003-01-01) will be retuned

  • en (Optional) – end of data to be fetched. If None, the data from till (2021-12-31) end be retuned.

Returns:

a pandas dataframe of shape (972111, 21)

Return type:

pd.DataFrame

Examples

>>> from water_quality import WeatherJena
>>> dataset = WeatherJena()
>>> data = dataset.fetch()
>>> data.shape
(972111, 21)
... # get data between specific period
>>> data = dataset.fetch("20110101", "20201231")
>>> data.shape
(525622, 21)

SWECanada

class aqua_fetch.SWECanada(path=None, **kwargs)[source]

Bases: Datasets

Daily Canadian historical Snow Water Equivalent dataset from 1928 to 2020 from Brown et al., 2019 .

Examples

>>> from water_quality import SWECanada
>>> swe = SWECanada()
... # get names of all available stations
>>> stns = swe.stations()
>>> len(stns)
2607
... # get data of one station
>>> df1 = swe.fetch('SCD-NS010')
>>> df1['SCD-NS010'].shape
(33816, 3)
... # get data of 10 stations
>>> df5 = swe.fetch(5, st='20110101')
>>> df5.keys()
['YT-10AA-SC01', 'ALE-05CA805', 'SCD-NF078', 'SCD-NF086', 'INA-07RA01B']
>>> [v.shape for v in df5.values()]
[(3500, 3), (3500, 3), (3500, 3), (3500, 3), (3500, 3)]
... # get data of 0.1% of stations
>>> df2 = swe.fetch(0.001, st='20110101')
... # get data of one stations starting from 2011
>>> df3 = swe.fetch('ALE-05AE810', st='20110101')
>>> df3.keys()
>>> ['ALE-05AE810']
>>> df4 = swe.fetch(stns[0:10], st='20110101')
__init__(path=None, **kwargs)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

  • processes – int number of processes to use for parallel processing

  • verbosity – int determines the amount of information to be printed

  • remove_zip – bool whether to remove the zip files after unz

fetch(station_id: None | str | float | int | list = None, features: None | str | list = None, q_flags: None | str | list = None, st=None, en=None) dict[source]

Fetches time series data from selected stations.

Parameters:
  • station_id – station/stations to be retrieved. In None, then data from all stations will be returned.

  • features

    Names of features to be retrieved. Following features are allowed:

    • snw snow water equivalent kg/m3

    • snd snow depth m

    • den snowpack bulk density kg/m3

    If None, then all three features will be retrieved.

  • q_flags

    If None, then no qflags will be returned. Following q_flag values are available.

    • data_flag_snw

    • data_flag_snd

    • qc_flag_snw

    • qc_flag_snd

  • st – start of data to be retrieved

  • en – end of data to be retrived.

Returns:

a dictionary of dataframes of shape (st:en, features + q_flags) whose length is equal to length of stations being considered.

Return type:

dict

fetch_station_attributes(stn, features_to_fetch, st=None, en=None) DataFrame[source]

fetches attributes of one station

MtropicsLaos

Datasets