Miscellaneous
This section contains the documentation for the miscellaneous datasets available in the package.
Groundwater of Punjab region
- aqua_fetch.gw_punjab(data_type: str = 'full', country: str = None) DataFrame[source]
groundwater level (meters below ground level) dataset from Punjab region (Pakistan and north-west India) following the study of MacAllister et al., 2022.
- Parameters:
data_type (str (default="full")) – either
fullorLTS. Thefullcontains the full dataset, there are 68783 rows of observed groundwater level data from 4028 individual sites. InLTSthere are 7547 rows of groundwater level observations from 130 individual sites, which have water level data available for a period of more than 40 years and from which at least two thirds of the annual observations are available.country (str (default=None)) – the country for which data to retrieve. Either
PAKorIND.
- Returns:
a pandas DataFrame with datetime index
- Return type:
pd.DataFrame
Examples
>>> from water_quality import gw_punjab >>> full_data = gw_punjab() find out the earliest observation >>> print(full_data.sort_index().head(1)) >>> lts_data = gw_punjab() >>> lts_data.shape (68782, 4) >>> df_pak = gw_punjab(country="PAK") >>> df_pak.sort_index().dropna().head(1)
Weisssee
- class aqua_fetch.Weisssee(path=None, overwrite=False, **kwargs)[source]
Bases:
Datasets- __init__(path=None, overwrite=False, **kwargs)[source]
- Parameters:
name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz
WeatherJena
- class aqua_fetch.WeatherJena(path=None, obs_loc='roof')[source]
Bases:
Datasets10 minute weather dataset of Jena, Germany hosted at https://www.bgc-jena.mpg.de/wetter/index.html from 2002 onwards.
>>> from water_quality import WeatherJena >>> dataset = WeatherJena() >>> data = dataset.fetch() >>> data.sum()
- __init__(path=None, obs_loc='roof')[source]
The ETP data is collected at three different locations i.e. roof, soil and saale(hall).
- Parameters:
obs_loc (str, optional (default=roof)) –
- location of observation. It can be one of following
roof
soil
saale
- fetch(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]
Fetches the time series data between given period as pandas dataframe.
- Parameters:
st (Optional) – start of data to be fetched. If None, the data from start (2003-01-01) will be retuned
en (Optional) – end of data to be fetched. If None, the data from till (2021-12-31) end be retuned.
- Returns:
a pandas dataframe of shape (972111, 21)
- Return type:
pd.DataFrame
Examples
>>> from water_quality import WeatherJena >>> dataset = WeatherJena() >>> data = dataset.fetch() >>> data.shape (972111, 21) ... # get data between specific period >>> data = dataset.fetch("20110101", "20201231") >>> data.shape (525622, 21)
SWECanada
- class aqua_fetch.SWECanada(path=None, **kwargs)[source]
Bases:
DatasetsDaily Canadian historical Snow Water Equivalent dataset from 1928 to 2020 from Brown et al., 2019 .
Examples
>>> from water_quality import SWECanada >>> swe = SWECanada() ... # get names of all available stations >>> stns = swe.stations() >>> len(stns) 2607 ... # get data of one station >>> df1 = swe.fetch('SCD-NS010') >>> df1['SCD-NS010'].shape (33816, 3) ... # get data of 10 stations >>> df5 = swe.fetch(5, st='20110101') >>> df5.keys() ['YT-10AA-SC01', 'ALE-05CA805', 'SCD-NF078', 'SCD-NF086', 'INA-07RA01B'] >>> [v.shape for v in df5.values()] [(3500, 3), (3500, 3), (3500, 3), (3500, 3), (3500, 3)] ... # get data of 0.1% of stations >>> df2 = swe.fetch(0.001, st='20110101') ... # get data of one stations starting from 2011 >>> df3 = swe.fetch('ALE-05AE810', st='20110101') >>> df3.keys() >>> ['ALE-05AE810'] >>> df4 = swe.fetch(stns[0:10], st='20110101')
- __init__(path=None, **kwargs)[source]
- Parameters:
name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz
- fetch(station_id: None | str | float | int | list = None, features: None | str | list = None, q_flags: None | str | list = None, st=None, en=None) dict[source]
Fetches time series data from selected stations.
- Parameters:
station_id – station/stations to be retrieved. In None, then data from all stations will be returned.
features –
Names of features to be retrieved. Following features are allowed:
snwsnow water equivalent kg/m3sndsnow depth mdensnowpack bulk density kg/m3
If None, then all three features will be retrieved.
q_flags –
If None, then no qflags will be returned. Following q_flag values are available.
data_flag_snwdata_flag_sndqc_flag_snwqc_flag_snd
st – start of data to be retrieved
en – end of data to be retrived.
- Returns:
a dictionary of dataframes of shape (st:en, features + q_flags) whose length is equal to length of stations being considered.
- Return type: