Waste Water Treatment
The wwt submodule contains data from approximately 20,000 experiments focused on the removal of various contaminants from wastewater using treatment strategies such as adsorption, photocatalysis, membrane filtration, and sonolysis. This submodule provides a unified interface to access all this data, which is scattered across the literature, in a standardized format using a few Python functions. It is important to note that we do not introduce this data since this data has already been utilized and analyzed in various peer-reviewed scientific publications. However, we offer a simple and easy-to-use interface to access this existing data. The availability of such a large corpus of experimental data can significantly aid in data-driven modeling and material discovery. A summary of these datasets is provided in following table.
List of datasets
Treatment Process |
Function Name |
Parameters |
Target Pollutant |
Data Points |
Reference |
|---|---|---|---|---|---|
|
26 |
Emerg. Contaminants |
3,757 |
||
|
15 |
Cr |
219 |
||
|
30 |
heavy metals |
1518 |
||
|
30 |
po4 |
5014 |
||
|
12 |
Industrial Dye |
1514 |
||
|
17 |
Heavy Metals |
689 |
||
|
8 |
P |
504 |
||
|
8 |
N |
211 |
||
|
13 |
As |
1605 |
||
|
11 |
Melachite Green |
1200 |
||
|
23 |
Dyes |
1527 |
||
|
15 |
2,4,Dichlorophenoxyacetic acid |
1044 |
||
|
2078 |
||||
|
8 |
Tetracycline |
374 |
||
|
7 |
TiO2 |
446 |
||
|
8 |
multiple |
457 |
||
|
18 |
micropollutants |
1906 |
||
|
6 |
Cyanobacteria |
314 |
Adsorption
- aqua_fetch.ec_removal_biochar(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, Dict[str, OneHotEncoder | LabelEncoder | Any]][source]
Data of removal of emerging contaminants/pollutants from wastewater using biochar. The data consists of three types of features, 1) adsorption experimental conditions, 2) elemental composition of adsorbent (biochar) and 3) parameters representing physical and synthesis conditions of biochar. For more description of this data see Jaffari et al., 2023
- Parameters:
parameters –
By default following features are used as input
adsorbentpyrolysis_temperaturepyrolysis_timeCHON(O+N)/CashH/CO/CN/Csurface_areapore_volumeaverage_pore_sizepollutantadsorption_timeconcentrationSolution_phrpmvolumeadsorbent_dosageadsorption_temperatureion_concentrationhumid_acidwastewater_typeadsorption_typefinal_concentrationcapacity
encoding (str, default=None) – the type of encoding to use for categorical features. If not None, it should be either
oheorle.
- Returns:
A tuple of length two. The first element is a DataFrame while the second element is a dictionary consisting of encoders with
adsorbentpollutant,wastewater_typeandadsorption_typeas keys.- Return type:
Examples
>>> from water_datasets import ec_removal_biochar >>> data, _ = ec_removal_biochar() >>> data.shape (3757, 29) >>> data, encoders = ec_removal_biochar(encoding="le") >>> data.shape (3757, 29) >>> len(set(encoders['adsorbent'].inverse_transform(data.loc[:, "adsorbent"]))) 15 >>> len(set(encoders['pollutant'].inverse_transform(data.iloc[:, "Pollutant"]))) 14 >>> set(encoders['wastewater_type'].inverse_transform(data.loc[:, "wastewater_type"])) {'Ground water', 'Lake water', 'Secondary effluent', 'Synthetic'} >>> set(encoders['adsorption_type'].inverse_transform(data.loc[:, "adsorption_type"])) {'Competative', 'Single'}
We can also use one hot encoding to convert categorical features into numerical features. This will obviously increase the number of features/columns in DataFrame
>>> data, encoders = ec_removal_biochar(encoding="ohe") >>> data.shape (3757, 60) >>> len(set(encoders['adsorbent'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('adsorbent')]].values))) 15 >>> len(set(encoders['pollutant'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('pollutant')]].values))) 14 >>> set(encoders['wastewater_type'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('wastewater_type')]].values)) {'Ground water', 'Lake water', 'Secondary effluent', 'Synthetic'} >>> set(encoders['adsorption_type'].inverse_transform(data.iloc[:, [col for col in data.columns if col.startswith('adsorption_type')]].values)) {'Competative', 'Single'}
- aqua_fetch.cr_removal(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, Dict[str, OneHotEncoder | LabelEncoder | Any]][source]
Data from experiments conducted for Cr removal from wastewater using adsorption Ishtiaq et al., 2024
- Parameters:
parameters –
By default following parameters are used
adsorbentNaOH_conc_Msurface_areapore_volumeC_%Al_%Nb_%O_%Na_%pore_sizeadsorption_timeinitial_concloading_g/Lvolume_lloading_gsolution_phcycle_numberfinal_concadsorption_capacityremoval_efficiency
encoding (str, default=None) – the type of encoding to use for categorical parameters. If not None, it should be either
oheorle.
- Returns:
A tuple of length two. The first element is a DataFrame while the second element is a dictionary consisting of encoder with
adsorbentas key.- Return type:
Examples
>>> from water_datasets import cr_removal >>> data, _ = cr_removal() >>> data.shape (219, 20) >>> data, encoders = cr_removal(encoding="le") >>> data.shape (219, 20) >>> len(set(encoders['adsorbent'].inverse_transform(data.loc[:, "adsorbent"]))) 5 >>> set(encoders['adsorbent'].inverse_transform(data.loc[:, "adsorbent"])) {'5M Nb2CTx', '20M Nb2CTx', '15M Nb2CTx', 'Nb2AlC', '10M Nb2CTx'} >>> data, encoders = cr_removal(encoding="ohe") >>> data.shape (219, 24)
We can also use one hot encoding to convert categorical features into numerical features. This will obviously increase the number of features/columns in DataFrame
>>> data, encoders = ec_removal_biochar(encoding="ohe") >>> data.shape (3757, 60)
- aqua_fetch.po4_removal_biochar(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, Dict[str, OneHotEncoder | LabelEncoder | Any]][source]
Data from adsorption experiments conducted for Cr removal from wastewater using biochar. For details on data see Iftikhar et al., 2023
- Parameters:
parameters –
The parameters of the adsorption. It must be one of the following:
adsorbentfeedstockactivationpyrolysis_tempheating_ratepyrolysis_timeC_%H_%O_%N_%S_%Ca_%ashH/CO/CN/C(O+N/C)surface_areapore_volumeavg_pore_sizeadsorption_time_minCi_ppmsolution_pHrpmvolume_lloading_gloading_g/Ladsorption_tempion_concentration_mMion_typefinal_confqeefficiency
encoding (str, default=None) – the type of encoding to use for categorical parameters. If not None, it should be either
oheorle.
- aqua_fetch.heavy_metal_removal(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, Dict[str, OneHotEncoder | LabelEncoder | Any]][source]
Data from experiments conducted for heavy metal removal from wastewater using adsorption. For more details on data see Jaffari et al., 2024 .
- Parameters:
parameters –
By default following parameters are used
adsorbentNaOH_conc_Msurface_areapore_volumeC_%Al_%Nb_%O_%Na_%pore_sizeadsorption_timeinitial_concloading_g/Lvolume_lloading_gsolution_phcycle_numberfinal_conc
encoding (str, default=None) – the type of encoding to use for categorical parameters. If not None, it should be either
oheorle.
- Returns:
A tuple of length two. The first element is a DataFrame while the second element is a dictionary consisting of encoder with
adsorbentas key.- Return type:
Examples
>>> from water_datasets import heavy_metal_removal >>> data, _ = heavy_metal_removal() >>> data.shape (219, 18)
>>> data, encoders = heavy_metal_removal(encoding="le") >>> data.shape (219, 18) >>> len(set(encoders['adsorbent'].inverse_transform(data.loc[:, "adsorbent"]))) 5 >>> set(encoders['adsorbent'].inverse_transform(data.loc[:, "adsorbent"])) {'5M Nb2CTx', '20M Nb2CTx', '15M Nb2CTx', 'Nb2AlC', '10M Nb2CTx'} >>> data, encoders = heavy_metal_removal(encoding="ohe") >>> data.shape (219, 22) >>> len(set(encoders['adsorbent'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('adsorbent')]].values))) 5
- aqua_fetch.industrial_dye_removal(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, Dict[str, OneHotEncoder | LabelEncoder | Any]][source]
Data from experiments conducted for industrial dye removal from wastewater using adsorption. For more details on data see Iftikhar et al., 2023 .
- Parameters:
parameters –
By default following parameters are used
adsorbentcalcination_temperaturecalcination_time_minC_%H_%O_%N_%ashH/CO/CN/Csurface_areapore_volumeaverage_pore_sizedyeadsorption_time_mininitial_concentrationsolution_phrpmvolume_lloading_g/ladsorption_temperatureion_concentration_Mhumic_acidwastewater_typeadsorption_typefinal_concentrationqeadsorbent_loading
encoding (str, default=None) – the type of encoding to use for categorical parameters. If not None, it should be either
oheorle.
- Returns:
A tuple of length two. The first element is a DataFrame while the second element is a dictionary consisting of encoders with
adsorbentanddyeas keys.- Return type:
Examples
>>> from water_datasets import industrial_dye_removal >>> data, _ = industrial_dye_removal() >>> data.shape (680, 29) >>> data, encoders = industrial_dye_removal(encoding="le") >>> data.shape (680, 29) >>> len(set(encoders['adsorbent'].inverse_transform(data.loc[:, "adsorbent"]))) 7 >>> len(set(encoders['dye'].inverse_transform(data.loc[:, "dye"]))) 4 >>> data, encoders = industrial_dye_removal(encoding="ohe") >>> data.shape (680, 38) >>> len(set(encoders['adsorbent'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('adsorbent')]].values))) 7 >>> len(set(encoders['dye'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('dye')]].values))) 4
- aqua_fetch.heavy_metal_removal_Shen(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, Dict[str, OneHotEncoder | LabelEncoder | Any]][source]
Data from experiments conducted for heavy metal removal from wastewater using adsorption. For more details on data see Shen et al., 2024
- Parameters:
parameters –
By default following parameters are used
heavy_metalhm_labelph_bicharC_%(O+N)/CO/CH/CashPSSACECtemperaturesolution_phC0χrNchargen
encoding (str, default=None) – the type of encoding to use for categorical parameters. If not None, it should be either
oheorle.
- Returns:
A tuple of length two. The first element is a DataFrame while the second element is a dictionary consisting of encoders with
heavy_metalandhm_labelas keys.- Return type:
Examples
>>> from water_datasets import heavy_metal_removal_Shen >>> data, _ = heavy_metal_removal_Shen() >>> data.shape (353, 18) >>> data, encoders = heavy_metal_removal_Shen(encoding="le") >>> data.shape (353, 18) >>> len(set(encoders['heavy_metal'].inverse_transform(data.loc[:, "heavy_metal"]))) 10 >>> len(set(encoders['hm_label'].inverse_transform(data.loc[:, "hm_label"]))) 42 >>> data, encoders = heavy_metal_removal_Shen(encoding="ohe") >>> data.shape (353, 68) >>> len(set(encoders['heavy_metal'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('heavy_metal')]].values))) 10 >>> len(set(encoders['hm_label'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('hm_label')]].values))) 42
- aqua_fetch.P_recovery(parameters: str | List[str] = 'all', encoding: str = None)[source]
Data from experiments conducted for P recovery from wastewater using adsorption. For more details on data see Leng et al., 2024 .
- Parameters:
parameters –
parameters to use as input. By default following parameters are used
stir(rpm)t(min)T(℃)pHN:PMg:PP_initial(mg/L)P_recovery(%)
encoding (str, default=None) – the type of encoding to use for categorical parameters. If not None, it should be either
oheorle.
- Returns:
A tuple of length two. The first element is a DataFrame while the second element is an empty dictionary.
- Return type:
Examples
>>> from water_datasets import P_recovery >>> data, _ = P_recovery() >>> data.shape (504, 8)
- aqua_fetch.N_recovery(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, dict][source]
Data from experiments conducted for N recovery from wastewater using adsorption. For more details on data see Leng et al., 2024 .
- Parameters:
parameters –
parameters to use as input. By default following parameters are used
stir(rpm)t(min)T(℃)pHN:PMg: NP_initial(mg/L)N_recovery(%)
encoding (str, default=None) – the type of encoding to use for categorical parameters. If not None, it should be either
oheorle.
- Returns:
A tuple of length two. The first element is a DataFrame while the second element is an empty dictionary.
- Return type:
Examples
>>> from water_datasets import N_recovery >>> data, _ = N_recovery() >>> data.shape (210, 8)
- aqua_fetch.As_recovery(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, Dict[str, Any]][source]
Data from experiments conducted for As recovery from wastewater using adsorption. For more details on data see Huang et al., 2023 .
- Parameters:
parameters –
parameters to use as input. By default following parameters are used
materialbiochar_modificationbiochar_typeBET_surface_areapore_volumesolution_pHreactor_temperatureinitial_As_concentration_mg_Ladsorbent_dosageequilibrium_reaction_time_hpyrolysis_temperatureAs_mg_gAs_type
encoding (str, default=None) – the type of encoding to use for categorical parameters. If not None, it should be either
oheorle.
- Returns:
A tuple of length two. The first element is a DataFrame while the second element is a dictionary consisting of encoders with
material,biochar_modification,biochar_typeandAs_typeas keys.- Return type:
Examples
>>> from water_datasets import As_recovery ... # Using default parameters >>> data, _ = As_recovery() >>> data.shape (1605, 13) ... # Using label encoding >>> data, encoders = As_recovery(encoding="le") >>> data.shape (1605, 13) >>> len(set(encoders['material'].inverse_transform(data.loc[:, "material"]))) 72 >>> len(set(encoders['biochar_modification'].inverse_transform(data.loc[:, "biochar_modification"]))) 2 >>> len(set(encoders['biochar_type'].inverse_transform(data.loc[:, "biochar_type"]))) 159 >>> len(set(encoders['As_type'].inverse_transform(data.loc[:, "As_type"]))) 2 ... # Using one hot encoding >>> data, encoders = As_recovery(encoding="ohe") >>> data.shape (1605, 244) >>> len(set(encoders['material'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('material')]].values))) 72 >>> len(set(encoders['biochar_modification'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('biochar_modification')]].values))) 2 >>> len(set(encoders['biochar_type'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('biochar_type')]].values))) 159 >>> len(set(encoders['As_type'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('As_type')]].values))) 2
Photocatalysis
- aqua_fetch.mg_degradation(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, Dict[str, OneHotEncoder | LabelEncoder | Any]][source]
This data is about photocatalytic degradation of melachite green dye using nobel metal dobe BiFeO3. For further description of this data see Jafari et al., 2023 and for the use of this data for removal efficiency prediction see . This dataset consists of 1200 points collected during ~135 experiments.
- Parameters:
parameters (list, optional) –
features to use as input. By default following features are used as input
Catalyst_typeSurface areaPore VolumeCatalyst_loading (g/L)Light_intensity (W)time (min)solution_pHHA (mg/L)AnionsCi (mg/L)Cf (mg/L)Efficiency (%)k_firstk_2nd
encoding (str, default=None) – type of encoding to use for the two categorical features i.e.,
catalyst_typeandanions, to convert them into numberical. Available options areohe,leand None. Ifoheis selected the original input columns are replaced with ohe hot encoded columns. This will result in 6 columns for Anions and 15 columns for catalyst_type.
- Returns:
A tuple of length two. The first element is a DataFrame of shape (1200, len(parameters)) while the second element is a dictionary consisting of encoders with
catalyst_typeandanionsas keys.- Return type:
Examples
>>> from water_datasets import mg_degradation >>> mg_data, encoders = mg_degradation() >>> mg_data.shape (1200, 14) ... # the default encoding is None, but if we want to use one hot encoder >>> mg_data_ohe, encoders = mg_degradation(encoding="ohe") >>> mg_data_ohe.shape (1200, 33) >>> encoders['catalyst_type'].inverse_transform(mg_data_ohe.loc[:, [col for col in data.columns if col.startswith('catalyst_type')]].values) >>> encoders['anions'].inverse_transform(mg_data_ohe.loc[:, [col for col in data.columns if col.startswith('anions')]].values) ... # if we want to use label encoder >>> mg_data_le, cat_enc, an_enc = mg_degradation(encoding="le") >>> mg_data_le.shape (1200, 14) >>> encoders['catalyst_type'].inverse_transform(mg_data_le.loc[:, 'catalyst_type'].values.astype(int)) >>> encoders['anions'].inverse_transform(mg_data_le.loc[:, 'anions'].values.astype(int)) ... # By default the target is efficiency but if we want ... # to use first order k as target >>> mg_data_k, _ = mg_degradation() ... # if we want to use 2nd order k as target >>> mg_data_k2, _ = mg_degradation()
- aqua_fetch.dye_removal(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, Dict[str, OneHotEncoder | LabelEncoder | Any]][source]
Data from experiments conducted to measure dye removal rate from wastewater treatment using photocatalysis method. For more information on data see Kim et al., 2024 .
- Parameters:
parameters (list, optional) –
features to use as input. It must be a subset of the following features
catalysthydrothermal_synthesis_time_min)energy_Band_gap_Eg) eVC_%O_%Fe_%Al_%Ni_%Mo_%S_%BiAgPdPtsurface_area_m2/gpore_volume_cm3/gpore_size_nmvolume_Lloading_gcatalyst_loading_mglight_intensity_wattlight_source_distance_cmtime_mdyelog_Kwhydrogen_bonding_acceptor_counthydrogen_bonding_donor_countsolubility_g/Lmolecular_wt_g/molpka1pka2dye_concentration_mg/Lsolution_pHHA_mg/Lanions
- encodingstr, default=None
type of encoding to use for the two categorical features i.e.,
Catalyst_typedyeandAnions, to convert them into numberical. Available options areohe,leand None.
- Returns:
A tuple of length two. The first element is a DataFrame of shape (1200, len(parameters)) while the second element is a dictionary consisting of encoders with
catalyst_typeandanionsas keys.- Return type:
Examples
>>> from water_datasets import dye_removal
>>> data, encoders = dye_removal() >>> assert data.shape == (1527, 36) # using label encoding to encode the categorical variables >>> data, encoders = dye_removal(encoding='le') >>> assert data.shape == (1527, 36), data.shape >>> catalysts = encoders['catalyst'].inverse_transform(data.loc[:, 'catalyst'].values) >>> len(set(catalysts.tolist())) 18 >>> dye = encoders['dye'].inverse_transform(data.loc[:, "dye"].values) >>> set(dye.tolist()) {'Melachite Green', 'Indigo'} >>> anions = encoders['anions'].inverse_transform(data.loc[:,'anions'].values) >>> set(anions.tolist()) {'NaCO3', 'N/A', 'Na2SO4', 'Na2HPO4', 'NaHCO3', 'NaCl'} # using one hot encoding for categroicla parameters >>> data, encoders = dye_removal(encoding='ohe') >>> assert data.shape == (1527, 59), data.shape >>> catalysts = encoders['catalyst'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('catalyst')]].values) >>> len(set(catalysts.tolist())) 18 >>> dye = encoders['dye'].inverse_transform(data.loc[:, ["dye_0", "dye_1"]].values) >>> set(dye.tolist()) {'Melachite Green', 'Indigo'} >>> anions = encoders['anions'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('anions')]].values) >>> set(anions.tolist()) {'NaCO3', 'N/A', 'Na2SO4', 'Na2HPO4', 'NaHCO3', 'NaCl'}
- aqua_fetch.dichlorophenoxyacetic_acid_removal(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, Dict[str, OneHotEncoder | LabelEncoder | Any]][source]
Data for photodegradation of 2,4-dichlorophenoxyacetic acid using gold-doped bismuth ferrite
- Parameters:
parameters (list, optional) –
features to use as input. It must be a subset of the following features
catalystsurface_areapore_volumeenergy_band_gap_eVAu_%Bi_%Fe_%O_%catalyst_loading_g/llight_intensity_watt``time_min
solution_phanionsini_conc_mg/lfinal_conc_mg/lefficiency_%
- encodingstr, default=None
type of encoding to use for the two categorical features i.e.,
Catalyst_typedyeandAnions, to convert them into numberical. Available options areohe,leand None.
- Returns:
A tuple of length two. The first element is a DataFrame of shape (1200, len(parameters)) while the second element is a dictionary consisting of encoders with
catalyst_typeandanionsas keys.- Return type:
Examples
>>> from water_datasets import dichlorophenoxyacetic_acid_removal ... # by default all parameters are returned >>> data, encoders = dichlorophenoxyacetic_acid_removal() >>> assert data.shape == (1044, 16), data.shape # using label encoding for categorical parameters >>> data, encoders = dichlorophenoxyacetic_acid_removal(encoding='le') >>> assert data.shape == (1044, 16), data.shape >>> catalysts = encoders['catalyst'].inverse_transform(data.loc[:, 'catalyst'].values) >>> assert len(set(catalysts.tolist())) == 7 >>> anions = encoders['anions'].inverse_transform(data.loc[:,'anions'].values) >>> set(anions.tolist()) {'Na2SO4', 'Without Anions', 'Na2HPO4', 'NaHCO3', 'NaCO3', 'NaCl'} # using one hot encoding for categorical parameters >>> data, encoders = dichlorophenoxyacetic_acid_removal(encoding='ohe') >>> assert data.shape == (1044, 27), data.shape >>> catalysts = encoders['catalyst'].inverse_transform(data.loc[:, ['catalyst_0', 'catalyst_1', 'catalyst_2', 'catalyst_3', 'catalyst_4', 'catalyst_5', 'catalyst_6']].values) >>> assert len(set(catalysts.tolist())) == 7 >>> anions = encoders['anions'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('anions')]].values) >>> set(anions.tolist()) {'Na2SO4', 'Without Anions', 'Na2HPO4', 'NaHCO3', 'NaCO3', 'NaCl'}
- aqua_fetch.pms_removal(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, Dict[str, OneHotEncoder | LabelEncoder | Any]][source]
Data for photodegradation of phenol using peroxymonosulfate.
- Parameters:
parameters (list, optional) –
Names of the parameters to use. By default following parameters are used
time_mincatalyst_typemagnetization_Ms_emu/genergy_band_gap_eVcalcination_temp_Cmin_calcination_timesurface_areapore_sizepollutantpoll_mol_formulapms_concentration_g/llight_intensity_wattlight_typecatalyst_dosage_g/lini_conc_ppmsolution_phH2O2_Conc_ppmvolume_mlstirring_speed_rpmradical_scavengerinorganic anionswater_typecycle_numfinal_conc_ppmremoval_efficiency_%
encoding (str, default=None) – type of encoding to use for the two categorical features i.e.,
Catalyst_typedyeandAnions, to convert them into numberical. Available options areohe,leand None.
- Returns:
A tuple of length two. The first element is a DataFrame of shape (2078, len(parameters)) while the second element is a dictionary consisting of encoders with
catalyst_type,pollutant,poll_mol_formulaandwater_typeas keys.- Return type:
Examples
>>> from water_datasets import pms_removal >>> data, encoders = pms_removal() >>> data.shape (2078, 25) ... # the default encoding is None, but if we want to use one hot encoder >>> data_ohe, encoders = pms_removal(encoding="ohe") >>> data_ohe.shape (2078, 100) >>> catalysts = encoders['catalyst_type'].inverse_transform(data_ohe.loc[:, [col for col in data.columns if col.startswith('catalyst_type')]].values) >>> len(set(catalysts)) 42 >>> pollutants = encoders['pollutant'].inverse_transform(data_ohe.loc[:, [col for col in data.columns if col.startswith('pollutant')]].values) >>> len(set(pollutants)) 14 >>> poll_mol_formula = encoders['poll_mol_formula'].inverse_transform(data_ohe.loc[:, [col for col in data.columns if col.startswith('poll_mol_formula')]].values) >>> len(set(poll_mol_formula)) 14 >>> water_type = encoders['water_type'].inverse_transform(data_ohe.loc[:, [col for col in data.columns if col.startswith('water_type')]].values) >>> len(set(water_type)) 9 ... # if we want to use label encoder >>> data_le, encoders = pms_removal(encoding="le") >>> data_le.shape (2078, 25) >>> catalysts = encoders['catalyst_type'].inverse_transform(data_le.loc[:, 'catalyst_type'].values) >>> len(set(catalysts)) 42 >>> pollutants = encoders['pollutant'].inverse_transform(data_le.loc[:, 'pollutant'].values) >>> len(set(pollutants)) 14 >>> poll_mol_formula = encoders['poll_mol_formula'].inverse_transform(data_le.loc[:, 'poll_mol_formula'].values) >>> len(set(poll_mol_formula)) 14 >>> water_type = encoders['water_type'].inverse_transform(data_le.loc[:, 'water_type'].values) >>> len(set(water_type)) 9
- aqua_fetch.tetracycline_degradation(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, dict][source]
Data for photodegradation of tetracycline. For details on data see Abdi et al., 2022 .
- Parameters:
parameters (list, optional) –
Names of the parameters to use. By default, following parameters are used
surf_area_m2gpore_vol_cm3gcatalyst_dosage_gLantibiotic_dosage_mgLillumination_time_minpHmetallic_org_frameworkefficiency_%
encoding (str, default=None) – type of encoding to use for the categorical features. It can be either ‘ohe’, ‘le’ or None. If ‘ohe’ is selected the original categroical column (
metallic_org_framework) is replaced with one hot encoded columns. If ‘le’ is selected the original column is replaced with a label encoded column. If None is selected, the original column is not replaced.
- Returns:
A tuple of length two. The first element is a DataFrame of shape (474, len(parameters)) while the second element is a dictionary consisting of encoders with
metallic_org_frameworkas key.- Return type:
Examples
>>> from water_datasets import tetracycline_degradation >>> data, encoders = tetracycline_degradation() >>> data.shape (374, 8)
>>> data, encoders = tetracycline_degradation(encoding='le') >>> data.shape (374, 8) >>> mofs = encoders['metallic_org_framework'].inverse_transform(data.loc[:, 'metallic_org_framework'].values) >>> len(set(mofs)) 10
>>> data, encoders = tetracycline_degradation(encoding='ohe') >>> data.shape (374, 17) >>> mofs = encoders['metallic_org_framework'].inverse_transform(data.loc[:, [col for col in data.columns if col.startswith('metallic_org_framework')]].values) >>> len(set(mofs)) 10
- aqua_fetch.tio2_degradation(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, dict][source]
Data for photodegradation of tio2
For details on data see Jiang et al., 2020 .
- Parameters:
- Returns:
A tuple of length two. The first element is a DataFrame of shape (446, len(parameters)) while the second element is an empty dictionary.
- Return type:
Examples
>>> from water_datasets import tio2_degradation >>> data, encoders = tio2_degradation() >>> data.shape (446, 7)
- aqua_fetch.photodegradation_Jiang(parameters: str | List[str] = 'all', encoding: str = None) Tuple[DataFrame, Dict[str, OneHotEncoder | LabelEncoder | Any]][source]
Data for photodegradation of multiple pollutants using various photocatalysts. For details on data see Jiang et al., 2021 .
- Parameters:
parameters (list, optional) –
Names of the parameters to use. By default following parameters are used
photocatalystcontaminantsphotocat_dosage_glphotocat_size_nminitial_conc_mglpHlight_typek_min-1
encoding (str, default=None) – type of encoding to use for the categorical features. It can be either
ohe,leor None. Ifoheis selected the original categroical column is replaced with one hot encoded columns. Ifleis selected the original column is replaced with a label encoded column. If None is selected, the original column is not replaced.
- Returns:
A tuple of length two. The first element is a DataFrame of shape (446, len(parameters)) while the second element is a dictionary consisting of encoders with
photocatalystandcontaminantsas keys.- Return type:
Examples
>>> from water_datasets import photodegradation_Jiang >>> data, encoders = photodegradation_Jiang() >>> data.shape (449, 8) ... # the default encoding is None, but if we want to use one hot encoder >>> data_ohe, encoders = photodegradation_Jiang(encoding="ohe") >>> data_ohe.shape (449, 16) >>> photocatalysts = encoders['photocatalyst'].inverse_transform(data_ohe.loc[:, [col for col in data.columns if col.startswith('photocatalyst')]].values) >>> len(set(photocatalysts)) 100 >>> contaminants = encoders['contaminants'].inverse_transform(data_ohe.loc[:, [col for col in data.columns if col.startswith('contaminants')]].values) >>> len(set(contaminants)) 47 ... # if we want to use label encoder >>> data_le, encoders = photodegradation_Jiang(encoding="le") >>> data_le.shape (449, 8) >>> photocatalysts = encoders['photocatalyst'].inverse_transform(data_le.loc[:, 'photocatalyst'].values) >>> len(set(photocatalysts)) 100 >>> contaminants = encoders['contaminants'].inverse_transform(data_le.loc[:, 'contaminants'].values) >>> len(set(contaminants)) 47