identify.calculate
Calculate the Normalized Mean Average Error along the rows. |
|
Calculate the Normalized Mean Biased Error along the rows. |
|
Calculate the Normalized Root Mean Square Error along the rows. |
|
Calculate average results on timevarying dataframes. |
|
Calculate Error Value between timevarying dataframes. |
|
Calc Pearson Correlation Coefficient between timevarying dataframes. |
Tessif module providing calc. tools for identifying result differences.
- tessif.identify.calculate.calc_nmae(dataframes_dict, reference_df, method='mean')[source]
Calculate the Normalized Mean Average Error along the rows.
- Parameters:
dataframes_dict¶ (dict) – Dictionairy of of
pandas.DataFrameobjects to calculate the nmae error values between columns of identically indexed columns relative to thereference_df(see example for less gibberish). Designed for using with timevarying (load) results between different softwares for the same component(s).reference_df¶ (pandas.DataFrame) – Dataframe indexed like those of
dataframes_dictmethod¶ ({"mean", "spread", "std"}, default = "mean") –
Method of normalization:
"mean": MAE is divided bymean(reference)"spread": MAE is divided byabs(max(reference)-min(reference))"std"MAE is divided bystd(reference)
- Returns:
DataFrame holding the calculated NMAE. Columns and index are swapped in comparison to the dataframes passed as arguments.
- Return type:
Examples
Picking up on the
Identificier Data Input Example:>>> import pandas as pd >>> software1 = pd.DataFrame( ... data=[[10, 8, 2], [0, 0, 0], [20, 2, 18]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... ) >>> reference_df = pd.DataFrame( ... data=[[13, 7, 1990], [42, 0, 42], [90, 0, 0]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... )
Original Data Frames:
>>> print(software1) A B B C D 2019-01-01 00:00:00 10 8 2 2019-01-01 01:00:00 0 0 0 2019-01-01 02:00:00 20 2 18
>>> print(reference_df) A B B C D 2019-01-01 00:00:00 13 7 1990 2019-01-01 01:00:00 42 0 42 2019-01-01 02:00:00 90 0 0
Normalized Mean Average Error:
>>> nmae = calc_nmae( ... dataframes_dict={"software1": software1}, ... reference_df=reference_df, ... ) >>> print(nmae) software1 A B 0.793103 B C 0.428571 D 1.007874
Using “spread” instead of the default “mean” normalization:
>>> nmae = calc_nmae( ... dataframes_dict={"software1": software1}, ... reference_df=reference_df, ... method="spread", ... ) >>> print(nmae) software1 A B 0.497835 B C 0.142857 D 0.343049
- tessif.identify.calculate.calc_nmbe(dataframes_dict, reference_df, method='mean')[source]
Calculate the Normalized Mean Biased Error along the rows.
- Parameters:
dataframes_dict¶ (dict) – Dictionairy of of
pandas.DataFrameobjects to calculate the nmbe error values between columns of identically indexed columns relative to thereference_df(see example for less gibberish). Designed for using with timevarying (load) results between different softwares for the same component(s).reference_df¶ (pandas.DataFrame) – Dataframe indexed like those of
dataframes_dictmethod¶ ({"mean", "spread", "std"}, default = "mean") –
Method of normalization:
"mean": MBE is divided bymean(reference)"spread": MBE is divided byabs(max(reference)-min(reference))"std"MBE is divided bystd(reference)
- Returns:
DataFrame holding the calculated NMBE. Columns and index are swapped in comparison to the dataframes passed as arguments.
- Return type:
Examples
Picking up on the
Identificier Data Input Example:>>> import pandas as pd >>> software1 = pd.DataFrame( ... data=[[10, 8, 2], [0, 0, 0], [20, 2, 18]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... ) >>> reference_df = pd.DataFrame( ... data=[[13, 7, 1990], [42, 0, 42], [90, 0, 0]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... )
Original Data Frames:
>>> print(software1) A B B C D 2019-01-01 00:00:00 10 8 2 2019-01-01 01:00:00 0 0 0 2019-01-01 02:00:00 20 2 18
>>> print(reference_df) A B B C D 2019-01-01 00:00:00 13 7 1990 2019-01-01 01:00:00 42 0 42 2019-01-01 02:00:00 90 0 0
Normalized Mean Biased Error:
>>> nmbe = calc_nmbe( ... dataframes_dict={"software1": software1}, ... reference_df=reference_df, ... ) >>> print(nmbe) software1 A B -0.793103 B C 0.428571 D -0.990157
Using “spread” instead of the default “mean” normalization:
>>> nmbe = calc_nmbe( ... dataframes_dict={"software1": software1}, ... reference_df=reference_df, ... method="spread", ... ) >>> print(nmbe) software1 A B -0.497835 B C 0.142857 D -0.337018
- tessif.identify.calculate.calc_nrmse(dataframes_dict, reference_df, method='mean')[source]
Calculate the Normalized Root Mean Square Error along the rows.
- Parameters:
dataframes_dict¶ (dict) – Dictionairy of of
pandas.DataFrameobjects to calculate the NRMSE error values between columns of identically indexed columns relative to thereference_df(see example for less gibberish). Designed for using with timevarying (load) results between different softwares for the same component(s).reference_df¶ (pandas.DataFrame) – Dataframe indexed like those of
dataframes_dictmethod¶ ({"mean", "spread", "std"}, default = "mean") –
Method of normalization:
"mean": RMSE is divided bymean(reference)"spread": RMSE is divided byabs(max(reference)-min(reference))"std"NRMSE is divided bystd(reference)
- Returns:
DataFrame holding the calculated NRMSE. Columns and index are swapped in comparison to the dataframes passed as arguments.
- Return type:
Examples
Picking up on the
Identificier Data Input Example:>>> import pandas as pd >>> software1 = pd.DataFrame( ... data=[[10, 8, 2], [0, 0, 0], [20, 2, 18]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... ) >>> reference_df = pd.DataFrame( ... data=[[13, 7, 1990], [42, 0, 42], [90, 0, 0]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... )
Original Data Frames:
>>> print(software1) A B B C D 2019-01-01 00:00:00 10 8 2 2019-01-01 01:00:00 0 0 0 2019-01-01 02:00:00 20 2 18
>>> print(reference_df) A B B C D 2019-01-01 00:00:00 13 7 1990 2019-01-01 01:00:00 42 0 42 2019-01-01 02:00:00 90 0 0
Normalized Root Mean Square Error:
>>> nrmse = calc_nrmse( ... dataframes_dict={"software1": software1}, ... reference_df=reference_df, ... ) >>> print(nrmse) software1 A B 0.975783 B C 0.553283 D 1.694993
Using “spread” instead of the default “mean” normalization:
>>> nrmse = calc_nrmse( ... dataframes_dict={"software1": software1}, ... reference_df=reference_df, ... method="spread", ... ) >>> print(nrmse) software1 A B 0.612504 B C 0.184428 D 0.576922
- tessif.identify.calculate.calc_avgs(dataframes)[source]
Calculate average results on timevarying dataframes.
Takes any number of
pandas.DataFrameobjects to calculate the average between rows of identically indexed columns (see example for less gibberish). Designed to average the timevarying (load) results between different softwares for the same component(s).- Parameters:
dataframes¶ (Container) – Container of
pandas.DataFrameobjects of which each row is averaged out.- Returns:
Averaged out results
- Return type:
Examples
Picking up on the
Identificier Data Input Example:>>> import pandas as pd >>> software1 = pd.DataFrame( ... data=[[10, 8, 2], [0, 0, 0], [20, 2, 18]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... ) >>> software2 = pd.DataFrame( ... data=[[13, 7, 1990], [42, 0, 42], [90, 0, 0]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... )
Original Data Frames:
>>> print(software1) A B B C D 2019-01-01 00:00:00 10 8 2 2019-01-01 01:00:00 0 0 0 2019-01-01 02:00:00 20 2 18
>>> print(software2) A B B C D 2019-01-01 00:00:00 13 7 1990 2019-01-01 01:00:00 42 0 42 2019-01-01 02:00:00 90 0 0
Average Results:
>>> averaged_results = calc_avgs( ... [software1, software2]) >>> print(averaged_results) A B B C D 2019-01-01 00:00:00 11.5 7.5 996.0 2019-01-01 01:00:00 21.0 0.0 21.0 2019-01-01 02:00:00 55.0 1.0 9.0
- tessif.identify.calculate.calc_evs(dataframes, labels=None, reference=None, error='NMAE', normalization='mean')[source]
Calculate Error Value between timevarying dataframes.
Takes any number of
pandas.DataFrameobjects to calculate chosen error values between rows of identically indexed columns (see example for less gibberish). Designed for using with timevarying (load) results between different softwares for the same component(s).- Parameters:
dataframes¶ (Container) – Container of
pandas.DataFrameobjects of which each row is averaged out.labels¶ (Container, None, default=None) – Container of strings specifying the respective dataframe labels. Equals software names in the design case.
reference¶ (int, str, None, default=None) –
Defines the reference results to be used for calculating the statistical error values. Integer denotes the 0-indexed container position of
dataframes. String the respective label. String parameter only works iflabelsare stated as container of strings.In case
Noneis used (default), the dataframes average is used as returned byaverage_timevarying_dataframe_results().String abbrevating the error value calculated. Currently supported are:
nmaeforNormalized Mean Average Error(default)nmbeforNormalized Mean Biased ErrornrmseforNormalized Root Mean Square Error
normalization¶ ({"mean", "spread"}, default = "mean") –
Method of error value normalization:
"mean": NMBE is divided bymean(reference)"spread": NMBE is divided byabs(max(reference)-min(reference))
- Returns:
DataFrame holding the calculated error values. Columns and index are swapped in comparison to the dataframes passed as arguments.
- Return type:
Examples
Picking up on the
Identificier Data Input Example:>>> import pandas as pd >>> software1 = pd.DataFrame( ... data=[[10, 8, 2], [0, 0, 0], [20, 2, 18]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... ) >>> software2 = pd.DataFrame( ... data=[[13, 7, 1990], [42, 0, 42], [90, 0, 0]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... ) >>> software3 = pd.DataFrame( ... data=[[13, 7, 1990], [42, 0, 42], [90, 0, 0]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... )
Original Data Frames:
>>> print(software1) A B B C D 2019-01-01 00:00:00 10 8 2 2019-01-01 01:00:00 0 0 0 2019-01-01 02:00:00 20 2 18
>>> print(software2) A B B C D 2019-01-01 00:00:00 13 7 1990 2019-01-01 01:00:00 42 0 42 2019-01-01 02:00:00 90 0 0
>>> print(software3) A B B C D 2019-01-01 00:00:00 13 7 1990 2019-01-01 01:00:00 42 0 42 2019-01-01 02:00:00 90 0 0
Normalized Mean Average Error:
>>> nmae = calc_evs( ... dataframes=[software1, software2, software3], ... labels=["software1", "software2", "software3"], ... reference="software2", ... error="nmae", ... ) >>> print(nmae) software1 software2 software3 A B 0.793103 0.0 0.0 B C 0.428571 0.0 0.0 D 1.007874 0.0 0.0
Using “spread” normalization:
>>> nmae = calc_evs( ... dataframes=[software1, software2, software3], ... labels=["software1", "software2", "software3"], ... reference="software2", ... error="nmae", ... normalization="spread", ... ) >>> print(nmae) software1 software2 software3 A B 0.497835 0.0 0.0 B C 0.142857 0.0 0.0 D 0.343049 0.0 0.0
Normalized Mean Biased Error:
>>> nmbe = calc_evs( ... dataframes=[software1, software2, software3], ... labels=["software1", "software2", "software3"], ... reference="software2", ... error="nmbe", ... ) >>> print(nmbe) software1 software2 software3 A B -0.793103 0.0 0.0 B C 0.428571 0.0 0.0 D -0.990157 0.0 0.0
Using “spread” normalization:
>>> nmbe = calc_evs( ... dataframes=[software1, software2, software3], ... labels=["software1", "software2", "software3"], ... reference="software2", ... error="nmbe", ... normalization="spread", ... ) >>> print(nmbe) software1 software2 software3 A B -0.497835 0.0 0.0 B C 0.142857 0.0 0.0 D -0.337018 0.0 0.0
Normalized Root Mean Square Error:
>>> nrmse = calc_evs( ... dataframes=[software1, software2, software3], ... labels=["software1", "software2", "software3"], ... reference="software2", ... error="nrmse", ... ) >>> print(nrmse) software1 software2 software3 A B 0.975783 0.0 0.0 B C 0.553283 0.0 0.0 D 1.694993 0.0 0.0
Using “spread” normalization:
>>> nrmse = calc_evs( ... dataframes=[software1, software2, software3], ... labels=["software1", "software2", "software3"], ... reference="software2", ... error="nrmse", ... normalization="spread", ... ) >>> print(nrmse) software1 software2 software3 A B 0.612504 0.0 0.0 B C 0.184428 0.0 0.0 D 0.576922 0.0 0.0
- tessif.identify.calculate.calc_corrs(dataframes, method='pearson', labels=None, reference=None, fillna=None)[source]
Calc Pearson Correlation Coefficient between timevarying dataframes.
Takes any number of
pandas.DataFrameobjects to calculate the pearson correlation coefficients between rows of identically indexed columns (see example for less gibberish). Designed for using with timevarying (load) results between different softwares for the same component(s).Uses
pandas.DataFrame.corrwithunder the hood.- Parameters:
dataframes¶ (Container) – Container of
pandas.DataFrameobjects of which each row is averaged out.method¶ ({'pearson', 'kendall', 'spearman'} or callable) –
Method of correlation:
pearson : standard correlation coefficient
kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation
callable: callable with input two 1d ndarrays and returning a float.
labels¶ (Container, None, default=None) – Container of strings specifying the respective dataframe labels. Equals software names in the design case.
reference¶ (int, str, None, default=None) –
Defines the reference results to be used for calculating the statistical error values. Integer denotes the 0-indexed container position of
dataframes. String the respective label. String parameter only works iflabelsare stated as container of strings.In case
Noneis used (default), the dataframes average is used as returned byaverage_timevarying_dataframe_results().fillna¶ (str, Number, None, default=None) –
String, number or None specifying what to do when pearson corrleation results to
NaN. For design case usage, this is usually the case when one of the correlated timeseries results is all zeros.If
None, thenpandas.DataFrame.corrwithoutput is kept.
- Returns:
DataFrame holding the calculated PCC values. Columns and index are swapped in comparison to the dataframes passed as arguments.
- Return type:
Examples
Picking up on the
Identificier Data Input Example:>>> import pandas as pd >>> software1 = pd.DataFrame( ... data=[[10, 8, 2], [0, 0, 0], [20, 2, 18]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... ) >>> software2 = pd.DataFrame( ... data=[[13, 7, 1990], [42, 0, 42], [90, 0, 0]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... ) >>> software3 = pd.DataFrame( ... data=[[13, 7, 1990], [42, 0, 42], [90, 0, 0]], ... columns=pd.MultiIndex.from_tuples( ... [("A", "B"), ("B", "C"), ("B", "D")]), ... index=pd.date_range('2019-01-01', periods=3, freq='H'), ... )
Original Data Frames:
>>> print(software1) A B B C D 2019-01-01 00:00:00 10 8 2 2019-01-01 01:00:00 0 0 0 2019-01-01 02:00:00 20 2 18
>>> print(software2) A B B C D 2019-01-01 00:00:00 13 7 1990 2019-01-01 01:00:00 42 0 42 2019-01-01 02:00:00 90 0 0
>>> print(software3) A B B C D 2019-01-01 00:00:00 13 7 1990 2019-01-01 01:00:00 42 0 42 2019-01-01 02:00:00 90 0 0
Pearson Correlation Coefficients:
>>> pcc = calc_corrs( ... dataframes=[software1, software2, software3], ... method="pearson", ... labels=["software1", "software2", "software3"], ... reference="software2", ... ) >>> print(pcc) software1 software2 software3 A B 0.617145 1.0 1.0 B C 0.970725 1.0 1.0 D -0.426423 1.0 1.0
Spearman Correlation Coefficients:
>>> spear = calc_corrs( ... dataframes=[software1, software2, software3], ... method="spearman", ... labels=["software1", "software2", "software3"], ... reference="software2", ... ) >>> print(spear) software1 software2 software3 A B 0.500000 1.0 1.0 B C 0.866025 1.0 1.0 D -0.500000 1.0 1.0
Kendall Correlation Coefficients:
>>> kend = calc_corrs( ... dataframes=[software1, software2, software3], ... method="kendall", ... labels=["software1", "software2", "software3"], ... reference="software2", ... ) >>> print(kend) software1 software2 software3 A B 0.333333 1.0 1.0 B C 0.816497 1.0 1.0 D -0.333333 1.0 1.0