identify.timeframes
Identify significant differences between timeseries results. |
Tessif module providing tools for identifying differing timeframes.
- tessif.identify.timeframes.significant_differences(data, method='ardiffs', threshold=0.1, reference=None, neighs=True)[source]
Identify significant differences between timeseries results.
Designed to detect significant deviations between software specific flow results between the same components.
Each continous sequence of detected difference is stored as a seperate DatFrame.
- Parameters:
data¶ (pandas.DataFrame, Container) – DataFrame of which each column is assumed to contain one flow result. Or container of flow results.
method¶ ({"ardiffs"}, method, default="ardiffs") – String specifying wich precoded function to use for calculating differences or function that takes
dataaspandas.DataFrameorpandas.Seriesandreference(asreference) to return a dataframe indexed likedata.threshold¶ (Number, default=0.1) – Number specifying the threshold on which relative differences are seen as “significant”. Comparison are made based on
referencereference¶ (str, None, default=None) –
Specifies which columns of
dataare to be used as reference results to calculate actual differences.For
None(default), the dataframes’ average is used as returned bynumpy.mean(data, axis="columns")
- Returns:
List of DataFrames where each DataFrame represent one continues sequence of detected differences.
- Return type:
Examples
>>> import pandas as pd >>> data=[ ... [10, 10, 10], ... [10, 12, 10], ... [10, 10, 10], ... [10, 10, 10], ... [10, 10, 12], ... ]
Simple use case of integer indexed data frames:
>>> dtf = pd.DataFrame( ... data, ... columns=["software1", "software2", "software3"], ... )
>>> identified_differences = significant_differences(dtf, neighs=False) >>> for dtf in identified_differences: ... print(dtf) ... print(59*'-') software1 software2 software3 1 10.666667 12.0 10.666667 ----------------------------------------------------------- software1 software2 software3 4 10.666667 10.666667 12.0 -----------------------------------------------------------
Design use case of timeindex indexed dataframes including neighbouring averages for creating telling stepplots:
>>> dtf2 = pd.DataFrame( ... data, ... columns=["software1", "software2", "software3"], ... index=pd.date_range("1990-07-13", periods=5, freq="H"), ... )
>>> identified_differences = significant_differences(dtf2, neighs=True)
>>> print(identified_differences[0]) software1 software2 software3 1990-07-13 00:00:00 10.000000 10.0 10.000000 1990-07-13 01:00:00 10.666667 12.0 10.666667 1990-07-13 02:00:00 10.000000 10.0 10.000000
>>> from tessif.visualize import component_loads >>> axes = component_loads.step(identified_differences[0]) >>> # axes.figure.show()
Note how the second dataframe of identified differences does not include an average on the last index, despite the above’s
neighs=True. This is due to a significant difference beeing detected at the last entry where a neighbour is not added.>>> print(identified_differences[1]) software1 software2 software3 1990-07-13 03:00:00 10.000000 10.000000 10.0 1990-07-13 04:00:00 10.666667 10.666667 12.0
>>> from tessif.visualize import component_loads >>> axes = component_loads.step(identified_differences[1]) >>> # axes.figure.show()
Using “software2” as reference and setting threshold to 30% results in no significant differences beeing detected:
>>> dtf2 = pd.DataFrame( ... data, ... columns=["software1", "software2", "software3"], ... index=pd.date_range("1990-07-13", periods=5, freq="H"), ... )
>>> identified_differences = significant_differences( ... dtf2, reference="software2", threshold=0.3)
>>> print(identified_differences) []
Using “software2” as reference and resetting threshold to 10%:
>>> identified_differences = significant_differences( ... dtf2, reference="software2", neighs=False)
>>> print(identified_differences[0]) software1 software2 software3 1 10 12 10
>>> print(identified_differences[1]) software1 software2 software3 4 10 10 12