identify.timeframes

significant_differences

Identify significant differences between timeseries results.

Tessif module providing tools for identifying differing timeframes.

tessif.identify.timeframes.significant_differences(data, method='ardiffs', threshold=0.1, reference=None, neighs=True)[source]

Identify significant differences between timeseries results.

Designed to detect significant deviations between software specific flow results between the same components.

Each continous sequence of detected difference is stored as a seperate DatFrame.

Parameters:

data¶ (pandas.DataFrame, Container) – DataFrame of which each column is assumed to contain one flow result. Or container of flow results.
method¶ ({"ardiffs"}, method, default="ardiffs") – String specifying wich precoded function to use for calculating differences or function that takes data as pandas.DataFrame or pandas.Series and reference (as reference) to return a dataframe indexed like data.
threshold¶ (Number, default=0.1) – Number specifying the threshold on which relative differences are seen as “significant”. Comparison are made based on reference
reference¶ (str, None, default=None) –
Specifies which columns of data are to be used as reference results to calculate actual differences.

For None (default), the dataframes’ average is used as returned by numpy.mean(data, axis="columns")

Returns:

List of DataFrames where each DataFrame represent one continues sequence of detected differences.

Return type:

list

Examples

>>> import pandas as pd
>>> data=[
...     [10, 10, 10],
...     [10, 12, 10],
...     [10, 10, 10],
...     [10, 10, 10],
...     [10, 10, 12],
... ]

Simple use case of integer indexed data frames:

>>> dtf = pd.DataFrame(
...     data,
...     columns=["software1", "software2", "software3"],
... )

>>> identified_differences = significant_differences(dtf, neighs=False)
>>> for dtf in identified_differences:
...     print(dtf)
...     print(59*'-')
   software1  software2  software3
1  10.666667       12.0  10.666667
-----------------------------------------------------------
   software1  software2  software3
4  10.666667  10.666667       12.0
-----------------------------------------------------------

Design use case of timeindex indexed dataframes including neighbouring averages for creating telling stepplots:

>>> dtf2 = pd.DataFrame(
...     data,
...     columns=["software1", "software2", "software3"],
...     index=pd.date_range("1990-07-13", periods=5, freq="H"),
... )

>>> identified_differences = significant_differences(dtf2, neighs=True)

>>> print(identified_differences[0])
                     software1  software2  software3
1990-07-13 00:00:00  10.000000       10.0  10.000000
1990-07-13 01:00:00  10.666667       12.0  10.666667
1990-07-13 02:00:00  10.000000       10.0  10.000000

>>> from tessif.visualize import component_loads
>>> axes = component_loads.step(identified_differences[0])
>>> # axes.figure.show()

Step plot image of the first identified timeframes

Note how the second dataframe of identified differences does not include an average on the last index, despite the above’s neighs=True. This is due to a significant difference beeing detected at the last entry where a neighbour is not added.

>>> print(identified_differences[1])
                     software1  software2  software3
1990-07-13 03:00:00  10.000000  10.000000       10.0
1990-07-13 04:00:00  10.666667  10.666667       12.0

>>> from tessif.visualize import component_loads
>>> axes = component_loads.step(identified_differences[1])
>>> # axes.figure.show()

Step plot image of the second identified timeframes

Using “software2” as reference and setting threshold to 30% results in no significant differences beeing detected:

>>> dtf2 = pd.DataFrame(
...     data,
...     columns=["software1", "software2", "software3"],
...     index=pd.date_range("1990-07-13", periods=5, freq="H"),
... )

>>> identified_differences = significant_differences(
...     dtf2, reference="software2", threshold=0.3)

>>> print(identified_differences)
[]

Using “software2” as reference and resetting threshold to 10%:

>>> identified_differences = significant_differences(
...     dtf2, reference="software2", neighs=False)

>>> print(identified_differences[0])
   software1  software2  software3
1         10         12         10

>>> print(identified_differences[1])
   software1  software2  software3
4         10         10         12