identify.core

API

cluster

Cluster value(s) on condition(s).

Identificier

`Identificier`	Identificaiton Base Class.
`Identificier.of_high_interest`	Node uid representations identified as of `high interest`.
`Identificier.high`	Alias for `of_high_interest`.
`Identificier.of_medium_interest`	Node uid representations identified as of `medium interest`.
`Identificier.medium`	Alias for `of_medium_interest`.
`Identificier.of_low_interest`	Node uid representations identified as of `low interest`.
`Identificier.low`	Alias for `of_low_interest`.
`Identificier.clustered_interest`	Inter component results clustered by interest.
`Identificier.cluster_interest`	Cluster inter component results by interest.
`Identificier.cluster_conditions`	Dictionairy of clustering conditions used.

Tessif module providing the core identification utilities.

class tessif.identify.core.Identificier(data, conditions_dict, reference=None)[source]

Bases: ABC

Identificaiton Base Class.

Identificaiton algorithm houses in Identificier.cluster_interest which needs to be overriden by child specific implementations.

Parameters:

data¶ – Result data to be analyzed for significant differences.
conditions_dict¶ (dict, None, default=None) –
Dictionairy keying container(s) of dicts by the respective cluster labels “high”, “medium” and “low”. The dictionairies inside the tuples need to have following keywords:
- thres specyfying the threshold used
- oprt specifying the operator used.
Used to cluster data by category/cluster label.
reference¶ (str, None, default=None) –
Defines the reference results to be used for calculating the statistical error values and pearson correlation coeficients.

In case None is used (default), the dataframes average is used as returned by average_timevarying_dataframe_results().

property of_high_interest: Node uid representations identified as of high interest.

property high: Alias for of_high_interest.

property high_interest_results: Inter component results identified as highly interesting.

property of_medium_interest: Node uid representations identified as of medium interest.

property medium: Alias for of_medium_interest.

property medium_interest_results: Inter component results identified as mediumly interesting.

property of_low_interest: Node uid representations identified as of low interest.

property low: Alias for of_low_interest.

property low_interest_results: Inter component results identified as lowly interesting.

property cluster_conditions: Dictionairy of clustering conditions used.

property clustered_interest: Inter component results clustered by interest.

property reference: Reference Model Used for Ientifications.

abstract cluster_interest()[source]: Cluster inter component results by interest.

abstract map_interest_results(data)[source]: Map data to identified interest categories.

tessif.identify.core.cluster(values, conditions_dict)[source]

Cluster value(s) on condition(s).

Uses a dcitionairy of conditions utilizing pythons operators.

Parameters:

values¶ (Container) – Container of number(s) on which the cluster conditions are checked on.
conditions_dict¶ (dict) –
Dictionairy keying container(s) of dicts by the respective cluster labels. The dictionairies inside the tuples need to have following keywords:
- thres specyfying the threshold used
- oprt specifying the operator used.

Returns:

Dictionairy key specifying the cluster. Usually a string or a number.

Return type:

Hashable

Examples

Using a single value condition check with 2 categories/clusters. Note that on single value conditions both, the value itself as well as the inner conditions dict must be Containers. Hence the trailing , to turn both into tuples.

>>> values = [(9000,), (9001,), (42,)]
>>> conditions = {
...     "Its over 9000!": ({"oprt": "gt", "thres": 9000},),
...     "Nope": ({"oprt": "le", "thres": 9000},),
... }

>>> for value in values:
...     print(cluster(value, conditions))
Nope
Its over 9000!
Nope

Multiple values and conditions (inner dict tuple length) can be used. Their length must match however:

>>> values = [
...     ([0, 1], "high"),
...     ([1, 1],  "medium1"),
...     ([0, 0],  "medium2"),
...     ([1, 0],  "low"),
... ]

>>> # first condition = pcc, second condition = nmae
>>> conditions = {
...     "high": ({"oprt": "lt", "thres": 0.7}, {"oprt": "ge", "thres": 0.1}),
...     "medium1": ({"oprt": "ge", "thres": 0.7}, {"oprt": "ge", "thres": 0.1}),
...     "medium2": ({"oprt": "lt", "thres": 0.7}, {"oprt": "lt", "thres": 0.1}),
...     "low": ({"oprt": "ge", "thres": 0.7}, {"oprt": "lt", "thres": 0.1}),
... }

>>> for value_pairing in values:
...     print(cluster(value_pairing[0], conditions))
high
medium1
medium2
low