mesh2vec.mesh2vec_base.Mesh2VecBase

class mesh2vec.mesh2vec_base.Mesh2VecBase(distance, hyper_edges, vtx_ids=None, calc_strategy='dfs')

Class to derive hypergraph neighborhoods from ordinary, undirected graphs, map numerical features to vertices, and provide aggregation methods that result in fixed sized vectors suitable for machine learning methods.

Methods

__init__(distance, hyper_edges[, vtx_ids, ...])

Create neighborhood sets on a hypergraph.

add_features_from_csv(csv_file[, ...])

Map the content of a CSV file to the vertices of the hypergraph.

add_features_from_dataframe(df)

Map the content of a Pandas dataframe to the vertices of the hypergraph.

aggregate(feature, dist, aggr[, aggr_name, ...])

Aggregate features from neighborhoods for each distance in dist

aggregate_categorical(feature, dist[, ...])

For categorical features, aggregate the numbers of occurrences of each categorical value.

available_aggregated_features()

returns a list the names of all aggregated features

available_features()

returns a list the names of all features

features()

Returns a Pandas dataframe with all feature columns.

from_file(hg_file, distance[, calc_strategy])

Read a hypergraph (hg) from a text file.

get_max_distance()

returns the distance value used to generate the hypergraph neighborhood

get_nbh(vtx, dist)

Get a list of neighbors with the exact distance dist of a given vertex vtx

load(path)

Load the Mesh2Vec object from a file with joblib

save(path)

Save the Mesh2Vec object to a file with joblib

to_array([vertices])

Returns a numpy array with all the beforehand aggregated feature columns.

to_dataframe([vertices])

Returns a Pandas dataframe with all the beforehand aggregated feature columns.

vtx_ids()

returns a list the ids of all hyper vertices

Parameters:
  • distance (int) –

  • hyper_edges (Dict[str, List[str]]) –

  • vtx_ids (Optional[List[str]]) –

  • calc_strategy (str) –

__init__(distance, hyper_edges, vtx_ids=None, calc_strategy='dfs')

Create neighborhood sets on a hypergraph. For each vertex in the hg, create sets of neighbors in distance d for each d up to a given max distance. A neighborhood set \(N^d_v\) of vertex \(v\) in distance \(d\) contains all vertices \(w\) where \(dist(v,w) \leq d\) and not \(w \in N^\delta_v,\delta < d\).

Parameters:
  • distance (int) – the maximum distance for neighborhood generation and feature aggregation

  • hyper_edges (Dict[str, List[str]]) – edge->connected vertices

  • vtx_ids (Optional[List[str]]) – provide a list of all vertices to control inernal order of vertices (features, aggregated feature)

  • calc_strategy (str) –

    choose the algorithm to calculate adjacencies

    • ”dfs”: depth first search (defaultl fast)

    • ”bfs”: breadth first search (low memory consumption)

    • ”matmul”: matrix multiplication (deprecated, for compatibility only)

Example

>>> from mesh2vec.mesh2vec_base import Mesh2VecBase
>>> edges = {"first": ["a", "b", "c"], "second": ["x", "y"]}
>>> hg = Mesh2VecBase(3, edges)
>>> hg._hyper_edges
OrderedDict([('first', ['a', 'b', 'c']), ('second', ['x', 'y'])])
save(path)

Save the Mesh2Vec object to a file with joblib

Parameters:

path (Path) – path to the file

Example

>>> from pathlib import Path
>>> from mesh2vec.mesh2vec_base import Mesh2VecBase
>>> hg = Mesh2VecBase(3, {"first": ["a", "b", "c"], "second": ["x", "y"]})
>>> hg.save(Path("data/temp_hg.joblib"))
static load(path)

Load the Mesh2Vec object from a file with joblib

Parameters:

path (Path) – path to the file

Example

>>> from pathlib import Path
>>> from mesh2vec.mesh2vec_base import Mesh2VecBase
>>> hg.save(Path("data/temp_hg.joblib"))
>>> hg = Mesh2VecBase.load(Path("data/temp_hg.joblib"))
static from_file(hg_file, distance, calc_strategy='dfs')

Read a hypergraph (hg) from a text file.

Parameters:
  • hg_file (Path) –

    either

    • a CSV files of pairs of alphanumerical vertex identifiers defining an undirected graph. Multiple edges are ignored. The initial hypergraph is given by the cliques of the graph. Since the CLIQUE problem is NP-complete, use this for small graphs only.

    • a hypergraph file (text). Each line of the file contains an alphanumerical edge identifier, followed by a list of vertex identifiers the edge is containing, in the form ‘DGEID: VTXID1,VTXID2,…’

  • distance (int) – the maximum distance for neighborhood generation and feature aggregation

  • calc_strategy

    choose the algorithm to calculate adjacencies

    • ”dfs”: depth first search (defaultl fast)

    • ”bfs”: breadth first search (low memory consumption)

    • ”matmul”: matrix multiplication (deprecated, for compatibility only)

Return type:

Mesh2VecBase

Example

>>> from pathlib import Path
>>> from mesh2vec.mesh2vec_base import Mesh2VecBase
>>> hg = Mesh2VecBase.from_file(Path("data/hyper_02.txt"), 4)
>>> hg._hyper_edges["edge1"]
['vtx00', 'vtx01', 'vtx07', 'vtx11']
get_nbh(vtx, dist)

Get a list of neighbors with the exact distance dist of a given vertex vtx

Example

>>> from mesh2vec.mesh2vec_base import Mesh2VecBase
>>> edges = {"first": ["a", "b", "c"], "second": ["x", "y"]}
>>> hg = Mesh2VecBase(3, edges)
>>> hg.get_nbh("a",1)
['b', 'c']
Parameters:
  • vtx (str) –

  • dist (int) –

Return type:

List[str]

aggregate_categorical(feature, dist, categories=None, default_value=None)

For categorical features, aggregate the numbers of occurrences of each categorical value. This results in a new aggregated feature for each categorical value. If feature is color and dist is 2, and there are 3 categories, e.g. RED, YELLOW, GREEN, the resulting feature column are named color-cat-RED-2, color-cat-YELLOW-2, color-cat-GREEN-2. If categories is None, all unique values from feature as taken as categories, otherwise the categories are taken from categories which must be list-like. Values not existent in categories are counted in an additional category NONE.

Returns:

aggregated feature name(s)

Parameters:
  • feature (str) –

  • dist (Union[List[int], int]) –

  • categories (Optional[Union[List[str], List[int]]]) –

  • default_value (Optional[Union[int, str]]) –

Return type:

Union[str, List[str]]

Example

>>> import pandas as pd
>>> from mesh2vec.mesh2vec_base import Mesh2VecBase
>>> edges = {"first": ["a", "b", "c"], "second": ["x", "y"], "third": ["x", "a"]}
>>> hg = Mesh2VecBase(3, edges)
>>> df1 = pd.DataFrame({"vtx_id": ["a", "b", "c", "x", "y"],
...     "f1": ["RED", "GREEN", "RED", "RED", "GREEN"]})
>>> hg.add_features_from_dataframe(df1)
>>> names = hg.aggregate_categorical("f1", 1)
>>> names
['f1-cat-GREEN-1', 'f1-cat-RED-1']
>>> hg._aggregated_features["f1-cat-RED-1"].to_list()
[2, 2, 1, 1, 1]
aggregate(feature, dist, aggr, aggr_name=None, agg_add_ref=False, default_value=0.0)

Aggregate features from neighborhoods for each distance in dist

Parameters:
  • feature (str) – name of the feature to aggregate

  • dist (Union[List[int], int]) –

    either

    • distance of the maximum neighborhood of vertex for aggregation, 0 <= dist <= self.distance, or

    • a list of distances, e.g. [0, 2, 5].

  • aggr (Callable) – aggregation function, e.g. np.min, np.mean, np.median

  • agg_add_ref (bool) – the aggregation callable needs the feature value of the center element as reference as 2nd argument.

  • default_value (float) – value to use in aggregation when a feature is missing for a neighbor or no neighor with the given dist exist.

  • aggr_name (Optional[str]) –

Returns:

aggregated feature name(s)

Return type:

Union[str, List[str]]

The resulting aggregated features are named feature-aggr-dist for each distance in dist, e.g. if feature is aspect, aggr is np.mean, and dist is [0,2,3], the new aggregated features are aspect-min-0, aspect-min-2, aspect-min-3 containing the mean aspect values in the hg neighborhoods in distances 0,2, and 3 for each vertex.

Example

>>> import pandas as pd
>>> import numpy as np
>>> from mesh2vec.mesh2vec_base import Mesh2VecBase
>>> edges = {"first": ["a", "b", "c"], "second": ["x", "y"]}
>>> hg = Mesh2VecBase(3, edges)
>>> df1 = pd.DataFrame({"vtx_id": ["a", "b", "c", "x", "y"], "f1": [2, 4, 8, 16, 32]})
>>> hg.add_features_from_dataframe(df1)
>>> name = hg.aggregate("f1", 1, np.mean)
>>> name
'f1-mean-1'
>>> hg._aggregated_features[name].to_list()
[6.0, 5.0, 3.0, 32.0, 16.0]
add_features_from_csv(csv_file, with_header=False, columns=None)

Map the content of a CSV file to the vertices of the hypergraph.

The column ‘vtx_id’ must contain the vertex IDs. if with_header is True, use the first line as column headers. If columns is list-like, its values are taken as column name, overriding possible headers from the file if with_header is True. Otherwise, all other columns starting with the 2nd are added by the name os.path.basename(csv_file).rsplit('.',1)[0]-N where N is the column number.

Example

>>> from mesh2vec.mesh2vec_base import Mesh2VecBase
>>> hg_01 = Mesh2VecBase.from_file(Path("data/hyper_02.txt"), 3)
>>> hg_01.add_features_from_csv(Path("data/hyper_02_features.csv"), with_header=True)
>>> hg_01.features()["pow2"][:4].to_list()
[0, 1, 4, 9]
Parameters:
  • csv_file (Path) –

  • with_header (bool) –

  • columns (Optional[List[str]]) –

Return type:

None

add_features_from_dataframe(df)

Map the content of a Pandas dataframe to the vertices of the hypergraph. The column ‘vtx_id’ of the dataframe is expected to contain vertex IDs as strings.

Example

>>> from mesh2vec.mesh2vec_base import Mesh2VecBase
>>> import pandas as pd
>>> edges = {"first": ["a", "b", "c"], "second": ["x", "y"], "third": ["x", "a"]}
>>> hg = Mesh2VecBase(3, edges)
>>> df1 = pd.DataFrame({"vtx_id": ["a", "b", "c", "y"], "f1": [0, 1, 2.1, 4]})
>>> hg.add_features_from_dataframe(df1)
>>> hg._features["f1"].tolist()
[0.0, 1.0, 2.1, nan, 4.0]
Parameters:

df (DataFrame) –

Return type:

None

to_dataframe(vertices=None)

Returns a Pandas dataframe with all the beforehand aggregated feature columns. If vertices is not None and iterable, the dataframe is only generated for vertices in vertices.

Parameters:

vertices (Optional[Iterable[str]]) –

Return type:

DataFrame

to_array(vertices=None)

Returns a numpy array with all the beforehand aggregated feature columns. If vertices is not None and iterable, the array is only generated for vertices in vertices.

Parameters:

vertices (Optional[Iterable[str]]) –

Return type:

ndarray

get_max_distance()

returns the distance value used to generate the hypergraph neighborhood

Return type:

int

available_features()

returns a list the names of all features

Return type:

List[str]

available_aggregated_features()

returns a list the names of all aggregated features

Return type:

List[str]

vtx_ids()

returns a list the ids of all hyper vertices

Return type:

List[str]

features()

Returns a Pandas dataframe with all feature columns.

Return type:

DataFrame