mesh2vec.mesh2vec_base
.Mesh2VecBase
- class mesh2vec.mesh2vec_base.Mesh2VecBase(distance, hyper_edges, vtx_ids=None, calc_strategy='dfs')
Class to derive hypergraph neighborhoods from ordinary, undirected graphs, map numerical features to vertices, and provide aggregation methods that result in fixed sized vectors suitable for machine learning methods.
Methods
__init__
(distance, hyper_edges[, vtx_ids, ...])Create neighborhood sets on a hypergraph.
add_features_from_csv
(csv_file[, ...])Map the content of a CSV file to the vertices of the hypergraph.
Map the content of a Pandas dataframe to the vertices of the hypergraph.
aggregate
(feature, dist, aggr[, aggr_name, ...])Aggregate features from neighborhoods for each distance in
dist
aggregate_categorical
(feature, dist[, ...])For categorical features, aggregate the numbers of occurrences of each categorical value.
returns a list the names of all aggregated features
returns a list the names of all features
features
()Returns a Pandas dataframe with all feature columns.
from_file
(hg_file, distance[, calc_strategy])Read a hypergraph (hg) from a text file.
returns the distance value used to generate the hypergraph neighborhood
get_nbh
(vtx, dist)Get a list of neighbors with the exact distance
dist
of a given vertexvtx
load
(path)Load the Mesh2Vec object from a file with joblib
save
(path)Save the Mesh2Vec object to a file with joblib
to_array
([vertices])Returns a numpy array with all the beforehand aggregated feature columns.
to_dataframe
([vertices])Returns a Pandas dataframe with all the beforehand aggregated feature columns.
vtx_ids
()returns a list the ids of all hyper vertices
- Parameters:
distance (int) –
hyper_edges (Dict[str, List[str]]) –
vtx_ids (Optional[List[str]]) –
calc_strategy (str) –
- __init__(distance, hyper_edges, vtx_ids=None, calc_strategy='dfs')
Create neighborhood sets on a hypergraph. For each vertex in the hg, create sets of neighbors in distance d for each d up to a given max distance. A neighborhood set \(N^d_v\) of vertex \(v\) in distance \(d\) contains all vertices \(w\) where \(dist(v,w) \leq d\) and not \(w \in N^\delta_v,\delta < d\).
- Parameters:
distance (int) – the maximum distance for neighborhood generation and feature aggregation
hyper_edges (Dict[str, List[str]]) – edge->connected vertices
vtx_ids (Optional[List[str]]) – provide a list of all vertices to control inernal order of vertices (features, aggregated feature)
calc_strategy (str) –
choose the algorithm to calculate adjacencies
”dfs”: depth first search (defaultl fast)
”bfs”: breadth first search (low memory consumption)
”matmul”: matrix multiplication (deprecated, for compatibility only)
Example
>>> from mesh2vec.mesh2vec_base import Mesh2VecBase >>> edges = {"first": ["a", "b", "c"], "second": ["x", "y"]} >>> hg = Mesh2VecBase(3, edges) >>> hg._hyper_edges OrderedDict([('first', ['a', 'b', 'c']), ('second', ['x', 'y'])])
- save(path)
Save the Mesh2Vec object to a file with joblib
- Parameters:
path (Path) – path to the file
Example
>>> from pathlib import Path >>> from mesh2vec.mesh2vec_base import Mesh2VecBase >>> hg = Mesh2VecBase(3, {"first": ["a", "b", "c"], "second": ["x", "y"]}) >>> hg.save(Path("data/temp_hg.joblib"))
- static load(path)
Load the Mesh2Vec object from a file with joblib
- Parameters:
path (Path) – path to the file
Example
>>> from pathlib import Path >>> from mesh2vec.mesh2vec_base import Mesh2VecBase >>> hg.save(Path("data/temp_hg.joblib")) >>> hg = Mesh2VecBase.load(Path("data/temp_hg.joblib"))
- static from_file(hg_file, distance, calc_strategy='dfs')
Read a hypergraph (hg) from a text file.
- Parameters:
hg_file (Path) –
either
a CSV files of pairs of alphanumerical vertex identifiers defining an undirected graph. Multiple edges are ignored. The initial hypergraph is given by the cliques of the graph. Since the CLIQUE problem is NP-complete, use this for small graphs only.
a hypergraph file (text). Each line of the file contains an alphanumerical edge identifier, followed by a list of vertex identifiers the edge is containing, in the form ‘DGEID: VTXID1,VTXID2,…’
distance (int) – the maximum distance for neighborhood generation and feature aggregation
calc_strategy –
choose the algorithm to calculate adjacencies
”dfs”: depth first search (defaultl fast)
”bfs”: breadth first search (low memory consumption)
”matmul”: matrix multiplication (deprecated, for compatibility only)
- Return type:
Example
>>> from pathlib import Path >>> from mesh2vec.mesh2vec_base import Mesh2VecBase >>> hg = Mesh2VecBase.from_file(Path("data/hyper_02.txt"), 4) >>> hg._hyper_edges["edge1"] ['vtx00', 'vtx01', 'vtx07', 'vtx11']
- get_nbh(vtx, dist)
Get a list of neighbors with the exact distance
dist
of a given vertexvtx
Example
>>> from mesh2vec.mesh2vec_base import Mesh2VecBase >>> edges = {"first": ["a", "b", "c"], "second": ["x", "y"]} >>> hg = Mesh2VecBase(3, edges) >>> hg.get_nbh("a",1) ['b', 'c']
- Parameters:
vtx (str) –
dist (int) –
- Return type:
List[str]
- aggregate_categorical(feature, dist, categories=None, default_value=None)
For categorical features, aggregate the numbers of occurrences of each categorical value. This results in a new aggregated
feature
for each categorical value. Iffeature
is color and dist is 2, and there are 3 categories, e.g. RED, YELLOW, GREEN, the resulting feature column are named color-cat-RED-2, color-cat-YELLOW-2, color-cat-GREEN-2. Ifcategories
isNone
, all unique values fromfeature
as taken as categories, otherwise the categories are taken fromcategories
which must be list-like. Values not existent incategories
are counted in an additional category NONE.- Returns:
aggregated feature name(s)
- Parameters:
feature (str) –
dist (Union[List[int], int]) –
categories (Optional[Union[List[str], List[int]]]) –
default_value (Optional[Union[int, str]]) –
- Return type:
Union[str, List[str]]
Example
>>> import pandas as pd >>> from mesh2vec.mesh2vec_base import Mesh2VecBase >>> edges = {"first": ["a", "b", "c"], "second": ["x", "y"], "third": ["x", "a"]} >>> hg = Mesh2VecBase(3, edges) >>> df1 = pd.DataFrame({"vtx_id": ["a", "b", "c", "x", "y"], ... "f1": ["RED", "GREEN", "RED", "RED", "GREEN"]}) >>> hg.add_features_from_dataframe(df1) >>> names = hg.aggregate_categorical("f1", 1) >>> names ['f1-cat-GREEN-1', 'f1-cat-RED-1'] >>> hg._aggregated_features["f1-cat-RED-1"].to_list() [2, 2, 1, 1, 1]
- aggregate(feature, dist, aggr, aggr_name=None, agg_add_ref=False, default_value=0.0)
Aggregate features from neighborhoods for each distance in
dist
- Parameters:
feature (str) – name of the feature to aggregate
dist (Union[List[int], int]) –
either
distance of the maximum neighborhood of vertex for aggregation, 0 <=
dist
<=self.distance
, ora list of distances, e.g.
[0, 2, 5]
.
aggr (Callable) – aggregation function, e.g. np.min, np.mean, np.median
agg_add_ref (bool) – the aggregation callable needs the feature value of the center element as reference as 2nd argument.
default_value (float) – value to use in aggregation when a feature is missing for a neighbor or no neighor with the given dist exist.
aggr_name (Optional[str]) –
- Returns:
aggregated feature name(s)
- Return type:
Union[str, List[str]]
The resulting aggregated features are named feature-aggr-dist for each distance in dist, e.g. if
feature
isaspect
,aggr
isnp.mean
, anddist
is[0,2,3]
, the new aggregated features are aspect-min-0, aspect-min-2, aspect-min-3 containing the mean aspect values in the hg neighborhoods in distances 0,2, and 3 for each vertex.Example
>>> import pandas as pd >>> import numpy as np >>> from mesh2vec.mesh2vec_base import Mesh2VecBase >>> edges = {"first": ["a", "b", "c"], "second": ["x", "y"]} >>> hg = Mesh2VecBase(3, edges) >>> df1 = pd.DataFrame({"vtx_id": ["a", "b", "c", "x", "y"], "f1": [2, 4, 8, 16, 32]}) >>> hg.add_features_from_dataframe(df1) >>> name = hg.aggregate("f1", 1, np.mean) >>> name 'f1-mean-1' >>> hg._aggregated_features[name].to_list() [6.0, 5.0, 3.0, 32.0, 16.0]
- add_features_from_csv(csv_file, with_header=False, columns=None)
Map the content of a CSV file to the vertices of the hypergraph.
The column ‘vtx_id’ must contain the vertex IDs. if
with_header
isTrue
, use the first line as column headers. Ifcolumns
is list-like, its values are taken as column name, overriding possible headers from the file ifwith_header
isTrue
. Otherwise, all other columns starting with the 2nd are added by the nameos.path.basename(csv_file).rsplit('.',1)[0]-N
whereN
is the column number.Example
>>> from mesh2vec.mesh2vec_base import Mesh2VecBase >>> hg_01 = Mesh2VecBase.from_file(Path("data/hyper_02.txt"), 3) >>> hg_01.add_features_from_csv(Path("data/hyper_02_features.csv"), with_header=True) >>> hg_01.features()["pow2"][:4].to_list() [0, 1, 4, 9]
- Parameters:
csv_file (Path) –
with_header (bool) –
columns (Optional[List[str]]) –
- Return type:
None
- add_features_from_dataframe(df)
Map the content of a Pandas dataframe to the vertices of the hypergraph. The column ‘vtx_id’ of the dataframe is expected to contain vertex IDs as strings.
Example
>>> from mesh2vec.mesh2vec_base import Mesh2VecBase >>> import pandas as pd >>> edges = {"first": ["a", "b", "c"], "second": ["x", "y"], "third": ["x", "a"]} >>> hg = Mesh2VecBase(3, edges) >>> df1 = pd.DataFrame({"vtx_id": ["a", "b", "c", "y"], "f1": [0, 1, 2.1, 4]}) >>> hg.add_features_from_dataframe(df1) >>> hg._features["f1"].tolist() [0.0, 1.0, 2.1, nan, 4.0]
- Parameters:
df (DataFrame) –
- Return type:
None
- to_dataframe(vertices=None)
Returns a Pandas dataframe with all the beforehand aggregated feature columns. If
vertices
is notNone
and iterable, the dataframe is only generated for vertices invertices
.- Parameters:
vertices (Optional[Iterable[str]]) –
- Return type:
DataFrame
- to_array(vertices=None)
Returns a numpy array with all the beforehand aggregated feature columns. If
vertices
is notNone
and iterable, the array is only generated for vertices invertices
.- Parameters:
vertices (Optional[Iterable[str]]) –
- Return type:
ndarray
- get_max_distance()
returns the distance value used to generate the hypergraph neighborhood
- Return type:
int
- available_features()
returns a list the names of all features
- Return type:
List[str]
- available_aggregated_features()
returns a list the names of all aggregated features
- Return type:
List[str]
- vtx_ids()
returns a list the ids of all hyper vertices
- Return type:
List[str]
- features()
Returns a Pandas dataframe with all feature columns.
- Return type:
DataFrame