gsnn.optim.MagnitudeEdgeKGE
Post-hoc Tier-0 edge inference via node2vec on the augmented function graph.
After training a GSNN, MagnitudeEdgeInferer accumulates activation/gradient
magnitude correlations. High-scoring non-edges are mined as inferred positives
and pooled with the kept-graph edges into a single augmented graph. A single
shared embedding table is learned by skip-gram with negative sampling on
random walks, so all nodes live in the same space - kept and inferred edges
contribute equally to neighborhood structure.
Held-out edges are scored by <emb[i], emb[j]>.
See tutorial 17 and docs/notes/edge_inference_notes.md section 4.
Classes
|
|
|
Post-hoc inferrer for function -> function edges via activation/gradient magnitude correlation across adjacent layers. |
|
Post-hoc node2vec edge inferrer for function -> function edges. |
- class gsnn.optim.MagnitudeEdgeKGE.MagnitudeEdgeKGE(*args: Any, **kwargs: Any)[source]
Bases:
ModulePost-hoc node2vec edge inferrer for function -> function edges.
Consumes a fitted
MagnitudeEdgeInferer, mines inferred positive edges from its correlation scores, builds an augmented directed graph from (kept + inferred) edges, and trains a single node embedding table by skip-gram with negative sampling on random walks. Parameter count isO(N * d).- Parameters:
mei (MagnitudeEdgeInferer) – Fitted inferrer with accumulated statistics (
mei.n >= 3).embedding_dim (int) – Embedding dimension.
score ({'corr', 'partial'}) – MEI score column used to mine inferred positives.
layer_agg ({'mean', 'max'}) – MEI layer aggregation for the score matrix.
mining_strategy ({'fdr', 'topk_per_target'}) – How to select inferred positives from the MEI score table.
fdr_alpha (float) – BH-FDR threshold when
mining_strategy='fdr'.top_k_per_target (int) – Top sources per target when
mining_strategy='topk_per_target'.exclude_edges (iterable of (src, dst)) – Held-out val/test edges to remove from inferred positives (anti-leakage).
walks_per_node (int) – Number of random walks starting from each node per epoch.
walk_length (int) – Length of each random walk (number of nodes).
window_size (int) – Skip-gram context window (sliding distance within a walk).
n_negatives (int) – Negative samples per positive (center, context) pair.
walk_undirected (bool) – If True, treat the augmented graph as undirected for walk traversal. Walks rarely die at sinks, so coverage is better. Skip-gram positives are still emitted symmetrically. Default True.
walk_corr_weighted (bool) – If True, transition probability
P(j | i)along the walk is proportional tomax(corr[i, j], 0) ** walk_alpharather than uniform over neighbors. Brings back MEI’s continuous signal that the binary mining step otherwise discards. Default True.walk_alpha (float) – Power applied to
max(corr, 0)before normalizing into transition probabilities.alpha=1.0is linear; larger values concentrate walks on high-correlation edges; smaller values flatten toward uniform. Default 1.0.kept_edge_weight (float or None) – Walk weight assigned to kept (true) function-function edges. If None, defaults to the maximum inferred-edge weight, so kept edges are at least as likely to be traversed as the strongest inferred edge.
lr (float) – Optimizer settings.
weight_decay (float) – Optimizer settings.
- evaluate(*, exclude_self: bool = True) pandas.DataFrame[source]
Build edge score DataFrame from node embeddings.
Columns:
src_func, dst_func, src_idx, dst_idx, score, has_edge.
- evaluate_against(positive_edges: set[tuple[str, str]] | list[tuple[str, str]], *, top_k: tuple[int, ...] = (1, 3, 5)) dict[str, float][source]
ROC-AUC and within-target ranking metrics on held-out edges.
- static evaluate_target_ranking(res: pandas.DataFrame, positive_edges: set[tuple[str, str]] | list[tuple[str, str]], score_col: str = 'score', top_k: tuple[int, ...] = (1, 3, 5)) tuple[pandas.DataFrame, dict[str, float]][source]
Delegate to
MagnitudeEdgeInferer.evaluate_target_ranking.
- fit(n_epochs: int = 100, batch_size: int = 2048, validation_edges: collections.abc.Iterable[tuple[str, str]] | None = None, verbose: bool = True, seed: int = 0) dict[str, list[float]][source]
Train embeddings via skip-gram with negative sampling.
Walks are regenerated each epoch.
- Returns:
history –
train_lossper epoch, optionalval_aucper epoch.- Return type: