gsnn.optim.MagnitudeEdgeKGE

Post-hoc Tier-0 edge inference via node2vec on the augmented function graph.

After training a GSNN, MagnitudeEdgeInferer accumulates activation/gradient magnitude correlations. High-scoring non-edges are mined as inferred positives and pooled with the kept-graph edges into a single augmented graph. A single shared embedding table is learned by skip-gram with negative sampling on random walks, so all nodes live in the same space - kept and inferred edges contribute equally to neighborhood structure.

Held-out edges are scored by <emb[i], emb[j]>.

See tutorial 17 and docs/notes/edge_inference_notes.md section 4.

Classes

`Iterable`()
`MagnitudeEdgeInferer`(model, data[, ...])	Post-hoc inferrer for function -> function edges via activation/gradient magnitude correlation across adjacent layers.
`MagnitudeEdgeKGE`(args, *kwargs)	Post-hoc node2vec edge inferrer for function -> function edges.

class gsnn.optim.MagnitudeEdgeKGE.MagnitudeEdgeKGE(*args: Any, **kwargs: Any)[source]

Bases: Module

Post-hoc node2vec edge inferrer for function -> function edges.

Consumes a fitted MagnitudeEdgeInferer, mines inferred positive edges from its correlation scores, builds an augmented directed graph from (kept + inferred) edges, and trains a single node embedding table by skip-gram with negative sampling on random walks. Parameter count is O(N * d).

Parameters:

mei (MagnitudeEdgeInferer) – Fitted inferrer with accumulated statistics (mei.n >= 3).
embedding_dim (int) – Embedding dimension.
score ({'corr', 'partial'}) – MEI score column used to mine inferred positives.
layer_agg ({'mean', 'max'}) – MEI layer aggregation for the score matrix.
mining_strategy ({'fdr', 'topk_per_target'}) – How to select inferred positives from the MEI score table.
fdr_alpha (float) – BH-FDR threshold when mining_strategy='fdr'.
top_k_per_target (int) – Top sources per target when mining_strategy='topk_per_target'.
exclude_edges (iterable of (src, dst)) – Held-out val/test edges to remove from inferred positives (anti-leakage).
walks_per_node (int) – Number of random walks starting from each node per epoch.
walk_length (int) – Length of each random walk (number of nodes).
window_size (int) – Skip-gram context window (sliding distance within a walk).
n_negatives (int) – Negative samples per positive (center, context) pair.
walk_undirected (bool) – If True, treat the augmented graph as undirected for walk traversal. Walks rarely die at sinks, so coverage is better. Skip-gram positives are still emitted symmetrically. Default True.
walk_corr_weighted (bool) – If True, transition probability P(j | i) along the walk is proportional to max(corr[i, j], 0) ** walk_alpha rather than uniform over neighbors. Brings back MEI’s continuous signal that the binary mining step otherwise discards. Default True.
walk_alpha (float) – Power applied to max(corr, 0) before normalizing into transition probabilities. alpha=1.0 is linear; larger values concentrate walks on high-correlation edges; smaller values flatten toward uniform. Default 1.0.
kept_edge_weight (float or None) – Walk weight assigned to kept (true) function-function edges. If None, defaults to the maximum inferred-edge weight, so kept edges are at least as likely to be traversed as the strongest inferred edge.
lr (float) – Optimizer settings.
weight_decay (float) – Optimizer settings.

evaluate(*, exclude_self: bool = True) → pandas.DataFrame[source]

Build edge score DataFrame from node embeddings.

Columns: src_func, dst_func, src_idx, dst_idx, score, has_edge.

evaluate_against(positive_edges: set[tuple[str, str]] | list[tuple[str, str]], *, top_k: tuple[int, ...] = (1, 3, 5)) → dict[str, float][source]: ROC-AUC and within-target ranking metrics on held-out edges.

static evaluate_target_ranking(res: pandas.DataFrame, positive_edges: set[tuple[str, str]] | list[tuple[str, str]], score_col: str = 'score', top_k: tuple[int, ...] = (1, 3, 5)) → tuple[pandas.DataFrame, dict[str, float]][source]: Delegate to MagnitudeEdgeInferer.evaluate_target_ranking.

fit(n_epochs: int = 100, batch_size: int = 2048, validation_edges: collections.abc.Iterable[tuple[str, str]] | None = None, verbose: bool = True, seed: int = 0) → dict[str, list[float]][source]

Train embeddings via skip-gram with negative sampling.

Walks are regenerated each epoch.

Returns:: history – train_loss per epoch, optional val_auc per epoch.
Return type:: dict

load_best() → None[source]: Restore weights from the best validation checkpoint.

maybe_save_best(metric: float, mode: str = 'max') → bool[source]: Save state_dict if metric improves over the previous best.

score_matrix() → numpy.ndarray[source]: Return (N, N) edge score matrix from node embeddings.