gsnn.optim.MagnitudeEdgeKGE

Post-hoc Tier-0 edge inference via node2vec on the augmented function graph.

After training a GSNN, MagnitudeEdgeInferer accumulates activation/gradient magnitude correlations. High-scoring non-edges are mined as inferred positives and pooled with the kept-graph edges into a single augmented graph. A single shared embedding table is learned by skip-gram with negative sampling on random walks, so all nodes live in the same space - kept and inferred edges contribute equally to neighborhood structure.

Held-out edges are scored by <emb[i], emb[j]>.

See tutorial 17 and docs/notes/edge_inference_notes.md section 4.

Classes

Iterable()

MagnitudeEdgeInferer(model, data[, ...])

Post-hoc inferrer for function -> function edges via activation/gradient magnitude correlation across adjacent layers.

MagnitudeEdgeKGE(*args, **kwargs)

Post-hoc node2vec edge inferrer for function -> function edges.

class gsnn.optim.MagnitudeEdgeKGE.MagnitudeEdgeKGE(*args: Any, **kwargs: Any)[source]

Bases: Module

Post-hoc node2vec edge inferrer for function -> function edges.

Consumes a fitted MagnitudeEdgeInferer, mines inferred positive edges from its correlation scores, builds an augmented directed graph from (kept + inferred) edges, and trains a single node embedding table by skip-gram with negative sampling on random walks. Parameter count is O(N * d).

Parameters:
  • mei (MagnitudeEdgeInferer) – Fitted inferrer with accumulated statistics (mei.n >= 3).

  • embedding_dim (int) – Embedding dimension.

  • score ({'corr', 'partial'}) – MEI score column used to mine inferred positives.

  • layer_agg ({'mean', 'max'}) – MEI layer aggregation for the score matrix.

  • mining_strategy ({'fdr', 'topk_per_target'}) – How to select inferred positives from the MEI score table.

  • fdr_alpha (float) – BH-FDR threshold when mining_strategy='fdr'.

  • top_k_per_target (int) – Top sources per target when mining_strategy='topk_per_target'.

  • exclude_edges (iterable of (src, dst)) – Held-out val/test edges to remove from inferred positives (anti-leakage).

  • walks_per_node (int) – Number of random walks starting from each node per epoch.

  • walk_length (int) – Length of each random walk (number of nodes).

  • window_size (int) – Skip-gram context window (sliding distance within a walk).

  • n_negatives (int) – Negative samples per positive (center, context) pair.

  • walk_undirected (bool) – If True, treat the augmented graph as undirected for walk traversal. Walks rarely die at sinks, so coverage is better. Skip-gram positives are still emitted symmetrically. Default True.

  • walk_corr_weighted (bool) – If True, transition probability P(j | i) along the walk is proportional to max(corr[i, j], 0) ** walk_alpha rather than uniform over neighbors. Brings back MEI’s continuous signal that the binary mining step otherwise discards. Default True.

  • walk_alpha (float) – Power applied to max(corr, 0) before normalizing into transition probabilities. alpha=1.0 is linear; larger values concentrate walks on high-correlation edges; smaller values flatten toward uniform. Default 1.0.

  • kept_edge_weight (float or None) – Walk weight assigned to kept (true) function-function edges. If None, defaults to the maximum inferred-edge weight, so kept edges are at least as likely to be traversed as the strongest inferred edge.

  • lr (float) – Optimizer settings.

  • weight_decay (float) – Optimizer settings.

evaluate(*, exclude_self: bool = True) pandas.DataFrame[source]

Build edge score DataFrame from node embeddings.

Columns: src_func, dst_func, src_idx, dst_idx, score, has_edge.

evaluate_against(positive_edges: set[tuple[str, str]] | list[tuple[str, str]], *, top_k: tuple[int, ...] = (1, 3, 5)) dict[str, float][source]

ROC-AUC and within-target ranking metrics on held-out edges.

static evaluate_target_ranking(res: pandas.DataFrame, positive_edges: set[tuple[str, str]] | list[tuple[str, str]], score_col: str = 'score', top_k: tuple[int, ...] = (1, 3, 5)) tuple[pandas.DataFrame, dict[str, float]][source]

Delegate to MagnitudeEdgeInferer.evaluate_target_ranking.

fit(n_epochs: int = 100, batch_size: int = 2048, validation_edges: collections.abc.Iterable[tuple[str, str]] | None = None, verbose: bool = True, seed: int = 0) dict[str, list[float]][source]

Train embeddings via skip-gram with negative sampling.

Walks are regenerated each epoch.

Returns:

historytrain_loss per epoch, optional val_auc per epoch.

Return type:

dict

load_best() None[source]

Restore weights from the best validation checkpoint.

maybe_save_best(metric: float, mode: str = 'max') bool[source]

Save state_dict if metric improves over the previous best.

score_matrix() numpy.ndarray[source]

Return (N, N) edge score matrix from node embeddings.