gsnn.gsnn.models.NodeAttention

class gsnn.gsnn.models.NodeAttention(*args: Any, **kwargs: Any)[source]

Bases: Module

Node-wise channel attention.

The layer learns a single scalar attention coefficient (alpha_{b,n}) per node n for every sample in the batch b. The coefficient is obtained by first aggregating the (optionally weighted) hidden channels that belong to the node and then normalising the aggregated scores across all nodes with a sigmoid gates per node (no cross-node normalization). The resulting attention weights can be:

Interpreted - (alpha_{b,n}) tells how important node n was for the current forward pass.
Applied - the coefficients are broadcast back to the individual channels that originated from the node and multiplied with the original activations, producing an attention-modulated output.

Parameters:

channel_groups (Sequence[int] or Tensor) – A 1-D list/array mapping global channel index → node index. Length equals the total number of hidden channels across all nodes.
dropout (float, optional (default=0.0)) – Dropout probability applied to the node-level attention weights.
temperature (float, optional (default=1.0)) – Softmax temperature. Lower values produce sharper distributions.

Examples

>>> # Suppose we have 2 nodes with 3 channels each (total 6 channels)
>>> ch_groups = [0, 0, 0, 1, 1, 1]
>>> attn = NodeAttention(ch_groups, dropout=0.1)
>>> x = torch.randn(8, 6)  # (batch=8, channels=6)
>>> out, alpha = attn(x, return_alpha=True)
>>> out.shape          # same shape as input
torch.Size([8, 6])
>>> alpha.shape        # one scalar per node
torch.Size([8, 2])

__init__(channel_groups, dropout: float = 0.0, temperature: float = 1.0, channels=16, edge_index=None, edge_weight=None)[source]

Methods

`__init__`(channel_groups[, dropout, ...])
`forward`(x, *[, return_alpha])	Apply node attention.

forward(x: torch.Tensor, *, return_alpha: bool = False)[source]

Apply node attention.

Parameters:

x (Tensor of shape (B, C)) – Input activations ordered so that channels belonging to the same node are indexed according to channel_groups.
return_alpha (bool, optional (default=False)) – If True, the method returns a tuple (out, alpha) where alpha is the attention matrix of shape (B, n_nodes).

Returns:

The attention-modulated activations (and, optionally, the node coefficients).

Return type:

Tensor or Tuple[Tensor, Tensor]