gsnn.proc.coarsen

Implement simply coarsening of a network.

should have various algorithms implemented and organized.

Importantly, the coarsening methods must not change the connectivity of the network, for instance, the list of descendents/ancestors of a grouped node should be equivalent to any of the individual nodes that were grouped.

valid node aggregations would be:

original: A (input) -> B -> C -> D (output) coarsened: A (input) -> BC (function) -> D (output)

# Now I’m wondering if this example is really valid… # arguably, if there are no other constraints, then we can’t specify B,C by strucutre since they have identical descendents and ancestors.

/-> B –original:(input) A D (output) -> C –/

coarsened: A (input) -> BC (function) -> D (output)

It’s also possible that there are “off shoots” that are valid paths, but could be largely simplified.

A

V B <-> C <-> D <-> E | V C

In this case, C,D,E are valid paths, but don’t are cycles that could be aggregated into B.

To generalize these cases, let’s try to define what a valid aggregation is.

Let I be all input nodes in the network and O be all output nodes. It is a valid aggregation of a set of nodes (N) if for every node i in N:

ancestors(i) intersection I = ancestors(N) intersection I and descedants(i) intersection I = descedants(N) intersection I

Call this IO equivalence: - Input ancestor equivalence - Output descedent equivalence

Arguably, if any two nodes have IO equivalence, then we can’t define them only by their structure. Although, there might be an argument that path lengths will impact the IO equivalence, e.g., A,B are IO equivalent but A has bath 1 to I0 and B has path 20 to I0.

We could use a diffusion process to capture path lengths and structure.

This would need to be done with one diffusion channel per input/output node.

Functions

`diff_equivalence`(G, sources, nodes[, iters])	Compute diffusion-based relational equivalence scores.
`diff_io_equivalence`(G, input_nodes, ...[, iters])	Placeholder for future implementation of combined diffusion-based input/output equivalence.
`io_equivalence`(G, input_nodes, ...)

gsnn.proc.coarsen.diff_equivalence(G, sources, nodes, iters: int = 10)[source]

Compute diffusion-based relational equivalence scores.

A channel is defined for every node contained in sources. Diffusion is performed independently inside each channel, meaning that information from one source never influences another source. Concretely, we initialise a one–hot vector for every source (shape \([N, |sources|]\), where N is the number of nodes in G). At every iteration the feature matrix is propagated along out-going edges using a row-normalised transition matrix. After iters iterations we return the diffusion values for the queried nodes.

Parameters:

G (networkx.DiGraph) – The graph on which diffusion is performed.
sources (list) – A list of node identifiers that act as sources/channels.
nodes (list) – Nodes (subset of G) for which diffusion scores are returned.
iters (int, optional (default=100)) – Number of synchronous diffusion steps.

Returns:

Array of shape (len(nodes), len(sources)) containing diffusion scores. scores[i, j] corresponds to the amount of signal that reached nodes[i] from sources[j] after iters steps.

Return type:

numpy.ndarray

gsnn.proc.coarsen.diff_io_equivalence(G, input_nodes, function_nodes, output_nodes, iters: int = 10)[source]

Placeholder for future implementation of combined diffusion-based input/output equivalence. Currently unimplemented.

This function remains as a stub so that coarsen.py can be imported without raising syntax errors. It will be completed in a future commit.

gsnn.proc.coarsen.io_equivalence(G, input_nodes, function_nodes, output_nodes)[source]