gsnn.gsnn.proc.subset
Functions
|
Perform breadth-first search from a start node and return shortest path distances. |
|
Build a NetworkX directed graph from function interactions, drug targets, and outputs. |
|
Compute shortest path lengths from root to leaf that pass through each node. |
|
Subset a graph to include only nodes that lie on paths from roots to leaves. |
Classes
|
deque([iterable[, maxlen]]) --> deque object |
- gsnn.gsnn.proc.subset.bfs_distance(G, start_node, depth, node_names)[source]
Perform breadth-first search from a start node and return shortest path distances.
This function computes the shortest path distances from a starting node to all reachable nodes within a specified maximum depth. The distances are returned as a numpy array where unreachable nodes are marked with infinity.
- Parameters:
- Returns:
- An array of shape
[len(node_names)]where each element represents the shortest path distance from the start node to the corresponding node. Unreachable nodes are marked with
np.inf.
- An array of shape
- Return type:
numpy.ndarray
Example
>>> import networkx as nx >>> G = nx.DiGraph() >>> G.add_edges_from([(0, 1), (1, 2), (0, 3), (3, 2)]) >>> node_names = [0, 1, 2, 3] >>> distances = bfs_distance(G, 0, 2, node_names) >>> print(distances) # [0, 1, 2, 1]
- gsnn.gsnn.proc.subset.build_nx(func_df, targets, outputs)[source]
Build a NetworkX directed graph from function interactions, drug targets, and outputs.
This function constructs a heterogeneous directed graph with three types of nodes: - Drug nodes (prefixed with ‘DRUG__’) - Function/Protein nodes (prefixed with ‘PROTEIN__’) - RNA/Output nodes (prefixed with ‘RNA__’ and ‘LINCS__’)
The graph represents a biological signaling network where drugs target proteins, proteins interact with each other, and proteins regulate RNA outputs.
- Parameters:
func_df (pandas.DataFrame) – DataFrame containing protein-protein interactions. Must have columns ‘source’ and ‘target’ representing interacting proteins.
targets (pandas.DataFrame) – DataFrame containing drug-target interactions. Must have columns ‘pert_id’ (drug identifier) and ‘target’ (protein target).
outputs (list) – List of RNA/gene identifiers that represent the outputs.
- Returns:
- A directed graph representing the biological network with
drug-protein, protein-protein, and protein-RNA interactions.
- Return type:
networkx.DiGraph
Example
>>> import pandas as pd >>> func_df = pd.DataFrame({ ... 'source': ['PROTEIN__A', 'PROTEIN__B'], ... 'target': ['PROTEIN__B', 'PROTEIN__C'] ... }) >>> targets = pd.DataFrame({ ... 'pert_id': ['drug1', 'drug2'], ... 'target': ['A', 'B'] ... }) >>> outputs = ['gene1', 'gene2'] >>> G = build_nx(func_df, targets, outputs) >>> print(list(G.nodes())) # ['DRUG__drug1', 'DRUG__drug2', 'PROTEIN__A', ...]
- gsnn.gsnn.proc.subset.get_all_possible_paths_set(G, rG, root, leaf, depth, root_distance_dict, leaf_distance_dict, node_names)[source]
Compute shortest path lengths from root to leaf that pass through each node.
This function calculates the shortest path length from a root node to a leaf node that goes through each node in the graph. For each node n, it computes: min_path_length(root → n → leaf) = distance(root → n) + distance(n → leaf)
The algorithm uses forward BFS from the root and reverse BFS from the leaf, then combines the distances to find the shortest path through each node. Caching dictionaries are used to avoid recomputing distances for the same nodes.
- Parameters:
G (networkx.DiGraph) – The original directed graph.
rG (networkx.DiGraph) – The reverse of the original graph (all edges flipped).
root – The starting node (root node).
leaf – The target node (leaf node).
depth (int) – The maximum depth to explore in BFS calculations.
root_distance_dict (dict) – Cache dictionary for root node distances.
leaf_distance_dict (dict) – Cache dictionary for leaf node distances.
node_names (list) – List of all node names in the graph.
- Returns:
- A tuple containing:
- spl (numpy.ndarray): Shortest path lengths from root to leaf through each node.
Shape
[len(node_names)]where each element represents the minimum path length from root to leaf that passes through the corresponding node. Nodes that cannot be reached from root or cannot reach leaf are marked withnp.inf.
root_distance_dict (dict): Updated cache of root distances.
leaf_distance_dict (dict): Updated cache of leaf distances.
- Return type:
Example
>>> import networkx as nx >>> G = nx.DiGraph() >>> G.add_edges_from([(0, 1), (1, 2), (0, 3), (3, 2)]) >>> rG = G.reverse() >>> node_names = [0, 1, 2, 3] >>> root_dist_dict = {} >>> leaf_dist_dict = {} >>> spl, root_dict, leaf_dict = get_all_possible_paths_set( ... G, rG, 0, 2, 3, root_dist_dict, leaf_dist_dict, node_names ... ) >>> print(spl) # [inf, 2, 0, 2] - shortest path lengths through each node >>> # Node 0: inf (cannot reach leaf 2) >>> # Node 1: 2 (path 0→1→2) >>> # Node 2: 0 (is the leaf itself) >>> # Node 3: 2 (path 0→3→2)
- gsnn.gsnn.proc.subset.subset_graph(G, depth, roots, leafs, verbose=True, distance_dicts=None, return_dicts=False)[source]
Subset a graph to include only nodes that lie on paths from roots to leaves.
This function creates a subgraph by identifying nodes that have at least one path from any root node to any leaf node within the specified depth. The algorithm computes shortest path distances from all roots to all leaves and includes nodes that lie on paths of length less than or equal to the specified depth.
- Parameters:
G (networkx.DiGraph) – The original directed graph to subset.
depth (int) – The maximum path length to consider when determining node inclusion.
roots (list) – List of root node identifiers.
leafs (list) – List of leaf node identifiers.
verbose (bool, optional) – If
True, print progress information. (default:True)distance_dicts (tuple, optional) – Pre-computed distance dictionaries for caching. Should be a tuple of (root_distance_dict, leaf_distance_dict). (default:
None)return_dicts (bool, optional) – If
True, return the distance dictionaries along with the subgraph for potential reuse. (default:False)
- Returns:
- If
return_dicts=False, returns the subgraph. If
return_dicts=True, returns a tuple containing: - subgraph (networkx.DiGraph): The subsetted graph - distance_dicts (tuple): Cached distance dictionaries for reuse
- If
- Return type:
networkx.DiGraph or tuple
Example
>>> import networkx as nx >>> G = nx.DiGraph() >>> G.add_edges_from([(0, 1), (1, 2), (0, 3), (3, 2), (2, 4), (4, 5)]) >>> roots = [0] >>> leafs = [2, 5] >>> subgraph = subset_graph(G, depth=3, roots=roots, leafs=leafs) >>> print(list(subgraph.nodes())) # [0, 1, 2, 3] - nodes on paths to targets