create nested groups of nodes in networkx - python

I am trying to use networkx to create groups within groups, etc of nodes.
For instance I have nodes [1,2,3,4,5,6] currently no sub groups. I want it to end up being like this
[1,2,[[3,4],[5,6]]
Currently I am just doing a for loop with some data and adding the nodes to the graph like this.
self.G = nx.Graph()
for s in self.tabledata:
self.G.add_node(s[0])
self.G.__dict__['_node'][s[0]]['label'] = '{0}'.format(s[0])
nx.write_graphml(self.G, '/filename')
where self.tabledata contains the following values [1234,2345,3456,4567,5678,6789]
I want to move 3 and 4 to be in a group 'A' together, 5 and 6 to be in group 'B' together and groups A and B to be in group 'C' along with nodes 1, and 2.
so as far as groups are concerned you have this [C[A,B]]
Any ideas how this can be accomplished?

Related

How to group objects in a dictionary based on their properties and a threshold in Python

I have a list of objects:
objects_list= [o1, o2, ...]
I would like to group them based on their location property. I know that this code will group them based on which ones have the exact same location :
from collections import defaultdict
groups = defaultdict(list)
for o in objects_list:
groups[o.location].append(o)
grouped_objects = groups.values()
And this gives the expected result but I am now wondering how to tweak this code to add a "threshold" so instead of being grouped based on which ones have the exact same location, they are grouped based on how close their locations are according to a given threshold.
Is there anyway I could do that?

How to loop through multiple variables and filter data to produce multiple dataframes?

I have a dataframe df with about 1000 rows with the following columns:
df$ID<- c("ab11", "ab12" ...) #about 1000 rows
df$ID1<-numbers ranging from 1 to 20k # for all intense and purposes this can be treated as class 'factor'
df$Acol<- #numbers ranging from 1 to 1000
df$Bcol<- #numbers ranging from 0 to 1
The following lists gives me 12 values in each list:
A<- seq(50,600,by=50)
B<- seq(0.2,1,by=0.75)
I am trying to do 2 things:
I would like to create dataframes by filtering the original dataset with various combinations of lists A and B. So 144 dataframes.
Once I have those dataframes I would like to combine 3 dataframes at a time and see if the frequency of the IDs match a master dataframe x and if they do, get the combinations information for the matching dataframe.
So for 1, this is my approach:
df_50_0.2<-subset(df, df$Acol>=50 & df$Bcol>=0.2)
I can't write that out 144 times- I need a loop. I tried nested loop but that doesn't give me every combination of A and B so I tried a while loop.
Here is my code:
i<-50
while (i<550) {
for (j in B) {
assign(paste("df","_",as.character(i),"_",as.character(j)), df %>%
filter (Acol>=i) %>%
filter(Bcol>=j),envir=.GlobalEnv
i<-i+50
}}
That give me the desired result except it doesn't split the dataframe according to B. So the output is similar to what I would have if I had just filtered the data with values of A.
For the second part I need to loop through all possible combinations of three data frames at a time. Here is my code:
df.final<-rbind (df_50_0.2,df_100_0.25,df_150_0.5)
tmp<-subset(table(df.final$ID),!(table(df.final$ID) %in% table(x$ID))
I would like the above to be in a loop. If tmp has any values, I don't want it to be an output. If it is 0, that is, it is a perfect match to the frequency of IDs in the master dataframe x, I want that to be written. So something like the following in a loop? I want all possible combinations checked iteratively to come up with the combinations that match the master dataframe x ID frequency perfectly:
if tmp = NULL
tmp
else rm(tmp)
Any help is much appreciated. A python solution is also welcome!
A solution available in the following link but modified for two columns could be helpful Filter loop to create multiple data frames
This is not a complete answer, but it represents a different way to think about your problem. First we need a reproducible sample. I'm leaving out your first two columns, but it would not be difficult to modify the example to include them.
set.seed(42)
df <- data.frame(A=runif(200) * 1000 + 1, B=runif(200))
Aint <- seq(50, 600, by=50)
Bint <- seq(0.2, 1, by=0.05)
comb <- expand.grid(Aint=Aint, Bint=Bint)
This gives you a data frame of 200 observations and columns A and B. Then we create the intervals (called Aint and Bint so we don't confuse them with df$A and df$B. The next line creates all possible combinations of those intervals - 204 (not 144 since Bint has 17 values).
Now we need to define your groups and generate the label for each one:
groups <- apply(comb, 1, function(x) df$A > x[1] & df$B > x[2])
labels <- paste("df", comb[, 1], comb[, 2], sep="_")
groups is a logical matrix 200 x 204 so each column defines one of your comparisons. The
first group is defined as df[groups[, 1], ] and the second as df[groups[, 2], ]. The labels for those comparisons are labels[1] and labels[2].
Since your groups are overlapping, the next step will create multiple copies of your data in each group of three. The number of combinations of 204 groups taken 3 at a time is 1,394,204. The following generates the code to create the different combinations of 3 data frames:
allcombos <- combn(204, 3, simplify=TRUE)
for (i in 1:5) {
dfno <- allcombos[, i]
df.final <- rbind(df[groups[, dfno[1]], ], df[groups[, dfno[2]], ], df[groups[, dfno[3]], ])
lbl <- labels[dfno]
}
The next step involves subset(table(df.final$ID),!(table(df.final$ID) %in% table(x$ID)) but that line does not do what you suggest. It creates a frequency table of the number of times each ID appears in df.final and a table for a master data frame x. The %in% statement processes each frequency in df.final and checks to see if it matches the frequency of ANY ID in master. So I have not included that logical statement or code that would write the current df.final to a list.

Is the Dataframe is ok for representing graph?

I want to represent relationships between nodes in python using pandas.DataFrame
And each relationship has weight so I used dataframe like this.
nodeA nodeB nodeC
nodeA 0 5 1
nodeB 5 0 4
nodeC 1 4 0
But I think this is improper way to express relationships because the dataframe
is symmetric, has duplicated datas.
Is there more proper way than using dataframe to represent graph in python?
(Sorry for my bad English)
This seems like an acceptable way to represent a graph, and is in fact compatible with, say, nextworkx. For example, you can recover a nextworkx graph object as follows:
import networkx as nx
g = nx.from_pandas_adjacency(df)
print(g.edges)
# [('nodeA', 'nodeB'), ('nodeA', 'nodeC'), ('nodeB', 'nodeC')]
print(g.get_edge_data('nodeA', 'nodeB'))
# {'weight': 5}
If your graph is sparse, you may want to store it as an edge list instead, e.g. as discussed here.

Networkx Array of Business Connections

I am trying to create a networkx graph mapping the business connections in our database. In other words, I would like every id (i.e. each individual business) to be a node, and I would like there to be line connecting the nodes that are 'connected'. A business is considered as connected with another if the lead_id and connection_id are associated together as per the below data structure.
lead_id connection_id
56340 1
56340 2
58684 3
58696 4
58947 5
Every example I find on the networkx documentation uses the following
G=nx.random_geometric_graph(200,0.125)
pos=nx.get_node_attributes(G,'pos')
I am trying to determine how to incorporate my values into this.
Here is a way to create a graph from the data presented:
G = nx.Graph()
for node in zip(data.lead_id,data.connection_id):
G.add_edge(node[0],node[1])

How to output attributes of nodes of a Graph (NetworkX) into a DataFrame (Pandas)

I am working with networks as graph of the interaction between characters in Spanish theatre plays. Here a visualisation:
I passed several attributes of the nodes (characters) as a dataframe to the network, so that I can use this values (for example the color of the nodes is set by the gender of the character). I want to calculate with NetworkX different values for each nodes (degree, centrality, betweenness...); then, I would like to output as a DataFrame both my attributes of the nodes and also the values calculated with NetworkX. I know I can ask for specific attributes of the nodes, like:
nx.get_node_attributes(graph,'degree')
And I could build a DataFrame using that, but I wonder if there is no a more elegant solution. I have tried also:
nx.to_dict_of_dicts(graph)
But this outputs only the edges and not the information about the nodes.
So, any help, please? Thanks!
If I understand your question correctly, you want to have a DataFrame which has nodes and some of the attribute of each node.
G = nx.Graph()
G.add_node(1, {'x': 11, 'y': 111})
G.add_node(2, {'x': 22, 'y': 222})
You can use list comprehension as follow to get specific node attributes:
pd.DataFrame([[i[0], i[1]['x'], i[1]['y']] for i in G.nodes(data=True)]
, columns=['node_name', 'x', 'y']).set_index('node_name')
# x y
#node_name
#1 11 111
#2 22 222
or if there are many attributes and you need all of them, this could be a better solution:
pd.DataFrame([i[1] for i in G.nodes(data=True)], index=[i[0] for i in G.nodes(data=True)])

Categories