Wrong betweenness in both Networkx and Networkit? - python

Apparently, both node betweenness and edge betweenness calculated with networkx and networkit give different values than what supposed to be.
Lets consider the following undirected graph (pag. 20/85 in these Lecture notes), written as edge list and saved in mygraph.txt:
1 2
1 5
2 3
2 5
3 4
4 5
4 6
The node betweenness should be (pag. 20/85 in these Lecture notes):
Node | Betweenness
1 0
2 1.5
3 1
4 4
5 3
6 0
However, by running the following code (I used G1 and G2 as different names for the graph in networkx and networkit, but they are exactly the same graph, coming from the same file mygraph.txt):
import networkx as nx
from networkit import *
import networkit as nk
G1 = nx.read_edgelist("mygraph.txt",create_using=nx.Graph(), nodetype = int)
G1.number_of_nodes()
node_btw = nx.betweenness_centrality(G1, normalized=False)
edge_btw = nx.edge_betweenness_centrality(G1, k=None, normalized=False, weight=None, seed=None)
print('NETWORK-X')
print(node_btw.values())
print(edge_btw.values())
edgeListReader = nk.graphio.EdgeListReader(' ', 1)
G2 = nk.readGraph("/home/JohnRambo/Documents/myFolder/mygraph.txt", nk.Format.EdgeListTabOne)
print(G2.numberOfNodes(), G2.numberOfEdges())
G2.indexEdges()
btwn = nk.centrality.Betweenness(G2, normalized=False, computeEdgeCentrality=True)
btwn.run()
print('NETWORK-IT')
print(btwn.scores()[:10])
print(btwn.edgeScores()[:10])
I got these results (P.S.: I added manually the texts node betweenness and edge betweenness):
NETWORK-X
node betweenness: [0.0, 1.5, 3.0, 1.0, 4.5, 0.0]
edge betweenness: [2.0, 3.0, 3.5, 2.5, 5.5, 3.5, 5.0]
NETWORK-IT
node betweenness: [0.0, 3.0, 2.0, 9.0, 6.0, 0.0]
edge betweenness: [4.0, 7.0, 7.0, 6.0, 5.0, 11.0, 10.0]
My calculation gives different results (the node betweenness scores are in agreement with those ones shown at pag. 20/85 in these Lecture notes)
node betweenness: [0.0, 1.5, 1.0, 4.5, 3.0, 0.0]
edge betweenness: [2.0, 3.0, 3.5, 2.5, 3.5, 5.5, 5.0]
Could you be please clarify and suggest a way to fix this issue?

As Kyle mentions in his comment, the nodes in the network are added in
the sequence associated with the edgelist. To remedy this, a simple
sort will fix it. The third line of output shows the actual sequence
of nodes.
With respect to the edges, something unexpected happened with reading
the edge list from file: the edge (4, 5) is loaded as (5, 4). See
second line of output. This causes the expected sort order of the
edges, which would be like the file, to be different: the sixth and
seventh node are swapped.
The code below sorts the node betweenness dictionary by key values
(the node number), while putting the betweenness values in a tuple,
output in the fourth line.
The last lines show each edge with its betweenness value.
import networkx as nx
G1 = nx.read_edgelist("mygraph.txt",create_using=nx.Graph(), nodetype = int)
G1.number_of_nodes()
node_btw = nx.betweenness_centrality(G1, normalized=False)
edge_btw = nx.edge_betweenness_centrality(G1, k=None, normalized=False, weight=None, seed=None)
print('NETWORK-X')
print(tuple(node_btw.keys()))
print(tuple(edge_btw.keys()))
nn_btw = tuple(v for _,v in sorted(node_btw.items(), key=lambda x: x[0]))
print(nn_btw)
en_btw = tuple(v for _,v in sorted(edge_btw.items(), key=lambda x: x[0]))
print(en_btw)
for k,v in sorted(edge_btw.items(),key=lambda x: x[0]):
print(k, v)
Output:
# NETWORK-X
# (1, 2, 5, 3, 4, 6)
# ((1, 2), (1, 5), (2, 3), (2, 5), (5, 4), (3, 4), (4, 6))
# (0.0, 1.5, 1.0, 4.5, 3.0, 0.0)
# (2.0, 3.0, 3.5, 2.5, 3.5, 5.0, 5.5)
# (1, 2) 2.0
# (1, 5) 3.0
# (2, 3) 3.5
# (2, 5) 2.5
# (3, 4) 3.5
# (4, 6) 5.0
# (5, 4) 5.5

Related

How to handle 'interval' type values returned by pd.cut directly?

I am using 'pd.cut' to separate the array elements into different bins and use 'value_counts' to count the frequency of each bin. My code and the result I get are like this.
s = pd.Series([5,9,2,4,5,6,7,9,5,3,8,7,4,6,8])
pd.cut(s,5).value_counts()
>>> pd.cut(s,5).value_counts()
(4.8, 6.2] 5
(7.6, 9.0] 4
(1.993, 3.4] 2
(3.4, 4.8] 2
(6.2, 7.6] 2
I want to get the values of the first three lines of the index part of the result, that is:
[4.8, 6.2]
[7.6, 9.0]
[1.993, 3.4]
or is better:
[4.8, 6.2, 7.6, 9.0, 1.993, 3.4]
but I searched for some information and found that pandas does not seem to have a method to directly handle this interval data, so I had to use the following stupid method, then combine them into list or array:
v1 = pd.cut(s,5).value_counts().index[0].left
v2 = pd.cut(s,5).value_counts().index[0].right
v3 = pd.cut(s,5).value_counts().index[1].left
...
v6 = pd.cut(s,5).value_counts().index[2].right
So is there an easier way to achieve what I need?
Convert CategoricalIndex to IntervalIndex, so possible use IntervalIndex.left,
IntervalIndex.right:
s = pd.cut(s,5).value_counts()
i = pd.IntervalIndex(s.index)
L1 = list(zip(i.left, i.right))[:3]
print (L1)
[(4.8, 6.2), (7.6, 9.0), (1.993, 3.4)]
L2 = [y for x in L1 for y in x]
print (L2)
[4.8, 6.2, 7.6, 9.0, 1.993, 3.4]

Sum values from different dictionaries based on "temporal instant"

I have a dictionary, such as:
For each key (A1 and A2) I have a list of items bought in a different place (T1 and T2) with the amount spent by the temporal time. example: Person A1 spend 3.0 at timestep 1, at supermarket T1.
{'A1': {'T1': [1, 3.0, 3, 4.0], 'T2': [2, 2.0]}, 'A2': {'T1': [1, 0.0, 3, 5.0], 'T2': [2, 3.0]}}
What I want to do is sum each sub dictionary, to obtain the total spent at each timestep in each supermarket:
A1 A2 A1 A2 A1 A2
T1+T1 T2+T2 T1+T1 (The lists are followed by: timestep + money spent)
[3.0, 5.0, 9.0] <<<< output
1 2 3
res 3.0 + 0.0 = 3.0 and 2.0 + 3.0 = 5.0 and 5.0 + 4.0 = 9.0
How can I do this? I've tried a for, but I've created a big mess
Output:
[3.0, 5.0, 9.0]
Here is a quick solution for your problem, but I would suggest you use dict for time stamped value to distinguish between timestamp and value.
Instead of this:
{"A1": {"T1": [1, 3.0, 3, 4.0]}}
Do this:
{"A1": {"T1": {1: 3.0, 3: 4.0}}}
import json
dict = json.loads('{"A1": {"T1": [1, 3.0, 3, 4.0], "T2": [2, 2.0]}, "A2": {"T1": [1, 0.0, 3, 5.0], "T2": [2, 3.0]}}')
result = {}
for person, supermarket in dict.items():
for _, timestamped_values in supermarket.items():
for i in range(len(timestamped_values)):
if i % 2 == 0 and i < len(timestamped_values)-1:
result.setdefault(timestamped_values[i], []).append(timestamped_values[i+1])
print(result)
Result is:
{1: [3.0, 0.0], 3: [4.0, 5.0], 2: [2.0, 3.0]}
Just add up the values in the list to get your result. I will keep it this way just in case you need to do other operations on the timestamped values.

Summing over repeated indices in a dictionary and returning the resulting values [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have a dictionary:
d = {
'inds': [0, 3, 7, 3, 3, 5, 1],
'vals': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
}
I want to sum over the inds where it sums the repeated inds and outputs the following:
ind: 0 1 2 3* 4 5 6 7
x == [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
I've tried various loops but can't seem to figure it out or have idea where to begin otherwise.
>>> from collections import defaultdict
>>> indices = [0,3,7,3,3,5,1]
>>> vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
>>> d = defaultdict(float)
>>> for i, idx in enumerate(indices):
... d[idx] += vals[i]
...
>>> print(d)
defaultdict(<class 'float'>, {0: 1.0, 3: 11.0, 7: 3.0, 5: 6.0, 1: 7.0})
>>> x = []
>>> for i in range(max(indices)+1):
... x.append(d[i])
...
>>> x
[1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
Using itertools.groupby
>>> z = sorted(zip(indices, vals), key=lambda x:x[0])
>>> z
[(0, 1.0), (1, 7.0), (3, 2.0), (3, 4.0), (3, 5.0), (5, 6.0), (7, 3.0)]
>>> for k, g in itertools.groupby(z, key=lambda x:x[0]):
... print(k, sum([t[1] for t in g]))
0 1.0
1 7.0
3 11.0
5 6.0
7 3.0
You need x to be a list of sums for every value (say i) in the range of 'inds' in d (min to max) of the 'vals' in d that have a inds matching i at the same position.
d = {
'inds': [0, 3, 7, 3, 3, 5, 1],
'vals': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
}
result = [sum([val for ind, val in zip(d['inds'], d['vals']) if ind == i])
for i in range(min(d['inds']), max(d['inds']) + 1)]
print(result)
The output:
[1.0, 7.0, 0, 11.0, 0, 6.0, 0, 3.0]
No libraries required. Although the list comprehension isn't exactly easy to read - it's fairly efficient and matches the description.
A breakdown of the list comprehension into its parts:
for i in range(min(d['inds']), max(d['inds']) + 1) just gets i to range from the smallest value found in d['inds'] to the largest, the + 1 takes into account that range goes up to (but not including) the second passed to it.
zip(d['inds'], d['vals']) pairs up elements from d['inds'] and d['vals'] and the surrounding for ind, val in .. makes these pairs available as ind, val.
[val for ind, val in .. if ind == i] generates a list of val where ind matches the current i
So, all put together, it creates a list that has the sums of those values that have an index that matches some i for each i in the range of the minimum d['inds'] to the maximum d['inds'].

For-Loop Problems

I have the following function which will take in a list of 2D lists of size NxN, for example:
print(matrix)
[
[ [1.0, 2.0, 3.0, 4.0],
[5.0, 6.0, 7.0, 8.0],
[1.0, 2.0, 3.0, 4.0],
[5.0, 6.0, 7.0, 8.0] ],
[ [2.0, 3.0, 4.0, 5.0],
[7.0, 8.0, 9.0, 1.0],
[8.0, 0.0, 2.0, 4.0],
[1.0, 9.0, 5.0, 8.0] ]
]
Each "matrix" is actually a 2D list both with dimension = 4; making 'matrix' a 3D list with two 2D list entries. The function below will take in the dimension of the 2D list, some number of time periods (say 3), age_classes (again suppose 3), and 'values' which would be the 3D list from above.
def initial_values_ext(dimension,periods,age_classes,values):
dicts = {}
dict_keys = range(dimension)
time_keys = range(periods)
age_keys = range(age_classes)
for i in dict_keys:
for j in dict_keys:
for t in time_keys:
for k in age_keys:
if t == 0:
dicts[i+1,j+1,t+1,k+1] = values[k][i][j]
else:
dicts[i+1,j+1,t+1,k+1] = 1
return dicts
The function 'initial_values_ext' will then pass those 2D lists and generates a dictionary. Each 2D list corresponded with an age class - so the first 2D list would be age_classes = 1 and the second 2D list would be age_classes = 2, and if there was an additional 2D list then it would correspond to age_classes = 3, and so on. So if we were to call the function, then a couple of the outputs might look like the following:
initial_values_ext(dimension=4, periods=3, age_classes=2,values=matrix)
(1,1,1,1):1.0
(1,1,1,2):2.0
(1,1,2,2):1.0
(3,4,1,1):7.0
(3,4,1,2):5.0
(3,4,2,1):1.0
So the final output would be a full dictionary of values that starts at (1,1,1,age_class=1):1.0 and ends at (4,4,2,age_class=2):8.0. Importantly, the resulting dictionary will pull from the first 2D list of 'matrix' when age_class=1 and will pull from the second 2D of 'matrix' when age_class=2
Edit: Below I have included the code that I have made for when the input matrix is only a list of lists and when there is no fourth entry of the dictionary.
matrix = [[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0]]
def initial_values(dimension,periods,values):
dicts = {}
dict_keys = range(dimension)
time_keys = range(periods)
for i in dict_keys:
for j in dict_keys:
for t in time_keys:
if t == 0:
dicts[i+1,j+1,t+1] = values[i][j]
else:
dicts[i+1,j+1,t+1] = 1
return dicts
Output:
initial_values(4,2,matrix)
{(1, 1, 1): 1.0,
(1, 1, 2): 1,
(1, 2, 1): 2.0,
(1, 2, 2): 1,
(1, 3, 1): 3.0,
(1, 3, 2): 1,
(1, 4, 1): 4.0,
(1, 4, 2): 1,
(2, 1, 1): 5.0,
(2, 1, 2): 1,
(2, 2, 1): 6.0,
(2, 2, 2): 1,
(2, 3, 1): 7.0,
(2, 3, 2): 1,
(2, 4, 1): 8.0,
(2, 4, 2): 1,
(3, 1, 1): 1.0,
(3, 1, 2): 1,
(3, 2, 1): 2.0,
(3, 2, 2): 1,
(3, 3, 1): 3.0,
(3, 3, 2): 1,
(3, 4, 1): 4.0,
(3, 4, 2): 1,
(4, 1, 1): 5.0,
(4, 1, 2): 1,
(4, 2, 1): 6.0,
(4, 2, 2): 1,
(4, 3, 1): 7.0,
(4, 3, 2): 1,
(4, 4, 1): 8.0,
(4, 4, 2): 1}
I made some modifications to make your approach more pythonic.
def initial_values_ext(dimension, periods, age_classes, values):
x = list(map(range,[dimension, periods, age_classes]))
dicts = {(i+1,j+1,t+1,k+1) : values[k][i][j] if t==0 else 1 \
for i in x[0] for j in x[0] for t in x[1] for k in x[2]}
return dicts
The function call was missing an additional looping index when 'values' are called:
def initial_values_ext(dimension,periods,age_classes,values):
dicts = {}
dict_keys = range(dimension)
time_keys = range(periods)
age_keys = range(age_classes)
for i in dict_keys:
for j in dict_keys:
for t in time_keys:
for k in age_keys:
if t == 0:
dicts[i+1,j+1,t+1,k+1] = values[k][i][j]
else:
dicts[i+1,j+1,t+1,k+1] = 1
return dicts

Python: how to make two lists from a dictionary

I have a dictionary.
{1 : [1.2, 2.3, 4.9, 2.0], 2 : [4.1, 5.1, 6.3], 3 : [4.9, 6.8, 9.5, 1.1, 7.1]}
I want to pass each key:value pair to an instance of matplotlib.pyplot as two lists: x values and y values.
Each key is an x value associated with each item in its value.
So I want two lists for each key:
[1,1,1,1] [1.2,2.3,4.9,2.0]
[2,2,2] [4.1,5.1,6.3]
[3,3,3,3,3] [4.9,6.8,9.5,1.1,7.1]
Is there an elegant way to do this?
Or perhaps there is a way to pass a dict to matplotlib.pyplot?
for k, v in dictionary.iteritems():
x = [k] * len(v)
y = v
pyplot.plot(x, y)
d = {1 : [1.2, 2.3, 4.9, 2.0], 2 : [4.1, 5.1, 6.3], 3 : [4.9, 6.8, 9.5, 1.1, 7.1]}
res = [([x]*len(y), y) for x, y in d.iteritems()]
res will be a list of tuples, where the first element in the tuple is your list of x-values and second element in the tuple is your list f y-values
Maybe something like:
d = {1 : [1.2, 2.3, 4.9, 2.0], 2 : [4.1, 5.1, 6.3], 3 : [4.9, 6.8, 9.5, 1.1, 7.1]}
result = []
for key, values in d.items():
result.append(([key]*len(values), values))
Use this list comprehension:
[([k]*len(v), v) for k, v in D.iteritems()]
Here's an example of it being used:
>>> from pprint import pprint
>>> D = {1: [1.2, 2.3, 4.9, 2.0], 2: [4.1, 5.1, 6.3], 3: [4.9, 6.8, 9.5, 1.1, 7.1]}
>>> LL = [([k]*len(v), v) for k, v in D.iteritems()]
>>> pprint(LL)
[([1, 1, 1, 1], [1.2, 2.2999999999999998, 4.9000000000000004, 2.0]),
([2, 2, 2], [4.0999999999999996, 5.0999999999999996, 6.2999999999999998]),
([3, 3, 3, 3, 3],
[4.9000000000000004,
6.7999999999999998,
9.5,
1.1000000000000001,
7.0999999999999996])]
As a list comprehension:
r = [([k]*len(v), v) for k,v in d.items()]
If your dictionary is very large, you'd want to use a generator expression:
from itertools import repeat
r = ((repeat(k, len(v)), v) for k,v in d.iteritems())
...though note that using repeat means that the first item in each tuple the generator returns is itself a generator. That's unnecessary if the dictionary's values don't themselves have many items.
>>> d = {1 : [1.2, 2.3, 4.9, 2.0], 2 : [4.1, 5.1, 6.3], 3 : [4.9, 6.8, 9.5, 1.1, 7.1]}
>>> result = [ ([k] * len(d[k]), d[k]) for k in d.keys() ]
>>> print result
[([1, 1, 1, 1], [1.2, 2.2999999999999998, 4.9000000000000004, 2.0]), ([2, 2, 2],
[4.0999999999999996, 5.0999999999999996, 6.2999999999999998]), ([3, 3, 3, 3, 3],
[4.9000000000000004, 6.7999999999999998, 9.5, 1.1000000000000001, 7.0999999999999996])]
I guess that a wizard will put something nicer, but I would do something like:
map(lambda x: ([x]*len(a[x]),a[x]),a)
for a tuple, or
map(lambda x: [[x]*len(a[x]),a[x]],a)
for a list.
btw: a is the dictionary, of course!
I assume that you work with the 2.x series...
Regards
the map function in python will allow this
x = [1,2,4]
y = [1,24,2]
c = zip(x,y)
print c
d = map(None,x,y)
print d
check it out. This will give you
[(1, 1), (2, 24), (4, 2)]
In the case of zip(), if one of the lists are smaller then the others, values will be truncated:
x = [1,2,4]
a = [1,2,3,4,5]
c = zip(x,a)
print c
d = map(None,x,a)
print d
[(1, 1), (2, 2), (4, 3)]
[(1, 1), (2, 2), (4, 3), (None, 4), (None, 5)]

Categories