How to cluster a graph using python igraph

How to cluster a graph using python igraph - python

I've been using python igraph to try to make an easier time of generating and analyzing graphs. My code below generates a random graph of 50 nodes and clusters it:
from igraph import *
import random as rn
g = Graph()
size = 50
g.add_vertices(size)
vert = []
for i in range(size):
for j in range(size):
test = rn.randint(0,5)
if j >= i or test is not 0:
continue
g.add_edges([(i,j)])
#layout = g.layout("kk")
#plot(g, layout = layout)
#dend = VertexDendrogram(graph=g, optimal_count=10)
clust = VertexClustering(g, membership=range(size))
#clust = dend.as_clustering()
c = clust.cluster_graph()
plot(clust, mark_groups=True)
layout = c.layout("kk")
#plot(c, layout = layout)
However, I'm not sure if this is correct, since the result has every single node as its own, isolated cluster. I am assuming this is somehow related to membership, the required parameter of VertexClustering, because I really don't understand what it means or why it is required.
Here is what the terse documentation says:
the membership list. The length of the list must be equal to the number of vertices in the graph. If None, every vertex is assumed to belong to the same cluster.
And doesn't explain what I am supposed to set membership to in order to get a normal regular clustering.
Alternatively I tried using the VertexDendrogram as seen in the two commented lines of code. However, running this comes up with this runtime error:
Traceback (most recent call last):
File "sma_hw6_goedeke.py", line 22, in <module>
dend = VertexDendrogram(graph=g, optimal_count=10)
TypeError: __init__() takes at least 3 arguments (3 given)
This is because VertexDendrogram requires the parameter merges. However, once again I have no clue what merges is supposed to be set to or why it is required. The documentation again says almost nothing:
merges - the merges performed given in matrix form.
And I have no clue what that is. So what should I do to this code to get a normal, regular clustering for my random graph I can experiment with?

I stand by my original assessment that the documentation for igraph is maddeningly too terse.
The function I actually needed for what I am doing is igraph.Graph.community_fastgreedy(). In general, it seems all the functions that run a clustering algorithm all start with "community" in their name.

Related

networkx merging graph ram issue

I'm quite a newbie with networkx and it seems that i'm having RAM issues when running a function that merges two different graphs. This function adds up the weight of edges that are common to both graphs.
I have a list of 8~9 graphs each containing about 1000-2000 nodes which I merge in this loop:
FinalGraph = nx.Graph()
while len(graphs_list)!=0:
FinalGraph = merge_graphs(FinalGraph,graphs_list.pop())
using this function
def merge_graphs(graph1, graph2):
edges1 = graph1.edges
edges2 = graph2.edges
diff1 = edges1 - edges2
if diff1==edges1:
return nx.compose(graph1,graph2)
else:
common_edges = list(edges1 - diff1)
for edges in common_edges:
graph1[edges[0]][edges[1]]['weight'] += graph2[edges[0]][edges[1]]['weight']
return nx.compose(graph2, graph1)
When running my script, my computer will always freeze when reaching this loop. Am i doing some kind of bad reference cycle or something ? Am i missing something in the networkx doc more effective that could help me not use this function for my purpose ?
Thanks for reading me, I hope i'm making sense

There seems to be a lot of extra work going on here caused by you trying to check if the conditions allow you to use compose. This may be contributing to the trouble. I think it might work better to just iterate through the edges and nodes of the graph. The following looks like a more direct way to do it (and doesn't require creating as many variables, which might be contributing to the memory issues)
final_graph = nx.Graph()
for graph in graphs_list:
final_graph.add_nodes_from(graph.nodes())
for u, v, w in graph.edges(data=True):
if final_graph.has_edge(u,v):
final_graph[u][v]['weight'] += w
else:
final_graph.add_edge(u,v,weight = w)

Mulitprocessing and rpy2 (with ape)

I ran into this today and can't figure out why. I have several functions chained together that perform some time consuming operations as part of a larger pipeline. I've included these here, pared down to a test example, as best as I could. The issue is that when I call a function directly, I get the expected output (e.g., 5 different trees). However, when I call the same function in a multiprocessing pool with apply_async (or apply, doesn't matter), I get 5 trees, but they are all the same.
I've documented this in an IPython notebook, which can be viewed here: http://nbviewer.ipython.org/gist/cfriedline/0e275d528ff1a8d674c6
In cell 91, I create 5 trees (each with 10 tips), and return two lists. The first containing the non-multiprocessing trees, and the second from apply_async.
In cell 92, you can see the results of creating trees without multiprocessing, and in 93, with multiprocessing.
What I expect is that there would be a total of 10 different trees between the two tests, but instead all of the multiprocessing trees are identical. Makes little sense to me.
Relevant versions of things:
Linux 2.6.18-238.12.1.el5 x86_64 GNU/Linux
Python 2.7.6 :: Anaconda 1.9.2 (64-bit)
IPython 2.0.0
Rpy2 2.3.9
Thanks!
Chris

I solved this one, with a point in the right direction from #mgilson. In fact, it was a random number problem, just not in python - in R (sigh). The state of R is copied when the Pool is created, meaning so is its random seed. To fix, just a little rpy2 as below calling R's set.seed function (with some process specific stuff for good measure):
def create_tree(num_tips, type):
"""
creates the taxa tree in R
#param num_tips: number of taxa to create
#param type: type for naming (e.g., 'taxa')
#return: a dendropy Tree
#rtype: dendropy.Tree
"""
r = rpy2.robjects.r
set_seed = r('set.seed')
set_seed(int((time.time()+os.getpid()*1000)))
rpy2.robjects.globalenv['numtips'] = num_tips
rpy2.robjects.globalenv['treetype'] = type
name = _get_random_string(20)
if type == "T":
r("%s = rtree(numtips, rooted=T, tip.label=paste(treetype, seq(1:(numtips)), sep=''))" % name)
else:
r("%s = rtree(numtips, rooted=F, tip.label=paste(treetype, seq(1:(numtips)), sep=''))" % name)
tree = r[name]
return ape_to_dendropy(tree)

I'm not 100% familiar with these libraries, however, on Linux, (IIRC) multiprocessing uses os.fork. This means that the state of the random module (which you're using) will also be forked and that each of your processes will generate the same sequence of random numbers resulting in a not-so-random _get_random_string function.
If I'm right, and you make the pool smaller than the number of trees that you want, you should see that you get groups of N identical trees (where N is the number of pools).
I think that probably the ideal solution is to re-seed the random number generator inside of each of the processes. It's unlikely that they'll run at exactly the same time, so you should get differing results.

Python something resets my random seed

My question is the exact opposite of this one.
This is an excerpt from my test file
f1 = open('seed1234','r')
f2 = open('seed7883','r')
s1 = eval(f1.read())
s2 = eval(f2.read())
f1.close()
f2.close()
####
test_sampler1.random_inst.setstate(s1)
out1 = test_sampler1.run()
self.assertEqual(out1,self.out1_regress) # this is fine and passes
test_sampler2.random_inst.setstate(s2)
out2 = test_sampler2.run()
self.assertEqual(out2,self.out2_regress) # this FAILS
Some info -
test_sampler1 and test_sampler2 are 2 object from a class that performs some stochastic sampling. The class has an attribute random_inst which is an object of type random.Random(). The file seed1234 contains a TestSampler's random_inst's state as returned by random.getstate() when it was given a seed of 1234 and you can guess what seed7883 is. What I did was I created a TestSampler in the terminal, gave it a random seed of 1234, acquired the state with rand_inst.getstate() and save it to a file. I then recreate the regression test and I always get the same output.
HOWEVER
The same procedure as above doesn't work for test_sampler2 - whatever I do not get the same random sequence of numbers. I am using python's random module and I am not importing it anywhere else, but I do use numpy in some places (but not numpy.random).
The only difference between test_sampler1 and test_sampler2 is that they are created from 2 different files. I know this is a big deal and it is totally dependent on the code I wrote but I also can't simply paste ~800 lines of code here, I am merely looking for some general idea of what I might be messing up...
What might be scrambling the state of test_sampler2's random number generator?
Solution
There were 2 separate issues with my code:
1
My script is a command line script and after I refactored it to use python's optparse library I found out that I was setting the seed for my sampler using something like seed = sys.argv[1] which meant that I was setting the seed to be a str, not an int - seed can take any hashable object and I found it the hard way. This explains why I would get 2 different sequences if I used the same seed - one if I run my script from the command line with sth like python sample 1234 #seed is 1234 and from my unit_tests.py file when I would create an object instance like test_sampler1 = TestSampler(seed=1234).
2
I have a function for discrete distribution sampling which I borrowed from here (look at the accepted answer). The code there was missing something fundamental: it was still non-deterministic in the sense that if you give it the same values and probabilities array, but transformed by a permutation (say values ['a','b'] and probs [0.1,0.9] and values ['b','a'] and probabilities [0.9,0.1]) and the seed is set and you will get the same random sample, say 0.3, by the PRNG, but since the intervals for your probabilities are different, in one case you'll get a b and in one an a. To fix it, I just zipped the values and probabilities together, sorted by probability and tadaa - I now always get the same probability intervals.
After fixing both issues the code worked as expected i.e. out2 started behaving deterministically.

The only thing (apart from an internal Python bug) that can change the state of a random.Random instance is calling methods on that instance. So the problem lies in something you haven't shown us. Here's a little test program:
from random import Random
r1 = Random()
r2 = Random()
for _ in range(100):
r1.random()
for _ in range(200):
r2.random()
r1state = r1.getstate()
r2state = r2.getstate()
with open("r1state", "w") as f:
print >> f, r1state
with open("r2state", "w") as f:
print >> f, r2state
for _ in range(100):
with open("r1state") as f:
r1.setstate(eval(f.read()))
with open("r2state") as f:
r2.setstate(eval(f.read()))
assert r1state == r1.getstate()
assert r2state == r2.getstate()
I haven't run that all day, but I bet I could and never see a failing assert ;-)
BTW, it's certainly more common to use pickle for this kind of thing, but it's not going to solve your real problem. The problem is not in getting or setting the state. The problem is that something you haven't yet found is calling methods on your random.Random instance(s).
While it's a major pain in the butt to do so, you could try adding print statements to random.py to find out what's doing it. There are cleverer ways to do that, but better to keep it dirt simple so that you don't end up actually debugging the debugging code.

more efficient Python scripting in Blender3D

I am basically building a 3D scatter plot using primitive UV spheres and am running into memory issues when attempting to create more than a couple hundred points at one time. I am limited on my laptop with a 2.1Ghz processor but wanted to know if there is a better way to write this:
import bpy
import random
while count < 5:
bpy.ops.mesh.primitive_uv_sphere_add(size=.3,\
location=(random.randint(-9,9), random.randint(-9,9),\
random.randint(-9,9)), rotation=(0,0,0))
count += 1
I realize that with such a simple script any performance increase is likely negligible but wanted to give it a shot anyway.

Some possible suggestions
I would pre-calculate the x,y,z values, store them in a mathutil vector and add it to a dict to be iterated over.
Duplication should provide a smaller memory footprint than
instantiating new objects. bpy.ops.object.duplicate_move(OBJECT_OT_duplicate=(linked:false, TRANSFORM_OT_translate=(transform)
Edit:
Doing further research it appears each time a bpy.ops.* is called the redraw function . One user documentented exponential increase in time taken to genenerate UV sphere.
CoDEmanX provided the following code snippet to another user.
import bpy
bpy.ops.object.select_all(action='DESELECT')
bpy.ops.mesh.primitive_uv_sphere_add()
sphere = bpy.context.object
for i in range(-1000, 1000, 2):
ob = sphere.copy()
ob.location.y = i
#ob.data = sphere.data.copy() # uncomment this, if you want full copies and no linked duplicates
bpy.context.scene.objects.link(ob)
bpy.context.scene.update()
Then it is just a case of adapting the code to set the object locations
obj.location = location_dict[i]

PyMel Rotate Raises Error

I want to randomly rotate a list of objects on a given axis with a random amount retrieved from a specified range.
This is what I came up with:
import pymel.core as pm
import random as rndm
def rndmRotateX(targets, axisType, range=[0,180]):
for obj in targets:
rValue=rndm.randint(range[0],range[1])
xDeg='%sDeg' % (rValue)
#if axisType=='world':
# pm.rotate(rValue,0,0, obj, ws=1)
#if axisType=='object':
# pm.rotate(rValue,0,0, obj, os=1)
pm.rotate(xDeg,0,0,r=True)
targetList= pm.ls(sl=1)
randRange=[0,75]
rotAxis='world'
rndmRotateX(targetList,rotAxis,randRange)
Im using pm.rotate() because it allows me to specify whether I want the rotations done in world or obj space (unlike setAttr, as far as I can tell).
The problem is, it raises this error when I try to run this:
# Error: MayaNodeError: file C:\Program Files\Autodesk\Maya2012\Python\lib\site-packages\pymel\internal\pmcmds.py line 140: #
It must be something with they way I enter the arguments for pm.rotate() (Im assuming this due to the line error PyMel spits out, which has to do with its arguments conversion function), but I cant figure out for the life of me wth I did wrong. :/

I think the problem is in this line
pm.rotate(rValue,0,0, obj, os=1)
obj should be the first argument, so it should be
pm.rotate(obj, (rValue,0,0), os=1)
but to make it even prettier you could use
obj.setRotation((rValue,0,0), os=1)
And also. Use pm.selected() instead of pm.ls(sl=1). It looks better

Another way to go about doing this..
from pymel.core import *
import random as rand
def rotateObjectsRandomly(axis, rotateRange):
rotateValue = rand.random() * rotateRange
for obj in objects:
PyNode(str(selected()) + ".r" + axis).set(rotateValue)
objectRotation = [[obj, obj.r.get()] for obj in selected()]
print "\nObjects have been rotated in the {0} axis {1} degrees.\n".format(axis, rotateValue)
return objectRotation
rotateObjectsRandomly("z", 360)
Since rand.random() returns a random value between 0 - 1, I just multiplied that by the rotateRange specified by the user..or in my preference I would just do away with that all together and just multiply it by 360...
You also don't need all the feedback I just think it looks nice when ran..
Objects have been rotated in the z axis 154.145898182 degrees.
# Result: [[nt.Transform(u'myCube'), dt.Vector([42.6541437517, 0.0, 154.145898182])]] #

Just as a straight debug of what you've got...
Issue 01: it's case sensitive
pm.rotate("20deg",0,0) will work fine, but pm.rotate("20Deg",0,0) will fail and throw a MayaNodeError because it thinks that you're looking for a node called '20Deg'. Basically, you want to build your string as per: xDeg='%sdeg' % (rValue)
Issue 02: you're relying on pm.rotate()'s implicit "will apply to selected objects" behaviour
You won't see this til you apply the above fix, but if you have two selected objects, and ran the (patched) rndmRotateX function on them, you'd get both objects rotating by the exact same amount, because pm.rotate() is operating on the selection (both objects) rather than a per-object rotation.
If you want a quick fix, you need to insert a pm.select(obj) before the rotate. And you possibly want to save the selection list and restore it...however IMHO, it's a Really Bad Idea to rely on selections like this, and so I'd push you towards Kim's answer.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to cluster a graph using python igraph - python

I stand by my original assessment that the documentation for igraph is maddeningly too terse. The function I actually needed for what I am doing is igraph.Graph.community_fastgreedy(). In general, it seems all the functions that run a clustering algorithm all start with "community" in their name.

Related

networkx merging graph ram issue

Mulitprocessing and rpy2 (with ape)

Python something resets my random seed

more efficient Python scripting in Blender3D

PyMel Rotate Raises Error

Categories

Resources