depth-first algorithm in python does not work - python

I have some project which I decide to do in Python. In brief: I have list of lists. Each of them also have lists, sometimes one-element, sometimes more. It looks like this:
rules=[
[[1],[2],[3,4,5],[4],[5],[7]]
[[1],[8],[3,7,8],[3],[45],[12]]
[[31],[12],[43,24,57],[47],[2],[43]]
]
The point is to compare values from numpy array to values from this rules (elements of rules table). We are comparing some [x][y] point to first element (e.g. 1 in first element), then, if it is true, value [x-1][j] from array with second from list and so on. Five first comparisons must be true to change value of [x][y] point. I've wrote sth like this (main function is SimulateLoop, order are switched because simulate2 function was written after second one):
def simulate2(self, i, j, w, rule):
data = Data(rule)
if w.world[i][j] in data.c:
if w.world[i-1][j] in data.n:
if w.world[i][j+1] in data.e:
if w.world[i+1][j] in data.s:
if w.world[i][j-1] in data.w:
w.world[i][j] = data.cc[0]
else: return
else: return
else: return
else: return
else: return
def SimulateLoop(self,w):
for z in range(w.steps):
for i in range(2,w.x-1):
for j in range(2,w.y-1):
for rule in w.rules:
self.simulate2(i,j,w,rule)
Data class:
class Data:
def __init__(self, rule):
self.c = rule[0]
self.n = rule[1]
self.e = rule[2]
self.s = rule[3]
self.w = rule[4]
self.cc = rule[5]
NumPy array is a object from World class. Rules is list as described above, parsed by function obtained from another program (GPL License).
To be honest it seems to work fine, but it does not. I was trying other possibilities, without luck. It is working, interpreter doesn't return any errors, but somehow values in array changing wrong. Rules are good because it was provided by program from which I've obtained parser for it (GPL license).
Maybe it will be helpful - it is Perrier's Loop, modified Langton's loop (artificial life).
Will be very thankful for any help!
)

I am not familiar with Perrier's Loop, but if you code something like famous "game life" you would have done simple mistake: store the next generation in the same array thus corrupting it.
Normally you store the next generation in temporary array and do copy/swap after the sweep, like in this sketch:
def do_step_in_game_life(world):
next_gen = zeros(world.shape) # <<< Tmp array here
Nx, Ny = world.shape
for i in range(1, Nx-1):
for j in range(1, Ny-1):
neighbours = sum(world[i-1:i+2, j-1:j+2]) - world[i,j]
if neighbours < 3:
next_gen[i,j] = 0
elif ...
world[:,:] = next_gen[:,:] # <<< Saving computed next generation

Related

Nested for loop producing more number of values than expected-Python

Background:I have two catalogues consisting of positions of spatial objects. My aim is to find the similar ones in both catalogues with a maximum difference in angular distance of certain value. One of them is called bss and another one is called super.
Here is the full code I wrote
import numpy as np
def crossmatch(bss_cat, super_cat, max_dist):
matches=[]
no_matches=[]
def find_closest(bss_cat,super_cat):
dist_list=[]
def angular_dist(ra1, dec1, ra2, dec2):
r1 = np.radians(ra1)
d1 = np.radians(dec1)
r2 = np.radians(ra2)
d2 = np.radians(dec2)
a = np.sin(np.abs(d1-d2)/2)**2
b = np.cos(d1)*np.cos(d2)*np.sin(np.abs(r1 - r2)/2)**2
rad = 2*np.arcsin(np.sqrt(a + b))
d = np.degrees(rad)
return d
for i in range(len(bss_cat)): #The problem arises here
for j in range(len(super_cat)):
distance = angular_dist(bss_cat[i][1], bss_cat[i][2], super_cat[j][1], super_cat[j][2]) #While this is supposed to produce single floating point values, it produces numpy.ndarray consisting of three entries
dist_list.append(distance) #This list now contains numpy.ndarrays instead of numpy.float values
for k in range(len(dist_list)):
if dist_list[k] < max_dist:
element = (bss_cat[i], super_cat[j], dist_list[k])
matches.append(element)
else:
element = bss_cat[i]
no_matches.append(element)
return (matches,no_matches)
When put seperately, the function angular_dist(ra1, dec1, ra2, dec2) produces a single numpy.float value as expected. But when used inside the for loop in this crossmatch(bss_cat, super_cat, max_dist) function, it produces numpy.ndarrays instead of numpy.float. I've stated this inside the code also. I don't know where the code goes wrong. Please help

How to check python codes by reduction?

import numpy
def rtpairs(R,T):
for i in range(numpy.size(R)):
o=0.0
for j in range(T[i]):
o +=2*(numpy.pi)/T[i]
yield R[i],o
R=[0.0,0.1,0.2]
T=[1,10,20]
for r,t in genpolar.rtpairs(R,T):
plot(r*cos(t),r*sin(t),'bo')
This program is supposed to be a generator, but I would like to check if i'm doing the right thing by first asking it to return some values for pheta (see below)
import numpy as np
def rtpairs (R=None,T=None):
R = np.array(R)
T = np.array(T)
for i in range(np.size(R)):
pheta = 0.0
for j in range(T[i]):
pheta += (2*np.pi)/T[i]
return pheta
Then
I typed import omg as o in the prompt
x = [o.rtpairs(R=[0.0,0.1,0.2],T=[1,10,20])]
# I tried to collect all values generated by the loops
It turns out to give me only one value which is 2 pi ... I have a habit to check my codes in the half way through, Is there any way for me to get a list of angles by using the code above? I don't understand why I must use a generator structure to check (first one) , but I couldn't use normal loop method to check.
Normal loop e.g.
x=[i for i in range(10)]
x=[0,1,2,3,4,5,6,7,8,9]
Here I can see a list of values I should get.
return pheta
You switched to return instead of yield. It isn't a generator any more; it's stopping at the first return. Change it back.
x = [o.rtpairs(R=[0.0,0.1,0.2],T=[1,10,20])]
This wraps the rtpairs return value in a 1-element list. That's not what you want. If you want to extract all elements from a generator and store them in a list, call list on the generator:
x = list(o.rtpairs(R=[0.0,0.1,0.2],T=[1,10,20]))

Size-Incremental Numpy Array in Python

I just came across the need of an incremental Numpy array in Python, and since I haven't found anything I implemented it. I'm just wondering if my way is the best way or you can come up with other ideas.
So, the problem is that I have a 2D array (the program handles nD arrays) for which the size is not known in advance and variable amount of data need to be concatenated to the array in one direction (let's say that I've to call np.vstak a lot of times). Every time I concatenate data, I need to take the array, sort it along axis 0 and do other stuff, so I cannot construct a long list of arrays and then np.vstak the list at once.
Since memory allocation is expensive, I turned to incremental arrays, where I increment the size of the array of a quantity bigger than the size I need (I use 50% increments), so that I minimize the number of allocations.
I coded this up and you can see it in the following code:
class ExpandingArray:
__DEFAULT_ALLOC_INIT_DIM = 10 # default initial dimension for all the axis is nothing is given by the user
__DEFAULT_MAX_INCREMENT = 10 # default value in order to limit the increment of memory allocation
__MAX_INCREMENT = [] # Max increment
__ALLOC_DIMS = [] # Dimensions of the allocated np.array
__DIMS = [] # Dimensions of the view with data on the allocated np.array (__DIMS <= __ALLOC_DIMS)
__ARRAY = [] # Allocated array
def __init__(self,initData,allocInitDim=None,dtype=np.float64,maxIncrement=None):
self.__DIMS = np.array(initData.shape)
self.__MAX_INCREMENT = maxIncrement
if self.__MAX_INCREMENT == None:
self.__MAX_INCREMENT = self.__DEFAULT_MAX_INCREMENT
# Compute the allocation dimensions based on user's input
if allocInitDim == None:
allocInitDim = self.__DIMS.copy()
while np.any( allocInitDim < self.__DIMS ) or np.any(allocInitDim == 0):
for i in range(len(self.__DIMS)):
if allocInitDim[i] == 0:
allocInitDim[i] = self.__DEFAULT_ALLOC_INIT_DIM
if allocInitDim[i] < self.__DIMS[i]:
allocInitDim[i] += min(allocInitDim[i]/2, self.__MAX_INCREMENT)
# Allocate memory
self.__ALLOC_DIMS = allocInitDim
self.__ARRAY = np.zeros(self.__ALLOC_DIMS,dtype=dtype)
# Set initData
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
self.__ARRAY[sliceIdxs] = initData
def shape(self):
return tuple(self.__DIMS)
def getAllocArray(self):
return self.__ARRAY
def getDataArray(self):
"""
Get the view of the array with data
"""
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
return self.__ARRAY[sliceIdxs]
def concatenate(self,X,axis=0):
if axis > len(self.__DIMS):
print "Error: axis number exceed the number of dimensions"
return
# Check dimensions for remaining axis
for i in range(len(self.__DIMS)):
if i != axis:
if X.shape[i] != self.shape()[i]:
print "Error: Dimensions of the input array are not consistent in the axis %d" % i
return
# Check whether allocated memory is enough
needAlloc = False
while self.__ALLOC_DIMS[axis] < self.__DIMS[axis] + X.shape[axis]:
needAlloc = True
# Increase the __ALLOC_DIMS
self.__ALLOC_DIMS[axis] += min(self.__ALLOC_DIMS[axis]/2,self.__MAX_INCREMENT)
# Reallocate memory and copy old data
if needAlloc:
# Allocate
newArray = np.zeros(self.__ALLOC_DIMS)
# Copy
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
newArray[sliceIdxs] = self.__ARRAY[sliceIdxs]
self.__ARRAY = newArray
# Concatenate new data
sliceIdxs = []
for i in range(len(self.__DIMS)):
if i != axis:
sliceIdxs.append(slice(self.__DIMS[i]))
else:
sliceIdxs.append(slice(self.__DIMS[i],self.__DIMS[i]+X.shape[i]))
self.__ARRAY[sliceIdxs] = X
self.__DIMS[axis] += X.shape[axis]
The code shows considerably better performances than vstack/hstack several random sized concatenations.
What I'm wondering about is: is it the best way? Is there anything that do this already in numpy?
Further it would be nice to be able to overload the slice assignment operator of np.array, so that as soon as the user assign anything outside the actual dimensions, an ExpandingArray.concatenate() is performed. How to do such overloading?
Testing code: I post here also some code I used to make comparison between vstack and my method. I add up random chunk of data of maximum length 100.
import time
N = 10000
def performEA(N):
EA = ExpandingArray(np.zeros((0,2)),maxIncrement=1000)
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
EA.concatenate(X,axis=0)
# Perform operations on EA.getDataArray()
return EA
def performVStack(N):
A = np.zeros((0,2))
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
A = np.vstack((A,X))
# Perform operations on A
return A
start_EA = time.clock()
EA = performEA(N)
stop_EA = time.clock()
start_VS = time.clock()
VS = performVStack(N)
stop_VS = time.clock()
print "Elapsed Time EA: %.2f" % (stop_EA-start_EA)
print "Elapsed Time VS: %.2f" % (stop_VS-start_VS)
I think the most common design pattern for these things is to just use a list for the small arrays. Sure you could do things like dynamic resizing (if you want to do crazy things, you can try to use the resize array method too). I think a typical method is to always double the size, when you really don't know how large things will be. Of course if you know how large the array will grow to, just allocating the full thing up front is simplest.
def performVStack_fromlist(N):
l = []
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
l.append(X)
return np.vstack(l)
I am sure there are some use cases where an expanding array could be useful (for example when the appending arrays are all very small), but this loop seems better handled with the above pattern. The optimization is mostly about how often you need to copy everything around, and doing a list like this (other then the list itself) this is exactly once here. So it is much faster normally.
When I faced a similar problem, I used ndarray.resize() (http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.resize.html#numpy.ndarray.resize). Most of the time, it will avoid reallocation+copying altogether. I can't guarantee it would prove to be faster (it probably would), but it's so much simpler.
As for your second question, I think overriding slice assignment for extending purposes is not a good idea. That operator is meant for assigning to existing items/slices. If you want to change that, it's not immediately clear how you'd want it to behave in some cases, e.g.:
a = MyExtendableArray(np.arange(100))
a[200] = 6 # resize to 200? pad [100:200] with what?
a[90:110] = 7 # assign to existing items AND automagically-allocated items?
a[::-1][200] = 6 # ...
My suggestion is that slice-assignment and data appending should remain separate.

python function not recurving properly (adding nodes to graph)

I'm having a rare honest-to-goodness computer science problem (as opposed to the usual how-do-I-make-this-language-I-don't-write-often-enough-do-what-I-want problem), and really feeling my lack of a CS degree for a change.
This is a bit messy, because I'm using several dicts of lists, but the basic concept is this: a Twitter-scraping function that adds retweets of a given tweet to a graph, node-by-node, building outwards from the original author (with follower relationships as edges).
for t in RTs_list:
g = nx.DiGraph()
followers_list=collections.defaultdict(list)
level=collections.defaultdict(list)
hoppers=collections.defaultdict(list)
retweets = []
retweeters = []
try:
u = api.get_status(t)
original_tweet = u.retweeted_status.id_str
print original_tweet
ot = api.get_status(original_tweet)
node_adder(ot.user.id, 1)
# Can't paginate -- can only get about ~20 RTs max. Need to work on small data here.
retweets = api.retweets(original_tweet)
for r in retweets:
retweeters.append(r.user.id)
followers_list["0"] = api.followers_ids(ot.user.id)[0]
print len(retweets),"total retweets"
level["1"] = ot.user.id
g.node[ot.user.id]['crossover'] = 1
if g.node[ot.user.id]["followers_count"]<4000:
bum_node_adder(followers_list["0"],level["1"], 2)
for r in retweets:
rt_iterator(r,retweets,0,followers_list,hoppers,level)
except:
print ""
def rt_iterator(r,retweets,q,followers_list,hoppers,level):
q = q+1
if r.user.id in followers_list[str(q-1)]:
hoppers[str(q)].append(r.user.id)
node_adder(r.user.id,q+1)
g.add_edge(level[str(q)], r.user.id)
try:
followers_list[str(q)] = api.followers_ids(r.user.id)[0]
level[str(q+1)] = r.user.id
if g.node[r.user.id]["followers_count"]<4000:
bum_node_adder(followers_list[str(q)],level[str(q+1)],q+2)
crossover = pull_crossover(followers_list[str(q)],followers_list[str(q-1)])
if q<10:
for r in retweets:
rt_iterator(r,retweets,q,followers_list,hoppers,level)
except:
print ""
There's some other function calls in there, but they're not related to the problem. The main issue is how Q counts when going from a (e.g.) a 2-hop node to a 3-hop node. I need it to build out to the maximum depth (10) for every branch from the center, whereas right now I believe it's just building out to the maximum depth for the first branch it tries. Hope that makes sense. If not, typing it up here has helped me; I think I'm just missing a loop in there somewhere but it's tough for me to see.
Also, ignore that various dicts refer to Q+1 or Q-1, that's an artifact of how I implemented this before I refactored to make it recurve.
Thanks!
I'm not totally sure what you mean by "the center" but I think you want something like this:
def rt_iterator(depth, other-args):
# store whatever info you need from this point in the tree
if depth>= MAX_DEPTH:
return
# look at the nodes you want to expand from here
for each node, in the order you want them expanded:
rt_iterator(depth+1, other-args)
think I've fixed it... this way Q isn't incremented when it shouldn't be.
def rt_iterator(r,retweets,q,depth,followers_list,hoppers,level):
def node_iterator (r,retweets,q,depth,followers_list,hoppers,level):
for r in retweets:
if r.user.id in followers_list[str(q-1)]:
hoppers[str(q)].append(r.user.id)
node_adder(r.user.id,q+1)
g.add_edge(level[str(q)], r.user.id)
try:
level[str(q+1)] = r.user.id
if g.node[r.user.id]["followers_count"]<4000:
followers_list[str(q)] = api.followers_ids(r.user.id)[0]
bum_node_adder(followers_list[str(q)],level[str(q+1)],q+2)
crossover = pull_crossover(followers_list[str(q)],followers_list[str(q-1)])
if q<10:
node_iterator(r,retweets,q+1,depth,followers_list,hoppers,level)
except:
print ""
depth = depth+1
q = depth
if q<10:
rt_iterator(r,retweets,q,depth,followers_list,hoppers,level)

homogenization the functions can be compiled into a calculate networks?

Inside of a network, information (package) can be passed to different node(hosts), by modify it's content it can carry different meaning. The final package depends on hosts input via it's given route of network.
Now I want to implement a calculating network model can do small jobs by give different calculate path.
Prototype:
def a(p): return p + 1
def b(p): return p + 2
def c(p): return p + 3
def d(p): return p + 4
def e(p): return p + 5
def link(p, r):
p1 = p
for x in r:
p1 = x(p1)
return p1
p = 100
route = [a,c,d]
result = link(p,result)
#========
target_result = 108
if result = target_result:
# route is OK
I think finally I need something like this:
p with [init_payload, expected_target, passed_path, actual_calculated_result]
|
\/
[CHAOS of possible of functions networks]
|
\/
px [a,a,b,c,e] # ok this path is ok and match the target
Here is my questions hope may get your help:
can p carry(determin) the route(s) by inspect the function and estmated result?
(1.1 ) for example, if on the route there's a node x()
def x(p): return x / 0 # I suppose it can pass the compile
can p know in somehow this path is not good then avoid select this path?
(1.2) Another confuse is if p is a self-defined class type, the payload inside of this class essentially is a string, when it carry with a path [a,c,d], can p know a() must with a int type then avoid to select this node?'
same as 1.2 when generating the path, can I avoid such oops
def a(p): return p + 1
def b(p): return p + 2
def x(p): return p.append(1)
def y(p): return p.append(2)
full_node_list = [a,b,x,y]
path = random(2,full_node_list) # oops x,y will be trouble for the inttype P and a,b will be trouble to list type.
pls consider if the path is lambda list of functions
PS: as the whole model is not very clear in my mind the any leading and directing will be appreciated.
THANKS!
You could test each function first with a set of sample data; any function which returns consistently unusable values might then be discarded.
def isGoodFn(f):
testData = [1,2,3,8,38,73,159] # random test input
goodEnough = 0.8 * len(testData) # need 80% pass rate
try:
good = 0
for i in testData:
if type(f(i)) is int:
good += 1
return good >= goodEnough
except:
return False
If you know nothing about what the functions do, you will have to essentially do a full breadth-first tree search with error-checking at each node to discard bad results. If you have more than a few functions this will get very large very quickly. If you can guarantee some of the functions' behavior, you might be able to greatly reduce the search space - but this would be domain-specific, requiring more exact knowledge of the problem.
If you had a heuristic measure for how far each result is from your desired result, you could do a directed search to find good answers much more quickly - but such a heuristic would depend on knowing the overall form of the functions (a distance heuristic for multiplicative functions would be very different than one for additive functions, etc).
Your functions can raise TypeError if they are not satisfied with the data types they receive. You can then catch this exception and see whether you are passing an appropriate type. You can also catch any other exception type. But trying to call the functions and catching the exceptions can be quite slow.
You could also organize your functions into different sets depending on the argument type.
functions = { list : [some functions taking a list], int : [some functions taking an int]}
...
x = choose_function(functions[type(p)])
p = x(p)
I'm somewhat confused as to what you're trying to do, but: p cannot "know about" the functions until it is run through them. By design, Python functions don't specify what type of data they operate on: e.g. a*5 is valid whether a is a string, a list, an integer or a float.
If there are some functions that might not be able to operate on p, then you could catch exceptions, for example in your link function:
def link(p, r):
try:
for x in r:
p = x(p)
except ZeroDivisionError, AttributeError: # List whatever errors you want to catch
return None
return p

Categories