Python numpy array split index out of range - python

I am trying to execute the following code:
def calculate_squared_dist_sliced_data(self, data, output, proc_numb):
for k in range(1, self.calc_border):
print("Calculating",k, "of", self.calc_border, "\n", (self.calc_border - k), "to go!")
kmeans = KMeansClusterer.KMeansClusterer(k, data)
print("inertia in round", k, ": ", kmeans.calc_custom_params(data, k).inertia_)
output.put( proc_numb, (kmeans.calc_custom_params(self.data, k).inertia_))
def calculate_squared_dist_mp(self):
length = np.shape(self.data)[0]
df_array = []
df_array[0] = self.data[int(length/4), :]
df_array[1] = self.data[int((length/4)+1):int(length/2), :]
df_array[2] = self.data[int((length/2)+1):int(3*length/4), :]
df_array[3] = self.data[int((3*length/4)+1):int(length/4), :]
output = mp.Queue()
processes = [mp.Process(target=self.calculate_squared_dist_sliced_data, args=(df_array[x], output, x)) for x in range(4)]
for p in processes:
p.start()
for p in processes:
p.join()
results = [output.get() for p in processes]
When executing df_array[0] = self.data[int(length/4), :], I get the following error:
IndexError: list assignment index out of range
The variable lentgh has the value 20195 (which is correct). I want to do the method calculate_squared_dist_sliced_data by multiprocessing, so I need to split the array data that is passed to this class.
Here is an example of how this numpy array looks:
[[ 0. 0. 0.02072968 ..., -0.07872599 -0.10147049 -0.44589 ]
[ 0. -0.11091352 0.11208243 ..., 0.08164318 -0.02754813
-0.44921876]
[ 0. -0.10642599 0.0028097 ..., 0.1185457 -0.22482443
-0.25121125]
...,
[ 0. 0. 0. ..., -0.03617197 0.00921685 0. ]
[ 0. 0. 0. ..., -0.08241634 -0.05494423
-0.10988845]
[ 0. 0. 0. ..., -0.03010139 -0.0925091
-0.02145017]]
Now I want to split this hole array into four equal pieces to give each one to a process. However, when selecting the rows I get the exception mentioned above. Can someone help me?
Maybe for a more theroretical approach of what I want to do:
A B C D
1 2 3 4
5 6 7 8
9 5 4 3
1 8 4 3
As a result I want to have for example two arrays, each containing two rows:
A B C D
1 2 3 4
5 6 7 8
and
A B C D
9 5 4 3
1 8 4 3
Can someone help me?

The left side of the assignment is not allowed as you list has length 0.
Either fix it to:
df_array = [None, None, None, None]
or use
df_array.append(self.data[int(length/4), :])
...
instead.

I just noticed that I tried to use a list like an array...

Related

How to get all possible array attributions of numpy arrays?

Python: get all possible array attributions of nd arrays. Use itertools.product?
If so, how?
In Python, I have two n dimensions numpy arrays A and B (B is a zero array).
Such way A.shape[i]<=B.shape[i], for any i between 0 and n.
I want to create a for loop in such way every iteration I attribute A to a different subset of B, in such way every possible position in occupied until the end of the for loop.
for instance, with A = np.array([[1,1,1],[1,1,1]]) and B = np.zeros((3,4)), I would get these(one of these for each iteration):
1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0
1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1
0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1
For a fixed n dimension it is trivial, just use nested for loops for each dimension.
However, I want it for a generic n dimensions.
My approach was to use the itertools.product to get all combinations of indexes.
In the above example, product([0,1],[0,1]), would iterate over (0,0),(0,1),(1,0),(1,1), and I would have my indexes.
However, I don't know how to pass the values of the parameters to product function for a generic n.
Any idea? There are better ways of doing so?
itertools product should work.
import numpy as np
from itertools import product
A = np.ones((2,3))
B = np.zeros((3,4))
r_rng = range(B.shape[0]-A.shape[0]+1)
c_rng = range(B.shape[1]-A.shape[1]+1)
for i,j in product(r_rng, c_rng):
C = B.copy()
C[i:i+A.shape[0],j:j+A.shape[1]]=A
print(C,'\n')
Output:
[[1. 1. 1. 0.]
[1. 1. 1. 0.]
[0. 0. 0. 0.]]
[[0. 1. 1. 1.]
[0. 1. 1. 1.]
[0. 0. 0. 0.]]
[[0. 0. 0. 0.]
[1. 1. 1. 0.]
[1. 1. 1. 0.]]
[[0. 0. 0. 0.]
[0. 1. 1. 1.]
[0. 1. 1. 1.]]
Here is an example. You can use the * operator to unpack a variable number of argument from a list and give it to itertools.product():
import itertools
size1 = (3,5,6)
size2 = (2,2,2)
N = len(size1)
coords = []
for i in range(N):
delta = size1[i]-size2[i]
coords.append(list(range(delta)))
print(coords)
it = itertools.product(*coords)
arr = np.array(list(it))
print(arr)
Output:
[[0 0 0]
[0 0 1]
[0 0 2]
[0 0 3]
[0 1 0]
[0 1 1]
[0 1 2]
[0 1 3]
[0 2 0]
[0 2 1]
[0 2 2]
[0 2 3]]
Im going to post the solution I obtained:
import numpy as np
from itertools import product
A=np.ones((2,3,2))
B=np.zeros((3,4,4))
coords=[]
for i in range(len(B.shape)):
delta = B.shape[i]-A.shape[i]+1
coords.append(list(range(delta)))
print(coords)
for start_idx in product(*coords):
idx=tuple(slice(start_idx[i], start_idx[i]+A.shape[i]) for i in range(len(A.shape)))
m=np.zeros(B.shape)
m.__setitem__(tuple(idx), A)
print(m)
ps: Indexing the nd arrays was very tricky

Tensorflow Non-Maximum Suppression

NOTE: tf.image.non_max_suppression does NOT do what I'm looking for!
I'm trying to perform non-maximum suppression (NMS) similar to the Canny edge detector. Specifically, NMS on an 2D array will keep a value if it is the maximum within a window, otherwise suppress it (set to 0).
For example, consider the matrix
[[3 2 1 4 2 3]
[1 4 2 1 5 2]
[2 2 3 2 1 3]]
If we consider a window size of 3 x 3, then the result should be
[[0 0 0 0 0 0]
[0 4 0 0 5 0]
[0 0 0 0 0 0]]
I've searched around and couldn't find anything that performs this operation in tf.image and tf.nn. Is there code somewhere that performs NMS? If not, how can I efficiently implement NMS in Tensorflow (Python)?
Thanks!
EDIT: I came up with one way to solve this but I'm not sure if there are better ways: take a max pool with 1 stride (i.e. no downsampling) and the window size, then use tf.where to check if the value is equal to the max pooled value and set to 0 if not. Is there a better way?
Answering my own question (though open to better solutions):
import tensorflow as tf
import numpy as np
def non_max_suppression(input, window_size):
# input: B x W x H x C
pooled = tf.nn.max_pool(input, ksize=[1, window_size, window_size, 1], strides=[1,1,1,1], padding='SAME')
output = tf.where(tf.equal(input, pooled), input, tf.zeros_like(input))
# NOTE: if input has negative values, the suppressed values can be higher than original
return output # output: B X W X H x C
sess = tf.InteractiveSession()
x = np.array([[3,2,1,4,2,3],[1,4,2,1,5,2],[2,2,3,2,1,3]], dtype=np.float32).reshape([1,3,6,1])
inp = tf.Variable(x)
out = non_max_suppression(inp, 3)
sess.run(tf.global_variables_initializer())
print out.eval().reshape([3,6])
'''
[[ 0. 0. 0. 0. 0. 0.]
[ 0. 4. 0. 0. 5. 0.]
[ 0. 0. 0. 0. 0. 0.]]
'''
sess.close()

Create all x,y pairs from two coordinate arrays

I have 4 lists that I need to iterate over so that I get the following:
x y a b
Lists a and b are of equal length and I iterate over both using the zip function, the code:
for a,b in zip(aL,bL):
print(a,"\t",b)
list x contains 1000 items and list b contains 750 items, after the loop is finished I am supposed to have 750.000 lines.
What is want to achieve is the following:
1 1 a b
1 2 a b
1 3 a b
1 4 a b
.....
1000 745 a b
1000 746 a b
1000 747 a b
1000 748 a b
1000 749 a b
1000 750 a b
How can I achieve this? I have tried enumerate and izip but both results are not what I am seeking.
Thanks.
EDIT:
I have followed your code and used since it is way faster. My output now looks like this:
[[[ 0.00000000e+00 0.00000000e+00 4.00000000e+01 2.30000000e+01]
[ 1.00000000e+00 0.00000000e+00 8.50000000e+01 1.40000000e+01]
[ 2.00000000e+00 0.00000000e+00 7.20000000e+01 2.00000000e+00]
...,
[ 1.44600000e+03 0.00000000e+00 9.20000000e+01 4.60000000e+01]
[ 1.44700000e+03 0.00000000e+00 5.00000000e+01 6.10000000e+01]
[ 1.44800000e+03 0.00000000e+00 8.40000000e+01 9.40000000e+01]]]
I have now 750 lists and each of those have another 1000. I have tried to flatten those to get 4 values (x,y,a,b) per line. This just takes forever. Is there another way to flatten those?
EDIT2
I have tried
np.fromiter(itertools.chain.from_iterable(arr), dtype='int')
but it gave and error: setting an array element with a sequence, so I tried
np.fromiter(itertools.chain.from_iterable(arr[0]), dtype='int')
but this just gave one list back with what I suspect is the whole first list in the array.
EDIT v2
Now using np.stack instead of np.dstack, and handling file output.
This is considerably simpler than the solutions proposed below.
import numpy as np
import numpy.random as nprnd
aL = nprnd.randint(0,100,size=10) # 10 random ints
bL = nprnd.randint(0,100,size=10) # 10 random ints
xL = np.linspace(0,100,num=5) # 5 evenly spaced ints
yL = np.linspace(0,100,num=2) # 2 evenly spaced ints
xv,yv = np.meshgrid(xL,yL)
arr = np.stack((np.ravel(xv), np.ravel(yv), aL, bL), axis=-1)
np.savetxt('out.out', arr, delimiter=' ')
Using np.meshgrid gives us the following two arrays:
xv = [[ 0. 25. 50. 75. 100.]
[ 0. 25. 50. 75. 100.]]
yv = [[ 0. 0. 0. 0. 0.]
[ 100. 100. 100. 100. 100.]]
which, when we ravel, become:
np.ravel(xv) = [ 0. 25. 50. 75. 100. 0. 25. 50. 75. 100.]
np.ravel(yv) = [ 0. 0. 0. 0. 0. 100. 100. 100. 100. 100.]
These arrays have the same shape as aL and bL,
aL = [74 79 92 63 47 49 18 81 74 32]
bL = [15 9 81 44 90 93 24 90 51 68]
so all that's left is to stack all four arrays along axis=-1:
arr = np.stack((np.ravel(xv), np.ravel(yv), aL, bL), axis=-1)
arr = [[ 0. 0. 62. 41.]
[ 25. 0. 4. 42.]
[ 50. 0. 94. 71.]
[ 75. 0. 24. 91.]
[ 100. 0. 10. 55.]
[ 0. 100. 41. 81.]
[ 25. 100. 67. 11.]
[ 50. 100. 21. 80.]
[ 75. 100. 63. 37.]
[ 100. 100. 27. 2.]]
From here, saving is trivial:
np.savetxt('out.out', arr, delimiter=' ')
ORIGINAL ANSWER
idx = 0
out = []
for x in xL:
for y in yL:
v1 = aL[idx]
v2 = bL[idx]
out.append((x, y, v1, v2))
# print(x,y, v1, v2)
idx += 1
but, it's slow, and only gets slower with more coordinates. I'd consider using the numpy package instead. Here's an example with a 2 x 5 dataset.
aL = nprnd.randint(0,100,size=10) # 10 random ints
bL = nprnd.randint(0,100,size=10) # 10 random ints
xL = np.linspace(0,100,num=5) # 5 evenly spaced ints
yL = np.linspace(0,100,num=2) # 2 evenly spaced ints
lenx = len(xL) # 5
leny = len(yL) # 2
arr = np.ndarray(shape=(leny,lenx,4)) # create a 3-d array
this creates an 3-dimensional array having a shape of 2 rows x 5 columns. On the third axis (length 4) we populate the array with the data you want.
for x in range(leny):
arr[x,:,0] = xL
this syntax is a a little confusing at first. You can learn more about it here. In short, it iterates over the number of rows and sets a particular slice of the array to xL. In this case, the slice we have selected is the zeroth index in all columns of row x. (the : means, "select all indices on this axis"). For our small example, this would yield:
[[[ 0 0 0 0]
[ 25 0 0 0]
[ 50 0 0 0]
[ 75 0 0 0]
[100 0 0 0]]
[[ 0 0 0 0]
[ 25 0 0 0]
[ 50 0 0 0]
[ 75 0 0 0]
[100 0 0 0]]]
now we do the same for each column:
for y in range(lenx):
arr[:,y,1] = yL
-----
[[[ 0 0 0 0]
[ 25 0 0 0]
[ 50 0 0 0]
[ 75 0 0 0]
[100 0 0 0]]
[[ 0 100 0 0]
[ 25 100 0 0]
[ 50 100 0 0]
[ 75 100 0 0]
[100 100 0 0]]]
now we need to address arrays aL and bL. these arrays are flat, so we must first reshape them to conform to the shape of arr. In our simple example, this would take an array of length 10 and reshape it into a 2 x 5 2-dimensional array.
a_reshaped = aL.reshape(leny,lenx)
b_reshaped = bL.reshape(leny,lenx)
to insert the reshaped arrays into our arr, we select the 2nd and 3rd index for all rows and all columns (note the two :'s this time:
arr[:,:,2] = a_reshaped
arr[:,:,3] = b_reshaped
----
[[[ 0 0 3 38]
[ 25 0 63 89]
[ 50 0 4 25]
[ 75 0 72 1]
[100 0 24 83]]
[[ 0 100 55 85]
[ 25 100 39 9]
[ 50 100 43 85]
[ 75 100 63 57]
[100 100 6 63]]]
this runs considerably faster than the nested loop solution. hope it helps!
Sounds like you need a nested loop for x and y:
for x in yL:
for y in yL:
for a, b in zip(aL, bL):
print "%d\t%d\t%s\t%s" % (x, y, a, b)
Try this,
for i,j in zip(zip(a,b),zip(c,d)):
print "%d\t%d\t%s\t%s" % (i[0], i[1], j[0], j[1])

Array building in Python

I have an array like this in a data file:
0 822.6 1391.3 1
0 822.6 1391.3 2
0 708.3 1501.2 3
0 708.3 1501.2 4
0 632.5 1585.8 5
0 632.5 1585.8 6
0 552.4 1652.6 7
0 552.4 1652.6 8
250 850.8 1358.6 1
250 803.3 1406.2 2
250 732.0 1481.9 3
250 694.9 1519 4
250 642.9 1566.5 5
250 613.2 1594.7 6
250 570.2 1637.8 7
250 537.5 1663 8
I want to create separate data sets depending on the last column.
In other words I want something like this:
while data[:,3] != 9:
if data[:,3] == 1:
x1 = data[:,0]
y1= (data[:,1]-data[:,2])**2
if data[:,3] == 2:
x2 = data[:,0]
y2= (data[:,1]-data[:,2])**2
And so on...
I only put does not equal 9 because I only have from 1-8 in the last column always.
I know this is completely wrong, but I need help.
Consider the following:
a = array([[ 0.64910219, 0.06868991, -0.34844128, 0. ],
[-1.34767042, -1.77338287, 0.693539 , 1. ],
[ 1.31245883, -2.08879047, -0.83514187, 3. ],
[ 0.43156959, 0.31388795, 0.2856625 , 1. ],
[-0.60531108, -0.63226693, 0.32063803, 2. ],
[-0.47538621, -0.64196643, -0.82296546, 3. ],
[ 0.3491207 , -1.25406403, 1.21754411, 0. ],
[-1.1573242 , 1.1636706 , 0.63733285, 2. ]])
d0 = a[a[:,3]==0,:]
d1 = a[a[:,3]==1,:]
d2 = a[a[:,3]==2,:]
d3 = a[a[:,3]==3,:]
The variables d0, d1, d2, d3 contain the rows with the appropriate values in the right-most column.
>>> d0
array([[ 0.64910219, 0.06868991, -0.34844128, 0. ],
[ 0.3491207 , -1.25406403, 1.21754411, 0. ]])
>>> d1
array([[-1.34767042, -1.77338287, 0.693539 , 1. ],
[ 0.43156959, 0.31388795, 0.2856625 , 1. ]])
>>> d2
array([[-0.60531108, -0.63226693, 0.32063803, 2. ],
[-1.1573242 , 1.1636706 , 0.63733285, 2. ]])
>>> d3
array([[ 1.31245883, -2.08879047, -0.83514187, 3. ],
[-0.47538621, -0.64196643, -0.82296546, 3. ]])
To create the x1, y1, etc.. that you mention in the post, you just need to manipulate those arrays.
x1 = d1[:,0]
y1 = (d1[:,1] - d1[:,2])**2
And so on for the other values of the fourth column. For a small number of possible values (1 - 8), hardcoding the different variables isn't too bad, but this method can easily generalize to an arbitrary list with a list of the computed outputs.

Adjacent cells of multiple cell patches in a numpy array

this is a followup question arising from this solution.
The solution to count adjacent cells works pretty well unless you have multiple patches in the array.
So this time the array for instance looks like this.
import numpy
from scipy import ndimage
s = ndimage.generate_binary_structure(2,2)
a = numpy.zeros((6,6), dtype=numpy.int) # example array
a[1:3, 1:3] = 1;a[2:4,4:5] = 1
print a
[0 0 0 0 0 0]
[0 1 1 0 0 0]
[0 1 1 0 1 0]
[0 0 0 0 1 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
# Number of nonoverlapping cells
c = ndimage.binary_dilation(a,s).astype(a.dtype)
b = c - a
numpy.sum(b) # returns 19
# However the correct number of non overlapping cells should be 22 (12+10)
Is there any smart solution to solve this dilemma without using any loops or iterating through the array? The reason is that the array could be quite big.
idea 1:
Just thought over it and a way to do it might be to check for more than one patch in the iterating structure. For the total count number to be correct those cells below have to be equal 2 (or more) in the dilation. Anyone got any idea how to turn this thought into code?
[1 1 1 1 0 0]
[1 0 0 2 1 1]
[1 0 0 2 0 1]
[1 1 1 2 0 1]
[0 0 0 1 1 1]
[0 0 0 0 0 0]
You can use label from ndimage to segment each patch of ones.
Then you just ask where the returned array equals 1, 2, 3 etc and perform your algoritm on it (or you just use the ndimage.distance_transform_cdt but with inverting your forground/background for each labeled segment.
Edit 1:
This code will take your array a and do what you ask:
b, c = ndimage.label(a)
e = numpy.zeros(a.shape)
for i in xrange(c):
e += ndimage.distance_transform_cdt((b == i + 1) == 0) == 1
print e
I realize it is a bit ugly with all the equals there but it outputs:
In [41]: print e
[[ 1. 1. 1. 1. 0. 0.]
[ 1. 0. 0. 2. 1. 1.]
[ 1. 0. 0. 2. 0. 1.]
[ 1. 1. 1. 2. 0. 1.]
[ 0. 0. 0. 1. 1. 1.]
[ 0. 0. 0. 0. 0. 0.]]
Edit 2 (Alternative solution):
This code should do the same stuff and hopefully faster (however it will not find the where
two patches only touch corners).
b = ndimage.binary_closing(a) - a
b = ndimage.binary_dilation(b.astype(bool))
c = ndimage.distance_transform_cdt(a == 0) == 1
e = c.astype(numpy.int) * b + c
print e

Categories