Is there any better way to do this? Like replacing that list comprehension with numpy functions? I'd assume that for a small number of elements, the difference is insignificant, but for larger chunks of data it takes too much time.
>>> rows = 3
>>> cols = 3
>>> target = [0, 4, 7, 8] # each value represent target index of 2-d array converted to 1-d
>>> x = [1 if i in target else 0 for i in range(rows * cols)]
>>> arr = np.reshape(x, (rows, cols))
>>> arr
[[1 0 0]
[0 1 0]
[0 1 1]]
Another way:
shape = (rows, cols)
arr = np.zeros(shape)
arr[np.unravel_index(target, shape)] = 1
Since x comes from a range, you can index an array of zeros to set the ones:
x = np.zeros(rows * cols, dtype=bool)
x[target] = True
x = x.reshape(rows, cols)
Alternatively, you can create the proper shape up front and assign to the raveled array:
x = np.zeros((rows, cols), dtype=bool)
x.ravel()[target] = True
If you want actual zeros and ones, use a dtype like np.uint8 or whatever else suits your needs other than bool.
The approach shown here would apply even to your list example to make it more efficient. Even if you turned target into a set, you are performing O(N) lookups, with N = rows * cols. Instead, you only need M assignments with no lookups, with M = len(target):
x = [0] * (rows * cols)
for i in target:
x[i] = 1
Related
I have a list of ND arrays(vectors), each vector has a (1,300) shape.
My goal is to find duplicate vectors inside a list, to sum them and then divide them by the size of a list, the result value(a vector) will replace the duplicate vector.
For example, a is a list of ND arrays, a = [[2,3,1],[5,65,-1],[2,3,1]], then the first and the last element are duplicates.
their sum would be :[4,6,2],
which will be divided by the size of a list of vectors, size = 3.
Output: a = [[4/3,6/3,2/3],[5,65,-1],[4/3,6/3,2/3]]
I have tried to use a Counter but it doesn't work for ndarrays.
What is the Numpy way?
Thanks.
If you have numpy 1.13 or higher, this is pretty simple:
def f(a):
u, inv, c = np.unique(a, return_counts = True, return_inverse = True, axis = 0)
p = np.where(c > 1, c / a.shape[0], 1)[:, None]
return (u * p)[inv]
If you don't have 1.13, you'll need some trick to convert a into a 1-d array first. I recommend #Jaime's excellent answer using np.void here
How it works:
u is the unique rows of a (usually not in their original order)
c is the number of times each row of u are repeated in a
inv is the indices to get u back to a, i.e. u[inv] = a
p is the multiplier for each row of u based on your requirements. 1 if c == 1 and c / n (where n is the number of rows in a) if c > 1. [:, None] turns it into a column vector so that it broadcasts well with u
return u * p indexed back to their original locations by [inv]
You can use numpy unique , with count return count
elements, count = np.unique(a, axis=0, return_counts=True)
Return Count allow to return the number of occurrence of each element in the array
The output is like this ,
(array([[ 2, 3, 1],
[ 5, 65, -1]]), array([2, 1]))
Then you can multiply them like this :
(count * elements.T).T
Output :
array([[ 4, 6, 2],
[ 5, 65, -1]])
I have this list:
a = [ np.array([ 1, 2]), np.array([0])]
I want to iterate:
x = np.array([t[i] for i, t in enumerate(a)])
but since np.array([0]) has only one element, it will throw an error.
So, I thought to fill the np.array([0]) with another one zero , and then
a = [ np.array([ 1, 2]), np.array([0,0])]
x = np.array([t[i] for i, t in enumerate(a)])
print(x)
[1 0]
So, I am finding the biggest length in the list:
temp = []
for i in a:
temp.append(len(i))
themax = max(temp)
which is 2. (the np.array([1, 2]).
Now, I must somehow fill the other subelements with zeros..
Note, that I will always have the zero np.array([0]) which causes the problem.
The easiest way would be to change your list comprehension to give a zero instead of an array element in the case of an array being too small:
x = np.asarray([(t[i] if i < t.shape[0] else 0.) for i, t in enumerate(a)])
This is more efficient as you don't have to expand all arrays with zeros.
Lets say I have two arrays: a = array([1,2,3,0,4,5,0]) and b = array([1,2,3,4,0,5,6]). I am interested in removing the instances where a and bare 0. But I also want to remove the corresponding instances from both lists. Therefore what I want to end up with is a = array([1,2,3,5]) and b = array([1,2,3,5]). This is because a[3] == 0 and a[6] == 0, so both b[3] and b[6] are also deleted. Likewise, since b[4] == 0, a[4] is also deleted.Its simple to do this for say two arrays:
import numpy as np
a = np.array([1,2,3,0,4,5,0])
b = np.array([1,2,3,4,0,5,6])
ix = np.where(b == 0)
b = np.delete(b, ix)
a = np.delete(a, ix)
ix = np.where(a == 0)
b = np.delete(b, ix)
a = np.delete(a, ix)
However this solution doesnt scale up if I have many many arrays (which I do). What would be a more elegant way to do this?
If I try the following:
import numpy as np
a = np.array([1,2,3,0,4,5,0])
b = np.array([1,2,3,4,0,5,6])
arrays = [a,b]
for array in arrays:
ix = np.where(array == 0)
b = np.delete(b, ix)
a = np.delete(a, ix)
I get a = array([1, 2, 3, 4]) and b = array([1, 2, 3, 0]), not the answers I need. Any idea where this is wrong?
Assuming both/all arrays always have the same length, you can use masks:
ma = a != 0 # mask elements which are not equal to zero in a
mb = b != 0 # mask elements which are not equal to zero in b
m = ma * mb # assign the intersection of ma and mb to m
print a[m], b[m] # [1 2 3 5] [1 2 3 5]
You can of course also do it in one line
m = (a != 0) * (b != 0)
Or use the inverse
ma = a == 0
mb = b == 0
m = ~(ma + mb) # not the union of ma and mb
This is happening because when you return from np.delete, you get an array that is stored in b and a inside the loop. However, the arrays stored in the arrays variable are copies, not references. Hence, when you're updating the arrays by deleting them, it deletes with regard to the original arrays. The first loop will return the corrects indices of 0 in the array but the second loop will return ix as 4 (look at the original array).Like if you display the arrays variable in each iteration, it is going to remain the same.
You need to reassign arrays once you are done processing one array so that it's taken into consideration the next iteration. Here's how you'd do it -
a = np.array([1, 2, 3, 0, 4, 5, 0])
b = np.array([1, 2, 3, 4, 0, 5, 6])
arrays = [a,b]
for i in range(0, len(arrays)):
ix = np.where(arrays[i] == 0)
b = np.delete(b, ix)
a = np.delete(a, ix)
arrays = [a, b]
Of course you can automate what happens inside the loop. I just wanted to give an explanation of what was happening.
A slow method involves operating over the whole list twice, first to build an intermediate list of indices to delete, and then second to delete all of the values at those indices:
import numpy as np
a = np.array([1,2,3,0,4,5,0])
b = np.array([1,2,3,4,0,5,6])
arrays = [a, b]
vals = []
for array in arrays:
ix = np.where(array == 0)
vals.extend([y for x in ix for y in x.tolist()])
vals = list(set(vals))
new_array = []
for array in arrays:
new_array.append(np.delete(array, vals))
Building up on top of Christoph Terasa's answer, you can use array operations instead of for loops:
arrays = np.vstack([a,b]) # ...long list of arrays of equal length
zeroind = (arrays==0).max(0)
pos_arrays = arrays[:,~zeroind] # a 2d array only containing those columns where none of the lines contained zeros
if I have a matrix:
A = [[0,1,2], [0,2,3], [1,5,6]]
I want to find the rows that the first two elements are 0 and 1, get the result
0
find the rows that the first elements are 0, get the result
[0,1]
How should I do? What is the fastest way?
I'd do it like this:
>>> A = [[0,1,2], [0,2,3], [1,5,6]]
>>> [i for i, row in enumerate(A) if row[:2] == [0, 1]]
[0]
>>> [i for i, row in enumerate(A) if row[0] == 0]
[0, 1]
If you want to create both results at the same time, use a regular loop as the above would iterate your matrix twice:
res_0, res_01 = list(), list()
for i, row in enumerate(A):
if row[0] == 0:
res_0.append(i)
if row[1] == 1:
res_01.append(i)
Since your question is tagged as numpy, I'm going to assume that A is a numpy array rather than a set of nested lists as you've shown it.
In that case you can use a combination of slice indexing, vectorized logical comparisons, np.all, and np.where:
import numpy as np
A = np.array([[0,1,2], [0,2,3], [1,5,6]])
print(np.where(np.all(A[:, :2] == np.array([0, 1]), axis=1))[0])
# [0]
To break that down a bit:
# index all of the rows and the first two columns of A
print(A[:, :2])
# [[0 1]
# [0 2]
# [1 5]]
# for each row, is the first column equal to 0, and is the second equal to 1?
print(A[:, :2] == np.array([0, 1]))
# [[ True True]
# [ True False]
# [False False]]
# do the elements in *both* columns match [0, 1]?
print(np.all(A[:, :2] == np.array([0, 1]), axis=1))
# [ True False False]
# get the indices of the rows for which the above statement is true
print(np.where(np.all(A[:, :2] == np.array([0, 1]), axis=1))[0])
# [0]
The answer by schore seems good, but since you asked for a fast way, and for loops get quite slow for large arrays I assumed you were using numpy since your question is tagged with it, I had a try at fancy indexing.
A = np.array([[0,1,2], [0,2,3], [1,5,6]])
indices = np.array(range(len(A)))
check01 = A[:,:2] ==[0,1]
result01 = (check01)[:,0] * (check01)[:,1]
print(indices[result01])
result0 = (A == 0)[:,0]
print(indices[result0])
this gives me
[0]
[0 1]
I am by no means an expert in writing fast code and there may be even nicer ways. Try to print out the things I define on the way to get a better understanding.
I two numpy arrays, both M by N. X contains random values. Y contains true/false. Array A contains indices for rows in X that need replacement, with the value -1. I want to only replace values where Y is true.
Here is some code to do that:
M=30
N=40
X = np.zeros((M,N)) # random values, but 0s work too
Y = np.where(np.random.rand(M,N) > .5, True, False)
A=np.array([ 7, 8, 10, 13]), # in my setting, it's (1,4), not (4,)
for i in A[0]:
X[i][Y[A][i]==True]=-1
However, what I actually want is only replace some of the entries. List B contains how many need to be replaced for each index in A. It's already ordered so A[0][0] corresponds to B[0], etc. Also, it's true that if A[i] = k, then the corresponding row in Y has at least k trues.
B = [1,2,1,1]
Then for each index i (in loop),
X[i][Y[A][i]==True][0:B[i]] = -1
This doesn't work. Any ideas on a fix?
Unfortunately, I don't have an elegant answer; however, this works:
M=30
N=40
X = np.zeros((M,N)) # random values, but 0s work too
Y = np.where(np.random.rand(M,N) > .5, True, False)
A=np.array([ 7, 8, 10, 13]), # in my setting, it's (1,4), not (4,)
B = [1,2,1,1]
# position in row where X should equal - 1, i.e. X[7,a0], X[8,a1], etc
a0=np.where(Y[7]==True)[0][0]
a1=np.where(Y[8]==True)[0][0]
a2=np.where(Y[8]==True)[0][1]
a3=np.where(Y[10]==True)[0][0]
a4=np.where(Y[13]==True)[0][0]
# For each row (i) indexed by A, take only B[i] entries where Y[i]==True. Assume these indices in X = -1
for i in range(len(A[0])):
X[A[0][i]][(Y[A][i]==True).nonzero()[0][0:B[i]]]=-1
np.sum(X) # should be -5
X[7,a0]+X[8,a1]+X[8,a2]+X[10,a3]+X[13,a4] # should be -5
It is not clear what you want to do, here is my understanding:
import numpy as np
m,n = 30,40
x = np.zeros((m,n))
y = np.random.rand(m,n) > 0.5 #no need for where here
a = np.array([7,8,10,13])
x[a] = np.where(y[a],-1,x[a]) #need where here