Replacing values in numpy array - python

I two numpy arrays, both M by N. X contains random values. Y contains true/false. Array A contains indices for rows in X that need replacement, with the value -1. I want to only replace values where Y is true.
Here is some code to do that:
M=30
N=40
X = np.zeros((M,N)) # random values, but 0s work too
Y = np.where(np.random.rand(M,N) > .5, True, False)
A=np.array([ 7, 8, 10, 13]), # in my setting, it's (1,4), not (4,)
for i in A[0]:
X[i][Y[A][i]==True]=-1
However, what I actually want is only replace some of the entries. List B contains how many need to be replaced for each index in A. It's already ordered so A[0][0] corresponds to B[0], etc. Also, it's true that if A[i] = k, then the corresponding row in Y has at least k trues.
B = [1,2,1,1]
Then for each index i (in loop),
X[i][Y[A][i]==True][0:B[i]] = -1
This doesn't work. Any ideas on a fix?

Unfortunately, I don't have an elegant answer; however, this works:
M=30
N=40
X = np.zeros((M,N)) # random values, but 0s work too
Y = np.where(np.random.rand(M,N) > .5, True, False)
A=np.array([ 7, 8, 10, 13]), # in my setting, it's (1,4), not (4,)
B = [1,2,1,1]
# position in row where X should equal - 1, i.e. X[7,a0], X[8,a1], etc
a0=np.where(Y[7]==True)[0][0]
a1=np.where(Y[8]==True)[0][0]
a2=np.where(Y[8]==True)[0][1]
a3=np.where(Y[10]==True)[0][0]
a4=np.where(Y[13]==True)[0][0]
# For each row (i) indexed by A, take only B[i] entries where Y[i]==True. Assume these indices in X = -1
for i in range(len(A[0])):
X[A[0][i]][(Y[A][i]==True).nonzero()[0][0:B[i]]]=-1
np.sum(X) # should be -5
X[7,a0]+X[8,a1]+X[8,a2]+X[10,a3]+X[13,a4] # should be -5

It is not clear what you want to do, here is my understanding:
import numpy as np
m,n = 30,40
x = np.zeros((m,n))
y = np.random.rand(m,n) > 0.5 #no need for where here
a = np.array([7,8,10,13])
x[a] = np.where(y[a],-1,x[a]) #need where here

Related

Reshaping array with specified indices

Is there any better way to do this? Like replacing that list comprehension with numpy functions? I'd assume that for a small number of elements, the difference is insignificant, but for larger chunks of data it takes too much time.
>>> rows = 3
>>> cols = 3
>>> target = [0, 4, 7, 8] # each value represent target index of 2-d array converted to 1-d
>>> x = [1 if i in target else 0 for i in range(rows * cols)]
>>> arr = np.reshape(x, (rows, cols))
>>> arr
[[1 0 0]
[0 1 0]
[0 1 1]]
Another way:
shape = (rows, cols)
arr = np.zeros(shape)
arr[np.unravel_index(target, shape)] = 1
Since x comes from a range, you can index an array of zeros to set the ones:
x = np.zeros(rows * cols, dtype=bool)
x[target] = True
x = x.reshape(rows, cols)
Alternatively, you can create the proper shape up front and assign to the raveled array:
x = np.zeros((rows, cols), dtype=bool)
x.ravel()[target] = True
If you want actual zeros and ones, use a dtype like np.uint8 or whatever else suits your needs other than bool.
The approach shown here would apply even to your list example to make it more efficient. Even if you turned target into a set, you are performing O(N) lookups, with N = rows * cols. Instead, you only need M assignments with no lookups, with M = len(target):
x = [0] * (rows * cols)
for i in target:
x[i] = 1

In Python. I have a list of ND arrays and I want to count duplicate arrays in order to calculate an Average for each Duplicate array value

I have a list of ND arrays(vectors), each vector has a (1,300) shape.
My goal is to find duplicate vectors inside a list, to sum them and then divide them by the size of a list, the result value(a vector) will replace the duplicate vector.
For example, a is a list of ND arrays, a = [[2,3,1],[5,65,-1],[2,3,1]], then the first and the last element are duplicates.
their sum would be :[4,6,2],
which will be divided by the size of a list of vectors, size = 3.
Output: a = [[4/3,6/3,2/3],[5,65,-1],[4/3,6/3,2/3]]
I have tried to use a Counter but it doesn't work for ndarrays.
What is the Numpy way?
Thanks.
If you have numpy 1.13 or higher, this is pretty simple:
def f(a):
u, inv, c = np.unique(a, return_counts = True, return_inverse = True, axis = 0)
p = np.where(c > 1, c / a.shape[0], 1)[:, None]
return (u * p)[inv]
If you don't have 1.13, you'll need some trick to convert a into a 1-d array first. I recommend #Jaime's excellent answer using np.void here
How it works:
u is the unique rows of a (usually not in their original order)
c is the number of times each row of u are repeated in a
inv is the indices to get u back to a, i.e. u[inv] = a
p is the multiplier for each row of u based on your requirements. 1 if c == 1 and c / n (where n is the number of rows in a) if c > 1. [:, None] turns it into a column vector so that it broadcasts well with u
return u * p indexed back to their original locations by [inv]
You can use numpy unique , with count return count
elements, count = np.unique(a, axis=0, return_counts=True)
Return Count allow to return the number of occurrence of each element in the array
The output is like this ,
(array([[ 2, 3, 1],
[ 5, 65, -1]]), array([2, 1]))
Then you can multiply them like this :
(count * elements.T).T
Output :
array([[ 4, 6, 2],
[ 5, 65, -1]])

Create column based on row value of another column.

I'm sure this has been asked before, but I couldn't find exactly what I was looked for.
I have a np.array and I would like to create an additional column (C2) which has values dependent on another column (C1).
In pseudocode, I would like to make a column where (j = 2:n):
R1C1 = R1C2
IF |Rj-1C2 - RjC1| < 20 THEN RjC2 = Rj-1C2
ElSE RjC2 = RjC1
I'm quite new to python, but I'm sure this is pretty straight forward. I basically just need to know how I can insert this formula into python for an np.array.
Thank you
This is pretty specific. Not sure there is a simple formula for this, because you are recursively generating the column rather than using existing data. You could do the following, where a is the index of your old column and b is the index of the column you want to fill in :
arr[0, b] = arr[0, a]
for j in range(1, n):
arr[j, b] = arr[j - 1, b] if abs(arr[j - 1, b] - arr[j, a]) < 20 else arr[j, a]
I'm going to use zero index (i.e. row 0 being the first row, row 1 being the second row, col 0 being first column, col 1 being second column, etc.) for ease of explaining and code implementation.
 The logic
Say we have a numpy array like this (call it array a) - as per your specification, both columns in first rows are the same.
a = np.array(
[
[10, 10],
[15, None],
[50, None]
]
)
You want to set n as 3 (number of rows).
The looping variant j takes the range of index 1 (inclusive) to n (exclusive). For our dummy example, j would be 1, 2. (i.e. 2 loops)
Note that Numpy indexing looks like this:
a[0][1] means first row (row 0), second column (col 1).
a[1][1] means second row (row 1), second column (col 1).
The condition being:
if abs(a[j-1][1] - a[j][0]) < 20 ... then a[j][1] = a[j-1][1]
otherwise, a[j][1] = a[j][0]
i.e. Expected output:
[
[10, 10],
[15, 10],
[50, 50]
]
 The code
This is a straight Numpy implementation
import numpy as np
# Create a sample numpy array as per specification
a = np.array(
[
[10, 10],
[15, None],
[50, None]
]
)
# get number of rows there are for looping upper bound
# for our dummy example, n = 3
n = a.shape[0]
# do the loop
for j in range(1, n):
if abs(a[j-1][1] - a[j][0]) < 20:
a[j][1] = a[j-1][1]
else:
a[j][1] = a[j][0]
# the array `a` is now is now updated to...
# array([[10, 10],
# [15, 10],
# [50, 50]], dtype=object)
Also, I would suggest you to rename your question from the original:
Create column based on row value of another column.
to the new:
Update column based on row value of another column.
... since you always only have two columns (but can be many rows)

fill array in list with zeros

I have this list:
a = [ np.array([ 1, 2]), np.array([0])]
I want to iterate:
x = np.array([t[i] for i, t in enumerate(a)])
but since np.array([0]) has only one element, it will throw an error.
So, I thought to fill the np.array([0]) with another one zero , and then
a = [ np.array([ 1, 2]), np.array([0,0])]
x = np.array([t[i] for i, t in enumerate(a)])
print(x)
[1 0]
So, I am finding the biggest length in the list:
temp = []
for i in a:
temp.append(len(i))
themax = max(temp)
which is 2. (the np.array([1, 2]).
Now, I must somehow fill the other subelements with zeros..
Note, that I will always have the zero np.array([0]) which causes the problem.
The easiest way would be to change your list comprehension to give a zero instead of an array element in the case of an array being too small:
x = np.asarray([(t[i] if i < t.shape[0] else 0.) for i, t in enumerate(a)])
This is more efficient as you don't have to expand all arrays with zeros.

Deleting values from multiple arrays that have a particular value

Lets say I have two arrays: a = array([1,2,3,0,4,5,0]) and b = array([1,2,3,4,0,5,6]). I am interested in removing the instances where a and bare 0. But I also want to remove the corresponding instances from both lists. Therefore what I want to end up with is a = array([1,2,3,5]) and b = array([1,2,3,5]). This is because a[3] == 0 and a[6] == 0, so both b[3] and b[6] are also deleted. Likewise, since b[4] == 0, a[4] is also deleted.Its simple to do this for say two arrays:
import numpy as np
a = np.array([1,2,3,0,4,5,0])
b = np.array([1,2,3,4,0,5,6])
ix = np.where(b == 0)
b = np.delete(b, ix)
a = np.delete(a, ix)
ix = np.where(a == 0)
b = np.delete(b, ix)
a = np.delete(a, ix)
However this solution doesnt scale up if I have many many arrays (which I do). What would be a more elegant way to do this?
If I try the following:
import numpy as np
a = np.array([1,2,3,0,4,5,0])
b = np.array([1,2,3,4,0,5,6])
arrays = [a,b]
for array in arrays:
ix = np.where(array == 0)
b = np.delete(b, ix)
a = np.delete(a, ix)
I get a = array([1, 2, 3, 4]) and b = array([1, 2, 3, 0]), not the answers I need. Any idea where this is wrong?
Assuming both/all arrays always have the same length, you can use masks:
ma = a != 0 # mask elements which are not equal to zero in a
mb = b != 0 # mask elements which are not equal to zero in b
m = ma * mb # assign the intersection of ma and mb to m
print a[m], b[m] # [1 2 3 5] [1 2 3 5]
You can of course also do it in one line
m = (a != 0) * (b != 0)
Or use the inverse
ma = a == 0
mb = b == 0
m = ~(ma + mb) # not the union of ma and mb
This is happening because when you return from np.delete, you get an array that is stored in b and a inside the loop. However, the arrays stored in the arrays variable are copies, not references. Hence, when you're updating the arrays by deleting them, it deletes with regard to the original arrays. The first loop will return the corrects indices of 0 in the array but the second loop will return ix as 4 (look at the original array).Like if you display the arrays variable in each iteration, it is going to remain the same.
You need to reassign arrays once you are done processing one array so that it's taken into consideration the next iteration. Here's how you'd do it -
a = np.array([1, 2, 3, 0, 4, 5, 0])
b = np.array([1, 2, 3, 4, 0, 5, 6])
arrays = [a,b]
for i in range(0, len(arrays)):
ix = np.where(arrays[i] == 0)
b = np.delete(b, ix)
a = np.delete(a, ix)
arrays = [a, b]
Of course you can automate what happens inside the loop. I just wanted to give an explanation of what was happening.
A slow method involves operating over the whole list twice, first to build an intermediate list of indices to delete, and then second to delete all of the values at those indices:
import numpy as np
a = np.array([1,2,3,0,4,5,0])
b = np.array([1,2,3,4,0,5,6])
arrays = [a, b]
vals = []
for array in arrays:
ix = np.where(array == 0)
vals.extend([y for x in ix for y in x.tolist()])
vals = list(set(vals))
new_array = []
for array in arrays:
new_array.append(np.delete(array, vals))
Building up on top of Christoph Terasa's answer, you can use array operations instead of for loops:
arrays = np.vstack([a,b]) # ...long list of arrays of equal length
zeroind = (arrays==0).max(0)
pos_arrays = arrays[:,~zeroind] # a 2d array only containing those columns where none of the lines contained zeros

Categories