I'm reading the book Python for data analysis about numpy Boolen indexing, it says Selecting data from an array by boolean indexing always creates a copy of the data, but why I could change the original array using Boolen indexing? Is anyone could help me? Thanks a lot.
here is the example:
In [86]: data
Out[86]:
array([[-0.048 , 0.5433, -0.2349, 1.2792],
[-0.268 , 0.5465, 0.0939, -2.0445],
[-0.047 , -2.026 , 0.7719, 0.3103],
[ 2.1452, 0.8799, -0.0523, 0.0672],
[-1.0023, -0.1698, 1.1503, 1.7289],
[ 0.5994, 0.8174, -0.9297, -1.2564]])
In [96]: data[data < 0] = 0
In [97]: data
Out[97]:
array([[ 0. , 0.5433, 0. , 1.2792],
[ 0. , 0.5465, 0.0939, 0. ],
[ 0. , 0. , 0.7719, 0.3103],
[ 2.1452, 0.8799, 0. , 0.0672],
[ 0. , 0. , 1.1503, 1.7289],
[ 0.1913, 0.4544, 0.4519, 0.5535],
[ 0.5994, 0.8174, 0. , 0. ]])
Boolean indexing returns a copy of the data, not a view of the original data, like one gets for slices.
>>> b=data[data<0]; b # this is a copy of data
array([-0.048 , -0.2349, -0.268 , -2.0445, -0.047 , -2.026 , -0.0523,
-1.0023, -0.1698, -0.9297, -1.2564])
I can manipulate b and data is preserved.
>>> b[:] = 0; b
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
>>> data
array([[-0.048 , 0.5433, -0.2349, 1.2792],
[-0.268 , 0.5465, 0.0939, -2.0445],
[-0.047 , -2.026 , 0.7719, 0.3103],
[ 2.1452, 0.8799, -0.0523, 0.0672],
[-1.0023, -0.1698, 1.1503, 1.7289],
[ 0.5994, 0.8174, -0.9297, -1.2564]])
Now, for a slice:
>>> a = data[0,:]; a # a is not a copy of data
array([-0.048 , 0.5433, -0.2349, 1.2792])
>>> a[:] = 0; a
array([ 0., 0., 0., 0.])
>>> data
array([[ 0. , 0. , 0. , 0. ],
[-0.268 , 0.5465, 0.0939, -2.0445],
[-0.047 , -2.026 , 0.7719, 0.3103],
[ 2.1452, 0.8799, -0.0523, 0.0672],
[-1.0023, -0.1698, 1.1503, 1.7289],
[ 0.5994, 0.8174, -0.9297, -1.2564]])
However, as you've identified, assignments made via indexed arrays are always made to the original data.
>>> data[data<0] = 1; data
array([[ 1. , 0.5433, 1. , 1.2792],
[ 1. , 0.5465, 0.0939, 1. ],
[ 1. , 1. , 0.7719, 0.3103],
[ 2.1452, 0.8799, 1. , 0.0672],
[ 1. , 1. , 1.1503, 1.7289],
[ 0.5994, 0.8174, 1. , 1. ]])
In a fetch or __getitem__ the boolean indexing does return a copy. But if used immediately before an assignment, it's a __setitem__ case, and the selected values will be changed:
In [196]: data = np.arange(10)
In [197]: d1 = data[data<5]
In [198]: d1 # a copy
Out[198]: array([0, 1, 2, 3, 4])
In [199]: d1[:] = 0
In [200]: data # not change to the original
Out[200]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Masked assignment:
In [201]: data[data<5] = 0
In [202]: data
Out[202]: array([0, 0, 0, 0, 0, 5, 6, 7, 8, 9]) # changed data
Indirect assignment does nothing:
In [204]: data[data<5][:] = 1
In [205]: data
Out[205]: array([0, 0, 0, 0, 0, 5, 6, 7, 8, 9])
Think of it as data.__getitem__(mask).__setitem__(slice) = 1. The get item returns a copy, which the set item changes - but doesn't change the original.
So if you need to use advanced indexing of the LHS, make sure it is immediately before the assignment. And you can't use 2 advanced indexing step on the LHS.
view v copy
With basic indexing it is possible to use the original databuffer, and just change attributes like shape and strides. For example:
In [85]: x = np.arange(10)
In [86]: x.shape
Out[86]: (10,)
In [87]: x.strides
Out[87]: (4,)
In [88]: y = x[::2]
In [89]: y.shape
Out[89]: (5,)
In [90]: y.strides
Out[90]: (8,)
y has the same databuffer as x (compare the x.__array_interface__ dictionaries). x uses all 10 4bytes elements; y uses every other one (strides steps by 8 bytes instead of 4).
But with advanced indexing you can't express the element selection in terms of shape and strides.
In [98]: z = x[[1,2,6,7,0]]
In [99]: z.shape
Out[99]: (5,)
In [100]: z.strides
Out[100]: (4,)
Items in the original array can be selected in any order and with repetitions. There's no regular pattern. So a copy is required.
Related
I have two arrays, and I want all the elements of one to be divided by the second. For example,
In [24]: a = np.array([1,2,3])
In [25]: b = np.array([1,2,3])
In [26]: a/b
Out[26]: array([1., 1., 1.])
In [27]: 1/b
Out[27]: array([1. , 0.5 , 0.33333333])
This is not the answer I want, the output I want is like (we can see all of the elements of a are divided by b)
In [28]: c = []
In [29]: for i in a:
...: c.append(i/b)
...:
In [30]: c
Out[30]:
[array([1. , 0.5 , 0.33333333]),
array([2. , 1. , 0.66666667]),
In [34]: np.array(c)
Out[34]:
array([[1. , 0.5 , 0.33333333],
[2. , 1. , 0.66666667],
[3. , 1.5 , 1. ]])
But I don't like for loop, it's too slow for big data, so is there a function that included in numpy package or any good (faster) way to solve this problem?
It is simple to do in pure numpy, you can use broadcasting to calculate the outer product (or any other outer operation) of two vectors:
import numpy as np
a = np.arange(1, 4)
b = np.arange(1, 4)
c = a[:,np.newaxis] / b
# array([[1. , 0.5 , 0.33333333],
# [2. , 1. , 0.66666667],
# [3. , 1.5 , 1. ]])
This works, since a[:,np.newaxis] increases the dimension of the (3,) shaped array a into a (3, 1) shaped array, which can be used for the desired broadcasting operation.
First you need to cast a into a 2D array (same shape as the output), then repeat for the dimension you want to loop over. Then vectorized division will work.
>>> a.reshape(-1,1)
array([[1],
[2],
[3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1)
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1) / b
array([[1. , 0.5 , 0.33333333],
[2. , 1. , 0.66666667],
[3. , 1.5 , 1. ]])
# Transpose will let you do it the other way around, but then you just get 1 for everything
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1).T
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1).T / b
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
This should do the job:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])
print(a.reshape(-1, 1) / b)
Output:
[[ 1. 0.5 0.33333333]
[ 2. 1. 0.66666667]
[ 3. 1.5 1. ]]
I have a function such as:
def f(x):
A =np.array([[0, 1],[0, -1/x]]);
return A
If I use an scalar I will obtain:
>>x=1
>>f(x)
array([[ 0., 1.],
[ 0., -1.]])
and if I use an array as an input, I will obtain:
>>x=np.linspace(1,3,3)
>>f(x)
array([[0, 1],
[0, array([-1. , -0.5 , -0.33333333])]], dtype=object)
Actually I would like to obtain a list of array, namely:
A = [A_1,A_2, ..., A_n]
Right now I do not care much about if it is an array of arrays or a list that contain several arrays.
I know I can do that using a for loop in x. But I think there is probably another way to do it, and maybe more efficient.
So the output that I would like would be something like:
>>x=np.linspace(1,3,3)
>>r=f(x)
array([[[0, 1],[0,-1]],
[[0, 1],[0,-0.5]],
[[0, 1],[0,-0.33333]]])
>>r[0]
array([[0, 1],[0,-1]])
or something like
>>x=np.linspace(1,3,3)
>>r=f(x)
[array([[0, 1],[0,-1]]),
array([[0, 1],[0,-0.5]]),
array([[0, 1],[0,-0.33333]])]
>>r[0]
array([[0, 1],[0,-1]])
Thanks
In your function we could check
type of given parameter. Here, if x is type of np.ndarray we are going to create nested list which we desire, otherwise we'll return output as before.
import numpy as np
def f(x):
if isinstance(x, np.ndarray):
v = -1/x
A = np.array([[[0, 1],[0, i]] for i in v])
else:
A = np.array([[0, 1],[0, -1/x]])
return A
x = np.linspace(1,3,3)
print(f(x))
Output:
[[[ 0. 1. ]
[ 0. -1. ]]
[[ 0. 1. ]
[ 0. -0.5 ]]
[[ 0. 1. ]
[ 0. -0.33333333]]]
You can do something like:
import numpy as np
def f(x):
x = np.array([x]) if type(x)==float or type(x)==int else x
A = np.stack([np.array([[0, 1],[0, -1/i]]) for i in x]);
return A
The first line deal with the cases when x is an int or a float, since is not an iterable. Then:
f(1)
array([[[ 0., 1.],
[ 0., -1.]]])
f(np.linspace(1,3,3))
array([[[ 0. , 1. ],
[ 0. , -1. ]],
[[ 0. , 1. ],
[ 0. , -0.5 ]],
[[ 0. , 1. ],
[ 0. , -0.33333333]]])
I have a tensorflow array names tf-array and a numpy array names np_array. I want to find specific rows in tf_array with regards to np-array.
tf-array = tf.constant(
[[9.968594, 8.655439, 0., 0. ],
[0., 8.3356, 0., 8.8974 ],
[0., 0., 6.103182, 7.330564 ],
[6.609862, 0., 3.0614321, 0. ],
[9.497023, 0., 3.8914037, 0. ],
[0., 8.457685, 8.602337, 0. ],
[0., 0., 5.826657, 8.283971 ]])
I also have an np-array:
np_array = np.matrix(
[[2, 5, 1],
[1, 6, 4],
[0, 0, 0],
[2, 3, 6],
[4, 2, 4]]
Now I want to keep the elements in tf-array in which the combination of n (here n is 2) of them (index of them) is in the value of np-array. What does it mean?
For example, in tf-array, in the first column, indexes which has value are: (0,3,4). Is there any row in np-array which contains any combination of these two indexes: (0,3), (0,4) or (3,4). Actually, there is no such row. So all the elements in that column became zero.
Indexes for the second column in tf-array is (0,1) (0,5) (1,5). As you see the record (1,5) is available in the np-array in the first row. Thats why we keep those in the tf-array.
So the final result should be like this:
[[0. 0. 0. 0. ]
[0. 8.3356 0. 8.8974 ]
[0. 0. 6.103182 7.330564 ]
[0. 0. 3.0614321 0. ]
[0. 0. 3.8914037 0. ]
[0. 8.457685 8.602337 0. ]
[0. 0. 5.826657 8.283971 ]]
I am looking for a very efficient approach as I have large number of data.
Update1
I could get this with the below code which is giving True where there is value and the zero mask to false:
[[ True True False False]
[False True False True]
[False False True True]
[ True False True False]
[ True False True False]
[False True True False]
[False False True True]]
with tf.Session() as sess:
where = tf.not_equal(tf-array, 0.0)
print(sess.run(where))
But how can I compare theese matrix with np_array?
Thank you in advance!
Here is the solution from https://stackoverflow.com/a/56510832/7207392 with necessary modifications. For the sake of simplicity I use np.array for all data. I'm no tensortflow expert, so if translating is not entirely straight forward, you'll have to ask somebody else how to do it.
import numpy as np
def f(a1, a2, n):
N,M = a1.shape
a1p = np.concatenate([a1,np.zeros((1,a1.shape[1]),a1.dtype)], axis=0)
a2 = np.sort(a2, axis=1)
a2[:,1:][a2[:,1:]==a2[:,:-1]] = N
y,x = np.where(np.count_nonzero(a1p[a2], axis=1) >= n)
out = np.zeros_like(a1p)
out[a2[y],x[:,None]] = a1p[a2[y],x[:,None]]
return out[:-1]
a1 = np.array(
[[9.968594, 8.655439, 0., 0. ],
[0., 8.3356, 0., 8.8974 ],
[0., 0., 6.103182, 7.330564 ],
[6.609862, 0., 3.0614321, 0. ],
[9.497023, 0., 3.8914037, 0. ],
[0., 8.457685, 8.602337, 0. ],
[0., 0., 5.826657, 8.283971 ]])
a2 = np.array(
[[2, 5, 1],
[1, 6, 4],
[0, 0, 0],
[2, 3, 6],
[4, 2, 4]])
print(f(a1,a2,2))
Output:
[[0. 0. 0. 0. ]
[0. 8.3356 0. 8.8974 ]
[0. 0. 6.103182 7.330564 ]
[0. 0. 3.0614321 0. ]
[0. 0. 3.8914037 0. ]
[0. 8.457685 8.602337 0. ]
[0. 0. 5.826657 8.283971 ]]
The one eficient way you can try is to make bit flags for each row what value are there like for (0,3,4) will be 1 <<0 | 1<<3 | 1<<4. You will have array of values with flags.Try if << and | operator work in numpy.
Make the same for another array, i guess tf- arrays are just wrapped numpys.
After having 2 array of flags, make bitwise "and" over those. Where you condition is true for rows, the result will have at least two non zero bits. Also cound of bits can be done also efficient, google for that.
This hovever wont work with float - you ll need convert those to pretty small ints.
import numpy as np
arr_one = np.array(
[[2, 5, 1],
[1, 6, 4],
[0, 0, 0],
[2, 3, 6],
[4, 2, 4]])
arr_two = np.array(
[[2, 0, 7],
[1, 3, 4],
[5, 5, 6],
[1, 3, 6],
[4, 2, 4]])
print('1 << arr_one.T[0] ' , 1 << arr_one.T[0] )
arr_one_flags = 1 << arr_one.T[0] | 1 << arr_one.T[1] | 1 << arr_one.T[2]
print('arr_one_flags ', arr_one_flags)
arr_two_flags = 1 << arr_two.T[0] | 1 << arr_two.T[1] | 1 << arr_two.T[2]
arr_and = arr_one_flags & arr_two_flags
print('arr_and ', arr_and)
def get_bit_count(value):
n = 0
while value:
n += 1
value &= value-1
return n
arr_matches = np.array([get_bit_count(x) for x in arr_and])
print('arr_matches ', arr_matches )
arr_two_filtered = arr_two[arr_matches > 1]
print('arr_two_filtered ', arr_two_filtered )
Summary of the question, Is this kind of slicing and then assignment supported in tensorflow?
out[tf_a2[y],x[:,None]] = tf_a1[tf_a2[y],x[:,None]]
final = out[:-1]
Lets give the example, I have a tensor like this:
tf_a1 = tf.Variable([ [9.968594, 8.655439, 0., 0. ],
[0., 8.3356, 0., 8.8974 ],
[0., 0., 6.103182, 7.330564 ],
[6.609862, 0., 3.0614321, 0. ],
[9.497023, 0., 3.8914037, 0. ],
[0., 8.457685, 8.602337, 0. ],
[0., 0., 5.826657, 8.283971 ],
[0., 0., 0., 0. ]])
and I have this one:
tf_a2 = tf.constant([[1, 2, 5],
[1, 4, 6],
[0, 7, 7],
[2, 3, 6],
[2, 4, 7]])
Now I want to keep the elements in tf_a1 in which the combination of n (here n is 2) of them (index of them) is in the value of tf_a2. What does it mean?
For example, in tf_a1, in the first column, indexes which has value are: (0,3,4). Is there any row in tf_a2 which contains any combination of these two indexes: (0,3), (0,4) or (3,4). Actually, there is no such row. So all the elements in that column became zero.
Indexes for the second column in tf_a1 is (0,1) (0,5) (1,5). As you see the record (1,5) is available in the tf_a2 in the first row. That's why we keep those in the tf_a1.
This is the correct numpy code:
y,x = np.where(np.count_nonzero(a1p[a2], axis=1) >= n)
out = np.zeros_like(tf_a1)
out[tf_a2[y],x[:,None]] = tf_a1[tf_a2[y],x[:,None]]
final = out[:-1]
This is the expected output of this numpy code (but I need this in tensorflow):
[[0. 0. 0. 0. ]
[0. 8.3356 0. 8.8974 ]
[0. 0. 6.103182 7.330564 ]
[0. 0. 3.0614321 0. ]
[0. 0. 3.8914037 0. ]
[0. 8.457685 8.602337 0. ]
[0. 0. 5.826657 8.283971 ]]
The tensorflow code should be something like this:
y, x = tf.where(tf.count_nonzero(tf.gather(tf_a1, tf_a2, axis=0), axis=1) >= n)
out = tf.zeros_like(tf_a1)
out[tf_a2[y],x[:,None]] = tf_a1[tf_a2[y],x[:,None]]
final = out[:-1]
This part of the code tf.gather(tf_a1, tf_a2, axis=0), axis=1) is doing the numpy like slicing tf_a1[tf_a2]
Update 1
The only line which does not work its:
out[tf_a2[y],x[:,None]] = tf_a1[tf_a2[y],x[:,None]]
final = out[:-1]
Any idea how can I accomplish this in tensorflow, is this kind of slicing is supported in tensor object at all?
Any help is appreciated:)
My goal is to do this:
weights[1][0][0] = some_object(1)
But it throws this error:
TypeError: float() argument must be a string or a number, not 'some_object'
Because of this I wawnt to change the dtype to 'object'
In my code I have weights. They look like this:
>>> print(weights)
[ array([[-2.66665269, 0. ],
[-0.36358187, 0. ],
[ 1.55058871, 0. ],
[ 3.91364328, 0. ]])
array([[ 0.],
[ 0.]])]
I want to change weights[1][0][0] to an object. I am refering to the 0 in:
[ array([[-2.66665269, 0. ],
[-0.36358187, 0. ],
[ 1.55058871, 0. ],
[ 3.91364328, 0. ]])
array([[ 0.], #this zero right here
[ 0.]])]
I want to convert that 0 to some_object(1). So I change the dtype to 'object', but it remains float!
>>> weights=np.array(weights,dtype='object')
>>> weights.dtype
object
>>> weights[1].dtype
float64
>>> weights[1][0].dtype
float64
>>> weights[1][0][0].dtype
float64
So now I try:
>>> weights[1].dtype='object'
TypeError: Cannot change data-type for object array.
and this
>>> weights[1][0].dtype='object'
TypeError: Cannot change data-type for object array.
so now I cannot do this:
weights[1][0][0] = some_object(1)
TypeError: float() argument must be a string or a number, not 'object'
because the dtype is incorrect. How do I change the dtype?
Edit: I found an answer.
weights[1] = weights[1].tolist()
weights[1][0][0] = some_object(1)
....
weights=np.array(weights)
Let's make sure we understand what you are starting with:
In [7]: weights
Out[7]:
[array([[-2.66665269, 0. ],
[-0.36358187, 0. ],
[ 1.55058871, 0. ],
[ 3.91364328, 0. ]]), array([[ 0.],
[ 0.]])]
In [8]: len(weights)
Out[8]: 2
In [9]: weights[0]
Out[9]:
array([[-2.66665269, 0. ],
[-0.36358187, 0. ],
[ 1.55058871, 0. ],
[ 3.91364328, 0. ]])
In [10]: weights[0].dtype
Out[10]: dtype('float64')
In [11]: weights[0].shape
Out[11]: (4, 2)
In [13]: weights[1]
Out[13]:
array([[ 0.],
[ 0.]])
In [14]: weights[1].dtype
Out[14]: dtype('float64')
In [15]: weights[1].shape
Out[15]: (2, 1)
This is a 2 item list, containing two arrays. Both are 2d float.
First you wrap the whole list in an object array:
In [16]: duh =np.array(weights,dtype='object')
In [17]: duh
Out[17]:
array([ array([[-2.66665269, 0. ],
[-0.36358187, 0. ],
[ 1.55058871, 0. ],
[ 3.91364328, 0. ]]),
array([[ 0.],
[ 0.]])], dtype=object)
This is a 2 element array, shape (2,). But it doesn't change the nature of the elements. And there's a potential gotcha - if the element arrays had the same shape, it would have created a 3d array of objects.
This is not the right syntax for change the dtype of an array. dtype is not a writable property/attribute.
weights[1].dtype='object'
We can use astype instead:
In [19]: weights[1].astype(object)
Out[19]:
array([[0.0],
[0.0]], dtype=object)
In [20]: weights[1]=weights[1].astype(object)
In [21]: weights
Out[21]:
[array([[-2.66665269, 0. ],
[-0.36358187, 0. ],
[ 1.55058871, 0. ],
[ 3.91364328, 0. ]]), array([[0.0],
[0.0]], dtype=object)]
It makes a new array, which we'd have write back into the original list.
Now I can change an element of that 2nd array
In [22]: weights[1][0,0]=None
In [23]: weights
Out[23]:
[array([[-2.66665269, 0. ],
[-0.36358187, 0. ],
[ 1.55058871, 0. ],
[ 3.91364328, 0. ]]), array([[None],
[0.0]], dtype=object)]
When playing games like this you have to pay attention to where you have arrays and where they are lists. And pay attention to the shape and dtype of the arrays. Don't blindly index and hope for the best. Display/print these attributes, or the whole array if it isn't too large.