I'm trying to do an "&" operation across all the values in a simple bool array. The array I have is as follows:
array([False False True], dtype=bool)
The only thing I've come up with is to slice out the values in the array and use "&" to give a "False" result. I feel like there must be a better way but I don't know enough about numpy to use it properly.
Use arr.all(), which is the same as np.all(arr):
import numpy as np
arr = np.array([False, False, True], dtype=bool)
arr.all()
=> False
np.all(arr)
=> False
Related
Let's say I have two numpy arrays:
>>> v1
array([ True, False, False, False, True])
>>> v2
array([False, False, True, True, True])
I'm trying to retrieve an array that has the same length (5) and contains True in each position where v1==True AND v2==False. That would be:
array([True, False, False, False, False])
Is there a quick way in numpy, something like logical_not() but considering v1 as the reference and v2 as the query?
You just need to use the right bitwise operators:
v1 & ~v2
# array([ True, False, False, False, False])
For boolean values, logical and bitwise operations are the same. It is therefore quite idiomatic to write
v1 & ~v2
However, this is a bitwise operation, and produces a potentially unnecessary temp array. You can not write v1 and not v2 as much as you'd like to because python expects to convert the inputs to single boolean values. Instead, you have to call the logical_and and logical_not ufuncs:
np.logical_and(v1, np.logical_not(v2))
The nice thing is that you can avoid the temp array, or even write directly to a buffer of your choice:
result = np.empty_like(v1)
np.logical_not(v2, out=result)
np.logical_and(v1, result, out=result)
You can even do the whole thing in-place (in v2):
np.logical_and(v1, np.logical_not(v2, out=v2), out=v2)
You can make use of bitwise operators here:
>>> v1 & ~v2
array([ True, False, False, False, False])
This is definitely more of a notional question, but I wanted to get others expertise input on this topic at SO. Most of my programming is coming from Numpy arrays lately. I've been matching items in two or so arrays that are different in sizes. Most of the time I will go to a for-loop or even worst, nested for-loop. I'm ultimately trying to avoid using for-loops as I try to gain more experience in Data Science because for-loops perform slower.
I am well aware of Numpy and the pre-defined cmds I can research, but for those of you whom are experienced, do you have a general school of thought when you iterate through something?
Something similar to the following:
small_array = np.array(["a", "b"])
big_array = np.array(["a", "b", "c", "d"])
for i in range(len(small_array)):
for p in range(len(big_array)):
if small_array[i] == big_array[p]:
print "This item is matched: ", small_array[i]
I'm well aware there are more than one way to skin a cat with this, but I am interested in others approach and way of thinking.
Since I've been working with array languages for decades (APL, MATLAB, numpy) I can't help with the starting steps. But I suspect I work mostly from patterns, things I've seen and used in the past. And I do a lot to experimentation in an interactive session.
To take your example:
In [273]: small_array = np.array(["a", "b"])
...: big_array = np.array(["a", "b", "c", "d"])
...:
...: for i in range(len(small_array)):
...: for p in range(len(big_array)):
...: if small_array[i] == big_array[p]:
...: print( "This item is matched: ", small_array[i])
...:
This item is matched: a
This item is matched: b
Often I run the iterative case just to get a clear(er) idea of what is desired.
In [274]: small_array
Out[274]:
array(['a', 'b'],
dtype='<U1')
In [275]: big_array
Out[275]:
array(['a', 'b', 'c', 'd'],
dtype='<U1')
I've seen this before - iterating over two arrays, and doing something with the paired values. This is a kind of outer operation. There are various tools, but the one I like best makes use of numpy broadcasting. It turn one array into a (n,1) array, and use it with the other (m,) array
In [276]: small_array[:,None]
Out[276]:
array([['a'],
['b']],
dtype='<U1')
The result of (n,1) operating with (1,m) is a (n,m) array:
In [277]: small_array[:,None]==big_array
Out[277]:
array([[ True, False, False, False],
[False, True, False, False]], dtype=bool)
Now I can take an any or all reduction on either axis:
In [278]: _.all(axis=0)
Out[278]: array([False, False, False, False], dtype=bool)
In [280]: __.all(axis=1)
Out[280]: array([False, False], dtype=bool)
I could also use np.where to reduce that boolean to indices.
Oops, I should have used any
In [284]: (small_array[:,None]==big_array).any(0)
Out[284]: array([ True, True, False, False], dtype=bool)
In [285]: (small_array[:,None]==big_array).any(1)
Out[285]: array([ True, True], dtype=bool)
Having played with this I remember that there's a in1d that does something similar
In [286]: np.in1d(big_array, small_array)
Out[286]: array([ True, True, False, False], dtype=bool)
But when I look at the code for in1d (see the [source] link in the docs), I see that, in some cases it actually iterates on the small array:
In [288]: for x in small_array:
...: print(x==big_array)
...:
[ True False False False]
[False True False False]
Compare that to Out[277]. x==big_array compares a scalar with an array. In numpy, doing something like ==, +, * etc with an array and scalar is easy, and should become second nature. Doing the same thing with 2 arrays of matching shapes is the next step. And from there do it with broadcastable shapes.
In other cases it use np.unique and np.argsort.
This pattern of creating a higher dimension array by broadcasting the inputs against each other, and then combining values with some sort of reduction (any, all, sum, mean, etc) is very common.
I will interpret your question in a more specific way:
How do I quit using index variables?
How do I start writing list comprehensions instead of normal loops"?
To quit using index variables, the key is to understand that "for" in Python is not the "for" of other languagues. It should be called "for each".
for x in small_array:
for y in big_array:
if x == y:
print "This item is matched: ", x
That's much better.
I also find myself in situations where I would write code with normal loops (or actually do it) and then start wondering whether it would be clearer and more elegant with a list comprehension.
List comprehensions are really a domain-specific language to create lists, so the first step would be to learn its basics. A typical statement would be:
l = [f(x) for x in list_expression if g(x)]
Meaning "give me a list of f(x), for all x out of list_expression that meet condition g"
So you could write it in this way:
matched = [x for x in small_array if x in big_array]
Et voilĂ , you are on the road to pythonic style!
As you said, you better use vectorized stuff to speed up. Learning it is a long path. You have to get used with matrices multiplication if you aren't already. Once you are, try to translate your data into matrix and see which multiplication you can do. Usually you can't do what you want with this and have super-matrices (more than 2D dimensions). That's where numpy get useful.
Numpy provides some functions like np.where, know how to use them. Know shortcuts like small_array[small_array == 'a'] = 'z'. Try to combine numpy functions with nativ pythons (map, filter...).
To handle multi-dimension matrix, there's no seccret, practice and use paper to understand what you're doing. But over 4 dimensions it starts getting very tricky.
For loops are not necessarily slow. That's a matlab nonsense spread through time because of matlab's own fault. Vectorization is "for" looping but in a lower level. You need to get a handle on what kind of data and architecture you are working in and which kind of function your are executing over your data.
I am currently working with NumPy version 1.12.1, and every call to numpy.where() returns an empty list with the following warning:
FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
I am comparing a string, date_now and a list, dates_list:
np.where(date_now==dates_list)
This causes errors, as the program subsequently calls functions that expect the numpy.where() output to be a non-empty. Does anyone have a solution for this?
Thanks in advance.
In your current comparison, you are comparing the entire list object, dates_list, to a string, date_now. This will cause element-wise comparison to fail and return a scalar as if you are just comparing two scalar values:
date_now = '2017-07-10'
dates_list = ['2017-07-10', '2017-07-09', '2017-07-08']
np.where(dates_list==date_now, True, False)
Out[3]: array(0)
What you want is to declare dates_list as a NumPy array to facilitate element-wise comparison.
np.where(np.array(dates_list)==date_now, True, False)
Out[8]: array([ True, False, False], dtype=bool)
I understand that tf.where will return the locations of True values, so that I could use the result's shape[0] to get the number of Trues.
However, when I try and use this, the dimension is unknown (which makes sense as it needs to be computed at runtime). So my question is, how can I access a dimension and use it in an operation like a sum?
For example:
myOtherTensor = tf.constant([[True, True], [False, True]])
myTensor = tf.where(myOtherTensor)
myTensor.get_shape() #=> [None, 2]
sum = 0
sum += myTensor.get_shape().as_list()[0] # Well defined at runtime but considered None until then.
You can cast the values to floats and compute the sum on them:
tf.reduce_sum(tf.cast(myOtherTensor, tf.float32))
Depending on your actual use case you can also compute sums per row/column if you specify the reduce dimensions of the call.
I think this is the easiest way to do it:
In [38]: myOtherTensor = tf.constant([[True, True], [False, True]])
In [39]: if_true = tf.count_nonzero(myOtherTensor)
In [40]: sess.run(if_true)
Out[40]: 3
Rafal's answer is almost certainly the simplest way to count the number of true elements in your tensor, but the other part of your question asked:
[H]ow can I access a dimension and use it in an operation like a sum?
To do this, you can use TensorFlow's shape-related operations, which act on the runtime value of the tensor. For example, tf.size(t) produces a scalar Tensor containing the number of elements in t, and tf.shape(t) produces a 1D Tensor containing the size of t in each dimension.
Using these operators, your program could also be written as:
myOtherTensor = tf.constant([[True, True], [False, True]])
myTensor = tf.where(myOtherTensor)
countTrue = tf.shape(myTensor)[0] # Size of `myTensor` in the 0th dimension.
sess = tf.Session()
sum = sess.run(countTrue)
There is a tensorflow function to count non-zero values tf.count_nonzero. The function also accepts an axis and keep_dims arguments.
Here is a simple example:
import numpy as np
import tensorflow as tf
a = tf.constant(np.random.random(100))
with tf.Session() as sess:
print(sess.run(tf.count_nonzero(tf.greater(a, 0.5))))
I have a numpy array containing integers and slice objects, e.g.:
x = np.array([0,slice(None)])
How do I retrieve the (logical) indices of the integers or slice objects? I tried np.isfinite(x) (producing an error), np.isreal(x) (all True), np.isscalar(x) (not element-wise), all in vain.
What seems to work though is
ind = x<np.Inf # Out[1]: array([True, False], dtype=bool)
but I'm reluctant to use a numerical comparison on an object who's numerical value is completely arbitrary (and might change in the future?). Is there a better solution to achieve this?
You can do this:
import numpy as np
checker = np.vectorize( lambda x: isinstance(x,slice) )
x = np.array([0,slice(None),slice(None),0,0,slice(None)])
checker(x)
#array([False, True, True, False, False, True], dtype=bool)