Numpy masks behaving differently when explicitly written and when referenced? - python

I was trying to understand numpy masks better and decided to try a simple fizzbuzz exercise (since np arrays are homogenous, 9993 is "fizz", 9995 = "buzz", 9998 = "fizzbuzz"). However, I noticed behavior I cannot understand and was hoping that someone could explain.
In the first case, I created my masks like that:
In:
a = np.arange(32)
a[(a % 3 == 0) & (a % 5 == 0)] = 9998
a
Out:
array([9998, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 9998, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 9998, 31])
In:
a[a % 3 == 0] = 9993
a[a % 5 == 0] = 9995
a
Out:
array([9998, 1, 2, 9993, 4, 9995, 9993, 7, 8, 9993, 9995,
11, 9993, 13, 14, 9998, 16, 17, 9993, 19, 9995, 9993,
22, 23, 9993, 9995, 26, 9993, 28, 29, 9998, 31])
Notice that 9998 has not been overwritten by the subsequent steps, as expected (it divides by neither 3 nor 5). So far so good. However, then I tried to be clever and name my masks:
In:
a = np.arange(32)
fizz = (a % 3 == 0)
buzz = (a % 5 == 0)
fizzbuzz = fizz & buzz
a[fizzbuzz] = 9998
a
Out:
array([9998, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 9998, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 9998, 31])
In:
a[fizz] = 9993
a[buzz] = 9995
a
Out:
array([9995, 1, 2, 9993, 4, 9995, 9993, 7, 8, 9993, 9995,
11, 9993, 13, 14, 9995, 16, 17, 9993, 19, 9995, 9993,
22, 23, 9993, 9995, 26, 9993, 28, 29, 9995, 31])
From what I could grasp, it would appear that at the "fizzbuzz = fizz & buzz" step, I create a mask such that it provides me with a copy of the array when applied over it. This is in contrast to just writing the mask out, which appears to work as intended and modify the array directly (15 & 30 remain 9998 even after the % 3 and % 5 masks are applied).
My question is why does this happen? From my perspective the logic is absolutely the same in both cases. Writing it as "a[fizz & buzz]" instead of "a[fizzbuzz]" did not help.

I think your problem is that, when you generate the array in first step you got a = [0,1,2,3,...,31]. and the comparation that you are doing (first snipet) is with the value in the array and not with de index. so when you do the first replace you got a=[9998,1,2,9998,4,9998,6...] then the next replace you are using the values so the compare 9998%3==0 when you are in index 15 is False and 9998%5==0 is also False
In the seccond case you are using boolean array to acccess a then you're acceding with indexs. In this case doesn't mater the value in that index.
if you want the same behavior in both you can modify when you create fizz and buzz
a = np.arange(32)
fizzbuzz = (a % 3 == 0) & (a % 5 == 0)
a[fizzbuzz] = 9998
print(a)
fizz = (a % 3 == 0)
buzz = (a % 5 == 0)
a[fizz] = 9993
a[buzz] = 9995
print(a)
so the thing is that you are creating fizz and buzz with different arrays y both cases
(sorry for potato english)

Related

Numpy: How to best align two sorted arrays?

In order to combine time series data, I am left with the following essential step:
>>> xs1
array([ 0, 10, 12, 16, 25, 29])
>>> xs2
array([ 0, 5, 10, 15, 20, 25, 30])
How to best get the following solutions:
>>> xs1_ = np.array([0,0,10,12,12,16,16,25,29,29])
>>> xs2_ = np.array([0,5,10,10,15,15,20,25,25,30])
This is to align the measurements taken at times x1 and x2.
Imagine that the measurement from series xs1 at time 0 is valid until the next measurement in this series has been made, which is time 10. We could interpolate both series to their greatest common divisor, but that is most likely 1 and creates a huge bloat. Therefore it would be better to have an interpolation only for the union of xs1 and xs2. In xs1_ and xs2_ are aligned by list index the x-values to compare. I.e. we compare time 5 in series xs2_ with time 0 in series xs1_ as the next measurement in series xs1_ is only later, at time 10. From a visual point of view, imagine a step plot for both measurements (the y-values are not shown here) where we always compare the lines laying above each other.
Although I am struggling how to name this task, I believe it is a problem of general interest and therefore think it is appropriate to ask here for its best solution.
Here is my proposition:
a=np.array([0,10,12,16,25,29])
b=np.array([0,5,10,15,20,25,30])
c=set(a).union(b)
#c = {0, 5, 10, 12, 15, 16, 20, 25, 29, 30}
xs1_= [max([i for i in a if i<=j]) for j in c]
# [0, 0, 10, 12, 12, 16, 16, 25, 29, 29]
xs2 = [max([i for i in b if i<=j]) for j in c]
# [0, 5, 10, 10, 15, 15, 20, 25, 25, 30]
1) a and b are your two first list.
2) c is a set which represents the union between your two arrays. By doing this, you get all the value present in both array.
3) Then, for each element of this set, I will select the maximum of the value present in a or b, which remain smaller than or equal to this element.
Here's a vectorised approach:
xs1 = np.array([ 0, 10, 12, 16, 25, 29])
xs2 = np.array([ 0, 5, 10, 15, 20, 25, 30])
# union of both sets
xs = np.array(sorted(set(xs1) | set(xs2)))
# array([ 0, 5, 10, 12, 15, 16, 20, 25, 29, 30])
xs1_ = np.maximum.accumulate(np.in1d(xs, xs1) * xs)
print(xs1_)
array([ 0, 0, 10, 12, 12, 16, 16, 25, 29, 29])
xs2_ = np.maximum.accumulate(np.in1d(xs, xs2) * xs)
print(xs_2)
array([ 0, 5, 10, 10, 15, 15, 20, 25, 25, 30])
Where, for both cases:
np.in1d(xs, xs1) * xs
# array([ 0, 0, 10, 12, 0, 16, 0, 25, 29, 0])
Is giving an array with the values in in xs contained in xs1 and 0 for those that aren't. We just need to forward fill using np.maximum.accumulate.

How to randomly select a specific sequence from a list?

I have a list of hours starting from (0 is midnight).
hour = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
I want to generate a sequence of 3 consecutive hours randomly. Example:
[3,6]
or
[15, 18]
or
[23,2]
and so on. random.sample does not achieve what I want!
import random
hourSequence = sorted(random.sample(range(1,24), 2))
Any suggestions?
Doesn't exactly sure what you want, but probably
import random
s = random.randint(0, 23)
r = [s, (s+3)%24]
r
Out[14]: [16, 19]
Note: None of the other answers take in to consideration the possible sequence [23,0,1]
Please notice the following using itertools from python lib:
from itertools import islice, cycle
from random import choice
hours = list(range(24)) # List w/ 24h
hours_cycle = cycle(hours) # Transform the list in to a cycle
select_init = islice(hours_cycle, choice(hours), None) # Select a iterator on a random position
# Get the next 3 values for the iterator
select_range = []
for i in range(3):
select_range.append(next(select_init))
print(select_range)
This will print sequences of three values on your hours list in a circular way, which will also include on your results for example the [23,0,1].
You can try this:
import random
hour = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
index = random.randint(0,len(hour)-2)
l = [hour[index],hour[index+3]]
print(l)
You can get a random number from the array you already created hour and take the element that is 3 places afterward:
import random
def random_sequence_endpoints(l, span):
i = random.choice(range(len(l)))
return [hour[i], hour[(i+span) % len(l)]]
hour = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
result = random_sequence_endpoints(hour, 3)
This will work not only for the above hours list example but for any other list contain any other elements.

extracting numbers from list

I've created a list (which is sorted):
indexlist = [0, 7, 8, 12, 19, 25, 26, 27, 29, 30, 31, 33]
I want to extract the numbers from this list that are at least five away from each other and input them into another list. This is kind of confusing. This is an example of how I want the output:
outlist = [0, 7, 19, 25, 31]
As you can see, none of the numbers are within 5 of each other.
I've tried this method:
for index2 in range(0, len(indexlist) - 1):
if indexlist[index2 + 1] > indexlist[index2] + 5:
outlist.append(indexlist[index2])
However, this gives me this output:
outlist = [0, 12, 19]
Sure, the numbers are at least 5 away, however, I'm missing some needed values.
Any ideas about how I can accomplish this task?
You need to keep track of the last item you added to the list, not just compare to the following value:
In [1]: indexlist = [0, 7, 8, 12, 19, 25, 26, 27, 29, 30, 31, 33]
In [2]: last = -1000 # starting value hopefully low enough :)
In [3]: resultlist = []
In [4]: for item in indexlist:
...: if item > last+5:
...: resultlist.append(item)
...: last = item
...:
In [5]: resultlist
Out[5]: [0, 7, 19, 25, 31]
This should do the trick. Here, as I said in comment, the outlist is initialised with the first value of indexlistand iterated indexlist elements are compared to it. It is a rough solution. But works.
indexlist = [0, 7, 8, 12, 19, 25, 26, 27, 29, 30, 31, 33]
outlist = [indexlist[0]]
for index2 in range(1, len(indexlist) - 1):
if indexlist[index2] > (outlist[-1] + 5):
outlist.append(indexlist[index2])
output:
>>outlist
[0, 7, 19, 25, 31]
Tim Pietzcker's answer is right but this can also be done without storing the last added item in a separate variable. Instead you can read the last value in outlist:
>>> indexlist = [0, 7, 8, 12, 19, 25, 26, 27, 29, 30, 31, 33]
>>> outlist = []
>>> for n in indexlist:
... if not outlist or n > outlist[-1] + 5:
... outlist.append(n)
...
>>> outlist
[0, 7, 19, 25, 31]
I suppose your index_list is sorted. Then this will give you only indexes MIN_INDEX_OFFSET apart.
MIN_INDEX_OFFSET = 5;
index_list = [0, 7, 8, 12, 19, 25, 26, 27, 29, 30, 31, 33];
last_accepted = index_list[0];
out_list = [last_accepted];
for index in index_list:
if index-last_accepted > MIN_INDEX_OFFSET:
out_list.append(index);
last_accepted = index;
print(out_list)

how to get the value of multiple maximas in an array in python

I have an array
a =[0, 0, 15, 17, 16, 17, 16, 12, 18, 18]
I am trying to find the element value that has max count. and if there is a tie, I would like all of the elements that have the same max count.
as you can see there are two 0, two 16, two 17, two 18 one 15 and one 12
so i want something that would return
[0, 16, 17, 18] (order not important but I do not want the 15 or the 12)
I was doing np.argmax(np.bincount(a)) but argmax only returns one element (per its documentation) so I only get the 1st one which is 0
I tried
np.argpartition(values, -4)[-4:] that works, but in practice I would not know that there are 4 elements that have the same count number! (maybe I am close here!!! the light bulb just went on !!!)
You can use np.unique to get the counts and an array of the unique elements then pull the elements whose count is equal to the max:
import numpy as np
a = np.array([0, 0, 15, 17, 16, 17, 16, 12, 18, 18])
un, cnt = np.unique(a, return_counts=True)
print(un[cnt == cnt.max()])
[ 0 16 17 18]
un are the unique elements, cnt is the frequency/count of each:
In [11]: a = np.array([0, 0, 15, 17, 16, 17, 16, 12, 18, 18])
In [12]: un, cnt = np.unique(a, return_counts=True)
In [13]: un, cnt
Out[13]: (array([ 0, 12, 15, 16, 17, 18]), array([2, 1, 1, 2, 2, 2]))
cnt == cnt.max() will give us the mask to pull the elements that are equal to the max:
In [14]: cnt == cnt.max()
Out[14]: array([ True, False, False, True, True, True], dtype=bool)
It is a bit fiddly but you can achieve this using Counter and itemgetter:
from collections import Counter
from operator import itemgetter
a =[0, 0, 15, 17, 16, 17, 16, 12, 18, 18]
counter_list = Counter(a).most_common()
max_occurrences = max(counter_list, key=itemgetter(1))[1]
answer = [item[0] for item in counter_list if item[1] == max_occurrences]
print(answer)
Output
[0, 16, 17, 18]
Here is a neat solution:
from collections import Counter
import numpy as np
a = np.array([0, 0, 15, 17, 16, 17, 16, 12, 18, 18])
freq_count = Counter(a)
high = max(freq_count.values())
res = [key for key in freq_count.keys() if freq_count[key]==high]
Output: [0 16 17 18]
Note: Output order not guaranteed

Numpy: find index of the elements within range

I have a numpy array of numbers, for example,
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
I would like to find all the indexes of the elements within a specific range. For instance, if the range is (6, 10), the answer should be (3, 4, 5). Is there a built-in function to do this?
You can use np.where to get indices and np.logical_and to set two conditions:
import numpy as np
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.where(np.logical_and(a>=6, a<=10))
# returns (array([3, 4, 5]),)
As in #deinonychusaur's reply, but even more compact:
In [7]: np.where((a >= 6) & (a <=10))
Out[7]: (array([3, 4, 5]),)
Summary of the answers
For understanding what is the best answer we can do some timing using the different solution.
Unfortunately, the question was not well-posed so there are answers to different questions, here I try to point the answer to the same question. Given the array:
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
The answer should be the indexes of the elements between a certain range, we assume inclusive, in this case, 6 and 10.
answer = (3, 4, 5)
Corresponding to the values 6,9,10.
To test the best answer we can use this code.
import timeit
setup = """
import numpy as np
import numexpr as ne
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
# or test it with an array of the similar size
# a = np.random.rand(100)*23 # change the number to the an estimate of your array size.
# we define the left and right limit
ll = 6
rl = 10
def sorted_slice(a,l,r):
start = np.searchsorted(a, l, 'left')
end = np.searchsorted(a, r, 'right')
return np.arange(start,end)
"""
functions = ['sorted_slice(a,ll,rl)', # works only for sorted values
'np.where(np.logical_and(a>=ll, a<=rl))[0]',
'np.where((a >= ll) & (a <=rl))[0]',
'np.where((a>=ll)*(a<=rl))[0]',
'np.where(np.vectorize(lambda x: ll <= x <= rl)(a))[0]',
'np.argwhere((a>=ll) & (a<=rl)).T[0]', # we traspose for getting a single row
'np.where(ne.evaluate("(ll <= a) & (a <= rl)"))[0]',]
functions2 = [
'a[np.logical_and(a>=ll, a<=rl)]',
'a[(a>=ll) & (a<=rl)]',
'a[(a>=ll)*(a<=rl)]',
'a[np.vectorize(lambda x: ll <= x <= rl)(a)]',
'a[ne.evaluate("(ll <= a) & (a <= rl)")]',
]
rdict = {}
for i in functions:
rdict[i] = timeit.timeit(i,setup=setup,number=1000)
print("%s -> %s s" %(i,rdict[i]))
print("Sorted:")
for w in sorted(rdict, key=rdict.get):
print(w, rdict[w])
Results
The results are reported in the following plot for a small array (on the top the fastest solution) as noted by #EZLearner they may vary depending on the size of the array. sorted slice could be faster for larger arrays, but it requires your array to be sorted, for arrays with over 10 M of entries ne.evaluate could be an option. Is hence always better to perform this test with an array of the same size as yours:
If instead of the indexes you want to extract the values you can perform the tests using functions2 but the results are almost the same.
I thought I would add this because the a in the example you gave is sorted:
import numpy as np
a = [1, 3, 5, 6, 9, 10, 14, 15, 56]
start = np.searchsorted(a, 6, 'left')
end = np.searchsorted(a, 10, 'right')
rng = np.arange(start, end)
rng
# array([3, 4, 5])
a = np.array([1,2,3,4,5,6,7,8,9])
b = a[(a>2) & (a<8)]
Other way is with:
np.vectorize(lambda x: 6 <= x <= 10)(a)
which returns:
array([False, False, False, True, True, True, False, False, False])
It is sometimes useful for masking time series, vectors, etc.
This code snippet returns all the numbers in a numpy array between two values:
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56] )
a[(a>6)*(a<10)]
It works as following:
(a>6) returns a numpy array with True (1) and False (0), so does (a<10). By multiplying these two together you get an array with either a True, if both statements are True (because 1x1 = 1) or False (because 0x0 = 0 and 1x0 = 0).
The part a[...] returns all values of array a where the array between brackets returns a True statement.
Of course you can make this more complicated by saying for instance
...*(1-a<10)
which is similar to an "and Not" statement.
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.argwhere((a>=6) & (a<=10))
Wanted to add numexpr into the mix:
import numpy as np
import numexpr as ne
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.where(ne.evaluate("(6 <= a) & (a <= 10)"))[0]
# array([3, 4, 5], dtype=int64)
Would only make sense for larger arrays with millions... or if you hitting a memory limits.
This may not be the prettiest, but works for any dimension
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
ranges = (0,4), (0,4)
def conditionRange(X : np.ndarray, ranges : list) -> np.ndarray:
idx = set()
for column, r in enumerate(ranges):
tmp = np.where(np.logical_and(X[:, column] >= r[0], X[:, column] <= r[1]))[0]
if idx:
idx = idx & set(tmp)
else:
idx = set(tmp)
idx = np.array(list(idx))
return X[idx, :]
b = conditionRange(a, ranges)
print(b)
s=[52, 33, 70, 39, 57, 59, 7, 2, 46, 69, 11, 74, 58, 60, 63, 43, 75, 92, 65, 19, 1, 79, 22, 38, 26, 3, 66, 88, 9, 15, 28, 44, 67, 87, 21, 49, 85, 32, 89, 77, 47, 93, 35, 12, 73, 76, 50, 45, 5, 29, 97, 94, 95, 56, 48, 71, 54, 55, 51, 23, 84, 80, 62, 30, 13, 34]
dic={}
for i in range(0,len(s),10):
dic[i,i+10]=list(filter(lambda x:((x>=i)&(x<i+10)),s))
print(dic)
for keys,values in dic.items():
print(keys)
print(values)
Output:
(0, 10)
[7, 2, 1, 3, 9, 5]
(20, 30)
[22, 26, 28, 21, 29, 23]
(30, 40)
[33, 39, 38, 32, 35, 30, 34]
(10, 20)
[11, 19, 15, 12, 13]
(40, 50)
[46, 43, 44, 49, 47, 45, 48]
(60, 70)
[69, 60, 63, 65, 66, 67, 62]
(50, 60)
[52, 57, 59, 58, 50, 56, 54, 55, 51]
You can use np.clip() to achieve the same:
a = [1, 3, 5, 6, 9, 10, 14, 15, 56]
np.clip(a,6,10)
However, it holds the values less than and greater than 6 and 10 respectively.

Categories