Related
I need to convert array like this:
[[1527 1369 86 86]
[ 573 590 709 709]
[1417 1000 68 68]
[1361 1194 86 86]]
to like this:
[(726, 1219, 1281, 664),
(1208, 1440, 1283, 1365),
(1006, 1483, 1069, 1421),
(999, 1414, 1062, 1351),]
I tried using convert diretly to tuple but got this:
( array([1527, 1369, 86, 86], dtype=int32),
array([573, 590, 709, 709], dtype=int32),
array([1417, 1000, 68, 68], dtype=int32),
array([1361, 1194, 86, 86], dtype=int32))
(array([701, 899, 671, 671], dtype=int32),)
The array method tolist is a easy and fast way of converting an array to a list. It handles multiple dimensions correctly:
In [92]: arr = np.arange(12).reshape(3,4)
In [93]: arr
Out[93]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [94]: arr.tolist()
Out[94]: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
For most purposes such as list of lists is just as good as a list of tuples, or tuple of tuples. They differ only in mutability.
But if you must have a tuples, a list comprehension does the conversion nicely.
In [95]: [tuple(x) for x in arr.tolist()]
Out[95]: [(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11)]
An alternative [tuple(x) for x in arr] is a bit slower, because it is iterating on the array rather than on a list. It also produces a different result - though you have to examine the type of the tuple elements to see that.
I strongly recommend starting with the tolist method, and doing any list to tuple conversions after.
What about using tuble and map function like this:
import numpy
numpy_arr = numpy.array(((1527, 1369, 86, 86),(573 , 590 , 709, 709)))
converted_list = tuple(map(tuple,numpy_arr)) # as list
converted_arr = map(tuple,numpy_arr) #as array
print(converted_arr)
Here is the following function assuming you do not want the final object to be a numpy object.
def fun(var):
a=[]
for i in var:
a.append(tuple(i))
return a
if you want in one line
def fun(var):
return [tuple(i) for i in var]
If you prefer list comprehensions to map():
a = numpy.random.uniform(0,1,size=(4,4))
a_tuple_list = [tuple(row) for row in a]
if have an array of shape (9,1,3).
array([[[ 6, 12, 108]],
[[122, 112, 38]],
[[ 57, 101, 62]],
[[119, 76, 177]],
[[ 46, 62, 2]],
[[127, 61, 155]],
[[ 5, 6, 151]],
[[ 5, 8, 185]],
[[109, 167, 33]]])
I want to find the argmax index of the third dimension, in this case it would be 185, so index 7.
I guess the solution is linked to reshaping but I can't wrap my head around it. Thanks for any help!
I'm not sure what's tricky about it. But, one way to get the index of the greatest element along the last axis would be by using np.max and np.argmax like:
# find `max` element along last axis
# and get the index using `argmax` where `arr` is your array
In [53]: np.argmax(np.max(arr, axis=2))
Out[53]: 7
Alternatively, as #PaulPanzer suggested in his comments, you could use:
In [63]: np.unravel_index(np.argmax(arr), arr.shape)
Out[63]: (7, 0, 2)
In [64]: arr[(7, 0, 2)]
Out[64]: 185
You may have to do it like this:
data = np.array([[[ 6, 12, 108]],
[[122, 112, 38]],
[[ 57, 101, 62]],
[[119, 76, 177]],
[[ 46, 62, 2]],
[[127, 61, 155]],
[[ 5, 6, 151]],
[[ 5, 8, 185]],
[[109, 167, 33]]])
np.argmax(data[:,0][:,2])
7
Is there a way to do multiple indexing in a numpy array as described below?
arr=np.array([55, 2, 3, 4, 5, 6, 7, 8, 9])
arr[np.arange(0,2):np.arange(5,7)]
output:
IndexError: too many indices for array
Desired output:
array([55,2,3,4,5],[2,3,4,5,6])
This problem might be similar to calculating a moving average over an array (but I want to do it without any function that is provided).
Here's an approach using strides -
start_index = np.arange(0,2)
L = 5 # Interval length
n = arr.strides[0]
strided = np.lib.stride_tricks.as_strided
out = strided(arr[start_index[0]:],shape=(len(start_index),L),strides=(n,n))
Sample run -
In [976]: arr
Out[976]: array([55, 52, 13, 64, 25, 76, 47, 18, 69, 88])
In [977]: start_index
Out[977]: array([2, 3, 4])
In [978]: L = 5
In [979]: out
Out[979]:
array([[13, 64, 25, 76, 47],
[64, 25, 76, 47, 18],
[25, 76, 47, 18, 69]])
I create two matrices
import numpy as np
arrA = np.zeros((9000,3))
arrB = np.zerros((9000,6))
I want to concatenate pieces of those matrices.
But when I try to do:
arrC = np.hstack((arrA, arrB[:,1]))
I get an error:
ValueError: all the input arrays must have same number of dimensions
I guess it's because np.shape(arrB[:,1]) is equal (9000,) instead of (9000,1), but I cannot figure out how to resolve it.
Could you please comment on this issue?
You could preserve dimensions by passing a list of indices, not an index:
>>> arrB[:,1].shape
(9000,)
>>> arrB[:,[1]].shape
(9000, 1)
>>> out = np.hstack([arrA, arrB[:,[1]]])
>>> out.shape
(9000, 4)
This is easier to see visually.
Assume:
>>> arrA=np.arange(9000*3).reshape(9000,3)
>>> arrA
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
...,
[26991, 26992, 26993],
[26994, 26995, 26996],
[26997, 26998, 26999]])
>>> arrB=np.arange(9000*6).reshape(9000,6)
>>> arrB
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[ 12, 13, 14, 15, 16, 17],
...,
[53982, 53983, 53984, 53985, 53986, 53987],
[53988, 53989, 53990, 53991, 53992, 53993],
[53994, 53995, 53996, 53997, 53998, 53999]])
If you take a slice of arrB, you are producing a series that looks more like a row:
>>> arrB[:,1]
array([ 1, 7, 13, ..., 53983, 53989, 53995])
What you need is a column the same shape as a column to add to arrA:
>>> arrB[:,[1]]
array([[ 1],
[ 7],
[ 13],
...,
[53983],
[53989],
[53995]])
Then hstack works as expected:
>>> arrC=np.hstack((arrA, arrB[:,[1]]))
>>> arrC
array([[ 0, 1, 2, 1],
[ 3, 4, 5, 7],
[ 6, 7, 8, 13],
...,
[26991, 26992, 26993, 53983],
[26994, 26995, 26996, 53989],
[26997, 26998, 26999, 53995]])
An alternate form is to specify -1 in one dimension and the number of rows or cols desired as the other in .reshape():
>>> arrB[:,1].reshape(-1,1) # one col
array([[ 1],
[ 7],
[ 13],
...,
[53983],
[53989],
[53995]])
>>> arrB[:,1].reshape(-1,6) # 6 cols
array([[ 1, 7, 13, 19, 25, 31],
[ 37, 43, 49, 55, 61, 67],
[ 73, 79, 85, 91, 97, 103],
...,
[53893, 53899, 53905, 53911, 53917, 53923],
[53929, 53935, 53941, 53947, 53953, 53959],
[53965, 53971, 53977, 53983, 53989, 53995]])
>>> arrB[:,1].reshape(2,-1) # 2 rows
array([[ 1, 7, 13, ..., 26983, 26989, 26995],
[27001, 27007, 27013, ..., 53983, 53989, 53995]])
There is more on array shaping and stacking here
I would try something like this:
np.vstack((arrA.transpose(), arrB[:,1])).transpose()
There several ways of making your selection from arrB a (9000,1) array:
np.hstack((arrA,arrB[:,[1]]))
np.hstack((arrA,arrB[:,1][:,None]))
np.hstack((arrA,arrB[:,1].reshape(9000,1)))
np.hstack((arrA,arrB[:,1].reshape(-1,1)))
One uses the concept of indexing with an array or list, the next adds a new axis (e.g. np.newaxis), the third uses reshape. These are all basic numpy array manipulation tasks.
I have a numpy array of numbers, for example,
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
I would like to find all the indexes of the elements within a specific range. For instance, if the range is (6, 10), the answer should be (3, 4, 5). Is there a built-in function to do this?
You can use np.where to get indices and np.logical_and to set two conditions:
import numpy as np
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.where(np.logical_and(a>=6, a<=10))
# returns (array([3, 4, 5]),)
As in #deinonychusaur's reply, but even more compact:
In [7]: np.where((a >= 6) & (a <=10))
Out[7]: (array([3, 4, 5]),)
Summary of the answers
For understanding what is the best answer we can do some timing using the different solution.
Unfortunately, the question was not well-posed so there are answers to different questions, here I try to point the answer to the same question. Given the array:
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
The answer should be the indexes of the elements between a certain range, we assume inclusive, in this case, 6 and 10.
answer = (3, 4, 5)
Corresponding to the values 6,9,10.
To test the best answer we can use this code.
import timeit
setup = """
import numpy as np
import numexpr as ne
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
# or test it with an array of the similar size
# a = np.random.rand(100)*23 # change the number to the an estimate of your array size.
# we define the left and right limit
ll = 6
rl = 10
def sorted_slice(a,l,r):
start = np.searchsorted(a, l, 'left')
end = np.searchsorted(a, r, 'right')
return np.arange(start,end)
"""
functions = ['sorted_slice(a,ll,rl)', # works only for sorted values
'np.where(np.logical_and(a>=ll, a<=rl))[0]',
'np.where((a >= ll) & (a <=rl))[0]',
'np.where((a>=ll)*(a<=rl))[0]',
'np.where(np.vectorize(lambda x: ll <= x <= rl)(a))[0]',
'np.argwhere((a>=ll) & (a<=rl)).T[0]', # we traspose for getting a single row
'np.where(ne.evaluate("(ll <= a) & (a <= rl)"))[0]',]
functions2 = [
'a[np.logical_and(a>=ll, a<=rl)]',
'a[(a>=ll) & (a<=rl)]',
'a[(a>=ll)*(a<=rl)]',
'a[np.vectorize(lambda x: ll <= x <= rl)(a)]',
'a[ne.evaluate("(ll <= a) & (a <= rl)")]',
]
rdict = {}
for i in functions:
rdict[i] = timeit.timeit(i,setup=setup,number=1000)
print("%s -> %s s" %(i,rdict[i]))
print("Sorted:")
for w in sorted(rdict, key=rdict.get):
print(w, rdict[w])
Results
The results are reported in the following plot for a small array (on the top the fastest solution) as noted by #EZLearner they may vary depending on the size of the array. sorted slice could be faster for larger arrays, but it requires your array to be sorted, for arrays with over 10 M of entries ne.evaluate could be an option. Is hence always better to perform this test with an array of the same size as yours:
If instead of the indexes you want to extract the values you can perform the tests using functions2 but the results are almost the same.
I thought I would add this because the a in the example you gave is sorted:
import numpy as np
a = [1, 3, 5, 6, 9, 10, 14, 15, 56]
start = np.searchsorted(a, 6, 'left')
end = np.searchsorted(a, 10, 'right')
rng = np.arange(start, end)
rng
# array([3, 4, 5])
a = np.array([1,2,3,4,5,6,7,8,9])
b = a[(a>2) & (a<8)]
Other way is with:
np.vectorize(lambda x: 6 <= x <= 10)(a)
which returns:
array([False, False, False, True, True, True, False, False, False])
It is sometimes useful for masking time series, vectors, etc.
This code snippet returns all the numbers in a numpy array between two values:
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56] )
a[(a>6)*(a<10)]
It works as following:
(a>6) returns a numpy array with True (1) and False (0), so does (a<10). By multiplying these two together you get an array with either a True, if both statements are True (because 1x1 = 1) or False (because 0x0 = 0 and 1x0 = 0).
The part a[...] returns all values of array a where the array between brackets returns a True statement.
Of course you can make this more complicated by saying for instance
...*(1-a<10)
which is similar to an "and Not" statement.
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.argwhere((a>=6) & (a<=10))
Wanted to add numexpr into the mix:
import numpy as np
import numexpr as ne
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.where(ne.evaluate("(6 <= a) & (a <= 10)"))[0]
# array([3, 4, 5], dtype=int64)
Would only make sense for larger arrays with millions... or if you hitting a memory limits.
This may not be the prettiest, but works for any dimension
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
ranges = (0,4), (0,4)
def conditionRange(X : np.ndarray, ranges : list) -> np.ndarray:
idx = set()
for column, r in enumerate(ranges):
tmp = np.where(np.logical_and(X[:, column] >= r[0], X[:, column] <= r[1]))[0]
if idx:
idx = idx & set(tmp)
else:
idx = set(tmp)
idx = np.array(list(idx))
return X[idx, :]
b = conditionRange(a, ranges)
print(b)
s=[52, 33, 70, 39, 57, 59, 7, 2, 46, 69, 11, 74, 58, 60, 63, 43, 75, 92, 65, 19, 1, 79, 22, 38, 26, 3, 66, 88, 9, 15, 28, 44, 67, 87, 21, 49, 85, 32, 89, 77, 47, 93, 35, 12, 73, 76, 50, 45, 5, 29, 97, 94, 95, 56, 48, 71, 54, 55, 51, 23, 84, 80, 62, 30, 13, 34]
dic={}
for i in range(0,len(s),10):
dic[i,i+10]=list(filter(lambda x:((x>=i)&(x<i+10)),s))
print(dic)
for keys,values in dic.items():
print(keys)
print(values)
Output:
(0, 10)
[7, 2, 1, 3, 9, 5]
(20, 30)
[22, 26, 28, 21, 29, 23]
(30, 40)
[33, 39, 38, 32, 35, 30, 34]
(10, 20)
[11, 19, 15, 12, 13]
(40, 50)
[46, 43, 44, 49, 47, 45, 48]
(60, 70)
[69, 60, 63, 65, 66, 67, 62]
(50, 60)
[52, 57, 59, 58, 50, 56, 54, 55, 51]
You can use np.clip() to achieve the same:
a = [1, 3, 5, 6, 9, 10, 14, 15, 56]
np.clip(a,6,10)
However, it holds the values less than and greater than 6 and 10 respectively.