Slicing Numpy Array by Array of Indices - python

I am attempting to slice a Numpy array by an array of indices. For example,
array = [10,15,20,25,32,66]
indices = [1,4,5]
The optimal output would be
[[10][15,20,25,32][66]]
I have tried using
array[indices]
but this just produces the single values of each individual index rather than all those in between.

Consider using np.split, like so
array = np.asarray([10, 15, 20, 25, 32, 66])
indices = [1, 5]
print(np.split(array, indices))
Produces
[array([10]), array([15, 20, 25, 32]), array([66])]
As split uses breakpoints only, where the index indicates the points at which to break blocks. Hence, no need to indicate 1:4, this is implicitly defined by breakpoints 1, 5.

According to your comment this generator produces the desired result:
def slice_multi(array, indices):
current = 0
for index in indices:
yield array[current:index]
current = index
array = [10,15,20,25,32,66]
indices = [1,4,5]
list(slice(array, indices)) # [[10], [15, 20, 25], [32]]

Related

How to create multiple random arrays from a multi-values dict/ multi-d array in Python?

Suppose I have the following data
dictData = {} or []
dictData[0] = [1,2,3]
dictData[1] = [10,11,12,13]
dictData[2] = [21,22]
With these data, I want to generate unique 1d arrays that contain randomly selected elements from each of the different arrays. The amount of arrays to be generated is the number of elements in the largest array. All of the elements in the array must be displayed at least once. Elements can be repeated, if all the elements in an array are already used once. The positions from each of the arrays are preserved (eg. a value taken from array 2 is placed at index 2)
A possible outcome is as shown below
possibleOutput = [1,10,21],[1,11,22],[3,12,21],[2,13,21]
I had previously implemented a naive method using a "for" loop starting with the biggest array and just picking one number from each array until exhausted. Is there a more efficient(maybe numpy) way to achieve the same results?
You can try:
def nth_product(num: int) -> list:
'''
Calculate n-th element from itertools.product(iterables).
Inspired from, but slightly improved for this case than:
https://stackoverflow.com/a/53626712/5431791
'''
res = []
for lst, len_lst in zip(iterables, lens):
res.append(lst[num % len_lst])
num //= len_lst
return res
iterables = dictData.values()
lens = list(map(len, iterables))
indices = np.random.choice(np.prod(lens), size=4, replace=False)
new_arr = list(map(nth_product, indices))
print(new_arr)
Output:
[[1, 12, 21], [3, 13, 21], [2, 13, 22], [2, 10, 21]]
Should be performant.
To make sure all values from the longest list appear:
def nth_product(idx: int, num: int) -> list:
'''
Calculate n-th element from itertools.product(iterables).
Inspired from, but slightly improved for this case than:
https://stackoverflow.com/a/53626712/5431791
'''
res = []
for lst, len_lst in zip(iterables, lens):
res.append(lst[num % len_lst]) if len_lst!=max_len else res.append(lst[idx])
num //= len_lst
return res
iterables = dictData.values()
lens = list(map(len, iterables))
max_len = max(lens)
indices = enumerate(np.random.choice(np.prod(lens), size=max_len, replace=False))
new_arr = list(map(nth_product, *zip(*indices)))
print(new_arr)
Output:
[[2, 10, 21], [3, 11, 22], [1, 12, 22], [3, 13, 22]]

Numpy array: How to extract whole rows based on values in a column

I am looking for the equivalent of an SQL 'where' query over a table. I have done a lot of searching and I'm either using the wrong search terms or not understanding the answers. Probably both.
So a table is a 2 dimensional numpy array.
my_array = np.array([[32, 55, 2],
[15, 2, 60],
[76, 90, 2],
[ 6, 65, 2]])
I wish to 'end up' with a numpy array of the same shape where eg the second column values are >= 55 AND <= 65.
So my desired numpy array would be...
desired_array([[32, 55, 2],
[ 6, 65, 2]])
Also, does 'desired_array' order match 'my_array' order?
Just make mask and use it.
mask = np.logical_and(my_array[:, 1] >= 55, my_array[:, 1] <= 65)
desired_array = my_array[mask]
desired_array
The general Numpy approach to filtering an array is to create a "mask" that matches the desired part of the array, and then use it to index in.
>>> my_array[((55 <= my_array) & (my_array <= 65))[:, 1]]
array([[32, 55, 2],
[ 6, 65, 2]])
Breaking it down:
# Comparing an array to a scalar gives you an array of all the results of
# individual element comparisons (this is called "broadcasting").
# So we take two such boolean arrays, resulting from comparing values to the
# two thresholds, and combine them together.
mask = (55 <= my_array) & (my_array <= 65)
# We only want to care about the [1] element in the second array dimension,
# so we take a 1-dimensional slice of that mask.
desired_rows = mask[:, 1]
# Finally we use those values to select the desired rows.
desired_array = my_array[desired_rows]
(The first two operations could instead be swapped - that way I imagine is more efficient, but it wouldn't matter for something this small. This way is the way that occurred to me first.)
You dont mean the same shape. You probably meant the same column size. The shape of my_array is (4, 3) and the shape of your desired array is (2, 3). I would recommend masking, too.
You can use a filter statement with a lambda that checks each row for the desired condition to get the desired result:
my_array = np.array([[32, 55, 2],
[15, 2, 60],
[76, 90, 2],
[ 6, 65, 2]])
desired_array = np.array([l for l in filter(lambda x: x[1] >= 55 and x[1] <= 65, my_array)])
Upon running this, we get:
>>> desired_array
array([[32, 55, 2],
[ 6, 65, 2]])

Select N evenly spaced out elements in array, including first and last

I have an array of arbitrary length, and I want to select N elements of it, evenly spaced out (approximately, as N may be even, array length may be prime, etc) that includes the very first arr[0] element and the very last arr[len-1] element.
Example:
>>> arr = np.arange(17)
>>> arr
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
Then I want to make a function like the following to grab numElems evenly spaced out within the array, which must include the first and last element:
GetSpacedElements(numElems = 4)
>>> returns 0, 5, 11, 16
Does this make sense?
I've tried arr[0:len:numElems] (i.e. using the array start:stop:skip notation) and some slight variations, but I'm not getting what I'm looking for here:
>>> arr[0:len:numElems]
array([ 0, 4, 8, 12, 16])
or
>>> arr[0:len:numElems+1]
array([ 0, 5, 10, 15])
I don't care exactly what the middle elements are, as long as they're spaced evenly apart, off by an index of 1 let's say. But getting the right number of elements, including the index zero and last index, are critical.
To get a list of evenly spaced indices, use np.linspace:
idx = np.round(np.linspace(0, len(arr) - 1, numElems)).astype(int)
Next, index back into arr to get the corresponding values:
arr[idx]
Always use rounding before casting to integers. Internally, linspace calls astype when the dtype argument is provided. Therefore, this method is NOT equivalent to:
# this simply truncates the non-integer part
idx = np.linspace(0, len(array) - 1, numElems).astype(int)
idx = np.linspace(0, len(arr) - 1, numElems, dtype='int')
Your GetSpacedElements() function should also take in the array to avoid unfortunate side effects elsewhere in code. That said, the function would need to look like this:
import numpy as np
def GetSpacedElements(array, numElems = 4):
out = array[np.round(np.linspace(0, len(array)-1, numElems)).astype(int)]
return out
arr = np.arange(17)
print(array)
spacedArray = GetSpacedElements(arr, 4)
print (spacedArray)
If you want to know more about finding indices that match values you seek, also have a look at numpy.argmin and numpy.where. Implementing the former:
import numpy as np
test = np.arange(17)
def nearest_index(array, value):
return (np.abs(np.asarray(array) - value)).argmin()
def evenly_spaced_indices(array, steps):
return [nearest_index(array, value) for value in np.linspace(np.min(array), np.max(array), steps)]
print(evenly_spaced_indices(test,4))
You should keep in mind that this is an unnecessary amount of function calls for the initial question you asked as switftly demonstrated by coldspeed. np.round intuitively rounds to the closest matching integer serving as index, implementing a similar process but optimised in C++. If you are interested in the indices too, you could have your function simply return both:
import numpy as np
def GetSpacedElements(array, numElems=4, returnIndices=False):
indices = np.round(np.linspace(0, len(arr) - 1, numElems)).astype(int)
values = array[indices]
return (values, indices) if returnIndices else (values)
arr = np.arange(17) + 42
print(arr)
print(GetSpacedElements(arr, 4)) # values only
print(GetSpacedElements(arr, 4, returnIndices=True)[0]) # values only
print(GetSpacedElements(arr, 4, returnIndices=True)[1]) # indices only
To get N evenly spaced elements from list 'x':
x[::int(np.ceil( len(x) / N ))]

The sum of indexes within nested lists. Adding elements from separate lists

Just doing an exercise question as follows: Take a ' number square' as a parameter and returns a list of the column sums.
e.g
square = [
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]
]
Result should be =
[28, 32, 36, 40]
What I have so far:
def column_sums(square):
col_list = []
for i in range(len(square)):
col_sum = 0
for j in range(len(square[i])):
col_sum += square[j]
col_list.append(col_sum)
return col_list
So I think I have all the parameters laid out, with all the indexing correct, but I am stuck because I get an error
builtins.TypeError: unsupported operand type(s) for +=: 'int' and 'list'
Which shouldn't happen if I am referencing the element within the list I thought.
Also it's probably easier to use the SUM command here, but couldn't figure out a way to use it cleanly.
Adding elements from separate lists via indexing.
A more simpler solution would be to transpose the list of lists with zip and take the sum across each new sublist:
def column_sums(square):
return [sum(i) for i in zip(*square)]
zip(*square) unpacks the list of lists and returns all items from each column (zipped as tuples) in successions.
>>> column_sums(square)
[28, 32, 36, 40]
You could also use numpy.sum to do this by setting the axis parameter to 0, meaning sum along rows (i.e. sum on each column):
>>> import numpy as np
>>> square = np.array(square)
>>> square.sum(axis=0)
array([28, 32, 36, 40])
It should be col_sum += square[j][i] since you want to access the element at position j of the list at position i.
You should change the line of summing to:
col_sum += square[j][i]
because square[j] is the j'th row (list), but you need the current column, which is the i'th element in that row.
But here is my solution using sum and list comprehensions:
def column_sums2(square):
def col(i):
return [row[i] for row in square]
return [sum(col(i)) for i in range(len(square))]

python: extracting one slice of a multidimensional array given the dimension index

I know how to take x[:,:,:,:,j,:] (which takes the jth slice of dimension 4).
Is there a way to do the same thing if the dimension is known at runtime, and is not a known constant?
One option to do so is to construct the slicing programatically:
slicing = (slice(None),) * 4 + (j,) + (slice(None),)
An alternative is to use numpy.take() or ndarray.take():
>>> a = numpy.array([[1, 2], [3, 4]])
>>> a.take((1,), axis=0)
array([[3, 4]])
>>> a.take((1,), axis=1)
array([[2],
[4]])
You can use the slice function and call it with the appropriate variable list during runtime as follows:
# Store the variables that represent the slice in a list/tuple
# Make a slice with the unzipped tuple using the slice() command
# Use the slice on your array
Example:
>>> from numpy import *
>>> a = (1, 2, 3)
>>> b = arange(27).reshape(3, 3, 3)
>>> s = slice(*a)
>>> b[s]
array([[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]])
This is the same as:
>>> b[1:2:3]
array([[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]])
Finally, the equivalent of not specifying anything between 2 : in the usual notation is to put None in those places in the tuple you create.
If everything is decided at runtime, you could do:
# Define the data (this could be measured at runtime)
data_shape = (3, 5, 7, 11, 13)
print('data_shape = {}'.format(data_shape))
# Pick which index to slice from which dimension (could also be decided at runtime)
slice_dim = len(data_shape)/2
slice_index = data_shape[slice_dim]/2
print('slice_dim = {} (data_shape[{}] = {}), slice_index = {}'.format(slice_dim, slice_dim, data_shape[slice_dim], slice_index))
# Make a data set for testing
data = arange(product(data_shape)).reshape(*data_shape)
# Slice the data
s = [slice_index if a == slice_dim else slice(None) for a in range(len(data_shape))]
d = data[s]
print('shape(data[s]) = {}, s = {}'.format(shape(d), s))
Although this is longer than ndarray.take(), it will work if slice_index = None, as in the case where the array happens to have so few dimensions that you don't actually want to slice it (but you don't know you don't want to slice it ahead of time).
You can also use ellipsis to replace the repeating colons.
See an answer to How do you use the ellipsis slicing syntax in Python? for an example.

Categories