In normal situations a list with integers can be used as indices for an array. Let's say
arr = np.arange(10)*2
l = [1,2,5]
arr[l] # this gives np.array([2,4,10])
Instead of one list of indices, I have several, with different lenghts, an I want to get arr[l] for each sublist in my list of indices. How can I achieve this without an sequential approach (using a for), or better, using less time than using a for using numpy?
For example:
lists = [[1,2,5], [5,6], [2,8,4]]
arr = np.arange(10)*2
result = np.array([[2,4,10], [10, 12], [4,16,8]]) #this is after the procedure I want to get
It depends on the size of your lists whether this makes sense. One option is to concatenate them all, do the slicing and then redistribute into lists.
lists = [[1,2,5], [5,6], [2,8,4]]
arr = np.arange(10)*2
extracted = arr[np.concatenate(lists)]
indices = [0] + list(np.cumsum(map(len, lists)))
result = [extracted[indices[i]:indices[i + 1]] for i in range(len(lists))]
Or, taking into account #unutbu's comment:
result = np.split(extracted, indices[1:-1])
Related
I'm trying to split a sorted integer list into two lists. The first list would have all ints under n and the second all ints over n. Note that n does not have to be in the original list.
I can easily do this with:
under = []
over = []
for x in sorted_list:
if x < n:
under.append(x)
else
over.append(x)
But it just seems like it should be possible to do this in a more elegant way knowing that the list is sorted. takewhile and dropwhile from itertools sound like the solution but then I would be iterating over the list twice.
Functionally, the best I can do is this:
i = 0
while sorted_list[i] < n:
i += 1
under = sorted_list[:i]
over = sorted_list[i:]
But I'm not even sure if it is actually better than just iterating over the list twice and it is definitely not more elegant.
I guess I'm looking for a way to get the list returned by takewhile and the remaining list, perhaps, in a pair.
The correct solution here is the bisect module. Use bisect.bisect to find the index to the right of n (or the index where it would be inserted if it's missing), then slice around that point:
import bisect # At top of file
split_idx = bisect.bisect(sorted_list, n)
under = sorted_list[:split_idx]
over = sorted_list[split_idx:]
While any solution is going to be O(n) (you do have to copy the elements after all), the comparisons are typically more expensive than simple pointer copies (and associated reference count updates), and bisect reduces the comparison work on a sorted list to O(log n), so this will typically (on larger inputs) beat simply iterating and copying element by element until you find the split point.
Use bisect.bisect_left (which finds the leftmost index of n) instead of bisect.bisect (equivalent to bisect.bisect_right) if you want n to end up in over instead of under.
I would use following approach, where I find the index and use slicing to create under and over:
sorted_list = [1,2,4,5,6,7,8]
n=6
idx = sorted_list.index(n)
under = sorted_list[:idx]
over = sorted_list[idx:]
print(under)
print(over)
Output (same as with your code):
[1, 2, 4, 5]
[6, 7, 8]
Edit: As I understood the question wrong here is an adapted solution to find the nearest index:
import numpy as np
sorted_list = [1,2,4,5,6,7,8]
n=3
idx = np.searchsorted(sorted_list, n)
under = sorted_list[:idx]
over = sorted_list[idx:]
print(under)
print(over)
Output:
[1, 2]
[4, 5, 6, 7, 8]
Example of what I want to do:
import numpy as np
values = np.array([7, 7, 5, 2, 3, 9])
indices = np.array([
np.array([3,5]),
np.array([4]),
np.array([1,2,3])
])
>>> values[indices]
array([
array([2,9]),
array([3]),
array([7,5,2]),
])
Is it possible to achieve this using vectorization?
Right now I'm doing it with a for loop, but it can get slow.
Thanks!
We could concatenate the indices, index into values with those and finally split back -
idx = np.concatenate(indices)
all_out = values[idx]
lens = list(map(len,indices))
ssidx = np.r_[0,lens].cumsum()
out = [all_out[i:j] for (i,j) in zip(ssidx[:-1],ssidx[1:])]
For completeness, here's the straight-forward indexing based version -
[values[i] for i in indices]
So, with the proposed method we are making use of slicing and hence reducing per-iteration workload. As such, alongwith the step to get idx that needs concatenation of all indices in the proposed one, it makes sense for the case with small indexing arrays in indices.
I want to create a 2D array containing tuples or lists which requires a particular order.
Using itertools.product I am capable of creating the required permutations:
import itertools
import numpy as np
elements = 2
n = 3
temp = []
for tuples in itertools.product(np.arange(elements,-1,-1), repeat=n):
if sum(tuples) == elements:
temp.append(tuples)
print temp
This will print:
Out[1277]:
array([[2,0,0],
[1,1,0],
[1,0,1],
[0,2,0],
[0,1,1],
[0,0,2]])
The array should then be created to yield:
array = [[(2,0,0),(1,1,0),(0,2,0)],
[(1,0,1),(0,1,1),(0,0,0)],
[(0,0,2),(0,0,0),(0,0,0)]]
and is subsequently used to compute a dot product:
array2 = [1,5,10]
np.dot(array, array2)
Out[1278]:
array([2,6,10,11,15,0,20,0,0])
However, itertools does not yield the order I am looking for.
Therefore, I am using argsort and basically 1D arrays in the end:
array = itertools.product(np.arange(elements,-1,-1), repeat=n)
sortedArray = array[array[:,1].argsort()]
print sortedArray
Out[1279]:
array([[2,0,0],
[1,0,1],
[0,0,2],
[1,1,0],
[0,1,1],
[0,2,0]])
result = np.dot(sortedList, array2)
This works fine in combination with np.pad to restore the original size (3x3 = 9):
np.pad(result, (0, array.size - result.size), "constant")
Out[1280]:
array([2,6,10,11,15,20,0,0,0])
However, the order is not retained.
The reason for doing this, is a second reference array that uses the same structure as the array above, which can be raveled:
reference = [[foo,bar,baz],
[bar,bar,0],
[foo, 0, 0]]
np.ravel(reference)
Out[1281]:
array([foo,bar,baz,bar,bar,0,foo,0,0])
I am looking for a solution that does not need the work-around.
My array looks like this:
a = ([1,2],[2,3],[4,5],[3,8])
I did the following to delete odd indexes :
a = [v for i, v in enumerate(a) if i % 2 == 0]
but it dives me now two different arrays instead of one two dimensional:
a= [array([1, 2]), array([4, 5])]
How can I keep the same format as the beginning? thank you!
That is as simple as
a[::2]
which yields the lines with even index.
Use numpy array indexing, not comprehensions:
c = a[list(range(0,len(a),2)),:]
If you define c as the output of a list comprehension, it will return a list of one-dimensional numpy arrays. Instead, using the proper indexing maintains the result a numpy array.
Note than instead of "deleting" the odd indices, what we do is specify what to keep: take all lines with an even index (the list(range(0,len(a),2)) part) and for each line take all elements (the : part)
Say that I have 4 numpy arrays
[1,2,3]
[2,3,1]
[3,2,1]
[1,3,2]
In this case, I've determined [1,2,3] is the "minimum array" for my purposes, as it is one of two arrays with lowest value at index 0, and of those two arrays it has the the lowest index 1. If there were more arrays with similar values, I would need to compare the next index values, and so on.
How can I extract the array [1,2,3] in that same order from the pile?
How can I extend that to x arrays of size n?
Thanks
Using the python non-numpy .sort() or sorted() on a list of lists (not numpy arrays) automatically does this e.g.
a = [[1,2,3],[2,3,1],[3,2,1],[1,3,2]]
a.sort()
gives
[[1,2,3],[1,3,2],[2,3,1],[3,2,1]]
The numpy sort seems to only sort the subarrays recursively so it seems the best way would be to convert it to a python list first. Assuming you have an array of arrays you want to pick the minimum of you could get the minimum as
sorted(a.tolist())[0]
As someone pointed out you could also do min(a.tolist()) which uses the same type of comparisons as sort, and would be faster for large arrays (linear vs n log n asymptotic run time).
Here's an idea using numpy:
import numpy
a = numpy.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
col = 0
while a.shape[0] > 1:
b = numpy.argmin(a[:,col:], axis=1)
a = a[b == numpy.min(b)]
col += 1
print a
This checks column by column until only one row is left.
numpy's lexsort is close to what you want. It sorts on the last key first, but that's easy to get around:
>>> a = np.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
>>> order = np.lexsort(a[:, ::-1].T)
>>> order
array([0, 3, 1, 2])
>>> a[order]
array([[1, 2, 3],
[1, 3, 2],
[2, 3, 1],
[3, 2, 1]])