How would you reshuffle this array efficiently? - python

I have an array arr_val, which stores values of a certain function at large size of locations (for illustration let's just take a small one 4 locations). Now, let's say that I also have another array loc_array which stores the location of the function, and assume that location is again the same number 4. However, location array is multidimensional array such that each location index has the same 4 sub-location index, and each sub-location index is a pair coordinates. To clearly illustrate:
arr_val = np.array([1, 2, 3, 4])
loc_array = np.array([[[1,1],[2,3],[3,1],[3,2]],[[1,2],[2,4],[3,4],[4,1]],
[[2,1],[1,4],[1,3],[3,3]],[[4,2],[4,3],[2,2],[4,4]]])
The meaning of the above two arrays would be value of some parameter of interest at, for example locations [1,1],[2,3],[3,1],[3,2] is 1, and so on. However, I am interested in re-expressing the same thing above in a different form, which is instead of having random points, I would like to have coordinates in the following tractable form
coord = [[[1,1],[1,2],[1,3],[1,4]],[[2,1],[2,2],[2,3],[2,4]],[[3,1],[3,2],
[3,3],[3,4]],[[4,1],[4,2],[4,3],[4,4]]]
and the values at respective coordinates given as
val = [[1, 2, 3, 3],[3, 4, 1, 2],[1, 1, 3, 2], [2, 4, 4, 4]]
What would be a very efficient way to achieve the above for large numpy arrays?

You can use lexsort like so:
>>> order = np.lexsort(loc_array.reshape(-1, 2).T[::-1])
>>> arr_val.repeat(4)[order].reshape(4, 4)
array([[1, 2, 3, 3],
[3, 4, 1, 2],
[1, 1, 3, 2],
[2, 4, 4, 4]])
If you know for sure that loc_array is a permutation of all possible locations then you can avoid the sort:
>>> out = np.empty((4, 4), arr_val.dtype)
>>> out.ravel()[np.ravel_multi_index((loc_array-1).reshape(-1, 2).T, (4, 4))] = arr_val.repeat(4)
>>> out
array([[1, 2, 3, 3],
[3, 4, 1, 2],
[1, 1, 3, 2],
[2, 4, 4, 4]])

It could not be the answer what you want, but it works anyway.
val = [[1, 2, 3, 3],[3, 4, 1, 2],[1, 1, 3, 2], [2, 4, 4, 4]]
temp= ""
int_list = []
for element in val:
temp_int = temp.join(map(str, element ))
int_list.append(int(temp_int))
int_list.sort()
print(int_list)
## result ##
[1132, 1233, 2444, 3412]
Change each element array into int and construct int_list
Sort int_list
Construct 2D np.array from int_list
I skipped last parts. You may find the way on web.

Related

NumPy - Expand and Repeat

Is there a way to "expand" an array and repeat the last element to fill the expansion?
Another post talks about expansion and padding with 0 but I wish to repeat the last value as the pad.
Say I have an array:
[[1, 2],
[3, 4],
[0, 0]]
And I need to insert [5, 6, 6] to replace the [0, 0], obviously NumPy wouldnt allow this. But can I reshape/expand to:
[[1, 2, 2],
[3, 4, 4],
[5, 6, 6]]
I'm reading through a file where the number of values may vary in length, but I need the array to be of the same shape. One way to do this is read through the file first and find the maximum length, then read it again an populate, but the file is 10GB+ so I would prefer to do it on a single pass by "expanding" and backfilling with repeats.
Looks like what you require is numpy.pad using the edge mode. From the doc:
‘edge’
Pads with the edge values of array.
Example code:
>>> ar = np.array([[1,2], [4,5]])
>>> ar
array([[1, 2],
[4, 5]])
>>> np.pad(ar, [(0, 0), (0, 4)], mode="edge")
array([[1, 2, 2, 2, 2, 2],
[4, 5, 5, 5, 5, 5]])
The first (0, 0) tuple specify no padding on the first axis, while the second basically does "add 0 padding to the left and 4 to the right"

how to roll two arrays of diffeent dimesnions into one dimensional array in python

I have two arrays (a,b) of different mXn dimensions
I need to know that how can I roll these two arrays into a single one dimensional array
I used np.flatten() for both a,b array and then rolled them into a single array but what i get is an array containg two one dimensional array(a,b)
a = np.array([[1,2,3,4],[3,4,5,6],[4,5,6,7]]) #3x4 array
b = np.array([ [1,2],[2,3],[3,4],[4,5],[5,6]]) #5x2 array
result = [a.flatten(),b.flatten()]
print(result)
[array([1, 2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7]), array([1, 2, 2, 3, ... 5, 6])]
In matlab , I would do it like this :
res = [a(:);b(:)]
Also, how can I retrieve a and b back from the result?
Use ravel + concatenate:
>>> np.concatenate((a.ravel(), b.ravel()))
array([1, 2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6])
ravel returns a 1D view of the arrays, and is a cheap operation. concatenate joins the views together, returning a new array.
As an aside, if you want to be able to retrieve these arrays back, you'll need to store their shapes in some variable.
i = a.shape
j = b.shape
res = np.concatenate((a.ravel(), b.ravel()))
Later, to retrieve a and b from res,
a = res[:np.prod(i)].reshape(i)
b = res[np.prod(i):].reshape(j)
a
array([[1, 2, 3, 4],
[3, 4, 5, 6],
[4, 5, 6, 7]])
b
array([[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6]])
How about changing the middle line to:
result = [a.flatten(),b.flatten()].flatten()
Or even more simply (if you know there's always exactly 2 arrays)
result = a.flatten() + b.flatten()

Python: How to create a list of integers depending on a specific distribution

Is there a way in python/numpy/scipy to create dynamically a list of integers in a specific range, which can vary and in which the numbers are ordererd depending on a distribtuin, like nomral(gaussian), exponential, linear. I imagine something
like for range 3:
[1,2,3]
[2,1,2]
[1,2,1]
[3,2,1]
for range 4:
[1,2,3,4]
[2,1,1,2]
[1,2,2,1]
[4,3,2,1]
for range 5:
[1,2,3,4,5]
[2,1,0,1,2]
[1,2,3,2,1]
[5,4,3,2,1]
We could use a bit of trickery using np.minimum to generate the symmetrical version in third row. The second row is just a complement of the third row subtracted from 3. The first and last rows are just ranges starting from 1 till n and flipped version of it respectively.
Thus, we would have one approach after row-stacking those rows to have a 2D array, like so -
def ranged_arr(n):
r = np.arange(n)+1
row3 = np.minimum(r,r[::-1])
return np.c_[r, 3-row3, row3, r[::-1]].T
We could also use np.row_stack to do the stacking -
np.row_stack((r, 3-row3, row3, r[::-1]))
Sample runs -
In [106]: ranged_arr(n=3)
Out[106]:
array([[1, 2, 3],
[2, 1, 2],
[1, 2, 1],
[3, 2, 1]])
In [107]: ranged_arr(n=4)
Out[107]:
array([[1, 2, 3, 4],
[2, 1, 1, 2],
[1, 2, 2, 1],
[4, 3, 2, 1]])
In [108]: ranged_arr(n=5)
Out[108]:
array([[1, 2, 3, 4, 5],
[2, 1, 0, 1, 2],
[1, 2, 3, 2, 1],
[5, 4, 3, 2, 1]])

Split a list into increasing sequences using itertools

I have a list with mixed sequences like
[1,2,3,4,5,2,3,4,1,2]
I want to know how I can use itertools to split the list into increasing sequences cutting the list at decreasing points. For instance the above would output
[[1, 2, 3, 4, 5], [2, 3, 4], [1, 2]]
this has been obtained by noting that the sequence decreases at 2 so we cut the first bit there and another decrease is at one cutting again there.
Another example is with the sequence
[3,2,1]
the output should be
[[3], [2], [1]]
In the event that the given sequence is increasing we return the same sequence. For example
[1,2,3]
returns the same result. i.e
[[1, 2, 3]]
For a repeating list like
[ 1, 2,2,2, 1, 2, 3, 3, 1,1,1, 2, 3, 4, 1, 2, 3, 4, 5, 6]
the output should be
[[1, 2, 2, 2], [1, 2, 3, 3], [1, 1, 1, 2, 3, 4], [1, 2, 3, 4, 5, 6]]
What I did to achieve this is define the following function
def splitter (L):
result = []
tmp = 0
initialPoint=0
for i in range(len(L)):
if (L[i] < tmp):
tmpp = L[initialPoint:i]
result.append(tmpp)
initialPoint=i
tmp = L[i]
result.append(L[initialPoint:])
return result
The function is working 100% but what I need is to do the same with itertools so that I can improve efficiency of my code. Is there a way to do this with itertools package to avoid the explicit looping?
With numpy, you can use numpy.split, this requires the index as split positions; since you want to split where the value decreases, you can use numpy.diff to calculate the difference and check where the difference is smaller than zero and use numpy.where to retrieve corresponding indices, an example with the last case in the question:
import numpy as np
lst = [ 1, 2,2,2, 1, 2, 3, 3, 1,1,1, 2, 3, 4, 1, 2, 3, 4, 5, 6]
np.split(lst, np.where(np.diff(lst) < 0)[0] + 1)
# [array([1, 2, 2, 2]),
# array([1, 2, 3, 3]),
# array([1, 1, 1, 2, 3, 4]),
# array([1, 2, 3, 4, 5, 6])]
Psidom already has you covered with a good answer, but another NumPy solution would be to use scipy.signal.argrelmax to acquire the local maxima, then np.split.
from scipy.signal import argrelmax
arr = np.random.randint(1000, size=10**6)
splits = np.split(arr, argrelmax(arr)[0]+1)
Assume your original input array:
a = [1, 2, 3, 4, 5, 2, 3, 4, 1, 2]
First find the places where the splits shall occur:
p = [ i+1 for i, (x, y) in enumerate(zip(a, a[1:])) if x > y ]
Then create slices for each such split:
print [ a[m:n] for m, n in zip([ 0 ] + p, p + [ None ]) ]
This will print this:
[[1, 2, 3, 4, 5], [2, 3, 4], [1, 2]]
I propose to use more speaking names than p, n, m, etc. ;-)

Is there any function in python which can perform the inverse of numpy.repeat function?

For example
x = np.repeat(np.array([[1,2],[3,4]]), 2, axis=1)
gives you
x = array([[1, 1, 2, 2],
[3, 3, 4, 4]])
but is there something which can perform
x = np.*inverse_repeat*(np.array([[1, 1, 2, 2],[3, 3, 4, 4]]), axis=1)
and gives you
x = array([[1,2],[3,4]])
Regular slicing should work. For the axis you want to inverse repeat, use ::number_of_repetitions
x = np.repeat(np.array([[1,2],[3,4]]), 4, axis=0)
x[::4, :] # axis=0
Out:
array([[1, 2],
[3, 4]])
x = np.repeat(np.array([[1,2],[3,4]]), 3, axis=1)
x[:,::3] # axis=1
Out:
array([[1, 2],
[3, 4]])
x = np.repeat(np.array([[[1],[2]],[[3],[4]]]), 5, axis=2)
x[:,:,::5] # axis=2
Out:
array([[[1],
[2]],
[[3],
[4]]])
This should work, and has the exact same signature as np.repeat:
def inverse_repeat(a, repeats, axis):
if isinstance(repeats, int):
indices = np.arange(a.shape[axis] / repeats, dtype=np.int) * repeats
else: # assume array_like of int
indices = np.cumsum(repeats) - 1
return a.take(indices, axis)
Edit: added support for per-item repeats as well, analogous to np.repeat
For the case where we know the axis and the repeat - and the repeat is a scalar (same value for all elements) we can construct a slicing index like this:
In [1117]: a=np.array([[1, 1, 2, 2],[3, 3, 4, 4]])
In [1118]: axis=1; repeats=2
In [1119]: ind=[slice(None)]*a.ndim
In [1120]: ind[axis]=slice(None,None,a.shape[axis]//repeats)
In [1121]: ind
Out[1121]: [slice(None, None, None), slice(None, None, 2)]
In [1122]: a[ind]
Out[1122]:
array([[1, 2],
[3, 4]])
#Eelco's use of take makes it easier to focus on one axis, but requires a list of indices, not a slice.
But repeat does allow for differing repeat counts.
In [1127]: np.repeat(a1,[2,3],axis=1)
Out[1127]:
array([[1, 1, 2, 2, 2],
[3, 3, 4, 4, 4]])
Knowing axis=1 and repeats=[2,3] we should be able construct the right take indexing (probably with cumsum). Slicing won't work.
But if we only know the axis, and the repeats are unknown then we probably need some sort of unique or set operation as in #redratear's answer.
In [1128]: a2=np.repeat(a1,[2,3],axis=1)
In [1129]: y=[list(set(c)) for c in a2]
In [1130]: y
Out[1130]: [[1, 2], [3, 4]]
A take solution with list repeats. This should select the last of each repeated block:
In [1132]: np.take(a2,np.cumsum([2,3])-1,axis=1)
Out[1132]:
array([[1, 2],
[3, 4]])
A deleted answer uses unique; here's my row by row use of unique
In [1136]: np.array([np.unique(row) for row in a2])
Out[1136]:
array([[1, 2],
[3, 4]])
unique is better than set for this use since it maintains element order. There's another problem with unique (or set) - what if the original had repeated values, e.g. [[1,2,1,3],[3,3,4,1]].
Here is a case where it would be difficult to deduce the repeat pattern from the result. I'd have to look at all the rows first.
In [1169]: a=np.array([[2,1,1,3],[3,3,2,1]])
In [1170]: a1=np.repeat(a,[2,1,3,4], axis=1)
In [1171]: a1
Out[1171]:
array([[2, 2, 1, 1, 1, 1, 3, 3, 3, 3],
[3, 3, 3, 2, 2, 2, 1, 1, 1, 1]])
But cumsum on a known repeat solves it nicely:
In [1172]: ind=np.cumsum([2,1,3,4])-1
In [1173]: ind
Out[1173]: array([1, 2, 5, 9], dtype=int32)
In [1174]: np.take(a1,ind,axis=1)
Out[1174]:
array([[2, 1, 1, 3],
[3, 3, 2, 1]])
>>> import numpy as np
>>> x = np.repeat(np.array([[1,2],[3,4]]), 2, axis=1)
>>> y=[list(set(c)) for c in x] #This part remove duplicates for each array in tuple. So this will not work for x = np.repeat(np.array([[1,1],[3,3]]), 2, axis=1)=[[1,1,1,1],[3,3,3,3]. Result will be [[1],[3]]
>>> print y
[[1, 2], [3, 4]]
You dont need know to axis and repeat amount...

Categories