I have a 2d array of size 3 by 7 in numpy:
[[1 2 3 4 5 6 7]
[4 5 6 7 8 9 0]
[2 3 4 5 6 7 8]]
I also have a list that contains indexes of splitting points:
[1, 3]
Now, I want to split the array using the indexes in the list such that I get:
[[1 2]
[4 5]
[2 3]]
[[ 2 3 4]
[5 6 7]
[3 4 5]]
[[ 4 5 6 7]
[7 8 9 0]
[5 6 7 8]]
How can I do this in python?
You can use a list comprehension with slicing, using zip to extract indices pairwise.
A = np.array([[1, 2, 3, 4, 5, 6, 7],
[4, 5, 6, 7, 8, 9, 0],
[2, 3, 4, 5, 6, 7, 8]])
idx = [1, 3]
idx = [0] + idx + [A.shape[1]]
res = [A[:, start: end+1] for start, end in zip(idx, idx[1:])]
print(*res, sep='\n'*2)
[[1 2]
[4 5]
[2 3]]
[[2 3 4]
[5 6 7]
[3 4 5]]
[[4 5 6 7]
[7 8 9 0]
[5 6 7 8]]
Related
I have a 2D numpy array like this:
[[4 5 2]
[5 5 1]
[5 4 5]
[5 3 4]
[5 4 4]
[4 3 2]]
I would like to sort/cluster this array by finding the sequence in array like this row[0]>=row[1]>=row[2], row[0]>=row[2]>row[1]... so the row of the array is in ordered sequence.
I tried to use the code: lexdf = df[np.lexsort((df[:,2], df[:,1],df[:,0]))][::-1], however it is not I want.
The output of lexsort:
[[5 5 1]
[5 4 5]
[5 4 4]
[5 3 4]
[4 5 2]
[4 3 2]]
The output I would like to have:
[[5 5 1]
[5 4 4]
[4 3 2]
[5 4 5]
[5 3 4]
[4 5 2]]
or cluster it into three parts:
[[5 5 1]
[5 4 4]
[4 3 2]]
[[5 4 5]
[5 3 4]]
[[4 5 2]]
And I would like to apply this to an array with more columns, so it would be better to do it without iteration. Any ideas to generate this kind of output?
I don't know how to do it in numpy, except maybe with some weird hacks of function numpy.split.
Here is a way to get your groups with python lists:
from itertools import groupby, pairwise
def f(sublist):
return [x <= y for x,y in pairwise(sublist)]
# NOTE: itertools.pairwise requires python>=3.10
# For python<=3.9, use one of those alternatives:
# * more_itertools.pairwise(sublist)
# * zip(sublist, sublist[1:])
a = [[4, 5, 2],
[5, 5, 1],
[5, 4, 5],
[5, 3, 4],
[5, 4, 4],
[4, 3, 2]]
b = [list(g) for _,g in groupby(sorted(a, key=f), key=f)]
print(b)
# [[[4, 3, 2]],
# [[5, 4, 5], [5, 3, 4], [5, 4, 4]],
# [[4, 5, 2], [5, 5, 1]]]
Note: The combination groupby+sorted is actually slightly subefficient, because sorted takes n log(n) time. A linear alternative is to group using a dictionary of lists. See for instance function itertoolz.groupby from module toolz.
could anyone explain me the reson why indexing the array using a list and using [x:x] lead to a very different result when manipulating numpy arrays?
Example:
a = np.array([[1,2,3,4],[3,4,5,5],[4,5,6,3], [1,2,5,5], [1, 2, 3, 4]])
print(a, '\n')
print(a[[3, 4]][:1][:, 1])
a[[3, 4]][:1][:, 1] = 99
print(a, '\n')
print(a[3:4][:1][:, 1])
a[3:4][:1][:, 1] = 99
print(a, '\n')
Output:
[[1 2 3 4]
[3 4 5 5]
[4 5 6 3]
[1 2 5 5]
[1 2 3 4]]
[2]
[[1 2 3 4]
[3 4 5 5]
[4 5 6 3]
[1 2 5 5]
[1 2 3 4]]
[2]
[[ 1 2 3 4]
[ 3 4 5 5]
[ 4 5 6 3]
[ 1 99 5 5]
[ 1 2 3 4]]
Is there a way to modify the array when indexing with a list?
Create an index that selects the desired elements without chaining:
In [114]: a[[3,4],1]=90
In [115]: a
Out[115]:
array([[ 1, 2, 3, 4],
[ 3, 4, 5, 5],
[ 4, 5, 6, 3],
[ 1, 90, 5, 5],
[ 1, 90, 3, 4]])
Suppose we want to generate the same random number between 1 and 10 each time. Then when I run the following I get the same random number each time:
import os
import numpy as np
import random
random.seed(30)
random.randint(1, 10)
9
random.seed(30)
random.randint(1, 10)
9
But if I want to generate the same random 4x4 matrix with numbers between 1 and 10 each time, I get different results:
random.seed(30)
np.random.randint(10, size=(4,4))
array([[8, 2, 6, 4],
[3, 3, 3, 5],
[6, 2, 6, 6],
[8, 7, 1, 1]])
random.seed(30)
np.random.randint(10, size=(4,4))
array([[9, 2, 1, 6],
[4, 3, 3, 8],
[1, 1, 6, 6],
[0, 2, 3, 5]])
Question. How do I get the same array each time using random.seed() ?
Added. I added the import statements.
You need to use numpy.random.seed and not random.seed.
Now, you mix 2 different modules i.e. numpy and random.
import numpy as np
for i in range(5):
np.random.seed(30)
print(np.random.randint(10, size=(4,4)))
[[5 5 4 7]
[2 5 1 3]
[9 7 7 1]
[1 3 2 2]]
[[5 5 4 7]
[2 5 1 3]
[9 7 7 1]
[1 3 2 2]]
[[5 5 4 7]
[2 5 1 3]
[9 7 7 1]
[1 3 2 2]]
[[5 5 4 7]
[2 5 1 3]
[9 7 7 1]
[1 3 2 2]]
[[5 5 4 7]
[2 5 1 3]
[9 7 7 1]
[1 3 2 2]]
I want replace last element of every row in an ndarray with a constant. Currently I can solve this by using loops, but i'm looking for an elegant solution. preferably using numpy functions.
for example i have a ndarray :
[1 3 4 5]
[4 2 4 1]
[3 2 7 3]
[7 9 4 3]
[6 9 7 2]
Here is the result i want, with last element of every row is replaced with 10
[1 3 4 10]
[4 2 4 10]
[3 2 7 10]
[7 9 4 10]
[6 9 7 10]
use numpy indexing for columns
import numpy as np
arr = np.array([[1,3,4,5],
[4,2,4,1],
[3,2,7,3],
[7,9,4,3],
[6,9,7,2]])
arr[:,-1]=10
arr
array([[ 1, 3, 4, 10],
[ 4, 2, 4, 10],
[ 3, 2, 7, 10],
[ 7, 9, 4, 10],
[ 6, 9, 7, 10]])
I am trying to use the function as_strided from numpy.lib.stride_tricks to extract sub series from a larger 2D array, but I struggled to find the right thing to write for the strides argument.
Let's say I have a matrix m which contains 5 1D array of length (a=)10. I want to extract sub 1D arrays of length (b=)4 for each 1D array in m.
import numpy
from numpy.lib.stride_tricks import as_strided
a, b = 10, 4
m = numpy.array([range(i,i+a) for i in range(5)])
# first try
sub_m = as_strided(m, shape=(m.shape[0], m.shape[1]-b+1, b))
print sub_m.shape # (5,7,4) which is what i expected
print sub_m[-1,-1,-1] # Some unexpected strange number: 8227625857902995061
# second try with strides argument
sub_m = as_strided(m, shape=(m.shape[0], m.shape[1]-b+1, b), strides=(m.itemize,m.itemize,m.itemize))
# gives error, see below
AttributeError: 'numpy.ndarray' object has no attribute 'itemize'
As you can see I succeed to get the right shape for sub_m in my first try. However I can't find what to write in strides=()
For information:
m = [[ 0 1 2 3 4 5 6 7 8 9]
[ 1 2 3 4 5 6 7 8 9 10]
[ 2 3 4 5 6 7 8 9 10 11]
[ 3 4 5 6 7 8 9 10 11 12]
[ 4 5 6 7 8 9 10 11 12 13]]
Expected output:
sub_n = [
[[0 1 2 3] [1 2 3 4] ... [5 6 7 8] [6 7 8 9]]
[[1 2 3 4] [2 3 4 5] ... [6 7 8 9] [7 8 9 10]]
[[2 3 4 5] [3 4 5 6] ... [7 8 9 10] [8 9 10 11]]
[[3 4 5 6] [4 5 6 7] ... [8 9 10 11] [9 10 11 12]]
[[4 5 6 7] [5 6 7 8] ... [9 10 11 12] [10 11 12 13]]
]
edit: I have much more data, that's the reason why I want to use as_strided (efficiency)
Here's one approach with np.lib.stride_tricks.as_strided -
def strided_lastaxis(a, L):
s0,s1 = a.strides
m,n = a.shape
return np.lib.stride_tricks.as_strided(a, shape=(m,n-L+1,L), strides=(s0,s1,s1))
Bit of explanation on strides for as_strided :
We have 3D strides, that increments by one element along the last/third axis, so s1 there for the last axis striding. The second axis strides by the same one element "distance", so s1 for that too. For the first axis, the striding is same as the first axis stride length of the array, as we move on the next row, so s0 there.
Sample run -
In [46]: a
Out[46]:
array([[0, 5, 6, 2, 3, 6, 7, 1, 4, 8],
[2, 1, 3, 7, 0, 3, 5, 4, 0, 1]])
In [47]: strided_lastaxis(a, L=4)
Out[47]:
array([[[0, 5, 6, 2],
[5, 6, 2, 3],
[6, 2, 3, 6],
[2, 3, 6, 7],
[3, 6, 7, 1],
[6, 7, 1, 4],
[7, 1, 4, 8]],
[[2, 1, 3, 7],
[1, 3, 7, 0],
[3, 7, 0, 3],
[7, 0, 3, 5],
[0, 3, 5, 4],
[3, 5, 4, 0],
[5, 4, 0, 1]]])