Related
Given an example array (or list), is there a way to split the array into different lengths?
Here is desired input & output such that:
import numpy as np
# Input array
data = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17])
# Desired output splited arrays
[array([0, 1, 2, 3, 4, 5, 6, 7]), array([8, 9, 10, 11, 12, 13, 14, 15, 16, 17])]
I want to get the corresponding output, but it doesn't work, so I ask a question.
Hard to know what you want exactly, but assuming you want 8 items for the first list, use numpy.array_split:
data = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17])
out = np.array_split(data, [8])
output:
[array([0, 1, 2, 3, 4, 5, 6, 7]),
array([ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17])]
To be more generic, you can use a list of sizes and process it with numpy.cumsum:
sizes = np.array([4,5,1,3])
out = np.array_split(data, np.cumsum(sizes))
output:
[array([0, 1, 2, 3]), # 4 items
array([4, 5, 6, 7, 8]), # 5 items
array([9]), # 1 item
array([10, 11, 12]), # 3 items
array([13, 14, 15, 16, 17])] # remaining items
I've got a set of indices that defines the starts:
Int64Index([0, 3, 5, 6, 7, 8, 10, 15, 20, 22], dtype='int64')
and ends:
Int64Index([2, 5, 7, 8, 9, 10, 12, 17, 22, 24], dtype='int64')
of the ranges that should be used as desirable index. In other words, I'd like to obtain an index that would include all integers from 0 to 2 (inclusive), then from 3 to 5 (inclusive), ..., from 10 to 12 (inclusive), from 15 to 17 (inclusive) and so on. The resulting index would be:
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, **15**, 16, 17, **20**, 21, 22, 23, 24], dtype='int64')
(please note the break before 15 and 20). So the pairs of subsequent values would define the ranges and then those ranges would be joined together.
How can I obtain that?
My attempt is:
np.unique(np.concatenate([np.arange(start, end + 1) for start, end in zip(indices_starts, indices_ends)]))
But it feels like there must be more straighforward and potentially faster solution.
start = [0, 3, 5, 6, 7, 8, 10, 15, 20, 22]
end = [2, 5, 7, 8, 9, 10, 12, 17, 22, 24]
# Create an empty list for your indexes
new_idx = []
# Add the new indexes
for s, e in zip(start, end):
new_idx.extend(list(range(s,e+1)))
# Drop duplicated values
list(set(new_idx))
Hope it helps!
Your result index list is not accurate according to your description. With the condition of the index ranges being INCLUSIVE, the resulting index list would be:
starts = [0, 3, 5, 6, 7, 8, 10, 15, 20, 22]
ends = [2, 5, 7, 8, 9, 10, 12, 17, 22, 24]
indexes = []
for i in range(len(starts)):
indexes.extend(list(range(starts[i], ends[i] + 1)))
print(indexes)
# [0, 1, 2, 3, 4, 5, 5, 6, 7, 6, 7, 8, 7, 8, ... 20, 21, 22, 22, 23, 24]
With the condition of the index ranges being EXCLUSIVE:
...
for i in range(len(starts)):
indexes.extend(list(range(starts[i], ends[i])))
print(indexes)
# [0, 1, 3, 4, 5, 6, 6, 7, 7, 8, 8, 9, 10, 11, 15, 16, 20, 21, 22, 23]
I would like to select every nth group of n columns in a numpy array. It means that I want the first n columns, not the n next columns, the n next columns, not the n next columns etc.
For example, with the following array and n=2:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])
I would like to get:
[[1, 2, 5, 6, 9, 10],
[11, 12, 15, 16, 19, 20]]
And with n=3:
[[1, 2, 3, 7, 8, 9],
[11, 12, 13, 17, 18, 19]]
With n=1 we can simply use the syntax arr[:,::2], but is there something similar for n>1?
You can use modulus to create ramps starting from 0 until 2n and then select the first n from each such ramp. Thus, for each ramp, we would have first n as True and rest as False, to give us a boolean array covering the entire length of the array. Then, we simply use boolean indexing along the columns to select the valid columns for the final output. Thus, the implementation would look something like this -
arr[:,np.mod(np.arange(arr.shape[-1]),2*n)<n]
Step by step code runs to give a better idea -
In [43]: arr
Out[43]:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])
In [44]: n = 3
In [45]: np.mod(np.arange(arr.shape[-1]),2*n)
Out[45]: array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3])
In [46]: np.mod(np.arange(arr.shape[-1]),2*n)<n
Out[46]: array([ True,True,True,False,False,False,True,True,True,False])
In [47]: arr[:,np.mod(np.arange(arr.shape[-1]),2*n)<n]
Out[47]:
array([[ 1, 2, 3, 7, 8, 9],
[11, 12, 13, 17, 18, 19]])
Sample runs across various n -
In [29]: arr
Out[29]:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])
In [30]: n = 1
In [31]: arr[:,np.mod(np.arange(arr.shape[-1]),2*n)<n]
Out[31]:
array([[ 1, 3, 5, 7, 9],
[11, 13, 15, 17, 19]])
In [32]: n = 2
In [33]: arr[:,np.mod(np.arange(arr.shape[-1]),2*n)<n]
Out[33]:
array([[ 1, 2, 5, 6, 9, 10],
[11, 12, 15, 16, 19, 20]])
In [34]: n = 3
In [35]: arr[:,np.mod(np.arange(arr.shape[-1]),2*n)<n]
Out[35]:
array([[ 1, 2, 3, 7, 8, 9],
[11, 12, 13, 17, 18, 19]])
Data = [day(1) day(2)...day(N)...day(2N)..day(K-N)...day(K)]
I am looking to create a numpy array with two arrays, N and K with shapes (120,) and (300,). The array needs to be of the form:
x1 = [day(1) day(2) day (3)...day(N)]
x2 = [day(2) day(3)...day(N) day(N+1)]
xN = [day(N) day(N+1) day(N+2)...day(2N)]
xK-N = [day(K-N) day(K-N+1)...day(K)]
X is basically of shape (K-N)xN, with the above x1,x2,...xK-N as rows. I have tried using iloc for getting two arrays N and K with the same shapes. Good till then. But, when I try to merge the arrays using X = np.array([np.concatenate((N[i:], K[:i] )) for i in range(len(N)]), I am getting an NxN array in the form of an overlap array only, and not in the desired format.
Is this what you are trying to produce (with simpler data)?
In [253]: N,K=10,15
In [254]: data = np.arange(K)+10
In [255]: data
Out[255]: array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24])
In [256]: np.array([data[np.arange(N)+i] for i in range(K-N+1)])
Out[256]:
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
[15, 16, 17, 18, 19, 20, 21, 22, 23, 24]])
There's another way of generating this, using advanced ideas about strides:
np.lib.stride_tricks.as_strided(data, shape=(K-N+1,N), strides=(4,4))
In the first case, all values in the new array are copies of the original. The strided case is actually a view. So any changes to data appear in the 2d array. And without data copying, the 2nd is also faster. I can try to explain it if you are interested.
Warren suggests using hankel. That's a short function, which in our case does essentially:
a, b = np.ogrid[0:K-N+1, 0:N]
data[a+b]
a+b is an array like:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
[ 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]])
In this example case it is just a bit better than the list comprehension solution, but I expect it will be a lot better for much larger cases.
It is probably not worth adding a dependence on scipy for the following, but if you are already using scipy in your code, you could use the function scipy.linalg.hankel:
In [75]: from scipy.linalg import hankel
In [76]: K = 16
In [77]: x = np.arange(K)
In [78]: x
Out[78]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
In [79]: N = 8
In [80]: hankel(x[:K-N+1], x[K-N:])
Out[80]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 2, 3, 4, 5, 6, 7, 8, 9],
[ 3, 4, 5, 6, 7, 8, 9, 10],
[ 4, 5, 6, 7, 8, 9, 10, 11],
[ 5, 6, 7, 8, 9, 10, 11, 12],
[ 6, 7, 8, 9, 10, 11, 12, 13],
[ 7, 8, 9, 10, 11, 12, 13, 14],
[ 8, 9, 10, 11, 12, 13, 14, 15]])
import numpy as np
import re
def validate(seq):
stl = "".join(np.diff(seq).astype(str))
for x in re.findall("[1]+",stl):
if len(x)>3:
return False
return True
print validate([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 20])
print validate([1, 2, 3, 6, 7, 8, 9, 11, 12, 16, 17, 18, 19, 22, 23])
print validate([2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 20, 22, 23, 24])
output
False
True
True
This code check the list and check how many numbers are chained and return False if there are more than 4 together like the first print (1, 2, 3, 4, 5, 6, 7, ...), returning True if there are only 4 in sequence like the second print (6, 7, 8, 9) (16, 17, 18, 19).
So how can I amend the code to return True when the list have only one group with 4 numbers in sequence? Returning False for lists with more than for numbers in sequence?
print validate([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 20])
print validate([1, 2, 3, 6, 7, 8, 9, 11, 12, 16, 17, 18, 19, 22, 23])
print validate([2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 20, 22, 23, 24])
output
False
False
True
The way you defined validate function is somewhat hard to follow. I rewrote it in the following way
def validate(seq, counts=1, N=4):
if len(seq) < N:
return False
diff_seq = np.diff(seq)
sum_seq = np.array([(np.sum(diff_seq[i:i+N-1])==N-1) for i in range(len(diff_seq)-N+2)])
return np.count_nonzero(sum_seq) == counts
where N is the length of consecutive numbers in a group and counts is the number of such groups that you want to have in seq.
EDIT
You can use convolve to compute sum_seq in function validate as follows
sum_seq = np.convolve(diff_seq, np.ones((N-1,)), mode='valid') == N-1
This should be much faster and it looks more pythonic.