Repeat calculations over repeated blocks of 5 rows within numpy - python

I have an array, of which this is a small sample. It repeats measurements 5 times, and I want to collate these blocks of five into a new array, where each block of five rows is now one row giving mean, median and standard deviation of the five initial rows
data =
[[1, 9, 66, 74, -0.274035]
[1, 9, 66, 74, -0.269245]
[1, 9, 66, 74, -0.271161]
[1, 9, 66, 74, -0.269245]
[1, 9, 66, 74, -0.266370]
[2, 10, 65, 73, 0.085277]
[2, 10, 65, 73, 0.086235]
[2, 10, 65, 73, 0.090068]
[2, 10, 65, 73, 0.087193]
[2, 10, 65, 73, 0.085277]
What I would like to do is keep the value of the value in the block for the first 4 column, then find the mean, median and standard deviation of the next column, working iteratively over blocks of five rows.
data2 =
[[1, 9, 66, 74, mean[0:5,4], median[0:5,4], std[0:5,4]]
[2, 10, 65, 73, mean[5:10,4], median[5:10,4], std[5:10,4]]]
or in numerical terms:
[[1, 9, 66, 74, -0.270011, -0.269245, 0.002528]
[2, 10, 65, 73, 0.08681, 0.086235, 0.001777]]
I've tried this, but just get are zeroes as an output:
index.shape
Out[119]: (10,)
repeat = 5
a = 0
b = repeat
length = int((len(index) - repeat) / repeat)
meanVre = np.zeros(length)
for _ in range(length):
np.append(meanVre, np.mean(data[a:b,5]))
a = a+5
b = b+5
(repeat is used as a variable rather than 5, as the amount of rows in the block is liable to change at a later date).
Any help you can give would be really appreciated.

def block_stats(data, blocksize = 5):
inputs = data[::blocksize, :4]
data_stat = data[:, 4].reshape(-1, blocksize)
means = np.mean(data_stat, axis = 1, keepdims = 1)
medians = np.median(data_stat, axis = 1, keepdims = 1)
stds = np.std(data_stat, axis = 1, keepdims = 1)
return np.vstack([inputs, means, medians, stds])

Related

How to get matrix position of the max element in sliding window view of an array?

I have successfully found the maximum of the array in each sliding window view using amax and sliding_window_view functions from NumPy as follows:
import numpy as np
a = np.random.randint(0, 100, (5, 6)) # 2D array
array([[51, 92, 14, 71, 60, 20],
[82, 86, 74, 74, 87, 66],
[23, 2, 21, 52, 1, 87],
[29, 37, 1, 63, 59, 20],
[32, 75, 57, 21, 83, 48]])
windows = np.lib.stride_tricks.sliding_window_view(a, (3, 3))
np.amax(windows, axis=(2, 3))
array([[92, 92, 87, 87],
[86, 86, 87, 87],
[75, 75, 83, 87]])
Now, I'm trying to find the position of the max values in the original array considering the windows.
Expected Output
The first element i.e. `92` should give position `(1, 0)`.
The second element i.e. `92` should give position `(1, 0)`.
The third element i.e. `87` should give position `(4, 1)`.
.
.
The seventh element i.e. `87` should give position `(4, 1)`.
The twelveth element i.e. `87` should give position `(5, 2)`.
.
so on
NOTE: Only one position per value is needed. Hence, if there are multiple positions inside a window, return only the first.
This solution gives indices per-window but does not give unique indices if a max-value appears twice in some window:
maxvals = np.amax(windows, axis=(2, 3))
# array([[92, 92, 87, 87],
# [86, 86, 87, 87],
# [75, 75, 83, 87]])
indx = np.array((windows == np.expand_dims(maxvals, axis = (2, 3)).nonzero())
which gives you back one array for each of the four axes in the windows array. Now we use some math with the relative index positions in each window to get back the indices at which max values occur in the original array:
np.sum(indx.reshape(2, 2, -1), axis = 0)
# array([[0, 0, 1, 1, 2, 1, 1, 1, 1, 2, 4, 4, 4, 2],
# [1, 1, 4, 4, 5, 1, 1, 4, 4, 5, 1, 1, 4, 5]])
The reshaping is done to faciliate adding of indices. The first two array give the window position. the second two arrays are positions relative to the window. So we just add them up.
You can check that each pair of value along the second axis is the pair of indices you require.
Without loops, it could be achieved with use of another SO post:
unique_values, index = np.unique(a, return_index=True)
result = index[np.searchsorted(unique_values, np.amax(windows, axis=(2, 3)))]
ind = np.dstack((result % a.shape[1], result // a.shape[1]))
ind.reshape(12, 2)
# [[1 0]
# [1 0]
# [4 1]
# [5 1]
# [1 1]
# [1 1]
# [4 1]
# [5 1]
# [1 4]
# [1 4]
# [4 4]
# [4 4]]

Generate 3 random lists and create another one with the sum of their elements

I want to create a NxN matrix (represented as lists of lists), where the first n-1 columns have random numbers in the range 1 to 10, and the last column contains the result of adding the numbers in previous commons.
import random
randomlist1 = []
for i in range(1,10):
n = random.randint(1,100)
randomlist1.append(n)
print(randomlist1)
randomlist2 = []
for i in range(1,10):
n = random.randint(1,100)
randomlist2.append(n)
print(randomlist2)
randomlist3 = []
for i in range(1,10):
n = random.randint(1,100)
randomlist3.append(n)
print(randomlist3)
# I have problems here
lists_of_lists = [sum(x) for x in (randomlist1, randomlist2,randomlist3)]
[sum(x) for x in zip(*lists_of_lists)]
print(lists_of_lists)
Your question calls for a few comments:
the title does not correspond to the question, and the code matches the title, not the question;
the rows randomlist1 , randomlist1 , randomlist1 are not in a matrix;
the final value is not a square matrix;
You write "the columns have random numbers in the range of 1 to 10" but your code randint(1,100) creates numbers in the range [1..100].
Solution to the question
import random
N = 5
# create a N by N-1 matrix of random integers
matrix = [[random.randint(1, 10) for j in range(N-1)] for i in range(N)]
print(f"{N} by {N-1} matrix:\n{matrix}")
# add a column as sum of the previous ones
for line in matrix:
line.append(sum(line))
print(f"{N} by {N} matrix with the last column as sum of the previous ones:\n{matrix}")
Ouput:
5 by 4 matrix:
[[7, 10, 5, 6], [4, 10, 9, 3], [5, 5, 4, 9], [10, 7, 2, 4], [8, 8, 5, 3]]
5 by 5 matrix with the last column as sum of the previous ones:
[[7, 10, 5, 6, 28], [4, 10, 9, 3, 26], [5, 5, 4, 9, 23], [10, 7, 2, 4, 23], [8, 8, 5, 3, 24]]
IIUC try with numpy
import numpy as np
np.random.seed(1) # just for demo purposes
# lists comprehensions to create your 3 lists inside a list
lsts = [np.random.randint(1,100, 10).tolist() for i in range(3)]
np.sum(lsts, axis=0)
# array([145, 100, 131, 105, 215, 115, 194, 247, 116, 45])
lsts
[[38, 13, 73, 10, 76, 6, 80, 65, 17, 2],
[77, 72, 7, 26, 51, 21, 19, 85, 12, 29],
[30, 15, 51, 69, 88, 88, 95, 97, 87, 14]]
Based on #It_is_Chris answer, I propose this as a numpy only implementation, without using lists:
np.random.seed(1)
final_shape = (3, 10)
lsts = np.random.randint(1, 100, np.prod(final_shape)).reshape(final_shape)
lstsum = np.sum(lsts, axis=0)

How to generate random numbers with if-statement in Python?

I would like to generate random numbers with a specific restriction using python. The code should do the following:
If an entered number is:
0, then generate 0 random non-recurrent numbers
<1, then generate 1 random non-recurrent numbers
<9, then generate 2 random non-recurrent numbers
<15, then generate 3 random non-recurrent numbers
<26, then generate 5 random non-recurrent numbers
<51, then generate 8 random non-recurrent numbers
<91, then generate 13 random non-recurrent numbers
<151, then generate 20 random non-recurrent numbers
<281, then generate 32 random non-recurrent numbers
The value of the random numbers should be limited by the value of the entered number. So if a 75 is entered, then the code should generate 13 random numbers with being 75 the highest value of the 13 numbers. 75 doesn't have to be the actual highest number, just in terms of max value.
My guess was to use numpy. Here is what I got until now (with an users help).
num_files=[0,1,9,...]
num_nums=[0,1,2,3,5,...]
for zipp in zip(num_files,num_nums)
if len(docx_files)<zipp[0]:
list_of_rands=np.random.choice(len(docx_files)+1,
zipp[1],replace=False)
Any ideas or more starting points?
Here's one way of doing it. Just zip the lists of numbers and the cutoffs, and check if the number input (the variable number in the code below) is above the cutoff. Note that this doesn't handle the case of numbers larger than 281, since I'm not sure what's supposed to happen there based on your description.
import numpy as np
number = 134
parameters = zip([9, 15, 26, 51, 91, 151], [3, 5, 8, 13, 20, 32])
nums = 2
for item in parameters:
if number > item[0]:
nums = item[1]
np.random.choice(number, nums)
You could define a function using a dictionary with ranges as keys and number of random numbers as values:
import random
def rand_nums(input_num):
d = {26: 5, 51: 8, 91: 13}
for k, v in d.items():
if input_num in range(k):
nums = random.sample(range(k+1), v)
return nums
print(rand_nums(20))
print(rand_nums(50))
print(rand_nums(88))
[14, 23, 11, 9, 5]
[9, 49, 23, 16, 8, 50, 47, 33]
[20, 16, 28, 77, 21, 87, 85, 82, 10, 47, 43, 90, 57]
>>>
You can avoid a many-branched if-elif-else using np.searchsorted:
import numpy as np
def generate(x):
boundaries = np.array([1, 2, 9, 15, 26, 51, 91, 151, 281])
numbers = np.array([0, 1, 2, 3, 5, 8, 13, 20, 32])
return [np.random.choice(j, n, False)+1 if j else np.array([], np.int64)
for j, n in np.broadcast(x, numbers[boundaries.searchsorted(x, 'right')])]
# demo
from pprint import pprint
# single value
pprint(generate(17))
# multiple values in one go
pprint(generate([19, 75, 3, 1, 2, 0, 8, 9]))
# interactive
i = int(input('Enter number: '))
pprint(generate(i))
Sample output:
[array([ 9, 1, 14, 4, 12])]
[array([ 8, 12, 6, 17, 4]),
array([17, 29, 2, 20, 16, 37, 36, 13, 34, 58, 49, 72, 41]),
array([1, 3]),
array([1]),
array([2, 1]),
array([], dtype=int64),
array([1, 8]),
array([3, 2, 6])]
Enter number: 280
[array([184, 73, 80, 280, 254, 164, 192, 145, 176, 29, 58, 251, 37,
107, 5, 51, 7, 128, 142, 125, 135, 87, 259, 83, 260, 10,
108, 210, 8, 36, 181, 64])]
How about:
def gen_rand_array(n):
mapping = np.array([[1,1],
[26,5],
[51,8],
[91,13]])
k = mapping[np.max(np.where(n > mapping[:,0])),1]
return np.random.choice(n+1,k)
Example:
>>> gen_rand_array(27)
array([ 0, 21, 26, 25, 23])
>>> gen_rand_array(27)
array([21, 5, 10, 3, 13])
>>> gen_rand_array(57)
array([30, 26, 50, 31, 44, 51, 39, 13])
>>> gen_rand_array(57)
array([21, 18, 35, 8, 13, 13, 20, 3])
Here's a screen shot putting it all together:
Explanation:
The line k = mapping[np.max(np.where(n > mapping[:,0])),1] is just finding the number of random values needed from the array mapping. n > mapping[:,0] return a boolean array whose values will be True for all the numbers smaller then n, False otherwise. np.where(...) will return the indexes of the elements of the array that are true. Since the values in the first column of mapping (i.e. mapping[:,0]) are ascending, we can find the index of the largest one that is less than n be calling np.max(...). Finally we want the corresponding value from the second column which is why we pass the result of that as an index to mapping again i.e. mapping[...,1] where the 1 is for the second column.
I don't know how to implement it in your code but with this code you then you get the randoms:
import random
x = 51
if x < 26:
ar_Random = [None]*5
for i in range(0, 6):
ar_Random[i] = random.randint(startNumOfRandom, stopNumOfRandom)
elif x < 51:
ar_Random = [None]*8
for i in range (0,9):
ar_Random[i] = random.randint(startNumOfRandom, stopNumOfRandom)
...
I'm not sure how you're mapping the length to the input but this is how you generate N random numbers with a maximum using Numpy.
import numpy as np
//set entered_num and desired_length to whatever you want
random_nums = np.random.randint(entered_num, size = desired_length)
import random
Starting_Number = int(input())
if Starting_Number < 26:
print(random.sample(range(1, 26), 5))
elif Starting_Number < 51:
print(random.sample(range(1, 51), 8))
elif Starting_Number < 91:
print(random.sample(range(1, 91), 13))
Here you go!!!
random.sample is the module you are looking for.
Have a good one!

Multiple indices for numpy array: IndexError: failed to coerce slice entry of type numpy.ndarray to integer

Is there a way to do multiple indexing in a numpy array as described below?
arr=np.array([55, 2, 3, 4, 5, 6, 7, 8, 9])
arr[np.arange(0,2):np.arange(5,7)]
output:
IndexError: too many indices for array
Desired output:
array([55,2,3,4,5],[2,3,4,5,6])
This problem might be similar to calculating a moving average over an array (but I want to do it without any function that is provided).
Here's an approach using strides -
start_index = np.arange(0,2)
L = 5 # Interval length
n = arr.strides[0]
strided = np.lib.stride_tricks.as_strided
out = strided(arr[start_index[0]:],shape=(len(start_index),L),strides=(n,n))
Sample run -
In [976]: arr
Out[976]: array([55, 52, 13, 64, 25, 76, 47, 18, 69, 88])
In [977]: start_index
Out[977]: array([2, 3, 4])
In [978]: L = 5
In [979]: out
Out[979]:
array([[13, 64, 25, 76, 47],
[64, 25, 76, 47, 18],
[25, 76, 47, 18, 69]])

Numpy: find index of the elements within range

I have a numpy array of numbers, for example,
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
I would like to find all the indexes of the elements within a specific range. For instance, if the range is (6, 10), the answer should be (3, 4, 5). Is there a built-in function to do this?
You can use np.where to get indices and np.logical_and to set two conditions:
import numpy as np
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.where(np.logical_and(a>=6, a<=10))
# returns (array([3, 4, 5]),)
As in #deinonychusaur's reply, but even more compact:
In [7]: np.where((a >= 6) & (a <=10))
Out[7]: (array([3, 4, 5]),)
Summary of the answers
For understanding what is the best answer we can do some timing using the different solution.
Unfortunately, the question was not well-posed so there are answers to different questions, here I try to point the answer to the same question. Given the array:
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
The answer should be the indexes of the elements between a certain range, we assume inclusive, in this case, 6 and 10.
answer = (3, 4, 5)
Corresponding to the values 6,9,10.
To test the best answer we can use this code.
import timeit
setup = """
import numpy as np
import numexpr as ne
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
# or test it with an array of the similar size
# a = np.random.rand(100)*23 # change the number to the an estimate of your array size.
# we define the left and right limit
ll = 6
rl = 10
def sorted_slice(a,l,r):
start = np.searchsorted(a, l, 'left')
end = np.searchsorted(a, r, 'right')
return np.arange(start,end)
"""
functions = ['sorted_slice(a,ll,rl)', # works only for sorted values
'np.where(np.logical_and(a>=ll, a<=rl))[0]',
'np.where((a >= ll) & (a <=rl))[0]',
'np.where((a>=ll)*(a<=rl))[0]',
'np.where(np.vectorize(lambda x: ll <= x <= rl)(a))[0]',
'np.argwhere((a>=ll) & (a<=rl)).T[0]', # we traspose for getting a single row
'np.where(ne.evaluate("(ll <= a) & (a <= rl)"))[0]',]
functions2 = [
'a[np.logical_and(a>=ll, a<=rl)]',
'a[(a>=ll) & (a<=rl)]',
'a[(a>=ll)*(a<=rl)]',
'a[np.vectorize(lambda x: ll <= x <= rl)(a)]',
'a[ne.evaluate("(ll <= a) & (a <= rl)")]',
]
rdict = {}
for i in functions:
rdict[i] = timeit.timeit(i,setup=setup,number=1000)
print("%s -> %s s" %(i,rdict[i]))
print("Sorted:")
for w in sorted(rdict, key=rdict.get):
print(w, rdict[w])
Results
The results are reported in the following plot for a small array (on the top the fastest solution) as noted by #EZLearner they may vary depending on the size of the array. sorted slice could be faster for larger arrays, but it requires your array to be sorted, for arrays with over 10 M of entries ne.evaluate could be an option. Is hence always better to perform this test with an array of the same size as yours:
If instead of the indexes you want to extract the values you can perform the tests using functions2 but the results are almost the same.
I thought I would add this because the a in the example you gave is sorted:
import numpy as np
a = [1, 3, 5, 6, 9, 10, 14, 15, 56]
start = np.searchsorted(a, 6, 'left')
end = np.searchsorted(a, 10, 'right')
rng = np.arange(start, end)
rng
# array([3, 4, 5])
a = np.array([1,2,3,4,5,6,7,8,9])
b = a[(a>2) & (a<8)]
Other way is with:
np.vectorize(lambda x: 6 <= x <= 10)(a)
which returns:
array([False, False, False, True, True, True, False, False, False])
It is sometimes useful for masking time series, vectors, etc.
This code snippet returns all the numbers in a numpy array between two values:
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56] )
a[(a>6)*(a<10)]
It works as following:
(a>6) returns a numpy array with True (1) and False (0), so does (a<10). By multiplying these two together you get an array with either a True, if both statements are True (because 1x1 = 1) or False (because 0x0 = 0 and 1x0 = 0).
The part a[...] returns all values of array a where the array between brackets returns a True statement.
Of course you can make this more complicated by saying for instance
...*(1-a<10)
which is similar to an "and Not" statement.
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.argwhere((a>=6) & (a<=10))
Wanted to add numexpr into the mix:
import numpy as np
import numexpr as ne
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.where(ne.evaluate("(6 <= a) & (a <= 10)"))[0]
# array([3, 4, 5], dtype=int64)
Would only make sense for larger arrays with millions... or if you hitting a memory limits.
This may not be the prettiest, but works for any dimension
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
ranges = (0,4), (0,4)
def conditionRange(X : np.ndarray, ranges : list) -> np.ndarray:
idx = set()
for column, r in enumerate(ranges):
tmp = np.where(np.logical_and(X[:, column] >= r[0], X[:, column] <= r[1]))[0]
if idx:
idx = idx & set(tmp)
else:
idx = set(tmp)
idx = np.array(list(idx))
return X[idx, :]
b = conditionRange(a, ranges)
print(b)
s=[52, 33, 70, 39, 57, 59, 7, 2, 46, 69, 11, 74, 58, 60, 63, 43, 75, 92, 65, 19, 1, 79, 22, 38, 26, 3, 66, 88, 9, 15, 28, 44, 67, 87, 21, 49, 85, 32, 89, 77, 47, 93, 35, 12, 73, 76, 50, 45, 5, 29, 97, 94, 95, 56, 48, 71, 54, 55, 51, 23, 84, 80, 62, 30, 13, 34]
dic={}
for i in range(0,len(s),10):
dic[i,i+10]=list(filter(lambda x:((x>=i)&(x<i+10)),s))
print(dic)
for keys,values in dic.items():
print(keys)
print(values)
Output:
(0, 10)
[7, 2, 1, 3, 9, 5]
(20, 30)
[22, 26, 28, 21, 29, 23]
(30, 40)
[33, 39, 38, 32, 35, 30, 34]
(10, 20)
[11, 19, 15, 12, 13]
(40, 50)
[46, 43, 44, 49, 47, 45, 48]
(60, 70)
[69, 60, 63, 65, 66, 67, 62]
(50, 60)
[52, 57, 59, 58, 50, 56, 54, 55, 51]
You can use np.clip() to achieve the same:
a = [1, 3, 5, 6, 9, 10, 14, 15, 56]
np.clip(a,6,10)
However, it holds the values less than and greater than 6 and 10 respectively.

Categories