How do I remove every nth element in an array?
import numpy as np
x = np.array([0,10,27,35,44,32,56,35,87,22,47,17])
n = 3 # remove every 3rd element
...something like the opposite of x[0::n]? I've tried this, but of course it doesn't work:
for i in np.arange(0,len(x),n):
x = np.delete(x,i)
You're close... Pass the entire arange as subslice to delete instead of attempting to delete each element in turn, eg:
import numpy as np
x = np.array([0,10,27,35,44,32,56,35,87,22,47,17])
x = np.delete(x, np.arange(0, x.size, 3))
# [10 27 44 32 35 87 47 17]
I just add another way with reshaping if the length of your array is a multiple of n:
import numpy as np
x = np.array([0,10,27,35,44,32,56,35,87,22,47,17])
x = x.reshape(-1,3)[:,1:].flatten()
# [10 27 44 32 35 87 47 17]
On my computer it runs almost twice faster than the solution with np.delete (between 1.8x and 1.9x to be honnest).
You can also easily perfom fancy operations, like m deletions each n values etc.
Here's a super fast version for 2D arrays: Remove every m-th row and n-th column from a 2D array (assuming the shape of the array is a multiple of (n, m)):
array2d = np.arange(60).reshape(6, 10)
m, n = (3, 5)
remove = lambda x, q: x.reshape(x.shape[0], -1, q)[..., 1:].reshape(x.shape[0], -1).T
remove(remove(array2d, n), m)
returns:
array([[11, 12, 13, 14, 16, 17, 18, 19],
[21, 22, 23, 24, 26, 27, 28, 29],
[41, 42, 43, 44, 46, 47, 48, 49],
[51, 52, 53, 54, 56, 57, 58, 59]])
To generalize for any shape use padding or reduce the input array depending on your situation.
Speed comparison:
from time import time
'remove'
start = time()
for _ in range(100000):
res = remove(remove(array2d, n), m)
time() - start
'delete'
start = time()
for _ in range(100000):
tmp = np.delete(array2d, np.arange(0, array2d.shape[0], m), axis=0)
res = np.delete(tmp, np.arange(0, array2d.shape[1], n), axis=1)
time() - start
"""
'remove'
0.3835930824279785
'delete'
3.173515558242798
"""
So, compared to numpy.delete the above method is significantly faster.
Related
I need to specialized numpy arrays. Assume I have a function:
def gen_array(start, end, n_cols):
It should behave like this, generating three columns where each column goes from start (inclusive) to end (exclusive):
>>> gen_array(20, 25, 3)
array([[20, 20, 20],
[21, 21, 21],
[22, 22, 22],
[23, 23, 23],
[24, 24, 24]])
My rather naïve implementation looks like this:
def gen_array(start, end, n_columns):
a = np.arange(start, end).reshape(end-start, 1) # create a column vector from start to end
return np.dot(a, [np.ones(n_columns)]) # replicate across n_columns
(It's okay, though not required, that the np.dot converts values to floats.)
I'm sure there's a better, more efficient and more numpy-ish way to accomplish the same thing. Suggestions?
Update
Buildin on a suggestion by #msi_gerva to use np.tile, my latest best thought is:
def gen_array(start, end, n_cols):
return np.tile(np.arange(start, end).reshape(-1, 1), (1, n_cols))
... which seems pretty good to me.
In addition to numpy.arange and numpy.reshape, use numpy.repeat to extend your data.
import numpy as np
def gen_array(start, end, n_cols):
return np.arange(start, end).repeat(n_cols).reshape(-1, n_cols)
print(gen_array(20, 25, 3))
# [[20 20 20]
# [21 21 21]
# [22 22 22]
# [23 23 23]
# [24 24 24]]
The simplest I found:
The [:,None] adds a dimension to the array.
np.arange(start, end)[:,None]*np.ones(n_cols)
np.arange(start, end)[:, np.newaxis].repeat(n_cols, axis=1)
I wish to create a variable array of numbers in numpy while skipping a chunk of numbers. For instance, If I have the variables:
m = 5
k = 3
num = 50
I want to create a linearly spaced numpy array starting at num and ending at num - k, skip k numbers and continue the array generation. Then repeat this process m times. For example, the above would yield:
np.array([50, 49, 48, 47, 44, 43, 42, 41, 38, 37, 36, 35, 32, 31, 30, 29, 26, 25, 24, 23])
How can I accomplish this via Numpy?
You can try:
import numpy as np
m = 5
k = 3
num = 50
np.hstack([np.arange(num - 2*i*k, num - (2*i+1)*k - 1, -1) for i in range(m)])
It gives:
array([50, 49, 48, 47, 44, 43, 42, 41, 38, 37, 36, 35, 32, 31, 30, 29, 26,
25, 24, 23])
Edit:
#JanChristophTerasa posted an answer (now deleted) that avoided Python loops by masking some elements of an array obtained using np.arange(). Here is a solution inspired by that idea. It works much faster than the above one:
import numpy as np
m = 5
k = 3
num = 50
x = np.arange(num, num - 2*k*m , -1).reshape(-1, 2*k)
x[:, :k+1].ravel()
We can use a mask and np.tile:
def mask_and_tile(m=5, k=3, num=50):
a = np.arange(num, num - 2 * m * k, -1) # create numbers
mask = np.ones(k * 2, dtype=bool) # create mask
mask[k+1:] = False # set appropriate elements to False
mask = np.tile(mask, m) # repeat mask m times
result = a[mask] # mask our numbers
return result
Or we can use a mask and just toggle the appropriate element:
def mask(m=5, k=3, num=50):
a = np.arange(num, num - 2 * m * k, -1) # create numbers
mask = np.ones_like(a, dtype=bool).reshape(-1, k)
mask[1::2] = False
mask[1::2, 0] = True
result = a[mask.flatten()]
return result
This will work fine:
import numpy as np
m = 5
k = 3
num = 50
h=0
x = np.array([])
for i in range(m):
x = np.append(x, range(num-h,num-h-k-1,-1))
h+=2*k
print(x)
Output
[50. 49. 48. 47. 44. 43. 42. 41. 38. 37. 36. 35. 32. 31. 30. 29. 26. 25.
24. 23.]
One way of doing this is making a 2D grid and calculating each number based on its position in the grid, then flattening it to a 1D array.
import numpy as np
num=50
m=5
k=3
# coordinates in a grid of width k+1 and height m
y, x = np.mgrid[:m, :k+1]
# a=[[50-0, 50-1, 50-2, 50-3], [50-0-2*3*1, 50-1-2*3*1, ...], [50-0-2*3*2...]...]
a = num - x - 2 * k * y
print(a.ravel())
Is there any way randomly pick index of numpy array with constant interval.
For Example,
I have an array with shape (1, 150) that is 5*30 elements. I want to randomly pick x indices where x<=30 for each 30 elements. So, totally I will have x*5 randomly picked indices from the array.
First, I tried with np.chocie but I can't use np.choice, because it doesn't look the constant interval.
I can go with loop iteration for each elements but I feel it's not the effective way.
Is there any way in numpy?
I tried this, it gives the required result. but I want to improve the code
frames_picker = np.zeros(30*5)
samples=[]
for i in range(5):
sample = (np.random.choice(frames_picker[0: 30].shape[0], 5, replace=False))+(i*30)
samples.append(sample)
samples=np.array(samples)
frames_picker[np.sort(samples)]=1
>>> np.random.choice(30, size=(5,5), replace=False) + np.arange(0, 150, 30)[:,None]
array([[ 18, 28, 13, 6, 8],
[ 40, 56, 44, 57, 32],
[ 83, 71, 65, 81, 64],
[114, 115, 97, 90, 106],
[137, 121, 129, 142, 149]])
You can then flatten and sort them. This gives you 5 random indices from each interval [0, 29], [30, 59], ..., [120, 149].
You can convert the 1D array to a more suitable 2D matrix and then apply sampling. For example:
data = np.array(...) # 1D array with 5*30 elements
m = np.asmatrix(np.split(data, 5)) # chunk and convert to a matrix
sample = m[np.random.randint(m.shape[0], size=30), :] # get random 30 slices from the matrix
np.ravel(sample) # convert back to 1D array
I'm given a problem that explicitly asks me not to use numpy and pandas
Prob : Selecting an element from the list A randomly with probability proportional to its magnitude. assume we are doing the same experiment for 100 times with replacement, in each experiment you will print a number that is selected randomly from A.
Ex 1: A = [0 5 27 6 13 28 100 45 10 79]
let f(x) denote the number of times x getting selected in 100 experiments.
f(100) > f(79) > f(45) > f(28) > f(27) > f(13) > f(10) > f(6) > f(5) > f(0)
Initially, I took the sum of all the elements of list A
I then divided (in order to normaliz) each element of list A by the sum and stored each of these values in another list (d_dash)
I then created another empty list (d_bar), that takes in cumalative sum of all elements of d_dash
created variable r, where r= random.uniform(0.0,1.0), and then for the length of d_dash comapring r to d_dash[k], if r<=d_dash[k], return A[k]
However, I'm getting the error list index out of range near d_dash[j].append((A[j]/sum)), not sure what is the issue here as I did not exceed the index of either d_dash or A[j].
Also, is my logic correct ? sharing a better way to do this would be appreciated.
Thanks in advance.
import random
A = [0,5,27,6,13,28,100,45,10,79]
def propotional_sampling(A):
sum=0
for i in range(len(A)):
sum = sum + A[i]
d_dash=[]
for j in range(len(A)):
d_dash[j].append((A[j]/sum))
#cumulative sum
d_bar =[]
d_bar[0]= 0
for k in range(len(A)):
d_bar[k] = d_bar[k] + d_dash[k]
r = random.uniform(0.0,1.0)
number=0
for p in range(len(d_bar)):
if(r<=d_bar[p]):
number=d_bar[p]
return number
def sampling_based_on_magnitued():
for i in range(1,100):
number = propotional_sampling(A)
print(number)
sampling_based_on_magnitued()
Below is the code to do the same :
A = [0, 5, 27, 6, 13, 28, 100, 45, 10, 79]
#Sum of all the elements in the array
S = sum(A)
#Calculating normalized sum
norm_sum = [ele/S for ele in A]
#Calculating cumulative normalized sum
cum_norm_sum = []
cum_norm_sum.append(norm_sum[0])
for itr in range(1, len(norm_sum), 1) :
cum_norm_sum.append(cum_norm_sum[-1] + norm_sum[itr])
def prop_sampling(cum_norm_sum) :
"""
This function returns an element
with proportional sampling.
"""
r = random.random()
for itr in range(len(cum_norm_sum)) :
if r < cum_norm_sum[itr] :
return A[itr]
#Sampling 1000 elements from the given list with proportional sampling
sampled_elements = []
for itr in range(1000) :
sampled_elements.append(prop_sampling(cum_norm_sum))
Below image shows the frequency of each element in the sampled points :
Clearly the number of times each elements appears is proportional to its magnitude.
Cumulative sum can be computed by itertools.accumulate. The loop:
for p in range(len(d_bar)):
if(r<=d_bar[p]):
number=d_bar[p]
can be substituted by bisect.bisect() (doc):
import random
from itertools import accumulate
from bisect import bisect
A = [0,5,27,6,13,28,100,45,10,79]
def propotional_sampling(A, n=100):
# calculate cumulative sum from A:
cum_sum = [*accumulate(A)]
# cum_sum = [0, 5, 32, 38, 51, 79, 179, 224, 234, 313]
out = []
for _ in range(n):
i = random.random() # i = [0.0, 1.0)
idx = bisect(cum_sum, i*cum_sum[-1]) # get index to list A
out.append(A[idx])
return out
print(propotional_sampling(A))
Prints (for example):
[10, 100, 100, 79, 28, 45, 45, 27, 79, 79, 79, 79, 100, 27, 100, 100, 100, 13, 45, 100, 5, 100, 45, 79, 100, 28, 79, 79, 6, 45, 27, 28, 27, 79, 100, 79, 79, 28, 100, 79, 45, 100, 10, 28, 28, 13, 79, 79, 79, 79, 28, 45, 45, 100, 28, 27, 79, 27, 45, 79, 45, 100, 28, 100, 100, 5, 100, 79, 28, 79, 13, 100, 100, 79, 28, 100, 79, 13, 27, 100, 28, 10, 27, 28, 100, 45, 79, 100, 100, 100, 28, 79, 100, 45, 28, 79, 79, 5, 45, 28]
The reason you got "list index out of range" message is that you created an empty list "d_bar =[]" and the started assigning value to it "d_bar[k] = d_bar[k] + d_dash[k]". I recoomment using the followoing structor isntead:
First, define it in this way:
d_bar=[0 for i in range(len(A))]
Also, I believe this code will return 1 forever as there is no break in the loop. you can resolve this issue by adding "break". here is updated version of your code:
A = [0, 5, 27, 6, 13, 28, 100, 45, 10, 79]
def pick_a_number_from_list(A):
sum=0
for i in A:
sum+=i
A_norm=[]
for j in A:
A_norm.append(j/sum)
A_cum=[0 for i in range(len(A))]
A_cum[0]=A_norm[0]
for k in range(len(A_norm)-1):
A_cum[k+1]=A_cum[k]+A_norm[k+1]
A_cum
r = random.uniform(0.0,1.0)
number=0
for p in range(len(A_cum)):
if(r<=A_cum[p]):
number=A[p]
break
return number
def sampling_based_on_magnitued():
for i in range(1,100):
number = pick_a_number_from_list(A)
print(number)
sampling_based_on_magnitued()
edit: it's an image so the suggested (How can I efficiently process a numpy array in blocks similar to Matlab's blkproc (blockproc) function) isn't really working for me
I have the following matlab code
fun = #(block_struct) ...
std2(block_struct.data) * ones(size(block_struct.data));
B=blockproc(im2double(Icorrected), [4 4], fun);
I want to remake my code, but this time in Python. I have installed Scikit and i'm trying to work around it like this
b = np.std(a, axis = 2)
The problem of course it's that i'm not applying the std for a number of blocks, just like above.
How can i do something like this? Start a loop and try to call the function for each X*X blocks? Then i wouldn't keep the size the it was.
Is there another more efficient way?
If there is no overlap in the windows you can reshape the data to suit your needs:
Find the mean of 3x3 windows of a 9x9 array.
import numpy as np
>>> a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59, 60, 61, 62],
[63, 64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79, 80]])
Find the new shape
>>> window_size = (3,3)
>>> tuple(np.array(a.shape) / window_size) + window_size
(3, 3, 3, 3)
>>> b = a.reshape(3,3,3,3)
Find the mean along the first and third axes.
>>> b.mean(axis = (1,3))
array([[ 10., 13., 16.],
[ 37., 40., 43.],
[ 64., 67., 70.]])
>>>
2x2 windows of a 4x4 array:
>>> a = np.arange(16).reshape((4,4))
>>> window_size = (2,2)
>>> tuple(np.array(a.shape) / window_size) + window_size
(2, 2, 2, 2)
>>> b = a.reshape(2,2,2,2)
>>> b.mean(axis = (1,3))
array([[ 2.5, 4.5],
[ 10.5, 12.5]])
>>>
It won't work if the window size doesn't divide into the array size evenly. In that case you need some overlap in the windows or if you just want overlap numpy.lib.stride_tricks.as_strided is the way to go - a generic N-D function can be found at Efficient Overlapping Windows with Numpy
Another option for 2d arrays is sklearn.feature_extraction.image.extract_patches_2d and for ndarray's - sklearn.feature_extraction.image.extract_patches. Each manipulate the array's strides to produce the patches/windows.
I did the following
io.use_plugin('pil', 'imread')
a = io.imread('C:\Users\Dimitrios\Desktop\polimesa\\arizona.jpg')
B = np.zeros((len(a)/2 +1, len(a[0])/2 +1))
for i in xrange(0, len(a), 2):
for j in xrange(0, len(a[0]), 2):
x.append(a[i][j])
if i+1 < len(a):
x.append(a[i+1][j])
if j+1 < len(a[0]):
x.append(a[i][j+1])
if i+1 < len(a) and j+1 < len(a[0]):
x.append(a[i+1][j+1])
B[i/2][j/2] = np.std(x)
x[:] = []
and i think it's correct. Iterating over the image by 2 and taking each neighbour node, adding them to a list and calculating std.
edit* later edited for 4x4 blocks.
We can implement blockproc() in python the following way:
def blockproc(im, block_sz, func):
h, w = im.shape
m, n = block_sz
for x in range(0, h, m):
for y in range(0, w, n):
block = im[x:x+m, y:y+n]
block[:,:] = func(block)
return im
Now, let's apply it to implement contrast enhancement with local histogram equalization, with the low-contrast moon image (of size 512x512) as input and choosing 32x32 blocks:
from skimage import data, exposure
img = data.moon()
img = img / img.max()
m, n = 64, 64
img_eq = blockproc(img.copy(), (m, n), exposure.equalize_hist)
Display the input and output images:
Note that the function does in-place modification to the image, hence a copy of the input image is passed instead.