How can I get the second minimum value from each column? I have this array:
A = [[72 76 44 62 81 31]
[54 36 82 71 40 45]
[63 59 84 36 34 51]
[58 53 59 22 77 64]
[35 77 60 76 57 44]]
I wish to have output like:
A = [54 53 59 36 40 44]
Try this, in just one line:
[sorted(i)[1] for i in zip(*A)]
in action:
In [12]: A = [[72, 76, 44, 62, 81, 31],
...: [54 ,36 ,82 ,71 ,40, 45],
...: [63 ,59, 84, 36, 34 ,51],
...: [58, 53, 59, 22, 77 ,64],
...: [35 ,77, 60, 76, 57, 44]]
In [18]: [sorted(i)[1] for i in zip(*A)]
Out[18]: [54, 53, 59, 36, 40, 44]
zip(*A) will transpose your list of list so the columns become rows.
and if you have duplicate value, for example:
In [19]: A = [[72, 76, 44, 62, 81, 31],
...: [54 ,36 ,82 ,71 ,40, 45],
...: [63 ,59, 84, 36, 34 ,51],
...: [35, 53, 59, 22, 77 ,64], # 35
...: [35 ,77, 50, 76, 57, 44],] # 35
If you need to skip both 35s, you can use set():
In [29]: [sorted(list(set(i)))[1] for i in zip(*A)]
Out[29]: [54, 53, 50, 36, 40, 44]
Operations on numpy arrays should be done with numpy functions, so look at this one:
np.sort(A, axis=0)[1, :]
Out[61]: array([54, 53, 59, 36, 40, 44])
you can use heapq.nsmallest
from heapq import nsmallest
[nsmallest(2, e)[-1] for e in zip(*A)]
output:
[54, 53, 50, 36, 40, 44]
I added a simple benchmark to compare the performance of the different solutions already posted:
from simple_benchmark import BenchmarkBuilder
from heapq import nsmallest
b = BenchmarkBuilder()
#b.add_function()
def MehrdadPedramfar(A):
return [sorted(i)[1] for i in zip(*A)]
#b.add_function()
def NicolasGervais(A):
return np.sort(A, axis=0)[1, :]
#b.add_function()
def imcrazeegamerr(A):
rotated = zip(*A[::-1])
result = []
for arr in rotated:
# sort each 1d array from min to max
arr = sorted(list(arr))
# add the second minimum value to result array
result.append(arr[1])
return result
#b.add_function()
def Daweo(A):
return np.apply_along_axis(lambda x:heapq.nsmallest(2,x)[-1], 0, A)
#b.add_function()
def kederrac(A):
return [nsmallest(2, e)[-1] for e in zip(*A)]
#b.add_arguments('Number of row/cols (A is square matrix)')
def argument_provider():
for exp in range(2, 18):
size = 2**exp
yield size, [[randint(0, 1000) for _ in range(size)] for _ in range(size)]
r = b.run()
r.plot()
Using zip with sorted function is the fastest solution for small 2d lists while using zip with heapq.nsmallest shows to be the best on big 2d lists
I hope I understood your question correctly but either way here's my solution, im sure there is a more elegent way of doing this but it works
A = [[72,76,44,62,81,31]
,[54,36,82,71,40,45]
,[63,59,84,36,34,51]
,[58,53,59,22,77,64]
,[35,77,50,76,57,44]]
#rotate the array 90deg
rotated = zip(*A[::-1])
result = []
for arr in rotated:
# sort each 1d array from min to max
arr = sorted(list(arr))
# add the second minimum value to result array
result.append(arr[1])
print(result)
Assuming that A is numpy.array (if this holds true please consider adding numpy tag to your question) then you might use apply_along_axis for that following way:
import heap
import numpy as np
A = np.array([[72, 76, 44, 62, 81, 31],
[54, 36, 82, 71, 40, 45],
[63, 59, 84, 36, 34, 51],
[58, 53, 59, 22, 77, 64],
[35, 77, 60, 76, 57, 44]])
second_mins = np.apply_along_axis(lambda x:heapq.nsmallest(2,x)[-1], 0, A)
print(second_mins) # [54 53 59 36 40 44]
Note that I used heapq.nsmallest as it does as much sorting as required to get 2 smallest elements, unlike sorted which does complete sort.
>>> A = np.arange(30).reshape(5,6).tolist()
>>> A
[[0, 1, 2, 3, 4, 5],
[6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]]
Updated:
Use set to prevent from duplicate and transpose list using zip(*A)
>>> [sorted(set(items))[1] for items in zip(*A)]
[6, 7, 8, 9, 10, 11]
old: second minimum item in each row
>>> [sorted(set(items))[1] for items in A]
[1, 7, 13, 19, 25]
Related
I have a problem with matrix sort.
I need to create a matrix (MxM) from input. And create nested lists using randrange.
matrix_size = int(input("Enter size of the matrix: "))
matrix = [[randrange(1, 51) for column in range(matrix_size)] for row in range(matrix_size)]
Next step i should find sum of each column of matrix. So i do this thing:
for i in range(matrix_size):
sum_column = 0
for j in range(matrix_size):
sum_column += matrix[j][i]
print(f'{matrix[i][j]:>5}', end='')
print(f'{sum_column:>5}')
So problem is... that i should add sum row in the end of a matrix. But what happens to me:
Enter the size of the matrix: 5
15 23 14 22 20 73
7 26 26 27 27 160
17 36 9 13 42 104
1 32 41 2 29 113
33 43 14 49 12 130
Yeah. It counting right but how i can add it to the end of matrix. And sort ascending to the sums of columns. Hope some of you will understand what i need. Thanks
Do you mean something like this?
import numpy as np
matrix = np.array(matrix)
rowsum = matrix.sum(axis=1) # sum of rows
idx = np.argsort(rowsum) # permutation that makes rowsum sorted
result = np.hstack([matrix, rowsum[:, None]]) # join matrix and roswum
result = result[idx] # sort rows in ascending order
for matrix
array([[31, 13, 29, 5, 1],
[21, 9, 34, 31, 22],
[13, 38, 29, 20, 50],
[21, 12, 26, 5, 15],
[19, 24, 38, 44, 41]])
would the output be:
array([[ 31, 13, 29, 5, 1, 79],
[ 21, 12, 26, 5, 15, 79],
[ 21, 9, 34, 31, 22, 117],
[ 13, 38, 29, 20, 50, 150],
[ 19, 24, 38, 44, 41, 166]])
I have two numpy arrays of different dimensions:
x.shape = (1,1,M) and Y.shape = (N,N).
How do I perform Z = x - Y efficiently in python, such that Z.shape = (N,N,M), where - is an elementwise subtraction operation.
For example, M=10
x = array([[[1, 2, 3, 4, 5 , 6, 7, 8, 9, 10]]])
and N=8
Y = array([[11, 12, 13, 14, 15, 16, 17, 18],
[21, 22, 23, 24, 25, 26, 27, 28],
[31, 32, 33, 34, 35, 36, 37, 38],
[41, 42, 43, 44, 45, 46, 47, 48],
[51, 52, 53, 54, 55, 56, 57, 58],
[61, 62, 63, 64, 65, 66, 67, 68],
[71, 72, 73, 74, 75, 76, 77, 78],
[81, 82, 83, 84, 85, 86, 87, 88]])
Now the idea is to get a Z such that
Z[:,:,0] = array([[1-11, 1-12, 1-13, 1-14, 1-15, 1-16, 1-17, 1-18],
[1-21, 1-22, 1-23, 1-24, 1-25, 1-26, 1-27, 1-28],
[1-31, 1-32, 1-33, 1-34, 1-35, 1-36, 1-37, 1-38],
[1-41, 1-42, 1-43, 1-44, 1-45, 1-46, 1-47, 1-48],
[1-51, 1-52, 1-53, 1-54, 1-55, 1-56, 1-57, 1-58],
[1-61, 1-62, 1-63, 1-64, 1-65, 1-66, 1-67, 1-68],
[1-71, 1-72, 1-73, 1-74, 1-75, 1-76, 1-77, 1-78],
[1-81, 1-82, 1-83, 1-84, 1-85, 1-86, 1-87, 1-88]])
and
Z[:,:,9] = array([[10-11, 10-12, 10-13, 10-14, 10-15, 10-16, 10-17, 10-18],
[10-21, 10-22, 10-23, 10-24, 10-25, 10-26, 10-27, 10-28],
[10-31, 10-32, 10-33, 10-34, 10-35, 10-36, 10-37, 10-38],
[10-41, 10-42, 10-43, 10-44, 10-45, 10-46, 10-47, 10-48],
[10-51, 10-52, 10-53, 10-54, 10-55, 10-56, 10-57, 10-58],
[10-61, 10-62, 10-63, 10-64, 10-65, 10-66, 10-67, 10-68],
[10-71, 10-72, 10-73, 10-74, 10-75, 10-76, 10-77, 10-78],
[10-81, 10-82, 10-83, 10-84, 10-85, 10-86, 10-87, 10-88]])
and so on.
It is easy to do in MATLAB using just - operation. But Python does not support it.
The answer is: use different shape of y:
>>> y = y.reshape((8, 8, 1))
>>> (x-y).shape
(8, 8, 10)
This is a vizualization for better understanding with smaller dimensions:
You can compute your result without explicit creation of a reshaped array,
but using Numpy broadcasting.
The key to success is to add a new dimension to Y, using np.newaxis:
Z = x - Y[:, :, np.newaxis]
I have a numpy array (of an image), the 3rd dimension is of length 3. An example of my array is below. I am attempting to iterate it so I access/print the last dimension of the array. But each of the techniques below accesses each individual value in the 3d array rather than the whole 3d array.
How can I iterate this numpy array at the 3d array level?
My array:
src = cv2.imread('./myimage.jpg')
# naive/shortened example of src contents (shape=(1, 3, 3))
[[[117 108 99]
[115 105 98]
[ 90 79 75]]]
When iterating my objective is print the following values each iteration:
[117 108 99] # iteration 1
[115 105 98] # iteration 2
[ 90 79 75] # iteration 3
# Attempt 1 to iterate
for index,value in np.ndenumerate(src):
print(src[index]) # src[index] and value = 117 when I was hoping it equals [117 108 99]
# Attempt 2 to iterate
for index,value in enumerate(src):
print(src[index]) # value = is the entire row
Solution
You could use any of the following two methods. However, Method-2 is more robust and the justification for that has been shown in the section: Detailed Solution below.
import numpy as np
src = [[117, 108, 99], [115, 105, 98], [ 90, 79, 75]]
src = np.array(src).reshape((1,3,3))
Method-1
for row in src[0,:]:
print(row)
Method-2
Robust method.
for e in np.transpose(src, [2,0,1]):
print(e)
Output:
[117 108 99]
[115 105 98]
[90 79 75]
Detailed Solution
Let us make an array of shape (3,4,5). So, if we iterate over the 3rd dimension, we should find 5 items, each with a shape of (3,4). You could achieve this by using numpy.transpose as shown below:
src = np.arange(3*4*5).reshape((3,4,5))
for e in np.transpose(src, [2,0,1]):
print(row)
Output:
[[ 0 5 10 15]
[20 25 30 35]
[40 45 50 55]]
[[ 1 6 11 16]
[21 26 31 36]
[41 46 51 56]]
[[ 2 7 12 17]
[22 27 32 37]
[42 47 52 57]]
[[ 3 8 13 18]
[23 28 33 38]
[43 48 53 58]]
[[ 4 9 14 19]
[24 29 34 39]
[44 49 54 59]]
Here the array src is:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
General advice: When working with numpy, explicit python loops should be a last resort. Numpy is an extremely powerful tool which covers most use cases. Learn how to use it properly! If it helps, you can think of numpy as almost its own mini-language within a language.
Now, onto the code. I chose here to keep only the subarrays whose values are all below 100, but of course this is completely arbitrary and serves only to demonstrate the code.
import numpy as np
arr = np.array([[[117, 108, 99], [115, 105, 98], [90, 79, 75]], [[20, 3, 99], [101, 250, 30], [75, 89, 83]]])
cond_mask = np.all(a=arr < 100, axis=2)
arr_result = arr[cond_mask]
Let me know if you have any questions about the code :)
I have a pandas dataframe containing ~200,000 rows and I would like to create 5 random samples of 1000 rows each however I do not want any of these samples to contain the same row twice.
To create a random sample I have been using:
import numpy as np
rows = np.random.choice(df.index.values, 1000)
sampled_df = df.ix[rows]
However just doing this several times would run the risk of having duplicates. Would the best way to handle this be keeping track of which rows are sampled each time?
You can use df.sample.
A dataframe with 100 rows and 5 columns:
df = pd.DataFrame(np.random.randn(100, 5), columns = list("abcde"))
Sample 5 rows:
df.sample(5)
Out[8]:
a b c d e
84 0.012201 -0.053014 -0.952495 0.680935 0.006724
45 -1.347292 1.358781 -0.838931 -0.280550 -0.037584
10 -0.487169 0.999899 0.524546 -1.289632 -0.370625
64 1.542704 -0.971672 -1.150900 0.554445 -1.328722
99 0.012143 -2.450915 -0.718519 -1.192069 -1.268863
This ensures those 5 rows are different. If you want to repeat this process, I'd suggest sampling number_of_rows * number_of_samples rows. For example if each sample is going to contain 5 rows and you need 10 samples, sample 50 rows. The first 5 will be the first sample, the second five will be the second...
all_samples = df.sample(50)
samples = [all_samples.iloc[5*i:5*i+5] for i in range(10)]
You can set replace to False in np.random.choice
rows = np.random.choice(df.index.values, 1000, replace=False)
Take a look on numpy.random docs
For your solution:
import numpy as np
rows = np.random.choice(df.index.values, 1000, replace=False)
sampled_df = df.ix[rows]
This will make random choices without replacement.
If you want to generate multiple samples that none will have any elements in common you will need to remove the elements from each choice after each iteration. You can usenumpy.setdiff1d for that.
import numpy as np
allRows = df.index.values
numOfSamples = 5
samples = list()
for i in xrange(numOfSamples):
choices = np.random.choice(allRows, 1000, replace=False)
samples.append(choices)
allRows = np.setdiff1d(allRows, choices)
Here is a working example with a range of numbers between 0 and 100:
In [58]: import numpy as np
In [59]: allRows = np.arange(100)
In [60]: numOfSamples = 5
In [61]: samples = list()
In [62]: for i in xrange(numOfSamples):
....: choices = np.random.choice(allRows, 5, replace=False)
....: samples.append(choices)
....: allRows = np.setdiff1d(allRows, choices)
....:
In [63]: samples
Out[63]:
[array([66, 24, 47, 31, 22]),
array([ 8, 28, 15, 62, 52]),
array([18, 65, 71, 54, 48]),
array([59, 88, 43, 7, 85]),
array([97, 36, 55, 56, 14])]
In [64]: allRows
Out[64]:
array([ 0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 16, 17, 19, 20, 21,
23, 25, 26, 27, 29, 30, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42, 44,
45, 46, 49, 50, 51, 53, 57, 58, 60, 61, 63, 64, 67, 68, 69, 70, 72,
73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 86, 87, 89, 90, 91,
92, 93, 94, 95, 96, 98, 99])
Let's say I have a ndarray like this:
a = [[20 43 61 41][92 23 43 33]]
I want take the first dimension of this ndarray. so I try something like this:
a[0,:]
I hope it will return something like this:
[[20 43 61 41]]
but i got this error:
TypeError: 'numpy.int32' object is not iterable
Anyone can help me to solve this problem?
Using slice:
>>> import numpy as np
>>> a = np.array([[20, 43, 61, 41], [92, 23, 43, 33]])
>>> a[:1] # OR a[0:1]
array([[20, 43, 61, 41]])
>>> print(a[:1])
[[20 43 61 41]]
It's strange that you're getting this error. It suggests that a isn't what you think it is (i.e. not a Numpy array).
Anyway, here is how it can be done:
In [10]: import numpy as np
In [11]: a = np.array([[20, 43, 61, 41], [92, 23, 43, 33]])
In [12]: a[0:1]
Out[12]: array([[20, 43, 61, 41]])
Contrast this with
In [14]: a[0]
Out[14]: array([20, 43, 61, 41])
(which may or may not be what you want.)