Related
I am trying to simply add a column of ones to a numpy array but cannot find any easy solution to what I feel should be a straightforward answer. The number of rows in my array may change therefore the solution needs to generalise.
import numpy as np
X = np.array([[1,45,23,56,34,23],
[2,46,24,57,35,23]])
My desired output:
array([[ 1, 45, 23, 56, 34, 23, 1],
[ 2, 46, 24, 57, 35, 23, 1]])
I have tried using np.append and np.insert, but they either flatten the array or replace the values.
Thanks.
you can do hstack:
np.hstack((X,np.ones([X.shape[0],1], X.dtype)))
Output:
array([[ 1, 45, 23, 56, 34, 23, 1],
[ 2, 46, 24, 57, 35, 23, 1]])
You can use append, but you have to tell it which axis you want it to work along:
np.append(X, [[1],[1]], axis=1)
You can use numpy.c_
np.c_[X, [1, 1]]
You might use numpy.insert following way:
import numpy as np
X = np.array([[1,45,23,56,34,23], [2,46,24,57,35,23]])
X1 = np.insert(X, X.shape[1], 1, axis=1)
print(X1)
Output:
[[ 1 45 23 56 34 23 1]
[ 2 46 24 57 35 23 1]]
How can I get the second minimum value from each column? I have this array:
A = [[72 76 44 62 81 31]
[54 36 82 71 40 45]
[63 59 84 36 34 51]
[58 53 59 22 77 64]
[35 77 60 76 57 44]]
I wish to have output like:
A = [54 53 59 36 40 44]
Try this, in just one line:
[sorted(i)[1] for i in zip(*A)]
in action:
In [12]: A = [[72, 76, 44, 62, 81, 31],
...: [54 ,36 ,82 ,71 ,40, 45],
...: [63 ,59, 84, 36, 34 ,51],
...: [58, 53, 59, 22, 77 ,64],
...: [35 ,77, 60, 76, 57, 44]]
In [18]: [sorted(i)[1] for i in zip(*A)]
Out[18]: [54, 53, 59, 36, 40, 44]
zip(*A) will transpose your list of list so the columns become rows.
and if you have duplicate value, for example:
In [19]: A = [[72, 76, 44, 62, 81, 31],
...: [54 ,36 ,82 ,71 ,40, 45],
...: [63 ,59, 84, 36, 34 ,51],
...: [35, 53, 59, 22, 77 ,64], # 35
...: [35 ,77, 50, 76, 57, 44],] # 35
If you need to skip both 35s, you can use set():
In [29]: [sorted(list(set(i)))[1] for i in zip(*A)]
Out[29]: [54, 53, 50, 36, 40, 44]
Operations on numpy arrays should be done with numpy functions, so look at this one:
np.sort(A, axis=0)[1, :]
Out[61]: array([54, 53, 59, 36, 40, 44])
you can use heapq.nsmallest
from heapq import nsmallest
[nsmallest(2, e)[-1] for e in zip(*A)]
output:
[54, 53, 50, 36, 40, 44]
I added a simple benchmark to compare the performance of the different solutions already posted:
from simple_benchmark import BenchmarkBuilder
from heapq import nsmallest
b = BenchmarkBuilder()
#b.add_function()
def MehrdadPedramfar(A):
return [sorted(i)[1] for i in zip(*A)]
#b.add_function()
def NicolasGervais(A):
return np.sort(A, axis=0)[1, :]
#b.add_function()
def imcrazeegamerr(A):
rotated = zip(*A[::-1])
result = []
for arr in rotated:
# sort each 1d array from min to max
arr = sorted(list(arr))
# add the second minimum value to result array
result.append(arr[1])
return result
#b.add_function()
def Daweo(A):
return np.apply_along_axis(lambda x:heapq.nsmallest(2,x)[-1], 0, A)
#b.add_function()
def kederrac(A):
return [nsmallest(2, e)[-1] for e in zip(*A)]
#b.add_arguments('Number of row/cols (A is square matrix)')
def argument_provider():
for exp in range(2, 18):
size = 2**exp
yield size, [[randint(0, 1000) for _ in range(size)] for _ in range(size)]
r = b.run()
r.plot()
Using zip with sorted function is the fastest solution for small 2d lists while using zip with heapq.nsmallest shows to be the best on big 2d lists
I hope I understood your question correctly but either way here's my solution, im sure there is a more elegent way of doing this but it works
A = [[72,76,44,62,81,31]
,[54,36,82,71,40,45]
,[63,59,84,36,34,51]
,[58,53,59,22,77,64]
,[35,77,50,76,57,44]]
#rotate the array 90deg
rotated = zip(*A[::-1])
result = []
for arr in rotated:
# sort each 1d array from min to max
arr = sorted(list(arr))
# add the second minimum value to result array
result.append(arr[1])
print(result)
Assuming that A is numpy.array (if this holds true please consider adding numpy tag to your question) then you might use apply_along_axis for that following way:
import heap
import numpy as np
A = np.array([[72, 76, 44, 62, 81, 31],
[54, 36, 82, 71, 40, 45],
[63, 59, 84, 36, 34, 51],
[58, 53, 59, 22, 77, 64],
[35, 77, 60, 76, 57, 44]])
second_mins = np.apply_along_axis(lambda x:heapq.nsmallest(2,x)[-1], 0, A)
print(second_mins) # [54 53 59 36 40 44]
Note that I used heapq.nsmallest as it does as much sorting as required to get 2 smallest elements, unlike sorted which does complete sort.
>>> A = np.arange(30).reshape(5,6).tolist()
>>> A
[[0, 1, 2, 3, 4, 5],
[6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]]
Updated:
Use set to prevent from duplicate and transpose list using zip(*A)
>>> [sorted(set(items))[1] for items in zip(*A)]
[6, 7, 8, 9, 10, 11]
old: second minimum item in each row
>>> [sorted(set(items))[1] for items in A]
[1, 7, 13, 19, 25]
I have a numpy array (of an image), the 3rd dimension is of length 3. An example of my array is below. I am attempting to iterate it so I access/print the last dimension of the array. But each of the techniques below accesses each individual value in the 3d array rather than the whole 3d array.
How can I iterate this numpy array at the 3d array level?
My array:
src = cv2.imread('./myimage.jpg')
# naive/shortened example of src contents (shape=(1, 3, 3))
[[[117 108 99]
[115 105 98]
[ 90 79 75]]]
When iterating my objective is print the following values each iteration:
[117 108 99] # iteration 1
[115 105 98] # iteration 2
[ 90 79 75] # iteration 3
# Attempt 1 to iterate
for index,value in np.ndenumerate(src):
print(src[index]) # src[index] and value = 117 when I was hoping it equals [117 108 99]
# Attempt 2 to iterate
for index,value in enumerate(src):
print(src[index]) # value = is the entire row
Solution
You could use any of the following two methods. However, Method-2 is more robust and the justification for that has been shown in the section: Detailed Solution below.
import numpy as np
src = [[117, 108, 99], [115, 105, 98], [ 90, 79, 75]]
src = np.array(src).reshape((1,3,3))
Method-1
for row in src[0,:]:
print(row)
Method-2
Robust method.
for e in np.transpose(src, [2,0,1]):
print(e)
Output:
[117 108 99]
[115 105 98]
[90 79 75]
Detailed Solution
Let us make an array of shape (3,4,5). So, if we iterate over the 3rd dimension, we should find 5 items, each with a shape of (3,4). You could achieve this by using numpy.transpose as shown below:
src = np.arange(3*4*5).reshape((3,4,5))
for e in np.transpose(src, [2,0,1]):
print(row)
Output:
[[ 0 5 10 15]
[20 25 30 35]
[40 45 50 55]]
[[ 1 6 11 16]
[21 26 31 36]
[41 46 51 56]]
[[ 2 7 12 17]
[22 27 32 37]
[42 47 52 57]]
[[ 3 8 13 18]
[23 28 33 38]
[43 48 53 58]]
[[ 4 9 14 19]
[24 29 34 39]
[44 49 54 59]]
Here the array src is:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
General advice: When working with numpy, explicit python loops should be a last resort. Numpy is an extremely powerful tool which covers most use cases. Learn how to use it properly! If it helps, you can think of numpy as almost its own mini-language within a language.
Now, onto the code. I chose here to keep only the subarrays whose values are all below 100, but of course this is completely arbitrary and serves only to demonstrate the code.
import numpy as np
arr = np.array([[[117, 108, 99], [115, 105, 98], [90, 79, 75]], [[20, 3, 99], [101, 250, 30], [75, 89, 83]]])
cond_mask = np.all(a=arr < 100, axis=2)
arr_result = arr[cond_mask]
Let me know if you have any questions about the code :)
Suppose I have a DataFrame of lists,
my_df = pd.DataFrame({'my_list':[[45,12,23],[20,46,78],[45,30,45]]})
which yields the following:
my_list
0 [45, 12, 23]
1 [20, 46, 78]
2 [45, 30, 45]
How can I add an element, let's say 99, to my_list for each row ?
Expected result :
my_list
0 [45, 12, 23, 99]
1 [20, 46, 78, 99]
2 [45, 30, 45, 99]
In [90]: my_df['my_list'] += [99]
In [91]: my_df
Out[91]:
my_list
0 [45, 12, 23, 99]
1 [20, 46, 78, 99]
2 [45, 30, 45, 99]
Sounds awfully boring but just iterate over the values directly - this way you can call append and avoid whatever rebinding occurs with +=, making things significantly faster.
for val in my_df.my_list:
val.append(99)
Demo
>>> import timeit
>>> setup = '''
import pandas as pd; import numpy as np
df = pd.DataFrame({'my_list': np.random.randint(0, 100, (500, 500)).tolist()})
'''
>>> min(timeit.Timer('for val in df.my_list: val.append(90)',
setup=setup).repeat(10, 1000))
0.05669815401779488
>>> min(timeit.Timer('df.my_list += [90]',
setup=setup).repeat(10, 1000))
2.7741127769695595
Of course, if speed (or even if not speed) is important to you, you should question if you really need to have lists inside a DataFrame. Consider working on a NumPy array until you need Pandas utility and doing something like
np.c_[arr, np.full(arr.shape[0], 90)]
or at least splitting your lists inside the DataFrame to separate columns and assigning a new column .
I'm using a random.seed() to try and keep the random.sample() the same as I sample more values from a list and at some point the numbers change.....where I thought the one purpose of the seed() function was to keep the numbers the same.
Heres a test I did to prove it doesn't keep the same numbers.
import random
a=range(0,100)
random.seed(1)
a = random.sample(a,10)
print a
then change the sample much higher and the sequence will change(at least for me they always do):
a = random.sample(a,40)
print a
I'm sort of a newb so maybe this is an easy fix but I would appreciate any help on this.
Thanks!
If you were to draw independent samples from the generator, what would happen would be exactly what you're expecting:
In [1]: import random
In [2]: random.seed(1)
In [3]: [random.randint(0, 99) for _ in range(10)]
Out[3]: [13, 84, 76, 25, 49, 44, 65, 78, 9, 2]
In [4]: random.seed(1)
In [5]: [random.randint(0, 99) for _ in range(40)]
Out[5]: [13, 84, 76, 25, 49, 44, 65, 78, 9, 2, 83, 43 ...]
As you can see, the first ten numbers are indeed the same.
It is the fact that random.sample() is drawing samples without replacement that's getting in the way. To understand how these algorithms work, see Reservoir Sampling. In essence what happens is that later samples can push earlier samples out of the result set.
One alternative might be to shuffle a list of indices and then take either 10 or 40 first elements:
In [1]: import random
In [2]: a = range(0,100)
In [3]: random.shuffle(a)
In [4]: a[:10]
Out[4]: [48, 27, 28, 4, 67, 76, 98, 68, 35, 80]
In [5]: a[:40]
Out[5]: [48, 27, 28, 4, 67, 76, 98, 68, 35, 80, ...]
It seems that random.sample is deterministic only if the seed and sample size are kept constant. In other words, even if you reset the seed, generating a sample with a different length is not "the same" random operation, and may give a different initial subsequence than generating a smaller sample with the same seed. In other words, the same random numbers are being generated internally, but the way sample uses them to derive the random sequence is different depending on how large a sample you ask for.
You are assuming an implementation of random.sample something like this:
def samples(lst, k):
n = len(lst)
indices = []
while len(indices) < k:
index = random.randrange(n)
if index not in indices:
indices.append(index)
return [lst[i] for i in indices]
Which gives:
>>> random.seed(1)
>>> samples(list(range(20)), 5)
[4, 18, 2, 8, 3]
>>> random.seed(1)
>>> samples(list(range(20)), 10)
[4, 18, 2, 8, 3, 15, 14, 12, 6, 0]
However, that isn't how random.sample is actually implemented; seed does work how you think, it's sample that doesn't!
You simply need to re-seed it:
a = list(range(100))
random.seed(1) # seed first time
random.sample(a, 10)
>> [17, 72, 97, 8, 32, 15, 63, 57, 60, 83]
random.seed(1) # seed second time with same value
random.sample(a, 40)
>> [17, 72, 97, 8, 32, 15, 63, 57, 60, 83, 48, 26, 12, 62, 3, 49, 55, 77, 0, 92, 34, 29, 75, 13, 40, 85, 2, 74, 69, 1, 89, 27, 54, 98, 28, 56, 93, 35, 14, 22]
But in your case you're using a generator, not a list, so after sampling the first time a will shrink (from 100 to 90), and you will lose the elements that you had sampled, so it won't work. So just use a list and seed before every sampling.