Unique entries in columns of a 2D numpy array - python

I have an array of integers:
import numpy as np
demo = np.array([[1, 2, 3],
[1, 5, 3],
[4, 5, 6],
[7, 8, 9],
[4, 2, 3],
[4, 2, 12],
[10, 11, 13]])
And I want an array of unique values in the columns, padded with something if necessary (e.g. nan):
[[1, 4, 7, 10, nan],
[2, 5, 8, 11, nan],
[3, 6, 9, 12, 13]]
It does work when I iterate over the transposed array and use a boolean_indexing solution from a previous question. But I was hoping there would be a built-in method:
solution = []
for row in np.unique(demo.T, axis=1):
solution.append(np.unique(row))
def boolean_indexing(v, fillval=np.nan):
lens = np.array([len(item) for item in v])
mask = lens[:,None] > np.arange(lens.max())
out = np.full(mask.shape,fillval)
out[mask] = np.concatenate(v)
return out
print(boolean_indexing(solution))

AFAIK, there are no builtin solution for that. That being said, your solution seems a bit complex to me. You could create an array with initialized values and fill it with a simple loop (since you already use loops anyway).
solution = [np.unique(row) for row in np.unique(demo.T, axis=1)]
result = np.full((len(solution), max(map(len, solution))), np.nan)
for i,arr in enumerate(solution):
result[i][:len(arr)] = arr

If you want to avoid the loop you could do:
demo = demo.astype(np.float32) # nan only works on floats
sort = np.sort(demo, axis=0)
diff = np.diff(sort, axis=0)
np.place(sort[1:], diff == 0, np.nan)
sort.sort(axis=0)
edge = np.argmax(sort, axis=0).max()
result = sort[:edge]
print(result.T)
Output:
array([[ 1., 4., 7., 10., nan],
[ 2., 5., 8., 11., nan],
[ 3., 6., 9., 12., 13.]], dtype=float32)
Not sure if this is any faster than the solution given by Jérôme.
EDIT
A slightly better solution
demo = demo.astype(np.float32)
sort = np.sort(demo, axis=0)
mask = np.full(sort.shape, False, dtype=bool)
np.equal(sort[1:], sort[:-1], out=mask[1:])
np.place(sort, mask, np.nan)
edge = (~mask).sum(0).max()
result = np.sort(sort, axis=0)[:edge]
print(result.T)
Output:
array([[ 1., 4., 7., 10., nan],
[ 2., 5., 8., 11., nan],
[ 3., 6., 9., 12., 13.]], dtype=float32)

Related

find infinity values and replace with maximum per vector in a numpy array

Suppose I have the following array with shape (3, 5) :
array = np.array([[1, 2, 3, inf, 5],
[10, 9, 8, 7, 6],
[4, inf, 2, 6, inf]])
Now I want to find the infinity values per vector and replace them with the maximum of that vector, with a lower limit of 1.
So the output for this example shoud be:
array_solved = np.array([[1, 2, 3, 5, 5],
[10, 9, 8, 7, 6],
[4, 6, 2, 6, 6]])
I could do this by looping over every vector of the array and apply:
idx_inf = np.isinf(array_vector)
max_value = np.max(np.append(array_vector[~idx_inf], 1.0))
array_vector[idx_inf] = max_value
But I guess there is a faster way.
Anyone an idea?
One way is to first convert infs to NaNs with np.isinf masking and then NaNs to max values of rows with np.nanmax:
array[np.isinf(array)] = np.nan
array[np.isnan(array)] = np.nanmax(array, axis=1)
to get
>>> array
array([[ 1., 2., 3., 5., 5.],
[10., 9., 8., 7., 6.],
[ 4., 10., 2., 6., 6.]])
import numpy as np
array = np.array([[1, 2, 3, np.inf, 5],
[10, 9, 8, 7, 6],
[4, np.inf, 2, 6, np.inf]])
n, m = array.shape
array[np.isinf(array)] = -np.inf
mx_array = np.repeat(np.max(array, axis=1), m).reshape(n, m)
ind = np.where(np.isinf(array))
array[ind] = mx_array[ind]
Output array:
array([[ 1., 2., 3., 5., 5.],
[10., 9., 8., 7., 6.],
[ 4., 6., 2., 6., 6.]])

How to stack uneven numpy arrays?

how can I stack the elements from the same respective index from each array in a list of arrays?
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
output = [[1,6,11,2],
[2,7,22,4],
[3,8,33],
[4,9,44],
[5,55]]
arrays is a list of arrays of uneven lengths. The output has a first array (don't mind if it's a list too) that contains all possible index 0s from each array. The next array within output contains all possible index 1s and so on...
Closest thing I can find (but requires same shape arrays) is:
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.stack((a, b), axis=-1)
# which gives
array([[1, 2],
[2, 3],
[3, 4]])
Thanks.
This gets you close. You can't really have a 2D sparse array as shown in your example output.
import numpy as np
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
maxx = max(x.shape[0] for x in arrays)
for x in arrays:
x.resize(maxx,refcheck=False)
output = np.stack(arrays, axis=1)
print(output)
C:\tmp>python x.py
[[ 1 6 11 2]
[ 2 7 22 4]
[ 3 8 33 0]
[ 4 9 44 0]
[ 5 0 55 0]]
You could just wrap it in a DataFrame first:
arr = pd.DataFrame(arrays).values.T
Output:
array([[ 1., 6., 11., 2.],
[ 2., 7., 22., 4.],
[ 3., 8., 33., nan],
[ 4., 9., 44., nan],
[ 5., nan, 55., nan]])
Though if you really want it with different sizes, go with:
arr = [x.dropna().values for _, x in pd.DataFrame(arrays).iteritems()]
Output:
[array([ 1, 6, 11, 2]),
array([ 2, 7, 22, 4]),
array([ 3., 8., 33.]),
array([ 4., 9., 44.]),
array([ 5., 55.])]

np.where equivalent for 1-D arrays

I am trying to fill nan values in an array with values from another array. Since the arrays I am working on are 1-D np.where is not working. However, following the tip in the documentation I tried the following:
import numpy as np
sample = [1, 2, np.nan, 4, 5, 6, np.nan]
replace = [3, 7]
new_sample = [new_value if condition else old_value for (new_value, condition, old_value) in zip(replace, np.isnan(sample), sample)]
However, instead output I expected [1, 2, 3, 4, 5, 6, 7] I get:
[Out]: [1, 2]
What I am doing wrong?
np.where works
In [561]: sample = np.array([1, 2, np.nan, 4, 5, 6, np.nan])
Use isnan to identify the nan values (don't use ==)
In [562]: np.isnan(sample)
Out[562]: array([False, False, True, False, False, False, True])
In [564]: np.where(np.isnan(sample))
Out[564]: (array([2, 6], dtype=int32),)
Either one, the boolean or the where tuple can index the nan values:
In [565]: sample[Out[564]]
Out[565]: array([nan, nan])
In [566]: sample[Out[562]]
Out[566]: array([nan, nan])
and be used to replace:
In [567]: sample[Out[562]]=[1,2]
In [568]: sample
Out[568]: array([1., 2., 1., 4., 5., 6., 2.])
The three parameter also works - but returns a copy.
In [571]: np.where(np.isnan(sample),999,sample)
Out[571]: array([ 1., 2., 999., 4., 5., 6., 999.])
You can use numpy.argwhere. But #hpaulj shows that numpy.where works just as well.
import numpy as np
sample = np.array([1, 2, np.nan, 4, 5, 6, np.nan])
replace = np.array([3, 7])
sample[np.argwhere(np.isnan(sample)).ravel()] = replace
# array([ 1., 2., 3., 4., 5., 6., 7.])

Tensorflow equivalent of this numpy axis-wise cartesian product for 2D matrices

I currently have code that allows one to take a combinatorial (cartesian) product across a particular axis. This is in numpy, and originated from a previous question Efficient axis-wise cartesian product of multiple 2D matrices with Numpy or TensorFlow
A = np.array([[1,2],
[3,4]])
B = np.array([[10,20],
[5,6]])
C = np.array([[50, 0],
[60, 8]])
cartesian_product( [A,B,C], axis=1 )
>> np.array([[ 1*10*50, 1*10*0, 1*20*50, 1*20*0, 2*10*50, 2*10*0, 2*20*50, 2*20*0]
[ 3*5*60, 3*5*8, 3*6*60, 3*6*8, 4*5*60, 4*5*8, 4*6*60, 4*6*8]])
and to reiterate the solution:
L = [A,B,C] # list of arrays
n = L[0].shape[0]
out = (L[1][:,None]*L[0][:,:,None]).reshape(n,-1)
for i in L[2:]:
out = (i[:,None]*out[:,:,None]).reshape(n,-1)
Is there an existing method to perform this with broadcasting in tensorflow - without a for loop?
Ok so I managed to find a pure tf based (partial) answer for two arrays. It's not currently generalizable like the numpy solution for M arrays, but that's for another question (perhaps a tf.while_loop). For those that are curious, the solution adapts from Evaluate all pair combinations of rows of two tensors in tensorflow
a = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[4, 5, 6, 7]])
b = np.array([[0, 1],
[2, 3],
[2, 3]])
N = a.shape[0]
A = tf.constant(a, dtype=tf.float64)
B = tf.constant(b, dtype=tf.float64)
A_ = tf.expand_dims(A, axis=1)
B_ = tf.expand_dims(B, axis=2)
z = tf.reshape(tf.multiply(A_, B_), [N, -1])
>> tf_result
Out[1]:
array([[ 0., 0., 0., 0., 0., 1., 2., 3.],
[ 8., 10., 12., 14., 12., 15., 18., 21.],
[ 8., 10., 12., 14., 12., 15., 18., 21.]])
Solutions for the multiple array case are welcome

Adding n columns to a numpy array [duplicate]

This question already has answers here:
Concatenate a NumPy array to another NumPy array
(12 answers)
Closed 7 years ago.
I'm making a program where I need to make a matrix looking like this:
A = np.array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
So I started thinking about this np.arange(1,4)
But, how to append n columns of np.arange(1,4) to A?
As mentioned in docs you can use concatenate
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
>>> np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
[3, 4, 6]])
Here's another way, using broadcasting:
In [69]: np.arange(1,4)*np.ones((4,1))
Out[69]:
array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
You can get something like what you typed in your question with:
N = 3
A = np.tile(np.arange(1, N+1), (N, 1))
I'm assuming you want a square array?
>>> np.repeat([np.arange(1, 4)], 4, 0)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])

Categories