Force numpy to keep a list a list - python

x2_Kaxs is an Nx3 numpy array of lists, and the elements in those lists index into another array. I want to end up with an Nx3 numpy array of lists of those indexed elements.
x2_Kcids = array([ ax2_cid[axs] for axs in x2_Kaxs.flat ], dtype=object)
This outputs a (N*3)x1 array of numpy arrays. great. that almost works for what I want. All I need to do is reshape it.
x2_Kcids.shape = x2_Kaxs.shape
And this works.x2_Kcids becomes an Nx3 array of numpy arrays. Perfect.
Except all the lists in x2_Kaxs only have one element in them. Then it flattens
it into an Nx3 array of integers, and my code expects a list later in the pipeline.
One solution I came up with was to append a dummy element and then pop it off, but that is very ugly. Is there anything nicer?

Your problem is not really about lists of size 1, it is about list all of the same size. I have created this dummy samples:
ax2_cid = np.random.rand(10)
shape = (10, 3)
x2_Kaxs = np.empty((10, 3), dtype=object).reshape(-1)
for j in xrange(x2_Kaxs.size):
x2_Kaxs[j] = [random.randint(0, 9) for k in xrange(random.randint(1, 5))]
x2_Kaxs.shape = shape
x2_Kaxs_1 = np.empty((10, 3), dtype=object).reshape(-1)
for j in xrange(x2_Kaxs.size):
x2_Kaxs_1[j] = [random.randint(0, 9)]
x2_Kaxs_1.shape = shape
x2_Kaxs_2 = np.empty((10, 3), dtype=object).reshape(-1)
for j in xrange(x2_Kaxs_2.size):
x2_Kaxs_2[j] = [random.randint(0, 9) for k in xrange(2)]
x2_Kaxs_2.shape = shape
If we run your code on these three, the return has the following shapes:
>>> np.array([ax2_cid[axs] for axs in x2_Kaxs.flat], dtype=object).shape
(30,)
>>> np.array([ax2_cid[axs] for axs in x2_Kaxs_1.flat], dtype=object).shape
(30, 1)
>>> np.array([ax2_cid[axs] for axs in x2_Kaxs_2.flat], dtype=object).shape
(30, 2)
And the case with all lists of length 2 won't even let you reshape to (n, 3). The problem is that, even with dtype=object, numpy tries to numpify your input as much as possible, which is all the way down to individual elements if all lists are of the same length. I think that your best bet is to preallocate your x2_Kcids array:
x2_Kcids = np.empty_like(x2_Kaxs).reshape(-1)
shape = x2_Kaxs.shape
x2_Kcids[:] = [ax2_cid[axs] for axs in x2_Kaxs.flat]
x2_Kcids.shape = shape
EDIT Since unubtu's answer is no longer visible, I am going to steal from him. The code above can be much more nicely and compactly written as:
x2_Kcids = np.empty_like(x2_Kaxs)
x2_Kcids.ravel()[:] = [ax2_cid[axs] for axs in x2_Kaxs.flat]
With the above example of single item lists:
>>> x2_Kcids_1 = np.empty_like(x2_Kaxs_1).reshape(-1)
>>> x2_Kcids_1[:] = [ax2_cid[axs] for axs in x2_Kaxs_1.flat]
>>> x2_Kcids_1.shape = shape
>>> x2_Kcids_1
array([[[ 0.37685372], [ 0.95328117], [ 0.63840868]],
[[ 0.43009678], [ 0.02069558], [ 0.32455781]],
[[ 0.32455781], [ 0.37685372], [ 0.09777559]],
[[ 0.09777559], [ 0.37685372], [ 0.32455781]],
[[ 0.02069558], [ 0.02069558], [ 0.43009678]],
[[ 0.32455781], [ 0.63840868], [ 0.37685372]],
[[ 0.63840868], [ 0.43009678], [ 0.25532799]],
[[ 0.02069558], [ 0.32455781], [ 0.09777559]],
[[ 0.43009678], [ 0.37685372], [ 0.63840868]],
[[ 0.02069558], [ 0.17876822], [ 0.17876822]]], dtype=object)
>>> x2_Kcids_1[0, 0]
array([ 0.37685372])

Similar to #Denis:
if x.ndim == 2:
x.shape += (1,)

Related

Construct a 2D, 3x3 matrix with random numbers from 1 to 8 with no duplicates

Construct a 2D, 3x3 matrix with random numbers from 1 to 8 with no duplicates
import numpy as np
random_matrix = np.random.randint(0,10,size=(3,3))
print(random_matrix)
If you want an answer where we don't have to rely on numpy then you can do this:
import random
# Generates a randomized list between 0-9, where 0 is replaced by "#"
x = ["#" if i == 0 else i for i in random.sample(range(10), k=9)]
print(x)
# Slices the list into a 3x3 format
newx = [x[idx:idx+3] for idx in range(0, len(x), 3)]
print(newx)
Output:
[6, 2, 7, 4, '#', 8, 9, 1, 3]
[[6, 2, 7], [4, '#', 8], [9, 1, 3]]
import numpy
x = numpy.arange(0, 9)
numpy.random.shuffle(x)
x = numpy.reshape(x, (3,3))
print(numpy.where(x==0, '#', x))
Let me know, but with my solution, integers seems to be replaced by string.. i don't know if you care. Else, I will found an other solution
You can achieve your goal using a few steps:
Generate sequence of values (in some range) you would like to randomly select into matrix.
Take randomly some number of elements from this sequence to new sequence.
From this new sequence make matrix with wanted shape.
import numpy as np
from random import sample
#step one
values = range(0,11)
#step two
random_sequence = sample(values, 9)
#step three
random_matrix = np.array(random_sequence).reshape(3,3)
Because you sample some number of elements, from unique sequence, that guarantee you uniqueness of new sequence, and then matrix.
You can use np.random.choice with replace=False to generate the (3, 3) array:
np.random.choice(np.arange(9), size=(3, 3), replace=False)
Replacing 0 with np.nan:
>>> np.where(x, x, np.nan)
array([[ 4., 1., 3.],
[ 5., nan, 8.],
[ 2., 6., 7.]])
However, I think Hampus Larsson's answer is better, as this problem is not appropriate for numpy if you intend to replace 0 with the string "#".
you could use numpy but random is enough
import random
numbers = list(range(9))
random.shuffle(numbers)
my_list = [[numbers[i*3 + j] for j in range(0,3)] for i in range(0,3)]

Numpyic way to sort a matrix based on another similar matrix

Say I have a matrix Y of random float numbers from 0 to 10 with shape (10, 3):
import numpy as np
np.random.seed(99)
Y = np.random.uniform(0, 10, (10, 3))
print(Y)
Output:
[[6.72278559 4.88078399 8.25495174]
[0.31446388 8.08049963 5.6561742 ]
[2.97622499 0.46695721 9.90627399]
[0.06825733 7.69793028 7.46767101]
[3.77438936 4.94147452 9.28948392]
[3.95454044 9.73956297 5.24414715]
[0.93613093 8.13308413 2.11686786]
[5.54345785 2.92269116 8.1614236 ]
[8.28042566 2.21577372 6.44834702]
[0.95181622 4.11663239 0.96865261]]
I am now given a matrix X with same shape that can be seen as obtained by adding small noises to Y and then shuffling the rows:
X = np.random.normal(Y, scale=0.1)
np.random.shuffle(X)
print(X)
Output:
[[ 4.04067271 9.90959141 5.19126867]
[ 5.59873104 2.84109306 8.11175891]
[ 0.10743952 7.74620162 7.51100441]
[ 3.60396019 4.91708372 9.07551354]
[ 0.9400948 4.15448712 1.04187208]
[ 2.91884302 0.47222752 10.12700505]
[ 0.30995155 8.09263241 5.74876947]
[ 1.11247872 8.02092335 1.99767444]
[ 6.68543696 4.8345869 8.17330513]
[ 8.38904822 2.11830619 6.42013343]]
Now I want to sort the matrix X based on Y by row. I already know each pair of column values in each matching pair of rows are not different from each other more than a tolerance of 0.5. I managed to write the following code and it is working fine.
def sort_X_by_Y(X, Y, tol):
idxs = [next(i for i in range(len(X)) if all(abs(X[i] - row) <= tol)) for row in Y]
return X[idxs]
print(sort_X_by_Y(X, Y, tol=0.5))
Output:
[[ 6.68543696 4.8345869 8.17330513]
[ 0.30995155 8.09263241 5.74876947]
[ 2.91884302 0.47222752 10.12700505]
[ 0.10743952 7.74620162 7.51100441]
[ 3.60396019 4.91708372 9.07551354]
[ 4.04067271 9.90959141 5.19126867]
[ 1.11247872 8.02092335 1.99767444]
[ 5.59873104 2.84109306 8.11175891]
[ 8.38904822 2.11830619 6.42013343]
[ 0.9400948 4.15448712 1.04187208]]
However, in reality I am sorting (1000, 3) matrices and my code is way too slow. I feel like there should be more numpyic way to code this. Any suggestions?
This is a vectorized version of your algorithm. It runs ~26.5x faster than your implementation for 1000 samples. But an additional boolean array with shape (1000,1000,3) is created. There is a chance that rows will have similar values within the tolerance and a wrong row is selected.
tol = .5
X[(np.abs(Y[:, np.newaxis] - X) <= tol).all(2).argmax(1)]
Output
array([[ 6.68543696, 4.8345869 , 8.17330513],
[ 0.30995155, 8.09263241, 5.74876947],
[ 2.91884302, 0.47222752, 10.12700505],
[ 0.10743952, 7.74620162, 7.51100441],
[ 3.60396019, 4.91708372, 9.07551354],
[ 4.04067271, 9.90959141, 5.19126867],
[ 1.11247872, 8.02092335, 1.99767444],
[ 5.59873104, 2.84109306, 8.11175891],
[ 8.38904822, 2.11830619, 6.42013343],
[ 0.9400948 , 4.15448712, 1.04187208]])
More robust solutions with L1-norm
X[np.abs(Y[:, np.newaxis] - X).sum(2).argmin(1)]
Or L2-norm
X[((Y[:, np.newaxis] - X)**2).sum(2).argmin(1)]

How to split a 3D matrix into 3D matrices lined up in a list?

I have a NumPy array with the following shape:
(1532, 2036, 5)
I would like to generate a list of arrays where each one has the following shape:
(1532, 2036)
You can use Ellipsis to signify all dimensions up to the last. For example:
arr = np.random.rand(4, 3, 2)
arr
array([[[ 0.35235813, 0.57984153],
[ 0.53743048, 0.46753367],
[ 0.80048303, 0.07982378]],
[[ 0.1339381 , 0.84586721],
[ 0.81425027, 0.41086151],
[ 0.34039991, 0.19972737]],
[[ 0.2112466 , 0.73086434],
[ 0.03755819, 0.40113463],
[ 0.74622891, 0.74695994]],
[[ 0.99313615, 0.65634951],
[ 0.90787642, 0.37387861],
[ 0.8738962 , 0.41747727]]])
The list of the last dimension arrays can be constructed as #Usernamenotfound mentioned or with Ellipsis like so:
[arr[..., i] for i in range(arr.shape[-1])]
[array([[ 0.35235813, 0.53743048, 0.80048303],
[ 0.1339381 , 0.81425027, 0.34039991],
[ 0.2112466 , 0.03755819, 0.74622891],
[ 0.99313615, 0.90787642, 0.8738962 ]]),
array([[ 0.57984153, 0.46753367, 0.07982378],
[ 0.84586721, 0.41086151, 0.19972737],
[ 0.73086434, 0.40113463, 0.74695994],
[ 0.65634951, 0.37387861, 0.41747727]])]
Each element has the shape (4, 3).
Likewise you could so the same for the first dimension, making 4 (3, 2) arrays.
[arr[i, ...] for i in range(arr.shape[0])]
[array([[ 0.35235813, 0.57984153],
[ 0.53743048, 0.46753367],
[ 0.80048303, 0.07982378]]), array([[ 0.1339381 , 0.84586721],
[ 0.81425027, 0.41086151],
[ 0.34039991, 0.19972737]]), array([[ 0.2112466 , 0.73086434],
[ 0.03755819, 0.40113463],
[ 0.74622891, 0.74695994]]), array([[ 0.99313615, 0.65634951],
[ 0.90787642, 0.37387861],
[ 0.8738962 , 0.41747727]])]
You can also permute the axes with numpy.transpose then simply iterate through the array:
import numpy as np
a = ... # Define the input array here
out = [a for a in np.transpose(arr, (2, 0, 1))]
You can slice the 3D array using
[x[:,:,i] for i in range(5)]
The above would give you a list of 2D arrays.
The same process can be scaled for multidimensional arrays

convolution of .mat file and 1D array

my code is:
import numpy as np
import scipy.io as spio
x=np.zeros((22113,1),float)
x= spio.loadmat('C:\\Users\\dell\\Desktop\\Rabia Ahmad spring 2016\\'
'FYP\\1. Matlab Work\\record work\\kk.mat')
print(x)
x = np.reshape(len(x),1);
h = np.array([0.9,0.3,0.1],float)
print(h)
h = h.reshape(len(h),1);
dd = np.convolve(h,x)
and the error I encounter is "ValueError: object too deep for desired array"
kindly help me in this reguard.
{'__globals__': [], '__version__': '1.0', 'ans': array([[ 0.13580322,
0.13580322], [ 0.13638306, 0.13638306], [ 0.13345337, 0.13345337],
..., [ 0.13638306, 0.13638306], [ 0.13345337, 0.13345337], ..., [
0.13638306, 0.13638306], [ 0.13345337, 0.13345337], ..., [-0.09136963,
-0.09136963], [-0.12442017, -0.12442017], [-0.15542603, -0.15542603]])}
See {}? That means x from the loadmat is a dictionary.
x['ans'] will be an array
array([[ 0.13580322,
0.13580322], [ 0.13638306, 0.13638306], [ 0.13345337, 0.13345337],...]])
which, if I count the [] right is a (n,2) array of floats.
The following line does not make sense:
x = np.reshape(len(x),1);
I suspect you mean x = x.reshape(...) as you do with h. But that would give an error with the dictionary x.
When you say the shape of x is (9,) and its dtype is uint16 - where in your code you verifying that?
x = np.reshape(len(x),1); doesn't do anything useful. That completely discards the data in x, and creates an array of shape (1,), with the only element being len(x).
In your code, you reshape h to (3, 1), which is a 2D array, not a 1D array, which is why convolve complains.
Remove both of your reshapes, and instead just pass squeeze=True to scipy.io.loadmat - this is needed because matlab does not have the concept as 1d arrays, and squeeze tells scipy to try and flatten (N, 1) and (1, N) arrays to (N,) arrays

Efficient way of merging two numpy masked arrays

I have two numpy masked arrays which I want to merge. I'm using the following code:
import numpy as np
a = np.zeros((10000, 10000), dtype=np.int16)
a[:5000, :5000] = 1
am = np.ma.masked_equal(a, 0)
b = np.zeros((10000, 10000), dtype=np.int16)
b[2500:7500, 2500:7500] = 2
bm = np.ma.masked_equal(b, 0)
arr = np.ma.array(np.dstack((am, bm)), mask=np.dstack((am.mask, bm.mask)))
arr = np.prod(arr, axis=2)
plt.imshow(arr)
The problem is that the np.prod() operation is very slow (4 seconds in my computer). Is there an alternative way of getting a merged array in a more efficient way?
Instead of your last two lines using dstack() and prod(), try this:
arr = np.ma.array(am.filled(1) * bm.filled(1), mask=(am.mask * bm.mask))
Now you don't need prod() at all, and you avoid allocating the 3D array entirely.
I took another approach that may not be particularly efficient, but is reasonably easy to extend and implement.
(I know I'm answering a question that is over 3 years old with functionality that has been around in numpy a long time, but bear with me)
The np.where function in numpy has two main purposes (it is a bit weird), the first is to give you indices for a boolean array:
>>> import numpy as np
>>> a = np.arange(12).reshape(3, 4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> m = (a % 3 == 0)
>>> m
array([[ True, False, False, True],
[False, False, True, False],
[False, True, False, False]], dtype=bool)
>>> row_ind, col_ind = np.where(m)
>>> row_ind
array([0, 0, 1, 2])
>>> col_ind
array([0, 3, 2, 1])
The other purpose of the np.where function is to pick from two arrays based on whether the given boolean array is True/False:
>>> np.where(m, a, np.zeros(a.shape))
array([[ 0., 0., 0., 3.],
[ 0., 0., 6., 0.],
[ 0., 9., 0., 0.]])
Turns out, there is also a numpy.ma.where which deals with masked arrays...
Given a list of masked arrays of the same shape, my code then looks like:
merged = masked_arrays[0]
for ma in masked_arrays[1:]:
merged = np.ma.where(ma.mask, merged, ma)
As I say, not particularly efficient, but certainly easy enough to implement.
HTH
Inspired by the accepted answer I've found a simple way of merging masked arrays. It works making some logical operations on the masks and simply adding 0 filled arrays.
import numpy as np
a = np.zeros((1000, 1000), dtype=np.int16)
a[:500, :500] = 2
am = np.ma.masked_equal(a, 0)
b = np.zeros((1000, 1000), dtype=np.int16)
b[250:750, 250:750] = 3
bm = np.ma.masked_equal(b, 0)
c = np.zeros((1000, 1000), dtype=np.int16)
c[500:1000, 500:1000] = 5
cm = np.ma.masked_equal(c, 0)
bm.mask = np.logical_or(np.logical_and(am.mask, bm.mask), np.logical_not(am.mask))
am = np.ma.array(am.filled(0) + bm.filled(0), mask=(am.mask * bm.mask))
cm.mask = np.logical_or(np.logical_and(am.mask, cm.mask), np.logical_not(am.mask))
am = np.ma.array(am.filled(0) + cm.filled(0), mask=(am.mask * cm.mask))
plt.imshow(am)
I hope someone find this helpful sometime. Masked arrays doesn't seem to be very efficient though. So, if someone finds an alternative to merge arrays I'd be happy to know.
Update: Based on #morningsun comment this implementation is 30% faster and much simpler:
import numpy as np
a = np.zeros((1000, 1000), dtype=np.int16)
a[:500, :500] = 2
am = np.ma.masked_equal(a, 0)
b = np.zeros((1000, 1000), dtype=np.int16)
b[250:750, 250:750] = 3
bm = np.ma.masked_equal(b, 0)
c = np.zeros((1000, 1000), dtype=np.int16)
c[500:1000, 500:1000] = 5
cm = np.ma.masked_equal(c, 0)
am[am.mask] = bm[am.mask]
am[am.mask] = cm[am.mask]
plt.imshow(am)

Categories