Related
rows is a 343x30 matrix of real numbers. Im trying to append row vectors from rows to true rows and false rows but it only adds the first row and doesnt do anything afterwards. Ive tried vstack and also tried putting example as a 2d array ([example]) but it crashed my pycharm. what can I do?
true_rows = []
true_labels = []
false_rows = []
false_labels = []
i = 0
for example in rows:
if question.match(example):
true_rows = np.append(true_rows , example , axis=0)
true_labels.append(labels[i])
else:
#false_rows = np.vstack(false_rows, example_t)
false_rows = np.append(false_rows, example, axis=0)
false_labels.append(labels[i])
i += 1
you can use only a simple list to append your rows and then transform this list to numpy array such as :
exemple1 = np.array([1,2,3,4,5])
exemple2 = np.array([6,7,8,9,10])
exemple3 = np.array([11,12,13,14,15])
true_rows = []
true_rows.append(exemple1)
true_rows.append(exemple2)
true_rows.append(exemple3)
true_rows = np.array(true_rows)
you will get this results:
true_rows = array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
you can also use np.concatenate if you want to get one dimensional array like this:
true_rows = np.concatenate(true_rows , axis =0)
you will get this results:
true_rows = array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
Your use of [] and np.append suggests you are trying to imitate a common list append model with arrays. You atleast read enough of the np.append docs to know you need to use axis, and that it returns a new array (the docs are quite clear this is a copy).
But did you test this idea with a small example, and actually look at the results (step by step)?
In [326]: rows = []
In [327]: rows = np.append(rows, np.arange(3), axis=0)
In [328]: rows
Out[328]: array([0., 1., 2.])
In [329]: rows.shape
Out[329]: (3,)
the first append doesn't do anything - the result is the same as arange(3).
In [330]: rows = np.append(rows, np.arange(3), axis=0)
In [331]: rows
Out[331]: array([0., 1., 2., 0., 1., 2.])
In [332]: rows.shape
Out[332]: (6,)
Do you understand why? We join 2 1d arrays on axis 0, making a 1d.
Using [] as a starting point is the same starting with this array:
In [333]: np.array([])
Out[333]: array([], dtype=float64)
In [334]: np.array([]).shape
Out[334]: (0,)
And with axis, np.append is just a call to concatenate:
In [335]: np.concatenate(( [], np.arange(3)), axis=0)
Out[335]: array([0., 1., 2.])
np.append sort looks like list append, but it is not a clone. It's really just a poorly named way to use concatenate. And you can't use it properly without actually understanding dimensions. np.append has an example with an error much like what you got with concatentate.
Repeated use of these array concatenates in a loop is not a good idea. It's hard to get the dimensions right, as you found. And even when it works, it is slow, since each step makes a copy (which grows with the iteration).
That's why the other answer sticks with list append.
vstack is like concatenate with axis 0, but it makes sure all arguments are 2d. But if the number columns differ, it raise an error:
In [336]: np.vstack(( [],np.arange(3)))
Traceback (most recent call last):
File "<ipython-input-336-22038d6ef0f7>", line 1, in <module>
np.vstack(( [],np.arange(3)))
File "<__array_function__ internals>", line 180, in vstack
File "/usr/local/lib/python3.8/dist-packages/numpy/core/shape_base.py", line 282, in vstack
return _nx.concatenate(arrs, 0)
File "<__array_function__ internals>", line 180, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 1 has size 3
In [337]: np.vstack(( [0,0,0],np.arange(3)))
Out[337]:
array([[0, 0, 0],
[0, 1, 2]])
If all you are joining are rows of a (n,30) array, then you do know the column size of the result.
In [338]: res = np.zeros((0,3))
In [339]: np.vstack(( res, np.arange(3)))
Out[339]: array([[0., 1., 2.]])
If you pay attention to the shape details, it is possible to create an array iteratively.
But instead of collecting rows one by one, why not create a mask and do the collection once.
Roughly do
mask = np.array([question.match(example) for example in rows])
true_rows = rows[mask]
false_rows = rows[~mask]
this still requires an iteration, but overall should be faster.
suppose x = np.array([[30,60,70],[100,20,80]]) and i wish to remove all elements that are <60. That is, the resulting array should be x = np.array([[60,70],[100,80]]).
I use x = np.array([[30,60,70],[100,20,80]]) to find the indices of the needed elements. And I get indices = (array([0, 1]), array([0, 1])). However, when I am trying to delete the elements in x via np.delete(x, indices), i get array([ 70, 100, 20, 80]) rather than what i was hoping.
What can I do to achieve the desired result?
import numpy as np
x = np.array([[30, 60, 70],
[100, 20, 80]])
new_x = np.array([(np.delete(i, np.where(i < 60)[0])) for i in x])
print(new_x)
Got it this way but idk if works too slow for large arrays
import numpy as np
d = np.array([
[30,60,70],
[100, 20, 80]
])
f = lambda x: x > 60
a = np.array([a[f(a)] for a in d])
print(a)
In order to solve a problem which is only possible element by element I need to combine NumPy's tuple indexing with an explicit slice.
def f(shape, n):
"""
:param shape: any shape of an array
:type shape: tuple
:type n: int
"""
x = numpy.zeros( (n,) + shape )
for i in numpy.ndindex(shape): # i = (k, l, ...)
x[:, k, l, ...] = numpy.random.random(n)
x[:, *i] results in a SyntaxError and x[:, i] is interpreted as numpy.array([ x[:, k] for k in i ]). Unfortunally it's not possible to have the n-dimension as last (x = numpy.zeros(shape+(n,)) for x[i] = numpy.random.random(n)) because of the further usage of x.
EDIT: Here some example wished in comment.
>>> n, shape = 2, (3,4)
>>> x = np.arange(24).reshape((n,)+(3,4))
>>> print(x)
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
>>> i = (1,2)
>>> print(x[ ??? ]) # '???' expressed by i with any length is the question
array([ 6, 18])
If I understand the question correctly, you have a multi-dimensional numpy array and want to index it by combining a : slice with some number of other indices from a tuple i.
The index to the numpy array is a tuple, so you can basically just combine those 'partial' indices to one tuple and use that as the index. A naive approach might look like this
x[ (:,) + i ] = numpy.random.random(n) # does not work
but this will give a syntax error. Instead of :, you have to use the slice builtin.
x[ (slice(None),) + i ] = numpy.random.random(n)
Suppose array_1 and array_2 are two arrays of matrices of the same sizes. Is there any vectorised way of multiplying element-wise, the elements of these two arrays(which their elements' multiplication is well defined)?
The dummy code:
def mat_multiply(array_1,array_2):
size=np.shape(array_1)[0]
result=np.array([])
for i in range(size):
result=np.append(result,np.dot(array_1[i],array_2[i]),axis=0)
return np.reshape(result,(size,2))
example input:
a=[[[1,2],[3,4]],[[1,2],[3,4]]]
b=[[1,3],[4,5]]
output:
[[ 7. 15.]
[ 14. 32.]]
Contrary to your first sentence, a and b are not the same size. But let's focus on your example.
So you want this - 2 dot products, one for each row of a and b
np.array([np.dot(x,y) for x,y in zip(a,b)])
or to avoid appending
X = np.zeros((2,2))
for i in range(2):
X[i,...] = np.dot(a[i],b[i])
the dot product can be expressed with einsum (matrix index notation) as
[np.einsum('ij,j->i',x,y) for x,y in zip(a,b)]
so the next step is to index that first dimension:
np.einsum('kij,kj->ki',a,b)
I'm quite familiar with einsum, but it still took a bit of trial and error to figure out what you want. Now that the problem is clear I can compute it in several other ways
A, B = np.array(a), np.array(b)
np.multiply(A,B[:,np.newaxis,:]).sum(axis=2)
(A*B[:,None,:]).sum(2)
np.dot(A,B.T)[0,...]
np.tensordot(b,a,(-1,-1))[:,0,:]
I find it helpful to work with arrays that have different sizes. For example if A were (2,3,4) and B (2,4), it would be more obvious the dot sum has to be on the last dimension.
Another numpy iteration tool is np.nditer. einsum uses this (in C).
http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
it = np.nditer([A, B, None],flags=['external_loop'],
op_axes=[[0,1,2], [0,-1,1], None])
for x,y,w in it:
# x, y are shape (2,)
w[...] = np.dot(x,y)
it.operands[2][...,0]
Avoiding that [...,0] step, requires a more elaborate setup.
C = np.zeros((2,2))
it = np.nditer([A, B, C],flags=['external_loop','reduce_ok'],
op_axes=[[0,1,2], [0,-1,1], [0,1,-1]],
op_flags=[['readonly'],['readonly'],['readwrite']])
for x,y,w in it:
w[...] = np.dot(x,y)
# w[...] += x*y
print C
# array([[ 7., 15.],[ 14., 32.]])
There's one more option that #hpaulj left out in his extensive and comprehensive list of options:
>>> a = np.array(a)
>>> b = np.array(b)
>>> from numpy.core.umath_tests import matrix_multiply
>>> matrix_multiply.signature
'(m,n),(n,p)->(m,p)'
>>> matrix_multiply(a, b[..., np.newaxis])
array([[[ 7],
[15]],
[[14],
[32]]])
>>> matrix_multiply(a, b[..., np.newaxis]).shape
(2L, 2L, 1L)
>>> np.squeeze(matrix_multiply(a, b[..., np.newaxis]), axis=-1)
array([[ 7, 15],
[14, 32]])
The nice thing about matrix_multiply is that, it being a gufunc, it will work not only with 1D arrays of matrices, but also with broadcastable arrays. As an example, if instead of multiplying the first matrix with the first vector, and the second matrix with the second vector, you wanted to compute all possible multiplications, you could simply do:
>>> a = np.arange(8).reshape(2, 2, 2) # to have different matrices
>>> np.squeeze(matrix_multiply(a[...,np.newaxis, :, :],
... b[..., np.newaxis]), axis=-1)
array([[[ 3, 11],
[ 5, 23]],
[[19, 27],
[41, 59]]])
I'm trying to slice and iterate over a multidimensional array at the same time. I have a solution that's functional, but it's kind of ugly, and I bet there's a slick way to do the iteration and slicing that I don't know about. Here's the code:
import numpy as np
x = np.arange(64).reshape(4,4,4)
y = [x[i:i+2,j:j+2,k:k+2] for i in range(0,4,2)
for j in range(0,4,2)
for k in range(0,4,2)]
y = np.array(y)
z = np.array([np.min(u) for u in y]).reshape(y.shape[1:])
Your last reshape doesn't work, because y has no shape defined. Without it you get:
>>> x = np.arange(64).reshape(4,4,4)
>>> y = [x[i:i+2,j:j+2,k:k+2] for i in range(0,4,2)
... for j in range(0,4,2)
... for k in range(0,4,2)]
>>> z = np.array([np.min(u) for u in y])
>>> z
array([ 0, 2, 8, 10, 32, 34, 40, 42])
But despite that, what you probably want is reshaping your array to 6 dimensions, which gets you the same result as above:
>>> xx = x.reshape(2, 2, 2, 2, 2, 2)
>>> zz = xx.min(axis=-1).min(axis=-2).min(axis=-3)
>>> zz
array([[[ 0, 2],
[ 8, 10]],
[[32, 34],
[40, 42]]])
>>> zz.ravel()
array([ 0, 2, 8, 10, 32, 34, 40, 42])
It's hard to tell exactly what you want in the last mean, but you can use stride_tricks to get a "slicker" way. It's rather tricky.
import numpy.lib.stride_tricks
# This returns a view with custom strides, x2[i,j,k] matches y[4*i+2*j+k]
x2 = numpy.lib.stride_tricks(
x, shape=(2,2,2,2,2,2),
strides=(numpy.array([32,8,2,16,4,1])*x.dtype.itemsize))
z2 = z2.min(axis=-1).min(axis=-2).min(axis=-3)
Still, I can't say this is much more readable. (Or efficient, as each min call will make temporaries.)
Note, my answer differs from Jaime's because I tried to match your elements of y. You can tell if you replace the min with max.