Numpy concatenate for arrays with same rows but different columns - python

I have a list of arrays that contain the same rows, but different columns.
I printed out the shape of the array and checked that they have same rows.
print ("Type x_test : actual",type(x_dump),x_dump.shape, type(actual), actual.shape, pred.shape)
cmp = np.concatenate([x_test,actual,pred],axis = 1)
('Type x_test : actual', <type 'numpy.ndarray'>, (2420L, 4719L), <type 'numpy.ndarray'>, (2420L,), (2420L,))
This gives me an error:
ValueError: all the input arrays must have same number of dimensions
I tried to replicate this error using the below commands:
x.shape,x1.shape,x2.shape
Out[772]: ((3L, 1L), (3L, 4L), (3L, 1L))
np.concatenate([x,x1,x2],axis=1)
Out[764]:
array([[ 0, 0, 1, 2, 3, 0],
[ 1, 4, 5, 6, 7, 1],
[ 2, 8, 9, 10, 11, 2]])
I dont get any error here. Is anyone facing similar issue ?
EDIT 1: Right after writing this question, I figured out that the dimensions are different.
#Gareth Rees: has explained beautifully the different between numpy array (R,1) and (R,) here.
Fixed using:
# Reshape and concatenate
actual = actual.reshape(len(actual),1)
pred = pred.reshape(len(pred),1)
EDIT 2: Marking to close this answer as a duplicate of Difference between numpy.array shape (R, 1) and (R,).

EDIT
After posting this, the OP figured out the error. This can be ignored unless one needs to see the construct with shaped (R,1) versus (R,). In any event, it will give the down voters practice space.
ORIGINAL
Given your shapes, the answer is correct.
a = np.arange(3).reshape(3,1)
b = np.arange(12).reshape(3,4)
c = np.arange(3).reshape(3,1)
np.concatenate([a, b, c], axis=1)
Out[4]:
array([[ 0, 0, 1, 2, 3, 0],
[ 1, 4, 5, 6, 7, 1],
[ 2, 8, 9, 10, 11, 2]])
a
Out[5]:
array([[0],
[1],
[2]])
b
Out[6]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
c
Out[7]:
array([[0],
[1],
[2]])

Related

How to sort a numpy matrix using a mask?

I have two matrices A, B, Which look like this:
A = array([[2, 2, 1, 0, 8],
[8, 2, 0, 3, 7],
[3, 2, 6, 5, 3],
[1, 4, 2, 5, 8],
[2, 3, 7, 0, 3]])
B = array([[3, 7, 6, 8, 3],
[0, 7, 4, 4, 3],
[1, 2, 0, 0, 4],
[8, 6, 6, 7, 1],
[8, 1, 0, 4, 8]])
I am trying to sort A and B BUT I need B to be ordered with the mask from A.
I tried this:
mask = A.argsort()
A = A[mask]
B = B[mask]
However the return value is a shaped (5, 5, 5) matrix
The next snippet works, but is using two iterations. I need something faster. Has anybody an Idea ?
A = [row[order] for row, order in zip(A,mask)]
B = [row[order] for row, order in zip(B,mask)]
You can use fancy indexing. The result will be the same shape as your indices broadcasted together. Your column index is already the right shape. A row index of size (A.shape[0], 1) would broadcast correctly:
r = np.arange(A.shape[0]).reshape(-1, 1)
c = np.argsort(A)
A = A[r, c]
B = B[r, c]
The reason that your original index didn't work out is that you were indexing with a single dimension, which selects entire rows based on each location. This would have failed if you had more columns than rows.
A simpler way would be to follow what the argsort docs suggest:
A = np.take_along_axis(A, mask, axis=-1)
B = np.take_along_axis(B, mask, axis=-1)

Numpy concatenate lists where first column is in range n

I am trying to select all rows in a numpy matrix named matrix with shape (25323, 9), where the values of the first column are inside the range of start and end for each tuple on the list range_tuple. Ultimately, I want to create a new numpy matrix with the result where final has a shape of (n, 9). The following code returns this error: TypeError: only integer scalar arrays can be converted to a scalar index. I have also tried initializing final with numpy.zeros((1,9)) and used np.concatenate but get similar results. I do get a compiled result when I use final.append(result) instead of using np.concatenate but the shape of the matrix gets lost. I know there is a proper solution to this problem, any help would be appreciated.
final = []
for i in range_tuples:
copy = np.copy(matrix)
start = i[0]
end = i[1]
result = copy[(matrix[:,0] < end) & (matrix[:,0] > start)]
final = np.concatenate(final, result)
final = np.matrix(final)
In [33]: arr
Out[33]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]])
In [34]: tups = [(0,6),(3,12),(9,10),(15,14)]
In [35]: alist=[]
...: for start, stop in tups:
...: res = arr[(arr[:,0]<stop)&(arr[:,0]>=start), :]
...: alist.append(res)
...:
check the list; note that elements differ in shape; some are 1 or 0 rows. It's a good idea to test these edge cases.
In [37]: alist
Out[37]:
[array([[0, 1, 2],
[3, 4, 5]]), array([[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]]), array([[ 9, 10, 11]]), array([], shape=(0, 3), dtype=int64)]
vstack joins them:
In [38]: np.vstack(alist)
Out[38]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[ 9, 10, 11]])
Here concatenate also works, because default axis is 0, and all inputs are already 2d.
Try the following
final = np.empty((0,9))
for start, stop in range_tuples:
result = matrix[(matrix[:,0] < end) & (matrix[:,0] > start)]
final = np.concatenate((final, result))
The first is to initialize final as a numpy array. The first argument to concatenate has to be a python list of the arrays, see docs. In your code it interprets the result variable as the value for the parameter axis
Notes
I used tuple deconstruction to make the loop clearer
the copy is not needed
appending lists can be faster. The final result can afterwards be obtained through reshaping, if result is always of the same length.
I would simply create a boolean mask to select rows that satisfy required conditions.
EDIT: I missed that you are working with matrix (as opposite to ndarray). Answer was edited for matrix.
Assume following input data:
matrix = np.matrix([[1, 2, 3], [5, 6, 7], [2, 1, 7], [3, 4, 5], [8, 9, 0]])
range_tuple = [(0, 2), (1, 4), (1, 9), (5, 9), (0, 100)]
Then, first, I would convert range_tuple to a numpy.ndarray:
range_mat = np.matrix(range_tuple)
Now, create the mask:
mask = np.ravel((matrix[:, 0] > range_mat[:, 0]) & (matrix[:, 0] < range_mat[:, 1]))
Apply the mask:
final = matrix[mask] # or matrix[mask].copy() if you intend to modify matrix
To check:
print(final)
[[1 2 3]
[2 1 7]
[8 9 0]]
If length of range_tuple can be different from the number of rows in the matrix, then do this:
n = min(range_mat.shape[0], matrix.shape[0])
mask = np.pad(
np.ravel(
(matrix[:n, 0] > range_mat[:n, 0]) & (matrix[:n, 0] < range_mat[:n, 1])
),
(0, matrix.shape[0] - n)
)
final = matrix[mask]

Sliced numpy array does not modify original array

I've run into this interaction with arrays that I'm a little confused. I can work around it, but for my own understanding, I'd like to know what is going on.
Essentially, I have a datafile that I'm trying to tailor so I can run this as an input for some code I've already written. This involves some calculations on some columns, rows, etc. In particular, I also need to rearrange some elements, where the original array isn't being modified as I expect it would.
import numpy as np
ex_data = np.arange(12).reshape(4,3)
ex_data[2,0] = 0 #Constructing some fake data
ex_data[ex_data[:,0] == 0][:,1] = 3
print ex_data
Basically, I look in a column of interest, collect all the rows where that column contains a parameter value of interest and just reassigning values.
With the snippet of code above, I would expect ex_data to have it's column 1 elements, conditional if it's column 0 element is equal to 0, to be assigned a value of 3. However what I'm seeing is that there is no effect at all.
>>> ex_data
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 0, 7, 8],
[ 9, 10, 11]])
In another case, if I don't 'slice', my 'sliced' data file, then the reassignment goes on as normal.
ex_data[ex_data[:,0] == 0] = 3
print ex_data
Here I'd expect my entire row, conditional to where column 0 is equal to 0, be populated with 3. This is what you see.
>>> ex_data
array([[ 3, 3, 3],
[ 3, 4, 5],
[ 3, 3, 3],
[ 9, 10, 11]])
Can anyone explain the interaction?
In [368]: ex_data
Out[368]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 0, 7, 8],
[ 9, 10, 11]])
The column 0 test:
In [369]: ex_data[:,0]==0
Out[369]: array([ True, False, True, False])
That boolean mask can be applied to the rows as:
In [370]: ex_data[ex_data[:,0]==0,0]
Out[370]: array([0, 0]) # the 0's you expected
In [371]: ex_data[ex_data[:,0]==0,1]
Out[371]: array([1, 7]) # the col 1 values you want to replace
In [372]: ex_data[ex_data[:,0]==0,1] = 3
In [373]: ex_data
Out[373]:
array([[ 0, 3, 2],
[ 3, 4, 5],
[ 0, 3, 8],
[ 9, 10, 11]])
The indexing you tried:
In [374]: ex_data[ex_data[:,0]==0]
Out[374]:
array([[0, 3, 2],
[0, 3, 8]])
produces a copy. Assigning ...[:,1]=3 just changes that copy, not the original array. Fortunately in this case, it is easy to use
ex_data[ex_data[:,0]==0,1]
instead of
ex_data[ex_data[:,0]==0][:,1]

Broadcasting using numpy's sum function

I was reading about broadcasting and was trying to understand it using numpy's sum function.
I created two matrices :
m1 = np.array([[1,2,3],[4,5,6]]) # 3X2
m2 = np.array([[1],[2]]) # 2X1
When I add the above two as :
m1 + m2
broadcasting is done as the column vector [1],[2] replicates itself equal to the number of columns inside m1 matrix. Is it also possible to see broadcasting using np.sum(m1,m2) ? I assume there is no difference between m1 + m2 and np.sum(m1,m2). But currently np.sum(m1,m2) throws an error TypeError: only integer scalar arrays can be converted to a scalar index.
Can't I have numpy to perform broadcasting if I use its sum function?
numpy.sum does not add two arrays, it computes the sum over one (or multiple, or, by default, all) axis of an array. The second argument is which axis to sum over and a multi-dimensional array does not work for that.
These are examples of how numpy.sum works:
m1 = np.arange(12).reshape((3,4))
# sum all entries
np.sum(m1) # 66
# sum along the first axis, getting a result for each column
np.sum(m1, 0) # array([12, 15, 18, 21])
m2 = np.arange(12).reshape((2,3,2))
# sum along two of the three axes
m2.sum((1,2)) # array([15, 51])
What you might be looking for is numpy.add. This adds together two arrays (just like +) but allows adding some constraints (when giving it an out array you can mask certain fields so they will not get filled with the result of the addition). Otherwise it behaves how you would expect it to behave if you know the numpy broadcasting rules:
m1 = np.array([[1,2,3],[4,5,6]]) # 3X2
m2 = np.array([[1],[2]]) # 2X1
m1 + m2
# array([[2, 3, 4],
# [6, 7, 8]])
np.add(m1, m2)
# array([[2, 3, 4],
# [6, 7, 8]])
And here an example of the more fancy usage:
m1 = m1.astype(float)
m1[1, 1] = np.inf
m1
# array([[ 1., 2., 3.],
# [ 4., inf, 6.]])
out = np.zeros_like(m1)
where = np.ones_like(m1, dtype=bool)
where[1, 1] = False # don't want that infinity in the sum
np.add(m1, m2, out, where=where)
# array([[ 2., 3., 4.],
# [ 6., 0., 8.]])
You actually kind of can make sum broadcast:
>>> import numpy as np
>>>
>>> a, b, c = np.ogrid[:2, :3, :4]
>>> d = b*c
>>> list(map(np.shape, (a, b, c, d)))
[(2, 1, 1), (1, 3, 1), (1, 1, 4), (1, 3, 4)]
>>>
>>> a+b+c+d
array([[[ 0, 1, 2, 3],
[ 1, 3, 5, 7],
[ 2, 5, 8, 11]],
[[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12]]])
>>> np.sum([a, b, c, d])
array([[[ 0, 1, 2, 3],
[ 1, 3, 5, 7],
[ 2, 5, 8, 11]],
[[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12]]])
I suspect this creates a 4-element array of dtype object and then delegates the actual summing to the element arrays.
Unfortunately, the array factory can at times be capricious with this kind of array-of-arrays:
And, indeed, we can use an example known to defeat np.array to trip up np.sum, even though the actual error doesn't appear to happen in np.array:
>>> np.sum([np.arange(3), 1]) # fine
array([1, 2, 3])
>>> np.sum([1, np.arange(3)]) # ouch!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/paul/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1882, in sum
out=out, **kwargs)
File "/home/paul/lib/python3.6/site-packages/numpy/core/_methods.py", line 32, in _sum
return umr_sum(a, axis, dtype, out, keepdims)
ValueError: setting an array element with a sequence.
So, on balance, it is probably better to go with the builtin Python sum:
>>> sum([a, b, c, d])
array([[[ 0, 1, 2, 3],
[ 1, 3, 5, 7],
[ 2, 5, 8, 11]],
[[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12]]])
>>> sum([1, np.arange(3)])
array([1, 2, 3])
>>> sum([np.arange(3), 1])
array([1, 2, 3])

How to add a dimension to a numpy array in Python

I have an array that is size (214, 144). I need it to be (214,144,1) is there a way to do this easily in Python? Basically the dimensions are supposed to be (Days, Times, Stations). Since I only have 1 station's data that dimension would be a 1. However if I could also make the code flexible enough work for say 2 stations that would be great (e.g. changing the dimension size from (428,288) to (214,144,2)) that would be great!
You could use reshape:
>>> a = numpy.array([[1,2,3,4,5,6],[7,8,9,10,11,12]])
>>> a.shape
(2, 6)
>>> a.reshape((2, 6, 1))
array([[[ 1],
[ 2],
[ 3],
[ 4],
[ 5],
[ 6]],
[[ 7],
[ 8],
[ 9],
[10],
[11],
[12]]])
>>> _.shape
(2, 6, 1)
Besides changing the shape from (x, y) to (x, y, 1), you could use (x, y/n, n) as well, but you may want to specify the column order depending on the input:
>>> a.reshape((2, 3, 2))
array([[[ 1, 2],
[ 3, 4],
[ 5, 6]],
[[ 7, 8],
[ 9, 10],
[11, 12]]])
>>> a.reshape((2, 3, 2), order='F')
array([[[ 1, 4],
[ 2, 5],
[ 3, 6]],
[[ 7, 10],
[ 8, 11],
[ 9, 12]]])
1) To add a dimension to an array a of arbitrary dimensionality:
b = numpy.reshape (a, list (numpy.shape (a)) + [1])
Explanation:
You get the shape of a, turn it into a list, concatenate 1 to that list, and use that list as the new shape in a reshape operation.
2) To specify subdivisions of the dimensions, and have the size of the last dimension calculated automatically, use -1 for the size of the last dimension. e.g.:
b = numpy.reshape(a, [numpy.size(a,0)/2, numpy.size(a,1)/2, -1])
The shape of b in this case will be [214,144,4].
(obviously you could combine the two approaches if necessary):
b = numpy.reshape (a, numpy.append (numpy.array (numpy.shape (a))/2, -1))

Categories