How to add a dimension to a numpy array in Python - python

I have an array that is size (214, 144). I need it to be (214,144,1) is there a way to do this easily in Python? Basically the dimensions are supposed to be (Days, Times, Stations). Since I only have 1 station's data that dimension would be a 1. However if I could also make the code flexible enough work for say 2 stations that would be great (e.g. changing the dimension size from (428,288) to (214,144,2)) that would be great!

You could use reshape:
>>> a = numpy.array([[1,2,3,4,5,6],[7,8,9,10,11,12]])
>>> a.shape
(2, 6)
>>> a.reshape((2, 6, 1))
array([[[ 1],
[ 2],
[ 3],
[ 4],
[ 5],
[ 6]],
[[ 7],
[ 8],
[ 9],
[10],
[11],
[12]]])
>>> _.shape
(2, 6, 1)
Besides changing the shape from (x, y) to (x, y, 1), you could use (x, y/n, n) as well, but you may want to specify the column order depending on the input:
>>> a.reshape((2, 3, 2))
array([[[ 1, 2],
[ 3, 4],
[ 5, 6]],
[[ 7, 8],
[ 9, 10],
[11, 12]]])
>>> a.reshape((2, 3, 2), order='F')
array([[[ 1, 4],
[ 2, 5],
[ 3, 6]],
[[ 7, 10],
[ 8, 11],
[ 9, 12]]])

1) To add a dimension to an array a of arbitrary dimensionality:
b = numpy.reshape (a, list (numpy.shape (a)) + [1])
Explanation:
You get the shape of a, turn it into a list, concatenate 1 to that list, and use that list as the new shape in a reshape operation.
2) To specify subdivisions of the dimensions, and have the size of the last dimension calculated automatically, use -1 for the size of the last dimension. e.g.:
b = numpy.reshape(a, [numpy.size(a,0)/2, numpy.size(a,1)/2, -1])
The shape of b in this case will be [214,144,4].
(obviously you could combine the two approaches if necessary):
b = numpy.reshape (a, numpy.append (numpy.array (numpy.shape (a))/2, -1))

Related

Multiple numpy arrays to bytes

Input/Output:
[array([[ 2.120417 , -13.725279 ],
[ 2.066555 , -13.953174 ]], dtype=float32)
array([[ 1.952603, 6.800025],
[ 1.952603, 6.800025]], dtype=float32)
b"\x40\x07\xb4\xea\xc1\x5b\x9a\xbe\x3f\xf9\xee\xe5\x40\xd9\x99\xce\x40\x04\x42\x70\xc1\x5f\x40\x33\x3f\xf9\xee\xe5\x40\xd9\x99\xce"
Each array contains multiple x, y coordinates (floats). I want to go through one element in an array (one element contains a set of x, y coords) and then the next array at the same index, then after all arrays have been gone through the first index, then the next.
IIUC, you can hstack and ravel:
np.hstack([arr1, arr2, arr3]).ravel()
Output:
array([ 0, 1, 4, 5, 8, 9, 2, 3, 6, 7, 10, 11])
Used input ([arr1, arr2, arr3]):
[array([[0, 1],
[2, 3]]),
array([[4, 5],
[6, 7]]),
array([[ 8, 9],
[10, 11]])]

Adding an additional dimension to ndarray

I have and ndarray defined in the following way:
dataset = np.ndarray(shape=(len(image_files), image_size, image_size),
dtype=np.float32)
This array represents a collection of images of size image_size * image_size.
So I can say, dataset[0] and get a 2D table corresponding to an image with index 0.
Now I would like to have one additional field for each image in this array. For instance, for image located at index 0, I would like to store number 123, for an image located at index 321 I would like to store number 50000.
What is the simplest way to add this additional data field to the existing ndarray?
What is the appropriate way to access data in the new array after adding this additional dimension?
If you shuffle an index array instead of the dataset itself, you can keep track of the original 'identifiers'
idx = np.arange(len(image_files))
np.random.shuffle(idx)
shuffle_set = dataset[idx]
illustration:
In [20]: x = np.arange(12).reshape(6,2)
...: idx = np.arange(6)
...: np.random.shuffle(idx)
In [21]: x
Out[21]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
In [22]: x[idx] # shuffled
Out[22]:
array([[ 4, 5],
[ 0, 1],
[ 2, 3],
[ 6, 7],
[10, 11],
[ 8, 9]])
In [23]: idx1=np.argsort(idx)
In [24]: idx
Out[24]: array([2, 0, 1, 3, 5, 4])
In [25]: idx1
Out[25]: array([1, 2, 0, 3, 5, 4])
In [26]: Out[22][idx1] # recover original order
Out[26]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
Numpy arrays are fundamentally tensors, i.e., they have a shape that is absolute across the axes. Meaning that the shape is fixed and not variable. Take for example,
import numpy as np
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]]
])
print(x.shape) #Here we have two, 2x2s. Shape = (2,2,2)
If I want to associate x[0] to the number 5 and x[1] to the number 7, then that would be something like (if it was possible):
x = np.array([[[1,2],[3,4]],5,
[[5,6],[7,8]],7
])
But such thing is impossible, since it would "in some sense" have a shape that corresponds to (2,((2,2),1)), or something else that is ambiguous. Such an object is not a numpy array or a tensor. It doesn't have fixed axis sizes. All numpy arrays must have fixed axis sizes. Hence, if you wish to store the new information, the only way to do it, is to create another array.
x = np.array([[[1,2],[3,4]],
[[5,6],[7,8]],
])
y = np.array([5,7])
Now x[0] corresponds to y[0] and x[1] corresponds to y[1]. x has shape (2,2,2) and y has shape (2,).

Numpy concatenate lists where first column is in range n

I am trying to select all rows in a numpy matrix named matrix with shape (25323, 9), where the values of the first column are inside the range of start and end for each tuple on the list range_tuple. Ultimately, I want to create a new numpy matrix with the result where final has a shape of (n, 9). The following code returns this error: TypeError: only integer scalar arrays can be converted to a scalar index. I have also tried initializing final with numpy.zeros((1,9)) and used np.concatenate but get similar results. I do get a compiled result when I use final.append(result) instead of using np.concatenate but the shape of the matrix gets lost. I know there is a proper solution to this problem, any help would be appreciated.
final = []
for i in range_tuples:
copy = np.copy(matrix)
start = i[0]
end = i[1]
result = copy[(matrix[:,0] < end) & (matrix[:,0] > start)]
final = np.concatenate(final, result)
final = np.matrix(final)
In [33]: arr
Out[33]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]])
In [34]: tups = [(0,6),(3,12),(9,10),(15,14)]
In [35]: alist=[]
...: for start, stop in tups:
...: res = arr[(arr[:,0]<stop)&(arr[:,0]>=start), :]
...: alist.append(res)
...:
check the list; note that elements differ in shape; some are 1 or 0 rows. It's a good idea to test these edge cases.
In [37]: alist
Out[37]:
[array([[0, 1, 2],
[3, 4, 5]]), array([[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]]), array([[ 9, 10, 11]]), array([], shape=(0, 3), dtype=int64)]
vstack joins them:
In [38]: np.vstack(alist)
Out[38]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[ 9, 10, 11]])
Here concatenate also works, because default axis is 0, and all inputs are already 2d.
Try the following
final = np.empty((0,9))
for start, stop in range_tuples:
result = matrix[(matrix[:,0] < end) & (matrix[:,0] > start)]
final = np.concatenate((final, result))
The first is to initialize final as a numpy array. The first argument to concatenate has to be a python list of the arrays, see docs. In your code it interprets the result variable as the value for the parameter axis
Notes
I used tuple deconstruction to make the loop clearer
the copy is not needed
appending lists can be faster. The final result can afterwards be obtained through reshaping, if result is always of the same length.
I would simply create a boolean mask to select rows that satisfy required conditions.
EDIT: I missed that you are working with matrix (as opposite to ndarray). Answer was edited for matrix.
Assume following input data:
matrix = np.matrix([[1, 2, 3], [5, 6, 7], [2, 1, 7], [3, 4, 5], [8, 9, 0]])
range_tuple = [(0, 2), (1, 4), (1, 9), (5, 9), (0, 100)]
Then, first, I would convert range_tuple to a numpy.ndarray:
range_mat = np.matrix(range_tuple)
Now, create the mask:
mask = np.ravel((matrix[:, 0] > range_mat[:, 0]) & (matrix[:, 0] < range_mat[:, 1]))
Apply the mask:
final = matrix[mask] # or matrix[mask].copy() if you intend to modify matrix
To check:
print(final)
[[1 2 3]
[2 1 7]
[8 9 0]]
If length of range_tuple can be different from the number of rows in the matrix, then do this:
n = min(range_mat.shape[0], matrix.shape[0])
mask = np.pad(
np.ravel(
(matrix[:n, 0] > range_mat[:n, 0]) & (matrix[:n, 0] < range_mat[:n, 1])
),
(0, matrix.shape[0] - n)
)
final = matrix[mask]

Broadcasting using numpy's sum function

I was reading about broadcasting and was trying to understand it using numpy's sum function.
I created two matrices :
m1 = np.array([[1,2,3],[4,5,6]]) # 3X2
m2 = np.array([[1],[2]]) # 2X1
When I add the above two as :
m1 + m2
broadcasting is done as the column vector [1],[2] replicates itself equal to the number of columns inside m1 matrix. Is it also possible to see broadcasting using np.sum(m1,m2) ? I assume there is no difference between m1 + m2 and np.sum(m1,m2). But currently np.sum(m1,m2) throws an error TypeError: only integer scalar arrays can be converted to a scalar index.
Can't I have numpy to perform broadcasting if I use its sum function?
numpy.sum does not add two arrays, it computes the sum over one (or multiple, or, by default, all) axis of an array. The second argument is which axis to sum over and a multi-dimensional array does not work for that.
These are examples of how numpy.sum works:
m1 = np.arange(12).reshape((3,4))
# sum all entries
np.sum(m1) # 66
# sum along the first axis, getting a result for each column
np.sum(m1, 0) # array([12, 15, 18, 21])
m2 = np.arange(12).reshape((2,3,2))
# sum along two of the three axes
m2.sum((1,2)) # array([15, 51])
What you might be looking for is numpy.add. This adds together two arrays (just like +) but allows adding some constraints (when giving it an out array you can mask certain fields so they will not get filled with the result of the addition). Otherwise it behaves how you would expect it to behave if you know the numpy broadcasting rules:
m1 = np.array([[1,2,3],[4,5,6]]) # 3X2
m2 = np.array([[1],[2]]) # 2X1
m1 + m2
# array([[2, 3, 4],
# [6, 7, 8]])
np.add(m1, m2)
# array([[2, 3, 4],
# [6, 7, 8]])
And here an example of the more fancy usage:
m1 = m1.astype(float)
m1[1, 1] = np.inf
m1
# array([[ 1., 2., 3.],
# [ 4., inf, 6.]])
out = np.zeros_like(m1)
where = np.ones_like(m1, dtype=bool)
where[1, 1] = False # don't want that infinity in the sum
np.add(m1, m2, out, where=where)
# array([[ 2., 3., 4.],
# [ 6., 0., 8.]])
You actually kind of can make sum broadcast:
>>> import numpy as np
>>>
>>> a, b, c = np.ogrid[:2, :3, :4]
>>> d = b*c
>>> list(map(np.shape, (a, b, c, d)))
[(2, 1, 1), (1, 3, 1), (1, 1, 4), (1, 3, 4)]
>>>
>>> a+b+c+d
array([[[ 0, 1, 2, 3],
[ 1, 3, 5, 7],
[ 2, 5, 8, 11]],
[[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12]]])
>>> np.sum([a, b, c, d])
array([[[ 0, 1, 2, 3],
[ 1, 3, 5, 7],
[ 2, 5, 8, 11]],
[[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12]]])
I suspect this creates a 4-element array of dtype object and then delegates the actual summing to the element arrays.
Unfortunately, the array factory can at times be capricious with this kind of array-of-arrays:
And, indeed, we can use an example known to defeat np.array to trip up np.sum, even though the actual error doesn't appear to happen in np.array:
>>> np.sum([np.arange(3), 1]) # fine
array([1, 2, 3])
>>> np.sum([1, np.arange(3)]) # ouch!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/paul/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1882, in sum
out=out, **kwargs)
File "/home/paul/lib/python3.6/site-packages/numpy/core/_methods.py", line 32, in _sum
return umr_sum(a, axis, dtype, out, keepdims)
ValueError: setting an array element with a sequence.
So, on balance, it is probably better to go with the builtin Python sum:
>>> sum([a, b, c, d])
array([[[ 0, 1, 2, 3],
[ 1, 3, 5, 7],
[ 2, 5, 8, 11]],
[[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12]]])
>>> sum([1, np.arange(3)])
array([1, 2, 3])
>>> sum([np.arange(3), 1])
array([1, 2, 3])

How to use numpy as_strided (from np.stride_tricks) correctly?

I'm trying to reshape a numpy array using numpy.strided_tricks. This is the guide I'm following: https://stackoverflow.com/a/2487551/4909087
My use case is very similar, with the difference being that I need strides of 3.
Given this array:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
I'd like to get:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
Here's what I tried:
import numpy as np
as_strided = np.lib.stride_tricks.as_strided
a = np.arange(1, 10)
as_strided(a, (len(a) - 2, 3), (3, 3))
array([[ 1, 2199023255552, 131072],
[ 2199023255552, 131072, 216172782113783808],
[ 131072, 216172782113783808, 12884901888],
[216172782113783808, 12884901888, 768],
[ 12884901888, 768, 1125899906842624],
[ 768, 1125899906842624, 67108864],
[ 1125899906842624, 67108864, 4]])
I was pretty sure I'd followed the example to a T, but evidently not. Where am I going wrong?
The accepted answer (and discussion) is good, but for the benefit of readers who don't want to run their own test case, I'll try to illustrate what's going on:
In [374]: a = np.arange(1,10)
In [375]: as_strided = np.lib.stride_tricks.as_strided
In [376]: a.shape
Out[376]: (9,)
In [377]: a.strides
Out[377]: (4,)
For a contiguous 1d array, strides is the size of the element, here 4 bytes, an int32. To go from one element to the next it steps forward 4 bytes.
What the OP tried:
In [380]: as_strided(a, shape=(7,3), strides=(3,3))
Out[380]:
array([[ 1, 512, 196608],
[ 512, 196608, 67108864],
[ 196608, 67108864, 4],
[ 67108864, 4, 1280],
[ 4, 1280, 393216],
[ 1280, 393216, 117440512],
[ 393216, 117440512, 7]])
This is stepping by 3 bytes, crossing int32 boundaries, and giving mostly unintelligable numbers. If might make more sense if the dtype had been bytes or uint8.
Instead using a.strides*2 (tuple replication), or (4,4) we get the desired array:
In [381]: as_strided(a, shape=(7,3), strides=(4,4))
Out[381]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
Columns and rows both step one element, resulting in a 1 step moving window. We could have also set shape=(3,7), 3 windows 7 elements long.
In [382]: _.strides
Out[382]: (4, 4)
Changing strides to (8,4) steps 2 elements for each window.
In [383]: as_strided(a, shape=(7,3), strides=(8,4))
Out[383]:
array([[ 1, 2, 3],
[ 3, 4, 5],
[ 5, 6, 7],
[ 7, 8, 9],
[ 9, 25, -1316948568],
[-1316948568, 184787224, -1420192452],
[-1420192452, 0, 0]])
But shape is off, showing us bytes off the end of the original databuffer. That could be dangerous (we don't know if those bytes belong to some other object or array). With this size of array we don't get a full set of 2 step windows.
Now step 3 elements for each row (3*4, 4):
In [384]: as_strided(a, shape=(3,3), strides=(12,4))
Out[384]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [385]: a.reshape(3,3).strides
Out[385]: (12, 4)
This is the same shape and strides as a 3x3 reshape.
We can set negative stride values and 0 values. In fact, negative-step slicing along a dimension with a positive stride will give a negative stride, and broadcasting works by setting 0 strides:
In [399]: np.broadcast_to(a, (2,9))
Out[399]:
array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [400]: _.strides
Out[400]: (0, 4)
In [401]: a.reshape(3,3)[::-1,:]
Out[401]:
array([[7, 8, 9],
[4, 5, 6],
[1, 2, 3]])
In [402]: _.strides
Out[402]: (-12, 4)
However, negative strides require adjusting which element of the original array is the first element of the view, and as_strided has no parameter for that.
I have no idea why you think you need strides of 3. You need strides the distance in bytes between one element of a and the next, which you can get using a.strides:
as_strided(a, (len(a) - 2, 3), a.strides*2)
I was trying to do a similar operation and run into the same problem.
In your case, as stated in this comment, the problems were:
You were not taking into account the size of your element when stored in memory (int32 = 4, which can be checked using a.dtype.itemsize).
You didn't specify appropriately the number of strides you had to skip, which in your case were also 4, as you were skipping only one element.
I made myself a function based on this answer, in which I compute the segmentation of a given array, using a window of n-elements and specifying the number of elements to overlap (given by window - number_of_elements_to_skip).
I share it here in case someone else needs it, since it took me a while to figure out how stride_tricks work:
def window_signal(signal, window, overlap):
"""
Windowing function for data segmentation.
Parameters:
------------
signal: ndarray
The signal to segment.
window: int
Window length, in samples.
overlap: int
Number of samples to overlap
Returns:
--------
nd-array
A copy of the signal array with shape (rows, window),
where row = (N-window)//(window-overlap) + 1
"""
N = signal.reshape(-1).shape[0]
if (window == overlap):
rows = N//window
overlap = 0
else:
rows = (N-window)//(window-overlap) + 1
miss = (N-window)%(window-overlap)
if(miss != 0):
print('Windowing led to the loss of ', miss, ' samples.')
item_size = signal.dtype.itemsize
strides = (window - overlap) * item_size
return np.lib.stride_tricks.as_strided(signal, shape=(rows, window),
strides=(strides, item_size))
The solution for this case is, according to your code:
as_strided(a, (len(a) - 2, 3), (4, 4))
Alternatively, using the function window_signal:
window_signal(a, 3, 2)
Both return as output the following array:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])

Categories