Numpy ravel works well if I need to create a vector by reading by rows or by columns. However, I would like to transform a matrix to a 1d array, by using a method that is often used in image processing. This is an example with initial matrix A and final result B:
A = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
B = np.array([[ 0, 1, 4, 8, 5, 2, 3, 6, 9, 12, 13, 10, 7, 11, 14, 15])
Is there an existing function already that could help me with that? If not, can you give me some hints on how to solve this problem? PS. the matrix A is NxN.
I've been using numpy for several years, and I've never seen such a function.
Here's one way you could do it (not necessarily the most efficient):
In [47]: a
Out[47]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [48]: np.concatenate([np.diagonal(a[::-1,:], k)[::(2*(k % 2)-1)] for k in range(1-a.shape[0], a.shape[0])])
Out[48]: array([ 0, 1, 4, 8, 5, 2, 3, 6, 9, 12, 13, 10, 7, 11, 14, 15])
Breaking down the one-liner into separate steps:
a[::-1, :] reverses the rows:
In [59]: a[::-1, :]
Out[59]:
array([[12, 13, 14, 15],
[ 8, 9, 10, 11],
[ 4, 5, 6, 7],
[ 0, 1, 2, 3]])
(This could also be written a[::-1] or np.flipud(a).)
np.diagonal(a, k) extracts the kth diagonal, where k=0 is the main diagonal. So, for example,
In [65]: np.diagonal(a[::-1, :], -3)
Out[65]: array([0])
In [66]: np.diagonal(a[::-1, :], -2)
Out[66]: array([4, 1])
In [67]: np.diagonal(a[::-1, :], 0)
Out[67]: array([12, 9, 6, 3])
In [68]: np.diagonal(a[::-1, :], 2)
Out[68]: array([14, 11])
In the list comprehension, k gives the diagonal to be extracted. We want to reverse the elements in every other diagonal. The expression 2*(k % 2) - 1 gives the values 1, -1, 1, ... as k varies from -3 to 3. Indexing with [::1] leaves the order of the array being indexed unchanged, and indexing with [::-1] reverses the order of the array. So np.diagonal(a[::-1, :], k)[::(2*(k % 2)-1)] gives the kth diagonal, but with every other diagonal reversed:
In [71]: [np.diagonal(a[::-1,:], k)[::(2*(k % 2)-1)] for k in range(1-a.shape[0], a.shape[0])]
Out[71]:
[array([0]),
array([1, 4]),
array([8, 5, 2]),
array([ 3, 6, 9, 12]),
array([13, 10, 7]),
array([11, 14]),
array([15])]
np.concatenate() puts them all into a single array:
In [72]: np.concatenate([np.diagonal(a[::-1,:], k)[::(2*(k % 2)-1)] for k in range(1-a.shape[0], a.shape[0])])
Out[72]: array([ 0, 1, 4, 8, 5, 2, 3, 6, 9, 12, 13, 10, 7, 11, 14, 15])
I found discussion of zigzag scan for MATLAB, but not much for numpy. One project appears to use a hardcoded indexing array for 8x8 blocks
https://github.com/lot9s/lfv-compression/blob/master/scripts/our_mpeg/zigzag.py
ZIG = np.array([[0, 1, 5, 6, 14, 15, 27, 28],
[2, 4, 7, 13, 16, 26, 29, 42],
[3, 8, 12, 17, 25, 30, 41, 43],
[9, 11, 18, 24, 31, 40, 44,53],
[10, 19, 23, 32, 39, 45, 52,54],
[20, 22, 33, 38, 46, 51, 55,60],
[21, 34, 37, 47, 50, 56, 59,61],
[35, 36, 48, 49, 57, 58, 62,63]])
Apparently it's used jpeg and mpeg compression.
Related
I am using this for loop to separate dataset into groups. but the list "y" is converting into an array with an error.
def to_sequences(dataset, seq_size=1):
x = []
y = []
for i in range(len(dataset)-seq_size):
window = dataset[i:(i+seq_size), 0]
x.append(window)
window2 = dataset[(i+seq_size):i+seq_size+5, 0]
y.append(window2)
return np.array(x),np.array(y)
seq_size = 5
trainX, trainY = to_sequences(train, seq_size)
print("Shape of training set: {}".format(trainX.shape))
print("Shape of training set: {}".format(trainY.shape))
And this is the error message I get
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
return np.array(x),np.array(y)
Couldn't find the issue why it is working for 'x' and not for 'y'. Any idea ?
In [247]: dataset = np.arange(20)
In [248]: def to_sequences(dataset, seq_size=1):
...: x = []
...: y = []
...: for i in range(len(dataset)-seq_size):
...: window = dataset[i:(i+seq_size), 0]
...: x.append(window)
...: window2 = dataset[(i+seq_size):i+seq_size+5, 0]
...: y.append(window2)
...: return np.array(x),np.array(y)
...:
and a sample run:
In [250]: to_sequences(dataset[:,None], 5)
<ipython-input-248-176eb762993c>:9: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
return np.array(x),np.array(y)
Out[250]:
(array([[ 0, 1, 2, 3, 4],
[ 1, 2, 3, 4, 5],
[ 2, 3, 4, 5, 6],
[ 3, 4, 5, 6, 7],
[ 4, 5, 6, 7, 8],
[ 5, 6, 7, 8, 9],
[ 6, 7, 8, 9, 10],
[ 7, 8, 9, 10, 11],
[ 8, 9, 10, 11, 12],
[ 9, 10, 11, 12, 13],
[10, 11, 12, 13, 14],
[11, 12, 13, 14, 15],
[12, 13, 14, 15, 16],
[13, 14, 15, 16, 17],
[14, 15, 16, 17, 18]]),
array([array([5, 6, 7, 8, 9]), array([ 6, 7, 8, 9, 10]),
array([ 7, 8, 9, 10, 11]), array([ 8, 9, 10, 11, 12]),
array([ 9, 10, 11, 12, 13]), array([10, 11, 12, 13, 14]),
array([11, 12, 13, 14, 15]), array([12, 13, 14, 15, 16]),
array([13, 14, 15, 16, 17]), array([14, 15, 16, 17, 18]),
array([15, 16, 17, 18, 19]), array([16, 17, 18, 19]),
array([17, 18, 19]), array([18, 19]), array([19])], dtype=object))
The first array is (n,5) int dtype. The second is object dtype, containing arrays. Most of the arrays (5,), but the last ones are (4,),(3,),(2,),(1,).
dataset[(i+seq_size):i+seq_size+5, 0] is slicing off the end of dataset. Python/numpy allows that but the result is truncated.
You'll have to rethink that y slicing if you want a (n,5) shaped array.
Slicing off the end of a list:
In [252]: [1,2,3,4,5][1:4]
Out[252]: [2, 3, 4]
In [253]: [1,2,3,4,5][3:6]
Out[253]: [4, 5]
I am basically trying to achieve this, but need the unfolding to be done in a different fashion. i want all samples of the N-1th dimension to be concatenated. For example, if my unfolding were to be applied to an RGB image of (100,100,3) the new array would basically become a (100,300) where the 3 colour channel images are now side by side in the new array.
All my attempts to use a neat built in numpy function like flatten and concatenate yielded no results. (flatten, because the end goal is to apply this unfolding until it is a 1D array)
Can't even think of a slicing way of doing it in a loop since the starting number of dimensions isn't constant (array = array[:,...,:,0]+...+array[:,...,:,0])
EDIT
I just came up with this way of achieving what I want, but would still welcome better, more pure, numpy solutions.
shape = numpy.random.randint(100, size=numpy.random.randint(100))
array = numpy.random.uniform(size=shape)
array = array.T
for i in range(0, len(shape)-1, -1):
array = numpy.concatenate(array)
Am I right in deducing that you want flatten the last 2 dimensions of the array?
In [96]: shape = numpy.random.randint(10, size=numpy.random.randint(10))+1
In [97]: shape
Out[97]: array([2, 7, 2])
In [98]: newshape=tuple(shape[:-2])+(-1,)
In [99]: arr = np.arange(np.prod(shape)).reshape(shape)
In [100]: arr.shape
Out[100]: (2, 7, 2)
In [101]: arr.reshape(newshape).shape
Out[101]: (2, 14)
In [102]: arr.reshape(newshape)
Out[102]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]])
If you don't like the order of terms in the last dimension, you may need to transpose
In [109]: np.swapaxes(arr, -1,-2).reshape(newshape)
Out[109]:
array([[ 0, 2, 4, 6, 8, 10, 12, 1, 3, 5, 7, 9, 11, 13],
[14, 16, 18, 20, 22, 24, 26, 15, 17, 19, 21, 23, 25, 27]])
I can't test it against your code because range(0,len(shape)-1, -1) is an empty range.
I don't think you want
In [112]: np.concatenate(arr,axis=-1).shape
Out[112]: (7, 4)
In [113]: np.concatenate((arr[0,...],arr[1,...]), axis=-1)
Out[113]:
array([[ 0, 1, 14, 15],
[ 2, 3, 16, 17],
[ 4, 5, 18, 19],
[ 6, 7, 20, 21],
[ 8, 9, 22, 23],
[10, 11, 24, 25],
[12, 13, 26, 27]])
That splits the arr on the 1st axis, and then joins it on the last.
Using NumPy, I would like to produce a list of all lines and diagonals of an n-dimensional array with lengths of k.
Take the case of the following three-dimensional array with lengths of three.
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
For this case, I would like to obtain all of the following types of sequences. For any given case, I would like to obtain all of the possible sequences of each type. Examples of desired sequences are given in parentheses below, for each case.
1D lines
x axis (0, 1, 2)
y axis (0, 3, 6)
z axis (0, 9, 18)
2D diagonals
x/y axes (0, 4, 8, 2, 4, 6)
x/z axes (0, 10, 20, 2, 10, 18)
y/z axes (0, 12, 24, 6, 12, 18)
3D diagonals
x/y/z axes (0, 13, 26, 2, 13, 24)
The solution should be generalized, so that it will generate all lines and diagonals for an array, regardless of the array's number of dimensions or length (which is constant across all dimensions).
This solution generalized over n
Lets rephrase this problem as "find the list of indices".
We're looking for all of the 2d index arrays of the form
array[i[0], i[1], i[2], ..., i[n-1]]
Let n = arr.ndim
Where i is an array of shape (n, k)
Each of i[j] can be one of:
The same index repeated n times, ri[j] = [j, ..., j]
The forward sequence, fi = [0, 1, ..., k-1]
The backward sequence, bi = [k-1, ..., 1, 0]
With the requirements that each sequence is of the form ^(ri)*(fi)(fi|bi|ri)*$ (using regex to summarize it). This is because:
there must be at least one fi so the "line" is not a point selected repeatedly
no bis come before fis, to avoid getting reversed lines
def product_slices(n):
for i in range(n):
yield (
np.index_exp[np.newaxis] * i +
np.index_exp[:] +
np.index_exp[np.newaxis] * (n - i - 1)
)
def get_lines(n, k):
"""
Returns:
index (tuple): an object suitable for advanced indexing to get all possible lines
mask (ndarray): a boolean mask to apply to the result of the above
"""
fi = np.arange(k)
bi = fi[::-1]
ri = fi[:,None].repeat(k, axis=1)
all_i = np.concatenate((fi[None], bi[None], ri), axis=0)
# inedx which look up every possible line, some of which are not valid
index = tuple(all_i[s] for s in product_slices(n))
# We incrementally allow lines that start with some number of `ri`s, and an `fi`
# [0] here means we chose fi for that index
# [2:] here means we chose an ri for that index
mask = np.zeros((all_i.shape[0],)*n, dtype=np.bool)
sl = np.index_exp[0]
for i in range(n):
mask[sl] = True
sl = np.index_exp[2:] + sl
return index, mask
Applied to your example:
# construct your example array
n = 3
k = 3
data = np.arange(k**n).reshape((k,)*n)
# apply my index_creating function
index, mask = get_lines(n, k)
# apply the index to your array
lines = data[index][mask]
print(lines)
array([[ 0, 13, 26],
[ 2, 13, 24],
[ 0, 12, 24],
[ 1, 13, 25],
[ 2, 14, 26],
[ 6, 13, 20],
[ 8, 13, 18],
[ 6, 12, 18],
[ 7, 13, 19],
[ 8, 14, 20],
[ 0, 10, 20],
[ 2, 10, 18],
[ 0, 9, 18],
[ 1, 10, 19],
[ 2, 11, 20],
[ 3, 13, 23],
[ 5, 13, 21],
[ 3, 12, 21],
[ 4, 13, 22],
[ 5, 14, 23],
[ 6, 16, 26],
[ 8, 16, 24],
[ 6, 15, 24],
[ 7, 16, 25],
[ 8, 17, 26],
[ 0, 4, 8],
[ 2, 4, 6],
[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8],
[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 13, 17],
[11, 13, 15],
[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 22, 26],
[20, 22, 24],
[18, 21, 24],
[19, 22, 25],
[20, 23, 26],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26]])
Another good set of test data is np.moveaxis(np.indices((k,)*n), 0, -1), which gives an array where every value is its own index
I've solved this problem before to implement a higher dimensional tic-tac-toe
In [1]: x=np.arange(27).reshape(3,3,3)
Selecting individual rows is easy:
In [2]: x[0,0,:]
Out[2]: array([0, 1, 2])
In [3]: x[0,:,0]
Out[3]: array([0, 3, 6])
In [4]: x[:,0,0]
Out[4]: array([ 0, 9, 18])
You could iterate over dimensions with an index list:
In [10]: idx=[slice(None),0,0]
In [11]: x[idx]
Out[11]: array([ 0, 9, 18])
In [12]: idx[2]+=1
In [13]: x[idx]
Out[13]: array([ 1, 10, 19])
Look at the code for np.apply_along_axis to see how it implements this sort of iteration.
Reshape and split can also produce a list of rows. For some dimensions this might require a transpose:
In [20]: np.split(x.reshape(x.shape[0],-1),9,axis=1)
Out[20]:
[array([[ 0],
[ 9],
[18]]), array([[ 1],
[10],
[19]]), array([[ 2],
[11],
...
np.diag can get diagonals from 2d subarrays
In [21]: np.diag(x[0,:,:])
Out[21]: array([0, 4, 8])
In [22]: np.diag(x[1,:,:])
Out[22]: array([ 9, 13, 17])
In [23]: np.diag?
In [24]: np.diag(x[1,:,:],1)
Out[24]: array([10, 14])
In [25]: np.diag(x[1,:,:],-1)
Out[25]: array([12, 16])
And explore np.diagonal for direct application to the 3d. It's also easy to index the array directly, with range and arange, x[0,range(3),range(3)].
As far as I know there isn't a function to step through all these alternatives. Since dimensions of the returned arrays can differ, there's little point to producing such a function in compiled numpy code. So even if there was a function, it would step through the alternatives as I outlined.
==============
All the 1d lines
x.reshape(-1,3)
x.transpose(0,2,1).reshape(-1,3)
x.transpose(1,2,0).reshape(-1,3)
y/z diagonal and anti-diagonal
In [154]: i=np.arange(3)
In [155]: j=np.arange(2,-1,-1)
In [156]: np.concatenate((x[:,i,i],x[:,i,j]),axis=1)
Out[156]:
array([[ 0, 4, 8, 2, 4, 6],
[ 9, 13, 17, 11, 13, 15],
[18, 22, 26, 20, 22, 24]])
np.einsum can be used to build all these kind of expressions; for instance:
# 3d diagonals
print(np.einsum('iii->i', a))
# 2d diagonals
print(np.einsum('iij->ij', a))
print(np.einsum('iji->ij', a))
I would like to select every nth group of n columns in a numpy array. It means that I want the first n columns, not the n next columns, the n next columns, not the n next columns etc.
For example, with the following array and n=2:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])
I would like to get:
[[1, 2, 5, 6, 9, 10],
[11, 12, 15, 16, 19, 20]]
And with n=3:
[[1, 2, 3, 7, 8, 9],
[11, 12, 13, 17, 18, 19]]
With n=1 we can simply use the syntax arr[:,::2], but is there something similar for n>1?
You can use modulus to create ramps starting from 0 until 2n and then select the first n from each such ramp. Thus, for each ramp, we would have first n as True and rest as False, to give us a boolean array covering the entire length of the array. Then, we simply use boolean indexing along the columns to select the valid columns for the final output. Thus, the implementation would look something like this -
arr[:,np.mod(np.arange(arr.shape[-1]),2*n)<n]
Step by step code runs to give a better idea -
In [43]: arr
Out[43]:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])
In [44]: n = 3
In [45]: np.mod(np.arange(arr.shape[-1]),2*n)
Out[45]: array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3])
In [46]: np.mod(np.arange(arr.shape[-1]),2*n)<n
Out[46]: array([ True,True,True,False,False,False,True,True,True,False])
In [47]: arr[:,np.mod(np.arange(arr.shape[-1]),2*n)<n]
Out[47]:
array([[ 1, 2, 3, 7, 8, 9],
[11, 12, 13, 17, 18, 19]])
Sample runs across various n -
In [29]: arr
Out[29]:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])
In [30]: n = 1
In [31]: arr[:,np.mod(np.arange(arr.shape[-1]),2*n)<n]
Out[31]:
array([[ 1, 3, 5, 7, 9],
[11, 13, 15, 17, 19]])
In [32]: n = 2
In [33]: arr[:,np.mod(np.arange(arr.shape[-1]),2*n)<n]
Out[33]:
array([[ 1, 2, 5, 6, 9, 10],
[11, 12, 15, 16, 19, 20]])
In [34]: n = 3
In [35]: arr[:,np.mod(np.arange(arr.shape[-1]),2*n)<n]
Out[35]:
array([[ 1, 2, 3, 7, 8, 9],
[11, 12, 13, 17, 18, 19]])
I have a multidimensional NumPy array:
In [1]: m = np.arange(1,26).reshape((5,5))
In [2]: m
Out[2]:
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
and another array p = np.asarray([[1,1],[3,3]]). I wanted p to act as a array of indexes for m, i.e.:
m[p]
array([7, 19])
However I get:
In [4]: m[p]
Out[4]:
array([[[ 6, 7, 8, 9, 10],
[ 6, 7, 8, 9, 10]],
[[16, 17, 18, 19, 20],
[16, 17, 18, 19, 20]]])
How can I get the desired slice of m using p?
Numpy is using your array to index the first dimension only. As a general rule, indices for a multidimensional array should be in a tuple. This will get you a little closer to what you want:
>>> m[tuple(p)]
array([9, 9])
But now you are indexing the first dimension twice with 1, and the second twice with 3. To index the first dimension with a 1 and a 3, and then the second with a 1 and a 3 also, you could transpose your array:
>>> m[tuple(p.T)]
array([ 7, 19])