Collapse nD numpy array into a 1D array - python

I am trying to sum the values of a nD array along a particular axis to effectively collapse it into a 1D array.
I have been looking through the docs but haven't been able to find the right function. I will try to explain my question better with some code:
In [46]: g
Out[46]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
The output I need is:
array([5,10,15])
My actual data is a 7 MB file so I don't really want to use a for loop.
Thank you for your help

Just doing
numpy.sum(g, axis=0)
should work.

Related

Slicing Numpy Array by 2 index arrays

If I have a set of indices stored in two Numpy arrays, my goal is to slice a given input array based on corresponding indices in those index arrays. For eg.
index_arr1 = np.asarray([2,3,4])
index_arr2 = np.asarray([5,5,6])
input_arr = np.asarray([1,2,3,4,4,5,7,2])
The output to my code should be [[3,4,4],[4,4],[4,5]] which is basically [input_arr[2:5], input_arr[3:5], input_arr[4:6]]
Can anybody suggest a way to solve this problem using numpy functions and avoiding any for loops to be as efficient as possible.
Do you mean:
[input_arr[x:y] for x,y in zip(index_arr1, index_arr2)]
Output:
[array([3, 4, 4]), array([4, 4]), array([4, 5])]
Or if you really want list of lists:
[[input_arr[x:y].tolist() for x,y in zip(index_arr1, index_arr2)]
Output:
[[3, 4, 4], [4, 4], [4, 5]]

NumPy using the reshape function to reshape an array [duplicate]

This question already has an answer here:
how to reshape an N length vector to a 3x(N/3) matrix in numpy using reshape
(1 answer)
Closed 2 years ago.
I have an array: [1, 2, 3, 4, 5, 6]. I would like to use the numpy.reshape() function so that I end up with this array:
[[1, 4],
[2, 5],
[3, 6]
]
I'm not sure how to do this. I keep ending up with this, which is not what I want:
[[1, 2],
[3, 4],
[5, 6]
]
These do the same thing:
In [57]: np.reshape([1,2,3,4,5,6], (3,2), order='F')
Out[57]:
array([[1, 4],
[2, 5],
[3, 6]])
In [58]: np.reshape([1,2,3,4,5,6], (2,3)).T
Out[58]:
array([[1, 4],
[2, 5],
[3, 6]])
Normally values are 'read' across the rows in Python/numpy. This is call row-major or 'C' order. Read down is 'F', for FORTRAN, and is common in MATLAB, which has Fortran roots.
If you take the 'F' order, make a new copy and string it out, you'll get a different order:
In [59]: np.reshape([1,2,3,4,5,6], (3,2), order='F').copy().ravel()
Out[59]: array([1, 4, 2, 5, 3, 6])
You can set the order in np.reshape, in your case you can use 'F'. See docs for details
>>> arr
array([1, 2, 3, 4, 5, 6])
>>> arr.reshape(-1, 2, order = 'F')
array([[1, 4],
[2, 5],
[3, 6]])
The reason that you are getting that particular result is that arrays are normally allocates in C order. That means that reshaping by itself is not sufficient. You have to tell numpy to change the order of the axes when it steps along the array. Any number of operations will allow you to do that:
Set the axis order to F. F is for Fortran, which, like MATLAB, conventionally uses column-major order:
a.reshape(2, 3, order='F')
Swap the axes after reshaping:
np.swapaxes(a.reshape(2, 3), 0, 1)
Transpose the result:
a.reshape(2, 3).T
Roll the second axis forward:
np.rollaxis(a.reshape(2, 3), 1)
Notice that all but the first case require you to reshape to the transpose.
You can even manually arrange the data
np.stack((a[:3], a[3:]), axis=1)
Note that this will make many unnecessary copies. If you want the data copied, just do
a.reshape(2, 3, order='F').copy()

No fortran order in numpy.array

I see no fortran order in:
import numpy as np
In [143]: np.array([[1,2],[3,4]],order='F')
Out[143]:
array([[1, 2],
[3, 4]])
But in the following it works:
In [139]: np.reshape(np.arange(9),newshape=(3,3),order='F')
Out[139]:
array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])
So what am I doing wrong in the first one?
When you call numpy.array to create an array from an existing Python object, it will give you an object with whatever shape that the original Python object has. So,
np.array([[1,2],[3,4]], ...)
Will always give you,
np.array([[1, 2],
[3, 4]])
Which is exactly what you typed in, so it should not come as a surprise. Fortran order and C order do not describe the shape of the data, they describe the memory layout. When you print out an object, NumPy doesn't show you what the memory layout is, it only shows you the shape.
You can witness that the array truly is stored in Fortran order when you flatten it with the "K" order, which keeps the original order of the elements:
>>> a = np.array([[1,2],[3,4]], order="F")
>>> a.flatten(order="K")
array([1, 3, 2, 4])
This is what truly distinguishes Fortran from C order: the memory layout. Most NumPy functions do not force you to consider memory layout, instead, different layouts are handled transparently.
It sounds like what you want is to transpose, reversing the axis order. This can be done simply:
>>> b = numpy.transpose(a)
>>> b
array([[1, 3],
[2, 4]])
This does not create a new array, but a new view of the same array:
>>> b.base is a
True
If you want the data to have the memory layout 1 2 3 4 and have a Fortran order view of that [[1, 3], [2, 4]], the efficient way to do this is to store the existing array with C order and then transpose it, which results in a Fortran-order array with the desired contents and requires no extra copies.
>>> a = np.array([[1, 2], [3, 4]]).transpose()
>>> a.flatten(order="K")
array([1, 2, 3, 4])
>>> a
array([[1, 3],
[2, 4]])
If you store the original with Fortran order, the transposition will result in C order, so you don't want that (or maybe all you care about is the transposition, and memory order is not important?). In either case, the array will look the same in NumPy.
>>> a = np.array([[1, 2], [3, 4]], order="F").transpose()
>>> a.flatten(order="K")
array([1, 3, 2, 4])
>>> a
array([[1, 3],
[2, 4]])
Your two means of constructing the 2D array are not at all equivalent. In the first, you specified the structure of the array. In the second, you formed an array and then reshaped to your liking.
>>> np.reshape([1,2,3,4],newshape=(2,2),order='F')
array([[1, 3],
[2, 4]])
Again, for comparison, even if you ask for the reshape and format change to FORTRAN, you'll get your specified structure:
>>> np.reshape([[1,2],[3,4]],newshape=(2,2),order='F')
array([[1, 2],
[3, 4]])

How to split my numpy array

So I'm pretty new to numpy, and I'm trying working on a project, but have encountered an error that I can't seem to solve.
Imagine we had an NDarray in the following format
[4,5,6,1]
[3,5,2,0]
[4,7,3,1]
How would I split it into two parts such that the first part is:
[4,5,6]
[3,5,2]
[4,7,3]
and the second part is
[1,0,1]
I know the solution must be pretty simple but I can't seem to figure it out
Thanks in advance!
Try:
a = np.array([[4,5,6,1],
[3,5,2,0],
[4,7,3,1]])
b,c = a[:,:-1], a[:,-1]
This uses numpy's slicing to keep all rows and split the columns on the last one.
>>> import numpy as np
>>> a=np.array([[4,5,6,1],[3,5,2,0],[4,7,3,1]])
>>> a
array([[4, 5, 6, 1],
[3, 5, 2, 0],
[4, 7, 3, 1]])
>>> b=a[:,0:3]
>>> b
array([[4, 5, 6],
[3, 5, 2],
[4, 7, 3]])
>>> c=a[:,3]
>>> c
array([1, 0, 1])
>>>
This is something called array slice in python, not too much about numpy.
For more details about array slice, see Explain Python's slice notation

Numpy: Concatenating multidimensional and unidimensional arrays

I have a 2x2 numpy array :
x = array(([[1,2],[4,5]]))
which I must merge (or stack, if you wish) with a one-dimensional array :
y = array(([3,6]))
by adding it to the end of the rows, thus making a 2x3 numpy array that would output like so :
array([[1, 2, 3],
[4, 5, 6]])
now the proposed method for this in the numpy guides is :
hstack((x,y))
however this doesn't work, returning the following error :
ValueError: arrays must have same number of dimensions
The only workaround possible seems to be to do this :
hstack((x, array(([y])).T ))
which works, but looks and sounds rather hackish. It seems there is not other way to transpose the given array, so that hstack is able to digest it. I was wondering, is there a cleaner way to do this? Wouldn't there be a way for numpy to guess what I wanted to do?
unutbu's answer works in general, but in this case there is also np.column_stack
>>> x
array([[1, 2],
[4, 5]])
>>> y
array([3, 6])
>>> np.column_stack((x,y))
array([[1, 2, 3],
[4, 5, 6]])
Also works:
In [22]: np.append(x, y[:, np.newaxis], axis=1)
Out[22]:
array([[1, 2, 3],
[4, 5, 6]])

Categories