How to put 3D and 2D arrays into pandas DataFrame - python

I'm having problems with putting 3D and 2D arrays into a dataframe.
The first array is (2905, 150, 150);
The second array is (2905, 3)
I want a dataframe with 2905 rows, in which I have, for each item, 2D (150,150) arrays for column 1 and 1D arrays (3) for column 2.
Working with a smaller example ((3,2,2) for my 3D array and (3,2) for my 2D array) but, with the same intention:
a = [[[2,6],[1,95]],[[88,42],[21,90]],[[54,78],[47,70]]]
a = np.array(a)
b = [[1,0],[0,0],[0,1]]
b = np.array(b)
3D (a) and 2D (b) arrays
I want this as a result:
Expected result
I wanna find a way I could do this iteratively. Anyone has any ideia?
Thank you so much!

this can be easily done by
pd.DataFrame(zip(a,b))
result
0 1
0 [[2, 6], [1, 95]] [1, 0]
1 [[88, 42], [21, 90]] [0, 0]
2 [[54, 78], [47, 70]] [0, 1]

Related

How to convert 2D matrix to 3D by embeddings in numpy? [duplicate]

I'm trying to index the last dimension of a 3D matrix with a matrix consisting of indices that I wish to keep.
I have a matrix of thrust values with shape:
(3, 3, 5)
I would like to filter the last index according to some criteria so that it is reduced from size 5 to size 1. I have already found the indices in the last dimension that fit my criteria:
[[0 0 1]
[0 0 1]
[1 4 4]]
What I want to achieve: for the first row and first column I want the 0th index of the last dimension. For the first row and third column I want the 1st index of the last dimension. In terms of indices to keep the final matrix will become a (3, 3) 2D matrix like this:
[[0,0,0], [0,1,0], [0,2,1];
[1,0,0], [1,1,0], [1,2,1];
[2,0,1], [2,1,4], [2,2,4]]
I'm pretty confident numpy can achieve this, but I'm unable to figure out exactly how. I'd rather not build a construction with nested for loops.
I have already tried:
minValidPower = totalPower[:, :, tuple(indexMatrix)]
But this results in a (3, 3, 3, 3) matrix, so I am not entirely sure how I'm supposed to approach this.
With a as input array and idx as the indexing one -
np.take_along_axis(a,idx[...,None],axis=-1)[...,0]
Alternatively, with open-grids -
I,J = np.ogrid[:idx.shape[0],:idx.shape[1]]
out = a[I,J,idx]
You can build corresponding index arrays for the first two dimensions. Those would basically be:
[0 1 2]
[0 1 2]
[0 1 2]
[0 0 0]
[1 1 1]
[2 2 2]
You can construct these using the meshgrid function. I stored them as m1, and m2 in the example:
vals = np.arange(3*3*5).reshape(3, 3, 5) # test sample
m1, m2 = np.meshgrid(range(3), range(3), indexing='ij')
m3 = np.array([[0, 0, 1], 0, 0, 1], [1, 4, 4]])
sel_vals = vals[m1, m2, m3]
The shape of the result matches the shape of the indexing arrays m1, m2 and m3.

How to access multiple values of 3D numpy array using array of 3D indices

I happen to have a 3D array (referred to as array A) with shape (360,360,360) which I would like to access using another array (referred to as array B) with shape (259200,3). Array B is composed of 259200 3-D arrays that represent indices of array A.
Is there a way for me to quickly generate a numpy array that contains the selection of 3D array A values that are associated with the indices in array B? I'm trying to avoid having to write an intensive for loop like this:
SubArray = []
for i in range(0,len(ArrayB)):
Val = ArrayA[ArrayB[i][0],ArrayB[i][1],ArrayB[i][2]]
SubArray.append(Val)
I have almost the same question but didn't have solution.
If I understand correctly, for example you have 1D numpy array
Input:
import numpy as np
np.random.seed(0)
arr = np.random.randint(0, 10, size=20)
Output:
array([5, 0, 3, 3, 7, 9, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1])
Input:
arr[ np.array([0, 5, 10]) ]
Output:
array([5, 9, 7])
But I have problems with accessing to 3D numpy array. So, I 'd like to set specified indexes for 3D array and get values.
For example, I have 3D array data with shape (512, 221, 35) and I have 2D array table with columns: ir, iphi, itheta, which are integer indexes for data.
For example:
| table | ir | iphi | itheta
| ------|------|-------|--------
| row0 | 0 | 0 | 0
| row1 | 1 | 80 | 15
| row2 | 50 | 151 | 18
...
So I tried to get access by the same way
data[ [[0, 0, 0], [1, 80, 15]]], i.e.
data[ array_of_3D_indices] but I had the incorrect result as an array with shape (3, 35) but actually I expected 1D array with shape (2,), where every triplet ( 3 indeces from 3d array) corresponds to 1 number from data.
SOLUTION.
I tried to get it by data[ [ir1, iphi1, itheata1], [ir2, iphi2, itheata2]] and get wrong result,
but when tried data[ [ir1, ir2], [iphi1, iphi2], [itheta1, itheta2]] got correct answer!
You should join the same dimensions in one array of indexes and get 3 arrays of indices.
for example:
data[ [0,1,50], [0,80,151], [0,15,18]] will have shape (3,)
And this is without for loop
P.S. better to write:
data[ tuple( table )], whete table is 2D array of indices

Slicing a 2D NumPy Array by all zero rows

This is essentially the 2D array equivalent of slicing a python list into smaller lists at indexes that store a particular value. I'm running a program that extracts a large amount of data out of a CSV file and copies it into a 2D NumPy array. The basic format of these arrays are something like this:
[[0 8 9 10]
[9 9 1 4]
[0 0 0 0]
[1 2 1 4]
[0 0 0 0]
[1 1 1 2]
[39 23 10 1]]
I want to separate my NumPy array along rows that contain all zero values to create a set of smaller 2D arrays. The successful result for the above starting array would be the arrays:
[[0 8 9 10]
[9 9 1 4]]
[[1 2 1 4]]
[[1 1 1 2]
[39 23 10 1]]
I've thought about simply iterating down the array and checking if the row has all zeros but the data I'm handling is substantially large. I have potentially millions of rows of data in the text file and I'm trying to find the most efficient approach as opposed to a loop that could waste computation time. What are your thoughts on what I should do? Is there a better way?
a is your array. You can use any to find all zero rows, remove them, and then use split to split by their indices:
#not_all_zero rows indices
idx = np.flatnonzero(a.any(1))
#all_zero rows indices
idx_zero = np.delete(np.arange(a.shape[0]),idx)
#select not_all_zero rows and split by all_zero row indices
output = np.split(a[idx],idx_zero-np.arange(idx_zero.size))
output:
[array([[ 0, 8, 9, 10],
[ 9, 9, 1, 4]]),
array([[1, 2, 1, 4]]),
array([[ 1, 1, 1, 2],
[39, 23, 10, 1]])]
You can use the np.all function to check for rows which are all zeros, and then index appropriately.
# assume `x` is your data
indices = np.all(x == 0, axis=1)
zeros = x[indices]
nonzeros = x[np.logical_not(indices)]
The all function accepts an axis argument (as do many NumPy functions), which indicates the axis along which to operate. 1 here means to do the reduction along rows, so you get back a boolean array of shape (x.shape[0],), which can be used to directly index x.
Note that this will be much faster than a for-loop over the rows, especially for large arrays.

Python: Access saved points from 2d array in 3d numpy array

I got a 2d numpy array (shape(y,x)=601,1200) and a 3d numpy array (shape(z,y,x)=137,601,1200).
In my 2d array, I saved the z values at the y, x point which I now want to access from my 3d array and save it into a new 2d array.
I tried something like this without success.
levels = array2d.reshape(-1)
y = np.arange(601)
x = np.arange(1200)
newArray2d=oldArray3d[levels,y,x]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (721200,) (601,) (1200,)
I don't want to try something with loops, so is there any faster method?
This is the data you have:
x_len = 12 # In your case, 1200
y_len = 6 # In your case, 601
z_len = 3 # In your case, 137
import numpy as np
my2d = np.random.randint(0,z_len,(y_len,x_len))
my3d = np.random.randint(0,5,(z_len,y_len,x_len))
This is one way to build your new 2d array:
yindices,xindices = np.indices(my2d.shape)
new2d = my3d[my2d[:], yindices, xindices]
Notes:
We're using Integer Advanced Indexing.
This means we index the 3d array my3d with 3 integer index arrays.
For more explanation on how integer array indexing works, please refer to my answer on this other question
In your attempt, there was no need to reshape your 2d with reshape(-1), since the shape of the integer index array that we pass, will (after any broadcasting) become the shape of the resulting 2d array.
Also, in your attempt, your second and third index arrays need to have opposite orientations. That is, they must be of shape (y_len,1) and (1, x_len). Notice the different positions of the 1. This ensures that these two index arrays will get broadcasted
There's some vagueness in your question, but I think you want to advanced indexing like this:
In [2]: arr = np.arange(24).reshape(4,3,2)
In [3]: levels = np.random.randint(0,4,(3,2))
In [4]: levels
Out[4]:
array([[1, 2],
[3, 1],
[0, 2]])
In [5]: arr
Out[5]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]],
[[18, 19],
[20, 21],
[22, 23]]])
In [6]: arr[levels, np.arange(3)[:,None], np.arange(2)]
Out[6]:
array([[ 6, 13],
[20, 9],
[ 4, 17]])
levels is (3,2). I created the other 2 indexing arrays to they broadcast with it (3,1) and (2,). The result is a (3,2) array of values from arr, selected by their combined indices.

Numpy indexing multidimensional arrays with array and slice

My doubt is about this example in the numpy docs.
y = np.arange(35).reshape(5,7)
This is the operation that I am trying to clarify:
y[np.array([0,2,4]),1:3]
According to the docs:
"In effect, the slice is converted to an index array np.array([[1,2]]) (shape (1,2)) that is broadcast with the index array to produce a resultant array of shape (3,2)."
This does not work, so I am assuming it is not equivalent
y[np.array([0,2,4]), np.array([1,2])]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-140-f4cd35e70141> in <module>()
----> 1 y[np.array([0,2,4]), np.array([1,2])]
ValueError: shape mismatch: objects cannot be broadcast to a single shape
How does this broadcasted array of shape (3,2) looks like?
The broadcasting is more like:
In [280]: y[np.array([0,2,4])[...,None], np.array([1,2])]
Out[280]:
array([[ 1, 2],
[15, 16],
[29, 30]])
I added a dimension to [0,2,4] making it 2d. broadcast_arrays can be used to see what the broadcasted arrays look like:
In [281]: np.broadcast_arrays(np.array([0,2,4])[...,None], np.array([1,2]))
Out[281]:
[array([[0, 0],
[2, 2],
[4, 4]]),
array([[1, 2],
[1, 2],
[1, 2]])]
np.broadcast_arrays([[0],[2],[4]], [1,2]) is the samething without the array wrappers. np.meshgrid([0,2,4], [1,2], indexing='ij') is another way of producing these indexing arrays.
(the lists produced by meshgrid or broadcast_arrays could be used as the argument for y[_].)
So it's right to say [1,2] is broadcast with the index array, but it omits the bit about adjusting dimensions.
A little earlier they have this example:
y[np.array([0,2,4])]
which is equivalent to y[np.array([0,2,4]), :]. It picks 3 rows, and all items from them. The 1:3 case can be thought of as an extension of this, picking 3 rows, and then 2 columns.
y[[0,2,4],:][:,1:3]
This might be a better way of thinking about the indexing if broadcasting is too confusing.
There's another docs page that might handle this better
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
In this docs, basic indexing involves slices and integers
y[:,1:3], y[1,:], y[1, 1:3]
Advanced indexing involves an array (or list)
y[[0,2,4],:]
This produces the same result as y[::2,:], except the list case produces a copy, the slice (basic) a view.
y[[0,2,4], [1,2,3]] is a case of pure advance index array indexing, the result is 3 items, ones at (0,1), (2,2), and (4,3).
y[[0,2,4], 1:3] is a case that this docs calls Combining advanced and basic indexing, 'advanced' from `[0,2,4]', basic from '1:3'.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
Looking at a more complex index array might add some insight.
In [222]: i=[[0,2],[1,4]]
Used with another list, it is 'pure' advanced, and the result is broadcasted:
In [224]: y[i, [1,2]]
Out[224]:
array([[ 1, 16],
[ 8, 30]])
The index arrays are:
In [234]: np.broadcast_arrays(i, [1,2])
Out[234]:
[array([[0, 2],
[1, 4]]),
array([[1, 2],
[1, 2]])]
The [1,2] list is just expanded to a (2,2) array.
Using it with a slice is an example of this mixed advanced/basic, and the result is 3d (2,2,2).
In [223]: y[i, 1:3]
Out[223]:
array([[[ 1, 2],
[15, 16]],
[[ 8, 9],
[29, 30]]])
The equivalent with broadcasting is
y[np.array(i)[...,None], [1,2]]
y[data,beginIndex:endIndex]
import numpy as np
y = np.arange(35).reshape(5,7)
print(y)
[[ 0 1 2 3 4 5 6]
[ 7 8 9 10 11 12 13]
[14 15 16 17 18 19 20]
[21 22 23 24 25 26 27]
[28 29 30 31 32 33 34]]
print(y[np.array([0,2,4]),1:3])
[[ 1 2]
[15 16]
[29 30]]
You're right that the documentation may be incorrect here, or at least something is missing. I'd file an issue for that, for clarification in the documentation.
In fact, this part of the documentation shows just this example, but then with the exception you get being raised:
>>> y[np.array([0,2,4]), np.array([0,1])]
<type 'exceptions.ValueError'>: shape mismatch: objects cannot be
broadcast to a single shape

Categories