Let's say I have data for 3 variable pairs, A, B, and C (in my actual application the number of variables is anywhere from 1000-3000 but could be even higher).
Let's also say that there are pieces of the data that come in arrays.
For example:
Array X:
np.array([[ 0., 2., 3.],
[ -2., 0., 4.],
[ -3., -4., 0.]])
Where:
X[0,0] = corresponds to data for variables A and A
X[0,1] = corresponds to data for variables A and B
X[0,2] = corresponds to data for variables A and C
X[1,0] = corresponds to data for variables B and A
X[1,1] = corresponds to data for variables B and B
X[1,2] = corresponds to data for variables B and C
X[2,0] = corresponds to data for variables C and A
X[2,1] = corresponds to data for variables C and B
X[2,2] = corresponds to data for variables C and C
Array Y:
np.array([[2,12],
[-12, 2]])
Y[0,0] = corresponds to data for variables A and C
Y[0,1] = corresponds to data for variables A and B
Y[1,0] = corresponds to data for variables B and A
Y[1,1] = corresponds to data for variables C and A
Array Z:
np.array([[ 99, 77],
[-77, -99]])
Z[0,0] = corresponds to data for variables A and C
Z[0,1] = corresponds to data for variables B and C
Z[1,0] = corresponds to data for variables C and B
Z[1,1] = corresponds to data for variables C and A
I want to concatenate the above arrays keeping the variable position fixed as follows:
END_RESULT_ARRAY index 0 corresponds to variable A
END_RESULT_ARRAY index 1 corresponds to variable B
END_RESULT_ARRAY index 2 corresponds to variable C
Basically, there are N variables in the universe but can change every month (new ones can be introduced and existing ones can drop out and then return or never return). Within the N variables in the universe I compute permutations pairs and the positioning of each variable is fixed i.e. index 0 corresponds to variable A, index = 1 corresponds to variable B (as described above).
Given the above requirement the end END_RESULT_ARRAY should look like the following:
array([[[ 0., 2., 3.],
[ -2., 0., 4.],
[ -3., -4., 0.]],
[[ nan, 12., 2.],
[-12., nan, nan],
[ 2., nan, nan]],
[[ nan, nan, 99.],
[ nan, nan, 77.],
[-99., -77., nan]]])
Keep in mind that the above is an illustration.
In my actual application, I have about 125 arrays and a new one is generated every month. Each monthly array may have different sizes and may only have data for a portion of the variables defined in my universe. Also, as new arrays are created each month there is no way of knowing what its size will be or which variables will have data (or which ones will be missing).
So up until the most recent monthly array, we can determine the max size from the available historical data. Each month we will have to re-check the max size of all the arrays as a new array comes available. Once we have the max size we can then re-stitch/concatenate all the arrays together IF THIS IS SOMETHING THAT IS DOABLE in numpy. This will be an on-going operation done every month.
I want a general mechanism to be able to stitch these arrays together keeping the requirements I describe regarding the index position for the variables fixed.
I actually want to use H5PY arrays as my data set will grow exponentially not too distant future. However, I would like to get this working with numpy as a first step.
Based on the comment made by #user3483203. The next step is to concatenate the arrays.
a = np.array([[ 0., 2., 3.],
[ -2., 0., 4.],
[ -3., -4., 0.]])
b = np.array([[0,12], [-12, 0]])
out = np.full_like(a, np.nan); i, j = b.shape; out[:i, :j] = b
res = np.array([a, out])
print (res)
This answers the original question which has since been changed:
Lets say I have the following arrays:
np.array([[ 0., 2., 3.],
[ -2., 0., 4.],
[ -3., -4., 0.]])
np.array([[0,12],
[-12, 0]])
I want to concatenate the above 2 arrays such that the end result is
as follows:
array([[[0, 2, 3],
[-2, 0, 4],
[-3,-4, 0]],
[[0,12, np.nan],
[-12, 0, np.nan],
[np.nan, np.nan, np.nan]]])
Find out how much each array exceeds the max size in each dimension, then use np.pad to pad at the end of each dimension, then finally np.stack to stack them together:
import numpy as np
a = np.arange(12).reshape(4,3).astype(np.float)
b = np.arange(4).reshape(1,4).astype(np.float)
arrs = (a,b)
dims = len(arrs[0].shape)
maxshape = tuple( max(( x.shape[i] for x in arrs)) for i in range(dims))
paddedarrs = ( np.pad(x, tuple((0, maxshape[i]-x.shape[i]) for i in range(dims)), 'constant', constant_values=(np. nan,)) for x in (a,b))
c = np.stack(paddedarrs,0)
print (a)
print(b,"\n======================")
print(c)
[[ 0. 1. 2.]
[ 3. 4. 5.]
[ 6. 7. 8.]
[ 9. 10. 11.]]
[[0. 1. 2. 3.]]
======================
[[[ 0. 1. 2. nan]
[ 3. 4. 5. nan]
[ 6. 7. 8. nan]
[ 9. 10. 11. nan]]
[[ 0. 1. 2. 3.]
[nan nan nan nan]
[nan nan nan nan]
[nan nan nan nan]]]
Related
I would like to create in python (using numpy) an upper triangular matrix in the form:
[[ 1, c, c^2],
[ 0, 1, c ],
[ 0, 0, 1 ]])
where c is a rational number and the rank of the matrix may vary (2, 3, 4, ...). Is there any smart way to do it other than creating rows and stacking them?
r = 3
c = 3
i,j = np.indices((r,r))
np.triu(float(c)**(j-i))
Result:
array([[1., 3., 9.],
[0., 1., 3.],
[0., 0., 1.]])
There are probably more straightforward solutions but this is what I came up with:
import numpy as np
c=5
m=np.triu(c**np.triu(np.ones((3,3)), 1).cumsum(axis =1))
print(m)
output:
[[ 1. 5. 25.]
[ 0. 1. 5.]
[ 0. 0. 1.]]
I am using the following code and getting an output numpy ndarray of size (2,9) that I am then trying to reshape into size (3,3,2). My hope was that calling reshape using (3,3,2) as the dimensions of the new array would take each row of the 2x9 array and shape it into a 3x3 array and wrap these two 3x3 arrays into another array.
For instance, when I index the result I would like the following behavior:
input: print(result)
output: [[ 2. 2. 1. 0. 8. 5. 2. 4. 5.]
[ 4. 7. 5. 6. 4. 3. -3. 2. 1.]]
result = result.reshape((3,3,2))
DESIRED NEW BEHAVIOR
input: print(result[:,:,0])
output: [[2. 2. 1.]
[0. 8. 5.]
[2. 4. 5.]]
input: print(result[:,:,1])
output: [[ 4. 7. 5.]
[ 6. 4. 3.]
[-3. 2. 1.]]
ACTUAL NEW BEHAVIOR
input: print(result[:,:,0])
output: [[2. 1. 8.]
[2. 5. 7.]
[6. 3. 2.]]
input: print(result[:,:,1])
output: [[ 2. 0. 5.]
[ 4. 4. 5.]
[ 4. -3. 1.]]
Is there a way to specify to reshape that I would like to go row by row along the depth dimension? I'm very confused as to why numpy by default makes the choice it does for reshape.
Here is the code I am using to produce result matrix, this code may or may not be necessary to analyze my issue. I feel as if it will not be necessary but am including it for completeness:
import numpy as np
# im2col implementation assuming width/height dimensions of filter and input_vol
# are the same (i.e. input_vol_width is equal to input_vol_height and the same
# for the filter spatial dimensions, although input_vol_width need not equal
# filter_vol_width)
def im2col(input, filters, input_vol_dims, filter_size_dims, stride):
receptive_field_size = 1
for dim in filter_size_dims:
receptive_field_size *= dim
output_width = output_height = int((input_vol_dims[0]-filter_size_dims[0])/stride + 1)
X_col = np.zeros((receptive_field_size,output_width*output_height))
W_row = np.zeros((len(filters),receptive_field_size))
pos = 0
for i in range(0,input_vol_dims[0]-1,stride):
for j in range(0,input_vol_dims[1]-1,stride):
X_col[:,pos] = input[i:i+stride+1,j:j+stride+1,:].ravel()
pos += 1
for i in range(len(filters)):
W_row[i,:] = filters[i].ravel()
bias = np.array([[1], [0]])
result = np.dot(W_row, X_col) + bias
print(result)
if __name__ == '__main__':
x = np.zeros((7, 7, 3))
x[:,:,0] = np.array([[0,0,0,0,0,0,0],
[0,1,1,0,0,1,0],
[0,2,2,1,1,1,0],
[0,2,0,2,1,0,0],
[0,2,0,0,1,0,0],
[0,0,0,1,1,0,0],
[0,0,0,0,0,0,0]])
x[:,:,1] = np.array([[0,0,0,0,0,0,0],
[0,2,0,1,0,2,0],
[0,0,1,2,1,0,0],
[0,2,0,0,2,0,0],
[0,2,1,0,0,0,0],
[0,1,2,2,2,0,0],
[0,0,0,0,0,0,0]])
x[:,:,2] = np.array([[0,0,0,0,0,0,0],
[0,0,0,2,1,1,0],
[0,0,0,2,2,0,0],
[0,2,1,0,2,2,0],
[0,0,1,2,1,2,0],
[0,2,0,0,2,1,0],
[0,0,0,0,0,0,0]])
w0 = np.zeros((3,3,3))
w0[:,:,0] = np.array([[1,1,0],
[1,-1,1],
[-1,1,1]])
w0[:,:,1] = np.array([[-1,-1,0],
[1,-1,1],
[1,-1,-1]])
w0[:,:,2] = np.array([[0,0,0],
[0,0,1],
[1,0,1]]
w1 = np.zeros((3,3,3))
w1[:,:,0] = np.array([[0,-1,1],
[1,1,0],
[1,1,0]])
w1[:,:,1] = np.array([[-1,-1,1],
[1,0,1],
[0,1,1]])
w1[:,:,2] = np.array([[-1,-1,0],
[1,-1,0],
[1,1,0]])
filters = np.array([w0,w1])
im2col(x,np.array([w0,w1]),x.shape,w0.shape,2)
Let's reshape a bit differently and then do a depth-wise dstack:
arr = np.dstack(result.reshape((-1,3,3)))
arr[..., 0]
array([[2., 2., 1.],
[0., 8., 5.],
[2., 4., 5.]])
Reshape keeps the original order of the elements
In [215]: x=np.array(x)
In [216]: x.shape
Out[216]: (2, 9)
Reshaping the size 9 dimension into a 3x3 keeps the element order that you want:
In [217]: x.reshape(2,3,3)
Out[217]:
array([[[ 2., 2., 1.],
[ 0., 8., 5.],
[ 2., 4., 5.]],
[[ 4., 7., 5.],
[ 6., 4., 3.],
[-3., 2., 1.]]])
But you have to index it with [0,:,:] to see one of those blocks.
To see the same blocks with [:,:,0], you have to move that size 2 dimension to the end. COLDSPEED's dstack does that by iterating on the first dimension, and joining the 2 blocks (each 3x3) on a new third dimension). Another way is to use transpose to reorder the dimensions:
In [218]: x.reshape(2,3,3).transpose(1,2,0)
Out[218]:
array([[[ 2., 4.],
[ 2., 7.],
[ 1., 5.]],
[[ 0., 6.],
[ 8., 4.],
[ 5., 3.]],
[[ 2., -3.],
[ 4., 2.],
[ 5., 1.]]])
In [219]: y = _
In [220]: y.shape
Out[220]: (3, 3, 2)
In [221]: y[:,:,0]
Out[221]:
array([[2., 2., 1.],
[0., 8., 5.],
[2., 4., 5.]])
I was going through NumPy documentation, and am not able to understand one point. It mentions, for the example below, the array has rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3.
[[ 1., 0., 0.],
[ 0., 1., 2.]]
How does the first dimension (axis) have a length of 2?
Edit:
The reason for my confusion is the below statement in the documentation.
The coordinates of a point in 3D space [1, 2, 1] is an array of rank
1, because it has one axis. That axis has a length of 3.
In the original 2D ndarray, I assumed that the number of lists identifies the rank/dimension, and I wrongly assumed that the length of each list denotes the length of each dimension (in that order). So, as per my understanding, the first dimension should be having a length of 3, since the length of the first list is 3.
In numpy, axis ordering follows zyx convention, instead of the usual (and maybe more intuitive) xyz.
Visually, it means that for a 2D array where the horizontal axis is x and the vertical axis is y:
x -->
y 0 1 2
| 0 [[1., 0., 0.],
V 1 [0., 1., 2.]]
The shape of this array is (2, 3) because it is ordered (y, x), with the first axis y of length 2.
And verifying this with slicing:
import numpy as np
a = np.array([[1, 0, 0], [0, 1, 2]], dtype=np.float)
>>> a
Out[]:
array([[ 1., 0., 0.],
[ 0., 1., 2.]])
>>> a[0, :] # Slice index 0 of first axis
Out[]: array([ 1., 0., 0.]) # Get values along second axis `x` of length 3
>>> a[:, 2] # Slice index 2 of second axis
Out[]: array([ 0., 2.]) # Get values along first axis `y` of length 2
You may be confusing the other sentence with the picture example below. Think of it like this: Rank = number of lists in the list(array) and the term length in your question can be thought of length = the number of 'things' in the list(array)
I think they are trying to describe to you the definition of shape which is in this case (2,3)
in that post I think the key sentence is here:
In NumPy dimensions are called axes. The number of axes is rank.
If you print the numpy array
print(np.array([[ 1. 0. 0.],[ 0. 1. 2.]])
You'll get the following output
#col1 col2 col3
[[ 1. 0. 0.] # row 1
[ 0. 1. 2.]] # row 2
Think of it as a 2 by 3 matrix... 2 rows, 3 columns. It is a 2d array because it is a list of lists. ([[ at the start is a hint its 2d)).
The 2d numpy array
np.array([[ 1. 0., 0., 6.],[ 0. 1. 2., 7.],[3.,4.,5,8.]])
would print as
#col1 col2 col3 col4
[[ 1. 0. , 0., 6.] # row 1
[ 0. 1. , 2., 7.] # row 2
[3., 4. , 5., 8.]] # row 3
This is a 3 by 4 2d array (3 rows, 4 columns)
The first dimensions is the length:
In [11]: a = np.array([[ 1., 0., 0.], [ 0., 1., 2.]])
In [12]: a
Out[12]:
array([[ 1., 0., 0.],
[ 0., 1., 2.]])
In [13]: len(a) # "length of first dimension"
Out[13]: 2
The second is the length of each "row":
In [14]: [len(aa) for aa in a] # 3 is "length of second dimension"
Out[14]: [3, 3]
Many numpy functions take axis as an argument, for example you can sum over an axis:
In [15]: a.sum(axis=0)
Out[15]: array([ 1., 1., 2.])
In [16]: a.sum(axis=1)
Out[16]: array([ 1., 3.])
The thing to note is that you can have higher dimensional arrays:
In [21]: b = np.array([[[1., 0., 0.], [ 0., 1., 2.]]])
In [22]: b
Out[22]:
array([[[ 1., 0., 0.],
[ 0., 1., 2.]]])
In [23]: b.sum(axis=2)
Out[23]: array([[ 1., 3.]])
Keep the following points in mind when considering Numpy axes:
Each sub-level of a list (or array) represents an axis. For example:
import numpy as np
a = np.array([1,2]) # 1 axis
b = np.array([[1,2],[3,4]]) # 2 axes
c = np.array([[[1,2],[3,4]],[[5,6],[7,8]]]) # 3 axes
Axis labels correspond to the level of the sub-list they represent, starting with axis 0 for the outer most list.
To illustrate this, consider the following array of different shape, each with 24 elements:
# 1D Array
a0 = np.array(
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
)
a0.shape # (24,) - here, the length along the 0-axis is 24
# 2D Array
a01 = np.array(
[
[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, 2.4],
[3.1, 3.2, 3.3, 3.4],
[4.1, 4.2, 4.3, 4.4],
[5.1, 5.2, 5.3, 5.4],
[6.1, 6.2, 6.3, 6.4]
]
)
a01.shape # (6, 4) - now, the length along the 0-axis is 6
# 3D Array
a012 = np.array(
[
[
[1.1.1, 1.1.2],
[1.2.1, 1.2.2],
[1.3.1, 1.3.2]
],
[
[2.1.1, 2.1.2],
[2.2.1, 2.2.2],
[2.3.1, 2.3.2]
],
[
[3.1.1, 3.1.2],
[3.2.1, 3.2.2],
[3.3.1, 3.3.2]
],
[
[4.1.1, 4.1.2],
[4.2.1, 4.2.2],
[4.3.1, 4.3.2]
]
)
a012.shape # (4, 3, 2) - and finally, the length along the 0-axis is 4
Hi I want to join multiple arrays in python, using numpy to form multidimensional arrays, it's inside of a for loop, this is a pseudocode
import numpy as np
h = np.zeros(4)
for x in range(3):
x1 = some array of length of 4 returned from a previous function (3,5,6,7)
h = np.concatenate((h,x1), axis =0)
The first iteration goes fine, but during the second iteration on the for loop I get the following error,
ValueError: all the input arrays must have same number of dimensions
The output array should look something like this
[[0,0,0,0],[3,5,6,7],[6,3,6,7]]
etc
So how can I join the arrays?
Thanks
You need to use vstack. It allows you to stack arrays. You take a sequence of arrays and stack them vertically to make a single array
import numpy as np
h = np.zeros(4)
for x in range(3):
x1 = [3,5,6,7]
h = np.vstack((h,x1))
# not h = np.concatenate((h,x1), axis =0)
print h
Output:
[[ 0. 0. 0. 0.]
[ 3. 5. 6. 7.]
[ 3. 5. 6. 7.]
[ 3. 5. 6. 7.]]
more edits later.
If you do want to use cocatenate only, you can do the following way as well:
import numpy as np
h1 = np.zeros(4)
for x in range(3):
x1 = np.array([3,5,6,7])
h1= np.concatenate([h1,x1.T], axis =0)
print h1.shape
print h1.reshape(4,4)
Output:
(16,)
[[ 0. 0. 0. 0.]
[ 3. 5. 6. 7.]
[ 3. 5. 6. 7.]
[ 3. 5. 6. 7.]]
Both have different applications. You can choose according to your need.
There are multiple ways of doing this. I'll list a few examples:
First, we import numpy and define a function that generates those arrays of length 4.
import numpy as np
def previous_function_returning_array_of_length_4(x):
return np.array(range(4)) + x
The first way involves creating a list of arrays, then calling numpy.array() to convert the list to a 2D array.
h0 = np.zeros(4)
arrays = [h0]
for x in range(3):
x1 = previous_function_returning_array_of_length_4(x)
arrays.append(x1)
h = np.array(arrays)
You can do the same with np.vstack():
h0 = np.zeros(4)
arrays = [h0]
for x in range(3):
x1 = previous_function_returning_array_of_length_4(x)
arrays.append(x1)
h = np.vstack(arrays)
Alternatively, if you know how many arrays you are going to create, you can create the 2D array first and fill in the values:
h = np.zeros((4, 4))
for ii in range(3):
x1 = previous_function_returning_array_of_length_4(ii)
h[ii + 1, ...] = x1
There are more ways, but hopefully, this will give you an idea of what to do.
It is best to collect values in a list, and perform the concatenate or array creation once, at the end.
h = [np.zeros(4)]
for x in range(3):
x1 = some array of length of 4 returned from a previous function (3,5,6,7)
h = h.append(x1)
h = np.array(h)
# or h = np.vstack(h)
All the concatenate/stack/array functions takes a list of multiple items. It is faster to append to a list than to do a concatenate of 2 items.
======================
Let's try your approach step by step:
In [189]: h=np.zeros(4)
In [190]: h
Out[190]: array([ 0., 0., 0., 0.]) # 1d array (4,) shape
In [191]: x1=np.array([3,5,6,7]) # another 1d
In [192]: h1=np.concatenate((h,x1),axis=0)
In [193]: h1
Out[193]: array([ 0., 0., 0., 0., 3., 5., 6., 7.])
In [194]: h1.shape
Out[194]: (8,) # also a 1d array, but with 8 items
In [195]: x1=np.array([6,3,6,7])
In [196]: h1=np.concatenate((h1,x1),axis=0)
In [197]: h1
Out[197]: array([ 0., 0., 0., 0., 3., 5., 6., 7., 6., 3., 6., 7.])
In this case I'm adding (4,) arrays one after the other, still getting a 1d array.
If I go back an create x1 as 2d (1,4):
In [198]: h=np.zeros(4)
In [199]: x1=np.array([[6,3,6,7]])
In [200]: h1=np.concatenate((h,x1),axis=0)
...
ValueError: all the input arrays must have same number of dimensions
I get this dimension error right away.
The fact that you get the error on the 2nd iteration suggests that the 1st x1 is (4,), but the 2nd is 2d.
When you have dimensions errors like this, check the shapes.
vstack adds dimensions to the inputs, as needed, so you can build 2d arrays:
In [207]: h=np.zeros(4)
In [208]: x1=np.array([3,5,6,7])
In [209]: h=np.vstack((h,x1))
In [210]: h
Out[210]:
array([[ 0., 0., 0., 0.],
[ 3., 5., 6., 7.]])
In [211]: x1=np.array([6,3,6,7])
In [212]: h=np.vstack((h,x1))
In [213]: h
Out[213]:
array([[ 0., 0., 0., 0.],
[ 3., 5., 6., 7.],
[ 6., 3., 6., 7.]])
I have two arrays A and B:
A=array([[ 5., 5., 5.],
[ 8., 9., 9.]])
B=array([[ 1., 1., 2.],
[ 3., 2., 1.]])
Anywhere there is a "1" in B I want to sum the same row and column locations in A.
So for example for this one the answer would be 5+5+9=10
I would want this to continue for 2,3....n (all unique values in B)
So for the 2's... it would be 9+5=14 and for the 3's it would be 8
I found the unique values by using:
numpy.unique(B)
I realize this make take multiple steps but I can't really wrap my head around using the index matrix to sum those locations in another matrix.
For each unique value x, you can do
A[B == x].sum()
Example:
>>> A[B == 1.0].sum()
19.0
I thinknumpy.bincount is what you want. If B is an array of small integers like in you example you can do something like this:
import numpy
A = numpy.array([[ 5., 5., 5.],
[ 8., 9., 9.]])
B = numpy.array([[ 1, 1, 2],
[ 3, 2, 1]])
print numpy.bincount(B.ravel(), weights=A.ravel())
# [ 0. 19. 14. 8.]
or if B has anything but small integers you can do something like this
import numpy
A = numpy.array([[ 5., 5., 5.],
[ 8., 9., 9.]])
B = numpy.array([[ 1., 1., 2.],
[ 3., 2., 1.]])
uniqB, inverse = numpy.unique(B, return_inverse=True)
print uniqB, numpy.bincount(inverse, weights=A.ravel())
# [ 1. 2. 3.] [ 19. 14. 8.]
[(val, np.sum(A[B==val])) for val in np.unique(B)] gives you a list of tuples where the first element is one of the unique values in B, and the second element is the sum of elements in A where the corresponding value in B is that value.
>>> [(val, np.sum(A[B==val])) for val in np.unique(B)]
[(1.0, 19.0), (2.0, 14.0), (3.0, 8.0)]
The key is that you can use A[B==val] to access items in A at positions where B equals val.
Edit: If you just want the sums, just do [np.sum(A[B==val]) for val in np.unique(B)].
I'd use numpy masked arrays. These are standard numpy arrays with a mask associated with them blocking off certain values. The process is pretty straight forward, create a masked array using
numpy.ma.masked_array(data, mask)
where mask is generated by using a masked function
mask = numpy.ma.masked_not_equal(B, 1).mask
and data is A
for i in numpy.unique(B):
print numpy.ma.masked_array(A, numpy.ma.masked_not_equal(B, i).mask).sum()
19.0
14.0
8.0
i found old question here
one of the answer
def sum_by_group(values, groups):
order = np.argsort(groups)
groups = groups[order]
values = values[order]
values.cumsum(out=values)
index = np.ones(len(groups), 'bool')
index[:-1] = groups[1:] != groups[:-1]
values = values[index]
groups = groups[index]
values[1:] = values[1:] - values[:-1]
return values, groups
in your case, you can flatten your array
aflat = A.flatten()
bflat = B.flatten()
sum_by_group(aflat, bflat)