Python: Resize an existing array and fill with zeros - python

I think that my issue should be really simple, yet I can not find any help
on the Internet whatsoever. I am very new to Python, so it is possible that
I am missing something very obvious.
I have an array, S, like this [x x x] (one-dimensional). I now create a
diagonal matrix, sigma, with np.diag(S) - so far, so good. Now, I want to
resize this new diagonal array so that I can multiply it by another array that
I have.
import numpy as np
...
shape = np.shape((6, 6)) #This will be some pre-determined size
sigma = np.diag(S) #diagonalise the matrix - this works
my_sigma = sigma.resize(shape) #Resize the matrix and fill with zeros - returns "None" - why?
However, when I print the contents of my_sigma, I get "None". Can someone please
point me in the right direction, because I can not imagine that this should be
so complicated.
Thanks in advance for any help!
Casper
Graphical:
I have this:
[x x x]
I want this:
[x 0 0]
[0 x 0]
[0 0 x]
[0 0 0]
[0 0 0]
[0 0 0] - or some similar size, but the diagonal elements are important.

There is a new numpy function in version 1.7.0 numpy.pad that can do this in one-line. Like the other answers, you can construct the diagonal matrix with np.diag before the padding.
The tuple ((0,N),(0,0)) used in this answer indicates the "side" of the matrix which to pad.
import numpy as np
A = np.array([1, 2, 3])
N = A.size
B = np.pad(np.diag(A), ((0,N),(0,0)), mode='constant')
B is now equal to:
[[1 0 0]
[0 2 0]
[0 0 3]
[0 0 0]
[0 0 0]
[0 0 0]]

sigma.resize() returns None because it operates in-place. np.resize(sigma, shape), on the other hand, returns the result but instead of padding with zeros, it pads with repeats of the array.
Also, the shape() function returns the shape of the input. If you just want to predefine a shape, just use a tuple.
import numpy as np
...
shape = (6, 6) #This will be some pre-determined size
sigma = np.diag(S) #diagonalise the matrix - this works
sigma.resize(shape) #Resize the matrix and fill with zeros
However, this will first flatten out your original array, and then reconstruct it into the given shape, destroying the original ordering. If you just want to "pad" with zeros, instead of using resize() you can just directly index into a generated zero-matrix.
# This assumes that you have a 2-dimensional array
zeros = np.zeros(shape, dtype=np.int32)
zeros[:sigma.shape[0], :sigma.shape[1]] = sigma

I see the edit... you do have to create the zeros first and then move some numbers into it. np.diag_indices_from might be useful for you
bigger_sigma = np.zeros(shape, dtype=sigma.dtype)
diag_ij = np.diag_indices_from(sigma)
bigger_sigma[diag_ij] = sigma[diag_ij]

This solution works with resize function
Take a sample array
S= np.ones((3))
print (S)
# [ 1. 1. 1.]
d= np.diag(S)
print(d)
"""
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]
"""
This dosent work, it just add a repeating values
np.resize(d,(6,3))
"""
adds a repeating value
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
"""
This does work
d.resize((6,3),refcheck=False)
print(d)
"""
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
"""

Another pure python solution is
a = [1, 2, 3]
b = []
for i in range(6):
b.append((([0] * i) + a[i:i+1] + ([0] * (len(a) - 1 - i)))[:len(a)])
b is now
[[1, 0, 0], [0, 2, 0], [0, 0, 3], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
it's a hideous solution, I'll admit that.
However, it illustrates some functions of the list type that can be used.

Related

Finding the average of the x component of an array of coordinates, based on the y component

I have the following example array of x-y coordinate pairs:
A = np.array([[0.33703753, 3.],
[0.90115394, 5.],
[0.91172016, 5.],
[0.93230994, 3.],
[0.08084283, 3.],
[0.71531777, 2.],
[0.07880787, 3.],
[0.03501083, 4.],
[0.69253184, 4.],
[0.62214452, 3.],
[0.26953094, 1.],
[0.4617873 , 3.],
[0.6495549 , 0.],
[0.84531478, 4.],
[0.08493308, 5.]])
My goal is to reduce this to an array with six rows by taking the average of the x-values for each y-value, like so:
array([[0.6495549 , 0. ],
[0.26953094, 1. ],
[0.71531777, 2. ],
[0.41882167, 3. ],
[0.52428582, 4. ],
[0.63260239, 5. ]])
Currently I am achieving this by converting to a pandas dataframe, performing the calculation, and converting back to a numpy array:
>>> df = pd.DataFrame({'x':A[:, 0], 'y':A[:, 1]})
>>> df.groupby('y').mean().reset_index()
y x
0 0.0 0.649555
1 1.0 0.269531
2 2.0 0.715318
3 3.0 0.418822
4 4.0 0.524286
5 5.0 0.632602
Is there a way to perform this calculation using numpy, without having to resort to the pandas library?
Here's a completely vectorized solution that only uses numpy methods and no python iteration:
sort_indices = np.argsort(A[:, 1])
unique_y, unique_indices, group_count = np.unique(A[sort_indices, 1], return_index=True, return_counts=True)
Once we have the indices and counts of all the unique elements, we can use the np.ufunc.reduceat method to collect the results of np.add for each group, and then divide by their counts to get the mean:
group_sum = np.add.reduceat(A[sort_indices, :], unique_indices, axis=0)
group_mean = group_sum / group_count[:, None]
# array([[0.6495549 , 0. ],
# [0.26953094, 1. ],
# [0.71531777, 2. ],
# [0.41882167, 3. ],
# [0.52428582, 4. ],
# [0.63260239, 5. ]])
Benchmarks:
Comparing this solution with the other answers here (Code at tio.run) for
A contains 10k rows, with A[:, 1] containing N groups, N varies from 1 to 10k
A contains N rows (N varies from 1 to 10k), with A[:, 1] containing min(N, 1000) groups
Observations:
The numpy-only solutions (Dani's and mine) win easily -- they are significantly faster than the pandas approach (possibly since the time taken to create the dataframe is an overhead that doesn't exist for the former).
The pandas solution is slower than the python+numpy solutions (Jaimu's and mine) for smaller arrays, since it's faster to just iterate in python and get it over with than to create a dataframe first, but these solutions become much slower than pandas as the array size or number of groups increases.
Note: The previous version of this answer iterated over the groups as returned by the accepted answer to Is there any numpy group by function? and individually calculated the mean:
First, we need to sort the array on the column you want to group by
A_s = A[A[:, 1].argsort(), :]
Then, run that snippet. np.split splits its first argument at the indices given by the second argument.
unique_elems, unique_indices = np.unique(A_s[:, 1], return_index=True)
# (array([0., 1., 2., 3., 4., 5.]), array([ 0, 1, 2, 3, 9, 12]))
split_indices = unique_indices[1:] # No need to split at the first index
groups = np.split(A_s, split_indices)
# [array([[0.6495549, 0. ]]),
# array([[0.26953094, 1. ]]),
# array([[0.71531777, 2. ]]),
# array([[0.33703753, 3. ],
# [0.93230994, 3. ],
# [0.08084283, 3. ],
# [0.07880787, 3. ],
# [0.62214452, 3. ],
# [0.4617873 , 3. ]]),
# array([[0.03501083, 4. ],
# [0.69253184, 4. ],
# [0.84531478, 4. ]]),
# array([[0.90115394, 5. ],
# [0.91172016, 5. ],
# [0.08493308, 5. ]])]
Now, groups is a list containing multiple np.arrays. Iterate over the list and mean each array:
means = np.zeros((len(groups), groups[0].shape[1]))
for i, grp in enumerate(groups):
means[i, :] = grp.mean(axis=0)
# array([[0.6495549 , 0. ],
# [0.26953094, 1. ],
# [0.71531777, 2. ],
# [0.41882167, 3. ],
# [0.52428582, 4. ],
# [0.63260239, 5. ]])
Here is a work around using numpy.
unique_ys, indices = np.unique(A[:, 1], return_inverse=True)
result = np.empty((unique_ys.shape[0], 2))
for i, y in enumerate(unique_ys):
result[i, 0] = np.mean(A[indices == i, 0])
result[i, 1] = y
print(result)
Alternative:
To make the code more pythonic, you can use a list comprehension to create the result array, instead of using a for loop.
unique_ys, indices = np.unique(A[:, 1], return_inverse=True)
result = np.array([[np.mean(A[indices == i, 0]), y] for i, y in enumerate(unique_ys)])
print(result)
Output:
[[0.6495549 0. ]
[0.26953094 1. ]
[0.71531777 2. ]
[0.41882167 3. ]
[0.52428582 4. ]
[0.63260239 5. ]]
Use np.bincount + np.unique:
sums = np.bincount(A[:, 1].astype(np.int64), weights=A[:, 0])
values, counts = np.unique(A[:, 1], return_counts=True)
res = np.vstack((sums / counts, values)).T
print(res)
Output
[[0.6495549 0. ]
[0.26953094 1. ]
[0.71531777 2. ]
[0.41882167 3. ]
[0.52428582 4. ]
[0.63260239 5. ]]
If you know the y values beforehand, you could try to match the array for each:
for example:
A[(A[:,1]==1),0] will give you all the x values where the y value is equal to 1.
So you could go through each value of y, sum the A[:,1]==y[n] to get the number of matches, sum the x values that match, divide to make the average, and place in a new array:
B=np.zeros([6,2])
for i in range( 6):
nmatch=sum(A[:,1]==i)
nsum=sum(A[(A[:,1]==i),0])
B[i,0]=i
B[i,1]=nsum/nmatch
There must be a more pythonic way of doing this ....

Returning list of arrays from a function having as argument a vector

I have a function such as:
def f(x):
A =np.array([[0, 1],[0, -1/x]]);
return A
If I use an scalar I will obtain:
>>x=1
>>f(x)
array([[ 0., 1.],
[ 0., -1.]])
and if I use an array as an input, I will obtain:
>>x=np.linspace(1,3,3)
>>f(x)
array([[0, 1],
[0, array([-1. , -0.5 , -0.33333333])]], dtype=object)
Actually I would like to obtain a list of array, namely:
A = [A_1,A_2, ..., A_n]
Right now I do not care much about if it is an array of arrays or a list that contain several arrays.
I know I can do that using a for loop in x. But I think there is probably another way to do it, and maybe more efficient.
So the output that I would like would be something like:
>>x=np.linspace(1,3,3)
>>r=f(x)
array([[[0, 1],[0,-1]],
[[0, 1],[0,-0.5]],
[[0, 1],[0,-0.33333]]])
>>r[0]
array([[0, 1],[0,-1]])
or something like
>>x=np.linspace(1,3,3)
>>r=f(x)
[array([[0, 1],[0,-1]]),
array([[0, 1],[0,-0.5]]),
array([[0, 1],[0,-0.33333]])]
>>r[0]
array([[0, 1],[0,-1]])
Thanks
In your function we could check
type of given parameter. Here, if x is type of np.ndarray we are going to create nested list which we desire, otherwise we'll return output as before.
import numpy as np
def f(x):
if isinstance(x, np.ndarray):
v = -1/x
A = np.array([[[0, 1],[0, i]] for i in v])
else:
A = np.array([[0, 1],[0, -1/x]])
return A
x = np.linspace(1,3,3)
print(f(x))
Output:
[[[ 0. 1. ]
[ 0. -1. ]]
[[ 0. 1. ]
[ 0. -0.5 ]]
[[ 0. 1. ]
[ 0. -0.33333333]]]
You can do something like:
import numpy as np
def f(x):
x = np.array([x]) if type(x)==float or type(x)==int else x
A = np.stack([np.array([[0, 1],[0, -1/i]]) for i in x]);
return A
The first line deal with the cases when x is an int or a float, since is not an iterable. Then:
f(1)
array([[[ 0., 1.],
[ 0., -1.]]])
f(np.linspace(1,3,3))
array([[[ 0. , 1. ],
[ 0. , -1. ]],
[[ 0. , 1. ],
[ 0. , -0.5 ]],
[[ 0. , 1. ],
[ 0. , -0.33333333]]])

How to vectorize increments in Python

I have a 2d array, and I have some numbers to add to some cells. I want to vectorize the operation in order to save time. The problem is when I need to add several numbers to the same cell. In this case, the vectorized code only adds the last.
'a' is my array, 'x' and 'y' are the coordinates of the cells I want to increment, and 'z' contains the numbers I want to add.
import numpy as np
a=np.zeros((4,4))
x=[1,2,1]
y=[0,1,0]
z=[2,3,1]
a[x,y]+=z
print(a)
As you see, a[1,0] should be incremented twice: one by 2, one by 1. So the expected array should be:
[[0. 0. 0. 0.]
[3. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
but instead I get:
[[0. 0. 0. 0.]
[1. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
The problem would be easy to solve with a for loop, but I wonder if I can correctly vectorize this operation.
Use np.add.at for that:
import numpy as np
a = np.zeros((4,4))
x = [1, 2, 1]
y = [0, 1, 0]
z = [2, 3, 1]
np.add.at(a, (x, y), z)
print(a)
# [[0. 0. 0. 0.]
# [3. 0. 0. 0.]
# [0. 3. 0. 0.]
# [0. 0. 0. 0.]]
When you're doing a[x,y]+=z, we can decompose the operations as :
a[1, 0], a[2, 1], a[1, 0] = [a[1, 0] + 2, a[2, 1] + 3, a[1, 0] + 1]
# Equivalent to :
a[1, 0] = 2
a[2, 1] = 3
a[1, 0] = 1
That's why it doesn't works.
But if you're incrementing your array with a loop for each dimention, it should work
You could create a multi-dimensional array of size 3x4x4, then add up z to all the 3 different dimensions and them sum them all
import numpy as np
x = [1,2,1]
y = [0,1,0]
z = [2,3,1]
a = np.zeros((3,4,4))
n = range(a.shape[0])
a[n,x,y] += z
print(sum(a))
which will result in
[[0. 0. 0. 0.]
[3. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
Approach #1: Bincount-based method for performance
We can use np.bincount for efficient bin-based summation and basically inspired by this post -
def accumulate_arr(x, y, z, out):
# Get output array shape
shp = out.shape
# Get linear indices to be used as IDs with bincount
lidx = np.ravel_multi_index((x,y),shp)
# Or lidx = coords[0]*(coords[1].max()+1) + coords[1]
# Accumulate arr with IDs from lidx
out += np.bincount(lidx,z,minlength=out.size).reshape(out.shape)
return out
If you are working with a zeros-initialized output array, feed in the output shape directly into the function and get the bincount output as the final one.
Output on given sample -
In [48]: accumulate_arr(x,y,z,a)
Out[48]:
array([[0., 0., 0., 0.],
[3., 0., 0., 0.],
[0., 3., 0., 0.],
[0., 0., 0., 0.]])
Approach #2: Using sparse-matrix for memory-efficiency
In [54]: from scipy.sparse import coo_matrix
In [56]: coo_matrix((z,(x,y)), shape=(4,4)).toarray()
Out[56]:
array([[0, 0, 0, 0],
[3, 0, 0, 0],
[0, 3, 0, 0],
[0, 0, 0, 0]])
If you are okay with a sparse-matrix, skip the .toarray() part for a memory-efficient solution.

Get output after matrix operation

I have a matrix A:
[[ 1 2]
[ 3 4]
[ 5 6]
[ 7 8]
[ 9 10]]
And I have matrix B:
[[1 0 0]
[0 1 0]
[1 0 0]
[0 0 1]
[0 1 0]]
And my desired Output is :
Matrix C:
[[1 0 0]
[0 3 0]
[5 0 0]
[0 0 7]
[0 9 0]]
i.e I would like to get first Column of Matrix A, and Substitute its values in Matrix B, where it says "1". Problem is that I need to do it using Matrix operations in Numpy, i.e without using Loops.
So far, I have done following. Please help me do it in easy steps
mat_A = np.array([[1,2],[3,4],[5,6],[7,8],[9,10]])
mat_B = np.array([[1,0,0],[0,1,0],[1,0,0],[0,0,1],[0,1,0]])
mat_A1 = np.zeros(mat_B.shape)
mat_A1[:mat_A.shape[0],:mat_A.shape[1]] = mat_A
mat_A1[:,1] = np.zeros(5)
print(mat_A1)
mat_A2 = np.zeros(mat_c.shape)
mat_A2[:mat_A.shape[0],:mat_A.shape[1]] = mat_A
mat_A2[:,0] = np.zeros(5)
print(mat_A2)
print(mat_B)
My Output is :
[[1. 0. 0.]
[3. 0. 0.]
[5. 0. 0.]
[7. 0. 0.]
[9. 0. 0.]]
[[ 0. 2. 0.]
[ 0. 4. 0.]
[ 0. 6. 0.]
[ 0. 8. 0.]
[ 0. 10. 0.]]
[[1 0 0]
[0 1 0]
[1 0 0]
[0 0 1]
[0 1 0]]
If I multiply, I get different output. Please help me get Matrix C.
I want to do it WITHOUT USING LOOP and only using numpy and matrix operations.
Here's a solution without the use of for loops:
import numpy as np
mat_A = np.array([[1,2],[3,4],[5,6],[7,8],[9,10]])
mat_B = np.array([[1,0,0],[0,1,0],[1,0,0],[0,0,1],[0,1,0]])
mat_C = mat_B.copy()
mask = (mat_C[...] == 1) #Create a mask
mat_C[mask] = mat_A[...,0] #Replace masked values by the ones in mat_A's first column
print(mat_C)
Create a mask and use it to index into mat_C to assign the values of the first column of mat_A to the 1's that were in mat_B.
You could do this..
C = np.zeros((B.shape))
for i in range(A.shape[0]):
C[i,:]=B[i,:]*A[i,0]
result:
array([[1., 0., 0.],
[0., 3., 0.],
[5., 0., 0.],
[0., 0., 7.],
[0., 9., 0.]])
you could also do this which is a bit more generalized if the data you are providing is just an example of data you are really working on...
replace_val = 1
for i in range(B.shape[0]):
for j in range(B.shape[1]):
if B[i,j] == replace_val:
C[i,j] = A[i,0]
same result
EDIT : this way works with no loops
vals_to_change = np.where(B==1)
C[vals_to_change] = A[vals_to_change[0],0]*B[vals_to_change]
same result

Where clause with numpy with single array and / or empty_like

I am trying to figure out how the np.where clause works. I create a simple df:
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0, 10, size=(3, 4)), columns=list('ABCD'))
print(df)
A B C D
0 5 8 9 5
1 0 0 1 7
2 6 9 2 4
Now when I implement:
print(np.where(df.values, 1, np.nan))
I receive:
[[ 1. 1. 1. 1.]
[ nan nan 1. 1.]
[ 1. 1. 1. 1.]]
But when I create an empty_like array from df: and put it into where clause I receive this:
print(np.where(np.empty_like(df.values), 1, np.nan))
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
Really could use help on explaining how where clause works on a single array.
np.empty_like()
Docs:-
numpy.empty_like(prototype, dtype=None, order='K', subok=True)
Return a new array with the same shape and type as a given array.
>>> a = ([1,2,3], [4,5,6]) # a is array-like
>>> np.empty_like(a)
array([[-1073741821, -1073741821, 3], #random
[ 0, 0, -1073741821]])
np.empty_like() creates an array of the same shape and type as the given array but with random numbers. This array now goes into np.where()
numpy.where()
Docs:-
numpy.where(condition[, x, y])
Return elements that are chosen from x or y depending on condition.
Example:-
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.where(a < 5, a, 10*a)
array([ 0, 1, 2, 3, 4, 50, 60, 70, 80, 90])
>>>np.where(a,1,np.nan)
array([nan, 1., 1., 1., 1., 1., 1., 1., 1., 1.])
In Python any number other than zero is considered to be TRUE whereas zero is considered to FALSE.
When np.where() gets a np.array it checks for the condition, Here the array itself acts as condition i.e, the np.where evaluates to TRUE when the array elements are not zero and FALSE when they are 0. So the "True" elements are replaced by 1 and "False" elements by np.nan.
Reference:-
numpy.where()
numpy.empty_like()

Categories