I'm setting a numpy array with a power-law equation. The problem is that part of my domain tries to do numpy.power(x, n) when x is negative and n is not an integer. In this part of the domain I want the value to be 0.0. Below is a code that has the correct behavior, but is there a more Pythonic way to do this?
# note mesh.x is a numpy array of length nx
myValues = npy.zeros((nx))
para = [5.8780046, 0.714285714, 2.819250868]
for j in range(nx):
if mesh.x[j] > para[1]:
myValues[j] = para[0]*npy.power(mesh.x[j]-para[1],para[2])
else:
myValues[j] = 0.0
Is "numpythonic" a word? It should be a word. The following is really neither pythonic nor unpythonic, but it is much more efficient than using a for loop, and close(r) to the way Travis would probably do it:
import numpy
mesh_x = numpy.array([0.5,1.0,1.5])
myValues = numpy.zeros_like( mesh_x )
para = [5.8780046, 0.714285714, 2.819250868]
mask = mesh_x > para[1]
myValues[mask] = para[0] * numpy.power(mesh_x[mask] - para[1], para[2])
print(myValues)
For very large problems you would probably want to avoid creating temporary arrays:
mask = mesh.x > para[1]
myValues[mask] = mesh.x[mask]
myValues[mask] -= para[1]
myValues[mask] **= para[2]
myValues[mask] *= para[0]
Here's one approach with np.where to choose values between the power calculations and 0 -
import numpy as np
np.where(mesh.x>para[1],para[0]*np.power(mesh.x-para[1],para[2]),0)
Explanation :
np.where(mask,A,B) chooses elements from A or B depending on mask elements. So, in our case it is mesh.x>para[1] when doing a vectorized comparison for all mesh.x elements in one go.
para[0]*np.power(mesh.x-para[1],para[2]) gives us the elements that are to be chosen in case a mask element is True. Else, we choose 0, which is the third argument to np.where.
More of an explanation of the answers given by #jez and #Divakar with simple examples than an answer itself. They both rely on some form of boolean indexing.
>>>
>>> a
array([[-4.5, -3.5, -2.5],
[-1.5, -0.5, 0.5],
[ 1.5, 2.5, 3.5]])
>>> n = 2.2
>>> a ** n
array([[ nan, nan, nan],
[ nan, nan, 0.21763764],
[ 2.44006149, 7.50702771, 15.73800567]])
np.where is made for this it selects one of two values based on a boolean array.
>>> np.where(np.isnan(a**n), 0, a**n)
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 0.21763764],
[ 2.44006149, 7.50702771, 15.73800567]])
>>>
>>> b = np.where(a < 0, 0, a)
>>> b
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 0.5],
[ 1.5, 2.5, 3.5]])
>>> b **n
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 0.21763764],
[ 2.44006149, 7.50702771, 15.73800567]])
Use of boolean indexing on the left-hand-side and the right-hand-side. This is similar to np.where
>>>
>>> a[a >= 0] = a[a >= 0] ** n
>>> a
array([[ -4.5 , -3.5 , -2.5 ],
[ -1.5 , -0.5 , 0.21763764],
[ 2.44006149, 7.50702771, 15.73800567]])
>>> a[a < 0] = 0
>>> a
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 0.21763764],
[ 2.44006149, 7.50702771, 15.73800567]])
>>>
Related
I want to create a N x N array in numpy such that the diagonal is zero and [x,y] = -[y,x].
For example:
np.array([[[0,12, 2],
[-12, 0, 3],
[-2, -3, 0]],])
The values inside the array can be any float.
One way would be with scipy.spatial.distance.squareform -
from scipy.spatial.distance import squareform
def diag_inverted(n):
l = n*(n-1)//2
out = squareform(np.random.randn(l))
out[np.tri(len(out),k=-1,dtype=bool)] *= -1
return out
Another with array-assignment and masking -
def diag_inverted_v2(n):
l = n*(n-1)//2
m = np.tri(n, k=-1, dtype=bool)
out = np.zeros((n,n),dtype=float)
out[m] = np.random.randn(l)
out[m.T] = -out.T[m.T]
return out
Sample runs -
In [148]: diag_inverted(2)
Out[148]:
array([[ 0. , -0.97873798],
[ 0.97873798, 0. ]])
In [149]: diag_inverted(3)
Out[149]:
array([[ 0. , -2.2408932 , -1.86755799],
[ 2.2408932 , 0. , 0.97727788],
[ 1.86755799, -0.97727788, 0. ]])
In [150]: diag_inverted(4)
Out[150]:
array([[ 0. , -0.95008842, 0.15135721, -0.4105985 ],
[ 0.95008842, 0. , 0.10321885, -0.14404357],
[-0.15135721, -0.10321885, 0. , -1.45427351],
[ 0.4105985 , 0.14404357, 1.45427351, 0. ]])
Here you go:
size = 3
a = np.random.normal(0,1, (size, size))
ret = (a-a.transpose())/2
Output (random):
array([[ 0. , 0.11872306, 0.46792054],
[-0.11872306, 0. , 0.12530741],
[-0.46792054, -0.12530741, 0. ]])
I have this big serie of length t (t = 200K rows)
prices = [200, 100, 500, 300 ..]
and I want to calculate a matrix (tXt) where a value is calculated as:
matrix[i][j] = prices[j]/prices[i] - 1
I tried this using a double for, but it's too slow. Any ideas how to perform it better?
for p0 in prices:
for p1 in prices:
matrix[i][j] = p1/p0 - 1
A vectorized solution is using np.meshgrid, with prices and 1/prices as arguments (note that prices must be an array), and multiplying the result and substracting 1 in order to compute matrix[i][j] = prices[j]/prices[i] - 1:
a, b = np.meshgrid(p, 1/p)
a * b - 1
As an example:
p = np.array([1,4,2])
Would give:
a, b = np.meshgrid(p, 1/p)
a * b - 1
array([[ 0. , 3. , 1. ],
[-0.75, 0. , -0.5 ],
[-0.5 , 1. , 0. ]])
Quick check of some of the cells:
(i,j) prices[j]/prices[i] - 1
--------------------------------
(1,1) 1/1 - 1 = 0
(1,2) 4/1 - 1 = 3
(1,3) 2/1 - 1 = 1
(2,1) 1/4 - 1 = -0.75
Another solution:
[p] / np.array([p]).T - 1
array([[ 0. , 3. , 1. ],
[-0.75, 0. , -0.5 ],
[-0.5 , 1. , 0. ]])
There are two idiomatic ways of doing an outer product-type operation. Either use the .outer method of universal functions, here np.divide:
In [2]: p = np.array([10, 20, 30, 40])
In [3]: np.divide.outer(p, p)
Out[3]:
array([[ 1. , 0.5 , 0.33333333, 0.25 ],
[ 2. , 1. , 0.66666667, 0.5 ],
[ 3. , 1.5 , 1. , 0.75 ],
[ 4. , 2. , 1.33333333, 1. ]])
Alternatively, use broadcasting:
In [4]: p[:, None] / p[None, :]
Out[4]:
array([[ 1. , 0.5 , 0.33333333, 0.25 ],
[ 2. , 1. , 0.66666667, 0.5 ],
[ 3. , 1.5 , 1. , 0.75 ],
[ 4. , 2. , 1.33333333, 1. ]])
This p[None, :] itself can be spelled as a reshape, p.reshape((1, len(p))), but readability.
Both are equivalent to a double for-loop:
In [6]: o = np.empty((len(p), len(p)))
In [7]: for i in range(len(p)):
...: for j in range(len(p)):
...: o[i, j] = p[i] / p[j]
...:
In [8]: o
Out[8]:
array([[ 1. , 0.5 , 0.33333333, 0.25 ],
[ 2. , 1. , 0.66666667, 0.5 ],
[ 3. , 1.5 , 1. , 0.75 ],
[ 4. , 2. , 1.33333333, 1. ]])
I guess it can be done in this way
import numpy
prices = [200., 300., 100., 500., 600.]
x = numpy.array(prices).reshape(1, len(prices))
matrix = (1/x.T) * x - 1
Let me explain in details. This matrix is a matrix product of column vector of element-wise reciprocal price values and a row vector of original price values. Then matrix of ones of the same size needs to be subtracted from the result.
First of all we create row-vector from prices list
x = numpy.array(prices).reshape(1, len(prices))
Reshaping is required here. Otherwise your vector will have shape (len(prices),), not required (1, len(prices)).
Then we compute a column vector of element-wise reciprocal price values:
(1/x.T)
Finally, we compute the resulting matrix
matrix = (1/x.T) * x - 1
Here ending - 1 will be broadcasted to a matrix of the same shape with (1/x.T) * x.
I have a numpy array A with shape (M,N). I want to create a new array B with shape (M,N,3) where the result would be the same as the following:
import numpy as np
def myfunc(A,sx=1.5,sy=3.5):
M,N=A.shape
B=np.zeros((M,N,3))
for i in range(M):
for j in range(N):
B[i,j,0]=i*sx
B[i,j,1]=j*sy
B[i,j,2]=A[i,j]
return B
A=np.array([[1,2,3],[9,8,7]])
print(myfunc(A))
Giving the result:
[[[0. 0. 1. ]
[0. 3.5 2. ]
[0. 7. 3. ]]
[[1.5 0. 9. ]
[1.5 3.5 8. ]
[1.5 7. 7. ]]]
Is there a way to do it without the loop? I was thinking whether numpy would be able to apply a function element-wise using the indexes of the array. Something like:
def myfuncEW(indx,value,out,vars):
out[0]=indx[0]*vars[0]
out[1]=indx[1]*vars[1]
out[2]=value
M,N=A.shape
B=np.zeros((M,N,3))
np.applyfunctionelementwise(myfuncEW,A,B,(sx,sy))
You could use mgrid and moveaxis:
>>> M, N = A.shape
>>> I, J = np.mgrid[:M, :N] * np.array((sx, sy))[:, None, None]
>>> np.moveaxis((I, J, A), 0, -1)
array([[[ 0. , 0. , 1. ],
[ 0. , 3.5, 2. ],
[ 0. , 7. , 3. ]],
[[ 1.5, 0. , 9. ],
[ 1.5, 3.5, 8. ],
[ 1.5, 7. , 7. ]]])
>>>
You could use meshgrid and dstack, like this:
import numpy as np
def myfunc(A,sx=1.5,sy=3.5):
M, N = A.shape
J, I = np.meshgrid(range(N), range(M))
return np.dstack((I*sx, J*sy, A))
A=np.array([[1,2,3],[9,8,7]])
print(myfunc(A))
# array([[[ 0. , 0. , 1. ],
# [ 0. , 3.5, 2. ],
# [ 0. , 7. , 3. ]],
#
# [[ 1.5, 0. , 9. ],
# [ 1.5, 3.5, 8. ],
# [ 1.5, 7. , 7. ]]])
By preallocating the 3d array B, you save about half the time compared to stacking I, J and A.
def myfunc(A, sx=1.5, sy=3.5):
M, N = A.shape
B = np.zeros((M, N, 3))
B[:, :, 0] = np.arange(M)[:, None]*sx
B[:, :, 1] = np.arange(N)[None, :]*sy
B[:, :, 2] = A
return B
I have following numpy array
import numpy as np
np.random.seed(20)
np.random.rand(20).reshape(5, 4)
array([[ 0.5881308 , 0.89771373, 0.89153073, 0.81583748],
[ 0.03588959, 0.69175758, 0.37868094, 0.51851095],
[ 0.65795147, 0.19385022, 0.2723164 , 0.71860593],
[ 0.78300361, 0.85032764, 0.77524489, 0.03666431],
[ 0.11669374, 0.7512807 , 0.23921822, 0.25480601]])
For each column I would like to slice it in positions:
position_for_slicing=[0, 3, 4, 4]
So I will get following array:
array([[ 0.5881308 , 0.85032764, 0.23921822, 0.81583748],
[ 0.03588959, 0.7512807 , 0 0],
[ 0.65795147, 0, 0 0],
[ 0.78300361, 0, 0 0],
[ 0.11669374, 0, 0 0]])
Is there fast way to do this ? I know I can use to do for loop for each column, but I was wondering if there is more elegant way to do this.
If "elegant" means "no loop" the following would qualify, but probably not under many other definitions (arr is your input array):
m, n = arr.shape
arrf = np.asanyarray(arr, order='F')
padded = np.r_[arrf, np.zeros_like(arrf)]
assert padded.flags['F_CONTIGUOUS']
expnd = np.lib.stride_tricks.as_strided(padded, (m, m+1, n), padded.strides[:1] + padded.strides)
expnd[:, [0,3,4,4], range(4)]
# array([[ 0.5881308 , 0.85032764, 0.23921822, 0.25480601],
# [ 0.03588959, 0.7512807 , 0. , 0. ],
# [ 0.65795147, 0. , 0. , 0. ],
# [ 0.78300361, 0. , 0. , 0. ],
# [ 0.11669374, 0. , 0. , 0. ]])
Please note that order='C' and then 'C_CONTIGUOUS' in the assertion also works. My hunch is that 'F' could be a bit faster because the indexing then operates on contiguous slices.
When using scipy.sparse.spdiags or scipy.sparse.diags I have noticed want I consider to be a bug in the routines eg
scipy.sparse.spdiags([1.1,1.2,1.3],1,4,4).toarray()
returns
array([[ 0. , 1.2, 0. , 0. ],
[ 0. , 0. , 1.3, 0. ],
[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ]])
That is for positive diagonals it drops the first k data. One might argue that there is some grand programming reason for this and that I just need to pad with zeros. OK annoying as that may be, one can use scipy.sparse.diags which gives the correct result. However this routine has a bug that can't be worked around
scipy.sparse.diags([1.1,1.2],0,(4,2)).toarray()
gives
array([[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ],
[ 0. , 0. ]])
nice, and
scipy.sparse.diags([1.1,1.2],-2,(4,2)).toarray()
gives
array([[ 0. , 0. ],
[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2]])
but
scipy.sparse.diags([1.1,1.2],-1,(4,2)).toarray()
gives an error saying ValueError: Diagonal length (index 0: 2 at offset -1) does not agree with matrix size (4, 2). Obviously the answer is
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
and for extra random behaviour we have
scipy.sparse.diags([1.1],-1,(4,2)).toarray()
giving
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.1],
[ 0. , 0. ]])
Anyone know if there is a function for constructing diagonal sparse matrices that actually works?
Executive summary: spdiags works correctly, even if the matrix input isn't the most intuitive. diags has a bug that affects some offsets in rectangular matrices. There is a bug fix on scipy github.
The example for spdiags is:
>>> data = array([[1,2,3,4],[1,2,3,4],[1,2,3,4]])
>>> diags = array([0,-1,2])
>>> spdiags(data, diags, 4, 4).todense()
matrix([[1, 0, 3, 0],
[1, 2, 0, 4],
[0, 2, 3, 0],
[0, 0, 3, 4]])
Note that the 3rd column of data always appears in the 3rd column of the sparse. The other columns also line up. But they are omitted where they 'fall off the edge'.
The input to this function is a matrix, while the input to diags is a ragged list. The diagonals of the sparse matrix all have different numbers of values. So the specification has to accomodate this in one or other. spdiags does this by ignoring some values, diags by taking a list input.
The sparse.diags([1.1,1.2],-1,(4,2)) error is puzzling.
the spdiags equivalent does work:
In [421]: sparse.spdiags([[1.1,1.2]],-1,4,2).A
Out[421]:
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
The error is raised in this block of code:
for j, diagonal in enumerate(diagonals):
offset = offsets[j]
k = max(0, offset)
length = min(m + offset, n - offset)
if length <= 0:
raise ValueError("Offset %d (index %d) out of bounds" % (offset, j))
try:
data_arr[j, k:k+length] = diagonal
except ValueError:
if len(diagonal) != length and len(diagonal) != 1:
raise ValueError(
"Diagonal length (index %d: %d at offset %d) does not "
"agree with matrix size (%d, %d)." % (
j, len(diagonal), offset, m, n))
raise
The actual matrix constructor in the diags is:
dia_matrix((data_arr, offsets), shape=(m, n))
This is the same constructor that spdiags uses, but without any manipulation.
In [434]: sparse.dia_matrix(([[1.1,1.2]],-1),shape=(4,2)).A
Out[434]:
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
In dia format, the inputs are stored exactly as given by spdiags (complete with that matrix with extra values):
In [436]: M.data
Out[436]: array([[ 1.1, 1.2]])
In [437]: M.offsets
Out[437]: array([-1], dtype=int32)
As #user2357112 points out, length = min(m + offset, n - offset is wrong, producing 3 in the test case. Changing it to length = min(m + k, n - k) makes all cases for this (4,2) matrix work. But it fails with the transpose: diags([1.1,1.2], 1, (2, 4))
The correction, as of Oct 5, for this issue is:
https://github.com/pv/scipy-work/commit/529cbde47121c8ed87f74fa6445c05d71353eb6c
length = min(m + offset, n - offset, min(m,n))
With this fix, diags([1.1,1.2], 1, (2, 4)) works.