Related
I am fairly new to numpy. I want to apply a custom function to 1, 2 or more rows (or columns). How can I do this? Before this is marked as duplicate, I want to point out that the only thread I found that does this is how to apply a generic function over numpy rows? and how to apply a generic function over numpy rows?. There are two issues with this post:
a) As a beginner, I am not quite sure what operation like A[:,None,:] does.
b) That operation doesn't work in my case. Please see below.
Let's assume that Matrix M is:
import numpy as np
M = np.array([[8, 3, 2],
[6, 1, 2],
[1, 2, 4]])
Now, I would want to calculate product of combination of all three rows. For this, I have created a custom function. Actual operation of the function could be different from multiplication. Multiplication is just an example.
def myf(a,b): return(a*b)
I have taken numpy array product as an example. Actual custom function could be different, but no matter what the operation is, the function will always return a numpy array. i.e. it will take two equally-sized numpy 1-D array and return 1-D array. In myf I am assuming that a and b are each np.array.
I want to be able to apply custom function to any two rows or columns, or even three rows (recursively applying function).
Expected output after multiplying two rows recursively:
If I apply pairwise row-operation:
[[48,3,4],
[6,2,8],
[8,6,8]]
OR ( The order of application of custom function doesn't matter. Hence, the actual position of rows in the output matrix won't matter. Below matrix will be fine as well.)
[[6,2,8],
[48,3,4], #row1 and 2 are swapped
[8,6,8]]
Similarly, if I apply pairwise operation on columns, I would get
[[24, 6, 16]
[6, 2, 12]
[2, 8, 4]]
Similarly, if I apply custom function to all three rows, I would get:
[48,6,16] #row-wise
OR
[48,12,8] #column-wise
I tried a few approaches after reading SO:
1:
vf=np.vectorize(myf)
vf(M,M)
However, above function applies custom function element-wise rather than row-wise or columnwise.
2:
I also tried:
M[:,None,:].dot(M) #dot mimics multiplication. Python wouldn't accept `*`
There are two problems with this:
a) I don't know what the output is.
b) I cannot apply custom function.
Can someone please help me? I'd appreciate any help.
I am open to numpy and scipy.
Some experts have requested desired output. Let's assume that the desired output is
[[48,3,4],
[6,2,8],
[8,6,8]].
However, I'd appreciate some guidance on customizing the solution for 2 or more columns and 2 or more rows.
You can simply roll your axis along the 0th axis
np.roll(M, -1, axis=0)
# array([[6, 1, 2],
# [1, 2, 4],
# [8, 3, 2]])
And multiply the result with your original array
M * np.roll(M, -1, axis=0)
# array([[48, 3, 4],
# [ 6, 2, 8],
# [ 8, 6, 8]])
If you want to incorporate more than two rows, you can roll it more than once:
M * np.roll(M, -1, axis=0) * np.roll(M, -2, axis=0)
# array([[48, 6, 16],
# [48, 6, 16],
# [48, 6, 16]])
I'm attempting to get the 'power' of a Python list/matrix using numpy. My only current working solution is an iterative function using np.dot():
def matr_power(matrix, power):
matrix_a = list(matrix)
matrix_b = list(matrix)
for i in range(0, power-1):
matrix_a = np.dot(matrix_a, matrix_b)
return matrix_a
This works for my needs, but I'm aware it's probably not the most efficient method.
I've tried converting my list to a numpy array, performing power operations on it, and then back to a list so it's usable in the form I need. The conversions seem to happen, but the power calculation does not.
while (foo != bar):
matr_x = np.asarray(matr_a)
matr_y = matr_x ** n
matr_out = matr_y.tolist()
n += 1
# Other code here to output certain results
The issue is, the matrix gets converted to an array as expected, but when performing the power operation (**) matr_y ends up being the same as matr_x as if no calculation was ever performed. I have tried using np.power(matr_y, n) and some other solutions found in related questions on Stack Overflow.
I've tried using the numpy documentation, but (either I'm misunderstanding it, or) it just confirms that this should be working as expected.
When checking the debugging console in PyCharm everything seems fine (all matrices / lists / arrays are converted as expected) except that the calculation matr_x ** i never seems to be calculated (or else never stored in matr_y).
Answer
Although it's possible to use a numpy matrix with the ** operator, the best solution is to use numpy arrays (as numpy matrices are deprecated) combined with numpy's linalg matrix_power method.
matr_x = np.array(mat_a)
matr_y = np.linalg.matrix_power(matr_x, path_length)
work_matr = matr_y.tolist()
It is also now apparent that the function of ** being element-wise may have been discovered earlier had I not been using an adjacency matrix (only zeros and ones).
There are (at least) two options for computing the power of a matrix using numpy without multiple calls to dot:
Use numpy.linalg.matrix_power.
Use the numpy matrix class, which defines ** to be the matrix algebraic power.
For example,
In [38]: a
Out[38]:
array([[0, 1, 0],
[1, 0, 1],
[0, 1, 0]])
In [39]: np.linalg.matrix_power(a, 2)
Out[39]:
array([[1, 0, 1],
[0, 2, 0],
[1, 0, 1]])
In [40]: np.linalg.matrix_power(a, 3)
Out[40]:
array([[0, 2, 0],
[2, 0, 2],
[0, 2, 0]])
In [41]: m = np.matrix(a)
In [42]: m ** 2
Out[42]:
matrix([[1, 0, 1],
[0, 2, 0],
[1, 0, 1]])
In [43]: m ** 3
Out[43]:
matrix([[0, 2, 0],
[2, 0, 2],
[0, 2, 0]])
Warren's answer is perfectly good.
Upon special request by the OP I briefly explain how to build an efficient integer power operator by hand.
I don't know what this algorithm is called, but it works like this:
Suppose you want to calculate X^35. If you do that naively it will cost you 34 multiplications. But you can do much better than that. Write X^35 = X^32 x X^2 x X. What you've done here is split the product according to the binary representation of 35, which is 100011. Now, calculating X^32 is actually cheap, because you only have to repeatedly (5 times) square X to get there. So in total you need just 7 multiplications, much better than 34.
In code:
def my_power(x, n):
out = None
p = x
while True:
if n % 2 == 1:
if out is None:
out = p
else:
out = out # p # this requires a fairly up-to-date python
# if yours is too old use np.dot instead
if n == 1:
return out
n //= 2
p = p # p
I wanted to repeat the rows of a scipy csr sparse matrix, but when I tried to call numpy's repeat method, it simply treats the sparse matrix like an object, and would only repeat it as an object in an ndarray. I looked through the documentation, but I couldn't find any utility to repeats the rows of a scipy csr sparse matrix.
I wrote the following code that operates on the internal data, which seems to work
def csr_repeat(csr, repeats):
if isinstance(repeats, int):
repeats = np.repeat(repeats, csr.shape[0])
repeats = np.asarray(repeats)
rnnz = np.diff(csr.indptr)
ndata = rnnz.dot(repeats)
if ndata == 0:
return sparse.csr_matrix((np.sum(repeats), csr.shape[1]),
dtype=csr.dtype)
indmap = np.ones(ndata, dtype=np.int)
indmap[0] = 0
rnnz_ = np.repeat(rnnz, repeats)
indptr_ = rnnz_.cumsum()
mask = indptr_ < ndata
indmap -= np.int_(np.bincount(indptr_[mask],
weights=rnnz_[mask],
minlength=ndata))
jumps = (rnnz * repeats).cumsum()
mask = jumps < ndata
indmap += np.int_(np.bincount(jumps[mask],
weights=rnnz[mask],
minlength=ndata))
indmap = indmap.cumsum()
return sparse.csr_matrix((csr.data[indmap],
csr.indices[indmap],
np.r_[0, indptr_]),
shape=(np.sum(repeats), csr.shape[1]))
and be reasonably efficient, but I'd rather not monkey patch the class. Is there a better way to do this?
Edit
As I revisit this question, I wonder why I posted it in the first place. Almost everything I could think to do with the repeated matrix would be easier to do with the original matrix, and then apply the repetition afterwards. My assumption is that post repetition will always be the better way to approach this problem than any of the potential answers.
from scipy.sparse import csr_matrix
repeated_row_matrix = csr_matrix(np.ones([repeat_number,1])) * sparse_row
It's not surprising that np.repeat does not work. It delegates the action to the hardcoded a.repeat method, and failing that, first turns a into an array (object if needed).
In the linear algebra world where sparse code was developed, most of the assembly work was done on the row, col, data arrays BEFORE creating the sparse matrix. The focus was on efficient math operations, and not so much on adding/deleting/indexing rows and elements.
I haven't worked through your code, but I'm not surprised that a csr format matrix requires that much work.
I worked out a similar function for the lil format (working from lil.copy):
def lil_repeat(S, repeat):
# row repeat for lil sparse matrix
# test for lil type and/or convert
shape=list(S.shape)
if isinstance(repeat, int):
shape[0]=shape[0]*repeat
else:
shape[0]=sum(repeat)
shape = tuple(shape)
new = sparse.lil_matrix(shape, dtype=S.dtype)
new.data = S.data.repeat(repeat) # flat repeat
new.rows = S.rows.repeat(repeat)
return new
But it is also possible to repeat using indices. Both lil and csr support indexing that is close to that of regular numpy arrays (at least in new enough versions). Thus:
S = sparse.lil_matrix([[0,1,2],[0,0,0],[1,0,0]])
print S.A.repeat([1,2,3], axis=0)
print S.A[(0,1,1,2,2,2),:]
print lil_repeat(S,[1,2,3]).A
print S[(0,1,1,2,2,2),:].A
give the same result
and best of all?
print S[np.arange(3).repeat([1,2,3]),:].A
After someone posted a really clever response for how best to do this I revisited my original question, to see if there was an even better way. I I came up with one more way that has some pros and cons. Instead of repeating all of the data (as is done with the accepted answer), we can instead instruct scipy to reuse the data of the repeated rows, creating something akin to a view of the original sparse array (as you might do with broadcast_to). This can be done by simply tiling the indptr field.
repeated = sparse.csr_matrix((orig.data, orig.indices, np.tile(orig.indptr, repeat_num)))
This technique repeats the vector repeat_num times, while only modifying the the indptr. The downside is that due to the way the csr matrices encode data, instead of creating a matrix that's repeat_num x n in dimension, it creates one that's (2 * repeat_num - 1) x n where every odd row is 0. This shouldn't be too big of a deal as any operation will be quick given that each row is 0, and they should be pretty easy to slice out afterwards (with something like [::2]), but it's not ideal.
I think the marked answer is probably still the "best" way to do this.
One of the most efficient ways to repeat the sparse matrix would be the way OP suggested. I modified indptr so that it doesn't output rows of 0s.
## original sparse matrix
indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
x = scipy.sparse.csr_matrix((data, indices, indptr), shape=(3, 3))
x.toarray()
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
To repeat this, you need to repeat data and indices, and you need to fix-up the indptr. This is not the most elegant way, but it works.
## repeated sparse matrix
repeat = 5
new_indptr = indptr
for r in range(1,repeat):
new_indptr = np.concatenate((new_indptr, new_indptr[-1]+indptr[1:]))
x = scipy.sparse.csr_matrix((np.tile(data,repeat), np.tile(indices,repeat), new_indptr))
x.toarray()
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6],
[1, 0, 2],
[0, 0, 3],
[4, 5, 6],
[1, 0, 2],
[0, 0, 3],
[4, 5, 6],
[1, 0, 2],
[0, 0, 3],
[4, 5, 6],
[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
I am trying to figure out how to calculate covariance with the Python Numpy function cov. When I pass it two one-dimentional arrays, I get back a 2x2 matrix of results. I don't know what to do with that. I'm not great at statistics, but I believe covariance in such a situation should be a single number. This is what I am looking for. I wrote my own:
def cov(a, b):
if len(a) != len(b):
return
a_mean = np.mean(a)
b_mean = np.mean(b)
sum = 0
for i in range(0, len(a)):
sum += ((a[i] - a_mean) * (b[i] - b_mean))
return sum/(len(a)-1)
That works, but I figure the Numpy version is much more efficient, if I could figure out how to use it.
Does anybody know how to make the Numpy cov function perform like the one I wrote?
Thanks,
Dave
When a and b are 1-dimensional sequences, numpy.cov(a,b)[0][1] is equivalent to your cov(a,b).
The 2x2 array returned by np.cov(a,b) has elements equal to
cov(a,a) cov(a,b)
cov(a,b) cov(b,b)
(where, again, cov is the function you defined above.)
Thanks to unutbu for the explanation. By default numpy.cov calculates the sample covariance. To obtain the population covariance you can specify normalisation by the total N samples like this:
numpy.cov(a, b, bias=True)[0][1]
or like this:
numpy.cov(a, b, ddof=0)[0][1]
Note that starting in Python 3.10, one can obtain the covariance directly from the standard library.
Using statistics.covariance which is a measure (the number you're looking for) of the joint variability of two inputs:
from statistics import covariance
# x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
# y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
covariance(x, y)
# 0.75
I've tried using matrices, and it has failed. I've looked at external modules and external programs, but none of it has worked. If someone could share some tips or code that would be helpful, thanks.
import numpy
import scipy.linalg
m = numpy.matrix([
[1, 1, 1, 1, 1],
[16, 8, 4, 2, 1],
[81, 27, 9, 3, 1],
[256, 64, 16, 4, 1],
[625, 125, 25, 5, 1]
])
res = numpy.matrix([[1],[2],[3],[4],[8]])
print scipy.linalg.solve(m, res)
returns
[[ 0.125]
[-1.25 ]
[ 4.375]
[-5.25 ]
[ 3. ]]
(your solution coefficients for a,b,c,d,e)
I'm not sure what you mean when you say the matrix methods don't work. That's the standard way of solving these types of problems.
From a linear algebra standpoint, solving 5 linear equations is trivial. It can be solved using any number of methods. You can use Gaussian elimination, finding the inverse, Cramer's rule, etc.
If you're lazy, you can always resort to libraries. Sympy and Numpy can both solve linear equations with ease.
Perhaps you're using matrices in a wrong way.
Matrices are just like lists within lists.
[[1,1,1,1,1],[1,1,1,1,1],[1,1,1,1,1],[1,1,1,1,1],[1,1,1,1,1,1]]
The aforementioned code would make a list that you can access like mylist[y][x] as the axes are swapped.