I plan to use python for the solution of next task.
There is an equation:
E=(n[1])*W[1]+ (n[2])*W[2]+..+ (n[N])*W[N]. The W[i],
E are known and are fixed values,
n[i] are integer variables.
I need to find all combinations of n[i] and write them.
Howe can I do it using numpy python?
Looks like a Diophantine equation.
There is no support for this in numpy/scipy and the usual suspect Integer-programming (which can be used to solve this) is also not available within scipy!
The general case is NP-hard!
Related
I am trying to find an inverse of this 9x9 covariance matrix so I can use it with mahalanobis distance. However, the result I'm getting from matrix inverse is a matrix full of 1.02939420e+16. I have been trying to find why, considering Wolfram would give me the correct answer, and this seems to have something to do with condition number of matrix, which in this case is 3.98290435292e+16.
Although I would like to understand the math behind this, what I really need at this moment is just a solution to this problem so I can continue with implementation. Is there a way how to find an inverse of such matrix? Or is it somehow possible to find inverse covariance matrix directly from data instead?
Edit: Matrix data (same as the pastebin link)
[[ 0.46811097 0.15024959 0.01806486 -0.03029948 -0.12472314 -0.11952018 -0.14738093 -0.14655549 -0.06794621]
[ 0.15024959 0.19338707 0.09046136 0.01293189 -0.05290348 -0.07200769 -0.09317139 -0.10125269 -0.12769464]
[ 0.01806486 0.09046136 0.12575072 0.06507481 -0.00951239 -0.02944675 -0.05349869 -0.07496244 -0.13193147]
[-0.03029948 0.01293189 0.06507481 0.12214787 0.04527352 -0.01478612 -0.02879678 -0.06006481 -0.1114809 ]
[-0.12472314 -0.05290348 -0.00951239 0.04527352 0.164018 0.05474073 -0.01028871 -0.02695087 -0.03965366]
[-0.11952018 -0.07200769 -0.02944675 -0.01478612 0.05474073 0.13397166 0.06839442 0.00403321 -0.02537928]
[-0.14738093 -0.09317139 -0.05349869 -0.02879678 -0.01028871 0.06839442 0.14424203 0.0906558 0.02984426]
[-0.14655549 -0.10125269 -0.07496244 -0.06006481 -0.02695087 0.00403321 0.0906558 0.17054466 0.14455264]
[-0.06794621 -0.12769464 -0.13193147 -0.1114809 -0.03965366 -0.02537928 0.02984426 0.14455264 0.32968928]]
The matrix m you provide has a determinant of 0 and is hence uninvertible from a numerical point of view (and this explain the great values you have which tends to bump to Inf):
In [218]: np.linalg.det(m)
Out[218]: 2.8479946613617788e-16
If you start doing linear algebra operations/problem solving, I strongly advise to check some basic concepts, which would avoid doing numerical mistakes/errors:
https://en.wikipedia.org/wiki/Invertible_matrix
You are faced with a very important and fundamental mathematical problem. If your method gives non-invertible matrix the method has a trouble. The method is trying to solve an ill-posed problem. Probably all well-posed problems have been solved in the XIX century. The most common way to solve ill-posed problems is regularization. Sometimes Moore-Penrose pseudoinverse may be convenient. Scipy.linalg have pseudoinverse. But pseudoinverse is not a shortcut. Using pseudoinverse you're replacing non-solvable problem A by solvable problem B. Sometimes the solution of problem B can successfully work instead of non-existent solution of problem A, but it is a matter of mathematical research.
Zero determinant means that your matrix has linearly dependent rows (or columns). In other words, some information in your model is redundant (it contains excessive or duplicate information). Re-develop your model in order to exclude redundancy.
Please provide a working example with at least two variables. I can't seem to find out how to do this from the documentation.
I have tried:
solve(Eq(poly(x + y,domain=FF(7)),0),x,y)
but this outputs
[]
which is incorrect and appears to be a type issue. Is there a way to get around this?
>>> solve(Poly(x + y,domain=FF(7)),[x,y])
[{x: -y}]
Which method does Pandas use for computing the variance of a Series?
For example, using Pandas (v0.14.1):
pandas.Series(numpy.repeat(500111,2000000)).var()
12.579462289731145
Obviously due to some numeric instability. However, in R we get:
var(rep(500111,2000000))
0
I wasn't able to make enough sense of the Pandas source-code to figure out what algorithm it uses.
This link may be useful: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
Update: To summarize the comments below - If the Python bottleneck package for fast NumPy array functions is installed, a stabler two-pass algorithm similar to np.sqrt(((arr - arr.mean())**2).mean()) is used and gives 0.0 (as indicated by #Jeff); whereas if it is not installed, the naive implementation indicated by #BrenBarn is used.
The algorithm can be seen in nanops.py, in the function nanvar, the last line of which is:
return np.fabs((XX - X ** 2 / count) / d)
This is the "naive" implementation at the beginning of the Wikipedia article you mention. (d will be set to N-1 in the default case.)
The behavior you're seeing appears to be due to the sum of squared values overflowing the numpy datatypes. It's not an issue of how the variance is calculated per se.
I don't know the answer, but it seems related to how Series are stored, not necessarily the var function.
np.var(pd.Series(repeat(100000000,100000)))
26848.788479999999
np.var(repeat(100000000,100000))
0.0
Using Pandas 0.11.0.
I have a vector of floats (coming from an operation on an array) and a float value (which is actually an element of the array, but that's unimportant), and I need to find the smallest float out of them all.
I'd love to be able to find the minimum between them in one line in a 'Pythony' way.
MinVec = N[i,:] + N[:,j]
Answer = min(min(MinVec),N[i,j])
Clearly I'm performing two minimisation calls, and I'd love to be able to replace this with one call. Perhaps I could eliminate the vector MinVec as well.
As an aside, this is for a short program in Dynamic Programming.
TIA.
EDIT: My apologies, I didn't specify I was using numpy. The variable N is an array.
You can append the value, then minimize. I'm not sure what the relative time considerations of the two approaches are, though - I wouldn't necessarily assume this is faster:
Answer = min(np.append(MinVec, N[i, j]))
This is the same thing as the answer above but without using numpy.
Answer = min(MinVec.append(N[i, j]))
cost=0
for i in range(12):
cost=cost+math.pow(float(float(q[i])-float(w[i])),2)
cost=(math.sqrt(cost))
Any faster alternative to this? i am need to improve my entire code so trying to improve each statements performance.
thanking u
In addition to the general optimization remarks that are already made (and to which I subscribe), there is a more "optimized" way of doing what you want: you manipulate arrays of values and combine them mathematically. This is a job for the very useful and widely used NumPy package!
Here is how you would do it:
q_array = numpy.array(q, dtype=float)
w_array = numpy.array(w, dtype=float)
cost = math.sqrt(((q_array-w_array)**2).sum())
(If your arrays q and w already contain floats, you can remove the dtype=float.)
This is almost as fast as it can get, since NumPy's operations are optimized for arrays. It is also much more legible than a loop, because it is both simple and short.
Just a hint, but usually real performance improvements come when you evaluate the code at a function or even higher level.
During a good evaluation, you may find whole blocks that code be thrown away or rewritten to simplify the process.
Profilers are useful AFTER you've cleaned up crufty not-very-legible code. irrespective of whether it's to be run once or N zillion times, you should not write code like that.
Why are you doing float(q[i]) and float(w[i])? What type(s) is/are the elements of q and `w'?
If x and y are floats, then x - y will be a float too, so that's 3 apparently redundant occurrences of float() already.
Calling math.pow() instead of using the ** operator bears the overhead of lookups on 'math' and 'pow'.
Etc etc
See if the following code gives the same answers and reads better and is faster:
costsq = 0.0
for i in xrange(12):
costsq += (q[i] - w[i]) ** 2
cost = math.sqrt(costsq)
After you've tested that and understood why the changes were made, you can apply the lessons to other Python code. Then if you have a lot more array or matrix work to do, consider using numpy.
Assuming q and w contain numbers the conversions to float are not necessary, otherwise you should convert the lists to a usable representation earlier (and separately from your calculation)
Given that your function seems to only be doing the equivalent of this:
cost = sum( (qi-wi)**2 for qi,wi in zip(q[:12],w) ) ** 0.5
Perhaps this form would execute faster.