I am calculating a trend line slope using numpy:
xs = []
ys = []
my_x = 0
for i in range(2000):
my_x += 1
ys.append(5*my_x+random.rand())
xs.append(my_x)
A = matrix(xs).T;
b = matrix(ys).T;
N = A.T*A
U = A.T*b
print N,U
a = (N.I*U)[0,0]
print a
The result I get is a=-8.2053307679 instead of the expected 5. Probably it happends beacuse the number in variable N is too big.
How to overcome this problem ? any help will be appreciated.
When I run the code, the answer is as you would expect:
[[2668667000]] [[ 1.33443472e+10]]
5.00037927592
It's probably due to the fact that you're on a 32-bit system, and I'm on a 64-bit system. Instead, you can use
A = matrix(xs, dtype='float64').T;
b = matrix(ys, dtype='float64').T;
Just FYI, when using numpy you'll be much more efficient if you work on vectorizing your algorithms. For example, you could replace the first several lines with this:
xs = np.arange(2000)
ys = 5 * xs + np.random.rand(2000)
Edit – one more thing: numerically, it is a bad idea to explicitly invert matrices when doing computations like these. It would be better to use something like a = np.linalg.solve(N, U)[0, 0] in your algorithm. It won't make a big difference here, but if you move to more complicated problems it definitely will! For some discussion this, take a look at this article.
:) The problem solved by using:
A = matrix(xs,float64).T;
b = matrix(ys,float64).T;
Related
I believe that my problem is really straightforward and there must be a really easy way to solve this issue, however as I am quite new with Python, I could not sort it out by my own.
I will post a made up example that I am using than the complex script which I am currently working on in case you want to test by yourself. Please, consider the following:
import numpy as np
nData = 100
sigma_alpha = np.array([1,1])
alpha = [-23,0]
data_alpha1 = np.random.randn(nData)*sigma_alpha[0]+alpha[0]
data_alpha2 = np.random.randn(nData)*sigma_alpha[1]+alpha[1]
My issue is that I have to limit data_alpha1 and data_alpha2 to -25 as lower limit and 25 as upper limit. That means, all the elements on both arrays have to be in between the aforementioned values. So, the solution that I am looking for has also to involve a case where all the elements of data_alpha1,2<25, as the following, where multiple values will be beyond 25:
nData = 100
sigma_alpha = np.array([1,1])
alpha = [25,0]
data_alpha1 = np.random.randn(nData)*sigma_alpha[0]+alpha[0]
data_alpha2 = np.random.randn(nData)*sigma_alpha[1]+alpha[1]
The variable alpha is in a loop, so it has a dynamic value and is constantly being update.
To sum up: what I have been trying to figure out is a way to make sure that data_alpha1 and data_alpha2 returns only values inbetween -25 and 25, and in case, any value doesn't respect the condition imposed, then it should be set to the closest boundary value which it surpasses. Like, if an element of alpha_data1 <-25, then it should be replaced by -25.
Hope that I managed to be succinct and precise. I would really appreciate your help on this one!
Like this:
data_alpha1[data_alpha1 > 25] = 25
data_alpha1[data_alpha1 < -25] = -25
I'm trying to write a program that will allow me to solve a system of equations using numpy, however, I want the solution to be non-trivial (not all zeros). Obviously the program is just going to set everything to 0, and boom, problem solved. I attempted to use a while loop (like below), but quickly found out it's going to continue to spit 0 back at me. I don't care if I end up using numpy, I'm open to other solutions if it's more elegant.
I actually haven't solved this particular set by hand, maybe the trivial solution is the only solution. If so, the principle still applies. Numpy seems to always spit 0 back.
Any help would be appreciated! Thanks.
x1 = .5
x2 = .3
x3 = .2
x4 = .05
a = np.array([[x1,x2],[x3,x4]])
b = np.array([0,0])
ans = np.linalg.solve(a,b)
while ans[0] == 0 and ans[1] == 0:
print ("got here")
ans = np.linalg.solve(a,b)
print(ans)
In your case, the matrix a is invertible. Therefore your system of linear equations has only one solution and the solution is [0, 0]. Are you wondering why you only get that unique solution?
Check out Sympy and it's use of solve and matrix calculations. Here are the pages for both.
http://docs.sympy.org/latest/tutorial/matrices.html
http://docs.sympy.org/latest/tutorial/solvers.html
So, in my previous question wflynny gave me a really neat solution (Surface where height is a function of two functions, and a sum over the third). I've got that part working for my simple version, but now I'm trying to improve on this.
Consider the following lambda function:
x = np.arange(0,100, 0.1)
y = np.sin(y);
f = lambda xx: (xx-y[x=xx])**2
values = f(x)
Now, in this scenario it works. In fact, the [x=xx] is trivial in the example. However, the example can be extended:
x = np.arange(0,100, 0.1)
z = np.sin(y);
f = lambda xx, yy: ( (xx-z[x=xx])**2 + yy**2)**0.5
y = np.arange(0,100,0.1)
[xgrid, ygrid] = np.meshgrid(x,y);
values = f(xgrid,ygrid)
In this case, the error ValueError: boolean index array should have 1 dimension is generated. This is because z.shape is different from xgrid.shape, I think.
Note that here, y=np.sin(y) is a simplification. It's not a function but an array of arbitrary values. We really need to go to that array to retrieve them.
I do not know what the proper way to implement this is. I am going to try some things, but I hope that somebody here will give me hints or provide me with the proper way to do this in Python.
EDIT: I originally thought I had solved it by using the following:
retrieve = lambda pp: map(lambda pp: dataArray[pp==phiArray][0], phi)
However, this merely returns the dataArray. Suppose dataArray contains a number of 'maximum' values for the polar radius. Then, you would normally incorporate this by saying something like g = lambda xx, yy: f(xx,yy) * Heaviside( dataArray - radius(xx,yy)). Then g would properly be zero if the radius is too large.
However, this doesn't work. I'm not fully sure but the behaviour seems to be something like taking a single value of dataArray instead of the entire array.
Thanks!
EDIT: Sadly, this stuff has to work and I can't spend more time on making it nice. Therefore, I've opted for the dirty implementation. The actual thing I was interested in would be of the sort as the g = lambda xx, yy written above, so I can implement that directly (dirty) instead of nicely (without nested for loops).
def envelope(xx, yy):
value = xx * 0.
for i in range(0,N): #N is defined somewhere, and xx.shape = (N,N)
for j in range(0,N):
if ( dataArray[x=xx[i,j]][0] > radius(xx[i,j],yy[i,j])):
value[i,j] = 1.
else:
value[i,j] = 0.
return value
A last resort, but it works. And, sometimes results matter over writing good code, especially when there's a deadline coming up (and you are the only one that cares about good code).
I would still be very much interested in learning how to do this properly, if there is a proper way, and thus increase my fluency in clean Python.
A question from a complete Python novice.
I have a column array where I need to force certain values to zero depending on a conditional statement applied to another array. I have found two solutions, which both provide the correct answer. But they are both quite time consuming for the larger arrays I typically need (>1E6 elements) - also I suspect that it is poor programming technique. The two versions are:
from numpy import zeros,abs,multiply,array,reshape
def testA(y, f, FC1, FC2):
c = zeros((len(f),1))
for n in xrange(len(f)):
if abs(f[n,0]) >= FC1 and abs(f[n,0]) <= FC2:
c[n,0] = 1.
w = multiply(c,y)
return w
def testB(y, f, FC1, FC2):
z = [(abs(f[n,0])>=FC1 and abs(f[n,0])<=FC2) for n in xrange(len(f))]
z = multiply(array(z,dtype=float).reshape(len(f),1), y)
return z
The input arrays are column arrays as this matches the post processing to be done. The test can be done like:
>>> from numpy.random import normal as randn
>>> fs, N = 1.E3, 2**22
>>> f = fs/N*arange(N).reshape((N,1))
>>> x = randn(size=(N,1))
>>> w1 = testA(x,f,200.,550.)
>>> z1 = testB(x,f,200.,550.)
On my laptop testA takes 18.7 seconds and testB takes 19.3 - both for N=2**22. In testB I also tried to include "z = [None]*len(f)" to preallocate as suggested in another thread but this doesn't really make any difference.
I have two questions, which I hope to have the same answer:
What is the "correct" Python solution to this problem?
Is there anything I can do to get the answer faster?
I have deliberately not used any time at all using compiled Python for example - I wanted to have some working code first. Hopefully also something, which is good Python style. I hope to be able to get the execution time for N=2**22 below two seconds or so. This particular operation will be used many times so the execution time does matter.
I apologize in advance if the question is stupid - I haven't been able to find an answer in the overwhelming amount of not always easily accessible Python documentation or in another thread.
use bool array to access elements in array y:
def testC(y, f, FC1, FC2):
f2 = abs(f)
idx = (f2>=FC1) & (f2<=FC2)
y[~idx] = 0
return y
All of these are slower than HYRY solution by a large factor:
How about
( x[1] if FC1<=abs(x[0])<=FC2 else 0 for x in itertools.izip(f,x) )
If you need to do random access (very slow)
[ x[1] if FC1<=abs(x[0])<=FC2 else 0 for x in itertools.izip(f,x) ]
or you can also use map
map(lambda x: x[1] if FC1<=abs(x[0])<=FC2 else 0 , itertools,izip(f,x))
or using vectorize (faster than A and B but much much slower than C)
b1v = np.vectorize(lambda a,b: a if 200<=abs(b)<=550 else 0)
b1 = b1v(f,x)
What's the best(fastest) way to do this?
This generates what I believe is the correct answer, but obviously at N = 10e6 it is painfully slow. I think I need to keep the Xi values so I can correctly calculate the standard deviation, but are there any techniques to make this run faster?
def randomInterval(a,b):
r = ((b-a)*float(random.random(1)) + a)
return r
N = 10e6
Sum = 0
x = []
for sample in range(0,int(N)):
n = randomInterval(-5.,5.)
while n == 5.0:
n = randomInterval(-5.,5.) # since X is [-5,5)
Sum += n
x = np.append(x, n)
A = Sum/N
for sample in range(0,int(N)):
summation = (x[sample] - A)**2.0
standard_deviation = np.sqrt((1./N)*summation)
You made a decent attempt, but should make sure you understand this and don't copy explicitly since this is HW
import numpy as np
N = int(1e6)
a = np.random.uniform(-5,5,size=(N,))
standard_deviation = np.std(a)
This assumes you can use a package like numpy (you tagged it as such). If you can, there are a whole host of methods that allow you to create and do operations on arrays of data, thus avoiding explicit looping (it's done under the hood in an efficient manner). It would be good to take a look at the documentation to see what features are available and how to use them:
http://docs.scipy.org/doc/numpy/reference/index.html
Using the formulas found on this wiki page for Variance, you could compute it in one loop without storing a list of the random numbers (assuming you didn't need them elsewhere).