Writing a double sum in Python - python

I am new to StackOverflow, and I am extremely new to Python.
My problem is this... I am needing to write a double-sum, as follows:
The motivation is that this is the angular correction to the gravitational potential used for the geoid.
I am having difficulty writing the sums. And please, before you say "Go to such-and-such a resource," or get impatient with me, this is the first time I have ever done coding/programming/whatever this is.
Is this a good place to use a "for" loop?
I have data for the two indices (n,m) and for the coefficients c_{nm} and s_{nm} in a .txt file. Each of those items is a column. When I say usecols, do I number them 0 through 3, or 1 through 4?
(the equation above)
\begin{equation}
V(r, \phi, \lambda) = \sum_{n=2}^{360}\left(\frac{a}{r}\right)^{n}\sum_{m=0}^{n}\left[c_{nm}*\cos{(m\lambda)} + s_{nm}*\sin{(m\lambda)}\right]*\sqrt{\frac{(n-m)!}{(n+m)!}(2n + 1)(2 - \delta_{m0})}P_{nm}(\sin{\lambda})
\end{equation}

(2) Yes, a "for" loop is fine. As #jpmc26 notes, a generator expression is a good alternative to a "for" loop. IMO, you'll want to use numpy if efficiency is important to you.
(3) As #askewchan notes, "usecols" refers to an argument of genfromtxt; as specified in that documentation, column indexes start at 0, so you'll want to use 0 to 3.
A naive implementation might be okay since the larger factorial is the denominator, but I wouldn't be surprised if you run into numerical issues. Here's something to get you started. Note that you'll need to define P() and a. I don't understand how "0 through 3" relates to c and s since their indexes range much further. I'm going to assume that each (and delta) has its own file of values.
import math
import numpy
c = numpy.getfromtxt("the_c_file.txt")
s = numpy.getfromtxt("the_s_file.txt")
delta = numpy.getfromtxt("the_delta_file.txt")
def V(r, phi, lam):
ret = 0
for n in xrange(2, 361):
for m in xrange(0, n + 1):
inner = c[n,m]*math.cos(m*lam) + s[n,m]*math.sin(m*lam)
inner *= math.sqrt(math.factorial(n-m)/math.factorial(n+m)*(2*n+1)*(2-delta[m,0]))
inner *= P(n, m, math.sin(lam))
ret += math.pow(a/r, n) * inner
return ret
Make sure to write unittests to check the math. Note that "lambda" is a reserved word.

Related

Speed Up Program Below

I have written this for loop program below where I go through element by element of an array and do some math to those elements. Once the math is calculated it gets stored into another array.
for i in range(0, 1024):
x[i] = a * data[i]+ b * x[(i-1)] + c * x[(i-2)]
So in my program a, b, and c are just scalar numbers. Data and x are arrays. Data has an array size 1024 filled with numbers in each element. X is also an array size 1024 but it's filled with all zeros initially. In order to calculate the new elements of x I use the previous two elements of x. Initially the previous two are 0 and 0 since it takes the last two element from the x array of zeros. I multiply the current element of data by a, the last element of x by b, and the second to last element of x by c. Then I add everything up and save it to the current element of x. Then I do the same thing for every element in data and x.
This loop program works but I was wondering if there is a faster way to do it? Maybe using a combination of numpy functions like cumsum or dot product? Can some one help me maybe make the program faster? Thank you!
Best you could do using recursive method:
x = a * data
coef = np.array([c,b])
for i in range(2, 1024):
x[i] += np.dot(coef, x[i-2:i])
But even better, you can solve this recurrence equation to a closed form solution and apply directly without loop. (This is a basic 2nd order linear equation)
In general, if you want a programm that is fast, Python is not the best option. Python is great for prototyping since it is easy and has a lot of tools, however it is not verry computationally efficient in it's raw form if you compare it to for example C. What I usually do is to use Cython, is is a module for python that let's you convert your script to machiene code (as you do with C) which would greatly increase the speed of the appliation.
It let's you type cast the variables for example:
cdef double a, b, c
When you use a variable in Python the variables has to be checked every single time to make sure what type of variable it is (int, double, string etc). In C, that is not an issue since you have to decide from the start what the variable should be, decreasing the time consumption of the operation.
I would try to transform the for loop in a list comprehension which has much faster processing time in python.

How can I get my function to add together its output?

So this is my line of code so far,
def Adder (i,j,k):
if i<=j:
for x in range (i, j+1):
print(x**k)
else:
print (0)
What it's supposed to do is get inputs (i,j,k) so that each number between [i,j] is multiplied the power of k. For example, Adder(3,6,2) would be 3^2 + 4^2 + 5^2 + 6^2 and eventually output 86. I know how to get the function to output the list of numbers between i and j to the power of K but I don't know how to make it so that the function sums that output. So in the case of my given example, my output would be 9, 16, 25, 36.
Is it possible to make it so that under my if conditional I can generate an output that adds up the numbers in the range after they've been taken to the power of K?
If anyone can give me some advice I would really appreciate it! First week of any coding ever and I don't quite know how to ask this question so sorry for vagueness!
Question now Answered, thanks to everyone who responded so quickly!
You could use built-in function sum()
def adder(i,j,k):
if i <= j:
print(sum(x**k for x in range(i,j+1)))
else:
print(0)
The documentation is here
I'm not sure if this is what you want but
if i<=j:
sum = 0
for x in range (i, j+1):
sum = sum + x**k #sum += x**k for simplicity
this will give you the sum of the powers
Looking at a few of the answers posted, they do a good job of giving you pythonic code for your solution, I thought I could answer your specific questions:
How can I get my function to add together its output?
A perhaps reasonable way is to iteratively and incrementally perform your calculations and store your interim solutions in a variable. See if you can visualize this:
Let's say (i,j,k) = (3,7,2)
We want the output to be: 135 (i.e., the result of the calculation 3^2 + 4^2 + 5^2 + 6^2 + 7^2)
Use a variable, call it result and initialize it to be zero.
As your for loop kicks off with x = 3, perform x^2 and add it to result. So result now stores the interim result 9. Now the loop moves on to x = 4. Same as the first iteration, perform x^2 and add it to result. Now result is 25. You can now imagine that result, by the time x = 7, contains the answer to the calculation 3^2+4^2+5^2+6^2. Let the loop finish, and you will find that 7^2 is also added to result.
Once loop is finished, print result to get the summed up answer.
A thing to note:
Consider where in your code you need to set and initialize the _result_ variable.
If anyone can give me some advice I would really appreciate it! First week of any coding ever and I don't quite know how to ask this question so sorry for vagueness!
Perhaps a bit advanced for you, but helpful to be made aware I think:
Alright, let's get some nuance added to this discussion. Since this is your first week, I wanted to jot down some things I had to learn which have helped greatly.
Iterative and Recursive Algorithms
First off, identify that the solution is an iterative type of algorithm. Where the actual calculation is the same, but is executed over different cumulative data.
In this example, if we were to represent the calculation as an operation called ADDER(i,j,k), then:
ADDER(3,7,2) = ADDER(3,6,2)+ 7^2
ADDER(3,6,2) = ADDER(3,5,2) + 6^2
ADDER(3,5,2) = ADDER(3,4,2) + 5^2
ADDER(3,4,2) = ADDER(3,3,2) + 4^2
ADDER(3,3,2) = 0 + 3^2
Problems like these can be solved iteratively (like using a loop, be it while or for) or recursively (where a function calls itself using a subset of the data). In your example, you can envision a function calling itself and each time it is called it does the following:
calculates the square of j and
adds it to the value returned from calling itself with j decremented
by 1 until
j < i, at which point it returns 0
Once the limiting condition (Point 3) is reached, a bunch of additions that were queued up along the way are triggered.
Learn to Speak The Language before using Idioms
I may get down-voted for this, but you will encounter a lot of advice displaying pythonic idioms for standard solutions. The idiomatic solution for your example would be as follows:
def adder(i,j,k):
return sum(x**k for x in range(i,j+1)) if i<=j else 0
But for a beginner this obscures a lot of the "science". It is far more rewarding to tread the simpler path as a beginner. Once you develop your own basic understanding of devising and implementing algorithms in python, then the idioms will make sense.
Just so you can lean into the above idiom, here's an explanation of what it does:
It calls the standard library function called sum which can operate over a list as well as an iterator. We feed it as argument a generator expression which does the job of the iterator by "drip feeding" the sum function with x^k values as it iterates over the range (1, j+1). In cases when N (which is j-i) is arbitrarily large, using a standard list can result in huge memory overhead and performance disadvantages. Using a generator expression allows us to avoid these issues, as iterators (which is what generator expressions create) will overwrite the same piece of memory with the new value and only generate the next value when needed.
Of course it only does all this if i <= j else it will return 0.
Lastly, make mistakes and ask questions. The community is great and very helpful
Well, do not use print. It is easy to modify your function like this,
if i<=j:
s = 0
for x in range (i, j+1):
s += x**k
return s # print(s) if you really want to
else:
return 0
Usually functions do not print anything. Instead they return values for their caller to either print or further process. For example, someone may want to find the value of Adder(3, 6, 2)+1, but if you return nothing, they have no way to do this, since the result is not passed to the program. A side note, do not capitalize functions. Those are for classes.

Optimizing a nested for loop

I'm trying avoid to use for loops to run my calculations. But I don't know how to do it. I have a matrix w with shape (40,100). Each line holds the position to a wave in a t time. For example first line w[0] is the initial condition (also w[1] for reasons that I will show).
To calculate the next line elements I use, for every t and x on shape range:
w[t+1,x] = a * w[t,x] + b * ( w[t,x-1] + w[t,x+1] ) - w[t-1,x]
Where a and b are some constants based on equation solution (it really doesn't matter), a = 2(1-r), b=r, r=(c*(dt/dx))**2. Where c is the wave speed and dt, dx are related to the increment on x and t direction.
Is there any way to avoid a for loop like:
for t in range(1,nt-1):
for x in range(1,nx-1):
w[t+1,x] = a * w[t,x] + b * ( w[t,x-1] + w[t,x+1] ) - w[t-1,x]
nt and nx are the shape of w matrix.
I assume you're setting w[:,0] and w[:-1] beforehand (to some constants?) because I don't see it in the loop.
If so, you can eliminate for x loop vectorizing this part of code:
for t in range(1,nt-1):
w[t+1,1:-1] = a*w[t,1:-1] + b*(w[t,:-2] + w[t,2:]) - w[t-1,1:-1]
Not really. If you want to do something for every element in your matrix (which you do), you're going to have to operate on each element in some way or another (most obvious way is with a for loop. Less obvious methods will either perform the same or worse).
If you're trying to avoid loops because loops are slow, know that sometimes loops are necessary to solve a certain kind of problem. However, there are lots of ways to make loops more efficient.
Generally with matrix problems like this where you're looking at the neighboring elements, a good solution is using some kind of dynamic programming or memoization (saving your work so you don't have to repeat calculations frequently). Like, suppose for each element you wanted to take the average of it and all the things around it (this is how blurring images works). Each pixel has 8 neighbors, so the average will be the sum / 9. Well, let's say you save the sums of the columns (save NW + W + SW, N + me + S, NE + E + SE). Well when you go to the next one to the right, just sum the values of your previous middle column, your previous last column, and the values of a new column (the new ones to the right). You just replaced adding 9 numbers with adding 5. In operations that are more complicated than addition, reducing 9 to 5 can mean a huge performance increase.
I looked at what you have to do and I couldn't think of a good way to do something like I just described. But see if you can think of something similar.
Also, remember multiplication is much more expensive than addition. So if you had a loop where, for instance, you had to multiply some number by the loop variable, instead of doing 1x, 2x, 3x, ..., you could do (value last time + x).

Is there a way to use this MACRO-based language to calculate fibnacci iteratively?

I tried to work out a python-like language which combined with the feature of MACRO(weird, but just for fun..), for example, the codes to calculate fibonacci seq analyticly is like this:
from math import *
def analytic_fibonacci(n):
sqrt_5 = sqrt(5);
p = (1 + sqrt_5) / 2;
q = 1/p;
return int( (p**n + q**n) / sqrt_5 + 0.5 )
print analytic_fibonacci(10),
And I can rewrite it in the python-like-with-MACRO language like this:
from math import sqrt
sqrt
def analytic_fibonacci(n):
_2(5)
(1+_1)/2
1/_1
return int((_2**n+_1**n)/_3+0.5)
print analytic_fibonacci(10)
The idea is to use line number to expand the expression so that no explicit assignment is needed. The _2 means to replace it with the expression appeared 2 lines smaller than the current line, so the _2 in the 4th line becomes the expression in the 2nd line, which is sqrt, and _2(5) is expanded to sqrt(5). (Lines before current line starts with _, after current line starts with |)
The example above is simple. When I tried to rewrite a more complex example, I encountered problem:
def fibIter(n):
if n < 2:
return n
fibPrev = 1
fib = 1
for num in xrange(2, n):
fibPrev, fib = fib, fib + fibPrev
return fib
I don't know how to use the line-number-based MACRO to express fibPrev, fib = fib, fib + fibPrev. I think some features is missing in this "MACRO langugage" , and fibPrev, fib = fib, fib+fibPrev is expressible if I fixed it.. (I heard that the MACRO in Lisp is Turing Complete so I think the example above should be expressed by MACRO) Does anyone have ideas about this?
I see two ways to interpret your language. Neither is very powerful.
The first way is to literally expand the macros to expressions, rather than values. Then analytic_fibonacci expands to
def analytic_fibonacci(n):
return int(((1+sqrt(5))/2**n+1/(1+sqrt(5))/2**n)/sqrt(5)+0.5)
You probably want some parentheses in there; depending on how you define the language, those may or may not be added for you.
This is pretty useless. Multiple-evaluation problems abound (where a function is reexecuted every time a macro refers to it), and it only lets you do things you could have done with ordinary expressions.
The second interpretation is that every statement consisting of a Python expression implicitly assigns that expression to a variable. This is also pretty useless, because only one statement can assign to any of these implicit variables. There's no way to do
x = 0
for i in range(5):
x += i
because you can't have the equivalent of x refer to either _2 or _0 depending on where the last assignment came from. Also, this really isn't a macro system at all.
Using the second interpretation, we can add a new operator to bring back the power of ordinary variable assignments. We'll call this the merge operator.
merge(_1, _2)
evaluates to either _1 or _2, depending on which was evaluated most recently. If one of the arguments hasn't yet been evaluated, it defaults to the other. fibIter then becomes
def fibIter(n):
if n < 2:
return n
1 # fibPrev
1 # fib
for num in xrange(2, n):
merge(_2, _-1) # temp
merge(_4, _-1) + merge(_3, _0) # fib
_2 # fibPrev
return merge(_2, _5)
This is quite awkward; essentially, we have to replace every use of a variable like x by a merge of every location it could have been assigned. It also requires awkward line counting, making it hard to tell which "variable" is which, and it doesn't handle multiple assignments, for loop targets, etc. I had to use negative indices to refer to future lines, because we need some way to refer to things assigned later.
Lisp macros are more powerful than your language because they let you apply arbitrary Lisp code to your Lisp code. Your language only allows a macro to expand to fixed expressions. A Lisp macro can take arbitrary code as arguments, cut it up, rearrange it, replace parts of it with different things depending on conditionals, recurse, etc. Your macros can't even take arguments.

DNA alignment -- score is additive or not?

So I have a recursive code that gives the best alignment for 2 DNA strands, but the problem is that it performs very slowly (I need it to be recursive). Then I read on an MIT website that the results are additive, which is great for me, but then I thought about it a little bit and I found out there is a problem:
website: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-096-algorithms-for-computational-biology-spring-2005/lecture-notes/lecture5_newest.pdf
The MIT website says that for a given spilt(i,j):
first_strand(0, i) and second_strand(0,j) alignment
+
first_strand(i, len) and second_strand(j, len) alignment
equals
first_strand and second strand alignment
but:
GTC GTAA
G with GTA alignment is G-- and GTA
TC with A alignment is TC and A-
result = G--TC and GTAA-
real best result = GTC- GTAA
Can anyone explain what they mean on the MIT website? I'm probably getting it all wrong!
I assume you're talking about this link.
If so, read it very carefully hundreds of times ;-) It's "additive" given that you're only considering alignments where the split is fixed at a specific (i, j) pair.
In your supposed counterexample, you started by breaking the initial G off of GTC and the initial GTA off of GTAA. Then G-- is the shortest way to change GTC into G. Fine. Continuing with the same split, you still needed to align the remaining right-hand parts: TC with A. Also fine.
This is no claim that this is the best possible split. There's only the claim that it's the best possible alignment given that you're only considering that specific split.
It's one small step in the dynamic programming approach, which is the part you're missing. It remains to compute the best alignments across all possible splits.
Dynamic programming is tricky at first. You shouldn't expect to learn it from staring at telegraphic slides. Read a real textbook, or search the web for tutorials.
Speeding a recursive version
The comments indicate that the code for this "must" be recursive. Oh well ;-)
Caution: I just threw this together to illustrate a general procedure for speeding suitable recursive functions. It's barely been tested at all.
First an utterly naive recursive version:
def lev(a, b):
if not a:
return len(b)
if not b:
return len(a)
return min(lev(a[:-1], b[:-1]) + (a[-1] != b[-1]),
lev(a[:-1], b) + 1,
lev(a, b[:-1]) + 1)
I'll be using "absd31-km" and "ldk3-1fjm" as arguments in all runs discussed here.
On my box, using Python 3, that simple function returns 7 after about 1.6 seconds. It's horribly slow.
The most obvious problem is the endlessly repeated string slicing. Each : in an index takes time proportional to the current length of the string being sliced. So the first refinement is to pass string indices instead. Since the code always slices off a prefix of a string, we only need to pass the "end of string" indices:
def lev2(a, b):
def inner(j1, j2):
if j1 < 0:
return j2 + 1
if j2 < 0:
return j1 + 1
return min(inner(j1-1, j2-1) + (a[j1] != b[j2]),
inner(j1-1, j2) + 1,
inner(j1, j2-1) + 1)
return inner(len(a)-1, len(b)-1)
Much better! This version returns 7 in "only" about 1.44 seconds. Still horridly slow, but better than the original. It's advantage would increase on longer strings, but who cares ;-)
We're almost done! The important thing to notice now is that the function passes the same arguments many times over the course of a run. We capture those in "a memo" to avoid all the redundant computation:
def lev3(a, b):
memo = {}
def inner(j1, j2):
if j1 < 0:
return j2 + 1
if j2 < 0:
return j1 + 1
args = j1, j2
if args in memo:
return memo[args]
result = min(inner(j1-1, j2-1) + (a[j1] != b[j2]),
inner(j1-1, j2) + 1,
inner(j1, j2-1) + 1)
memo[args] = result
return result
return inner(len(a)-1, len(b)-1)
That version returns 7 in about 0.00026 seconds, over 5000 times faster than lev2 did it.
Now if you've studied the matrix-based algorithms, and squint a little, you'll see that lev3() effectively builds a 2-dimensional matrix mapping index pairs to results in its memo dictionary. They're really the same thing, except that the recursive version builds the matrix in a more convoluted way. On the other hand, the recursive version may be easier to understand and to reason about. Note that the slides you found called the memoization aporoach "top down" and the nested-loop matrix approach "bottom up". Those are nicely descriptive.
You haven't said anything about how your recursive function works, but if it suffers any similar kinds of recursive excess, you should be able to get similar speedups using similar techniques :-)

Categories