I've read that one of the key beliefs of Python is that flat > nested. However, if I have several variables counting up, what is the alternative to multiple for loops?
My code is for counting grid sums and goes as follows:
def horizontal():
for x in range(20):
for y in range(17):
temp = grid[x][y: y + 4]
sum = 0
for n in temp:
sum += int(n)
print sum # EDIT: the return instead of print was a mistype
This seems to me like it is too heavily nested. Firstly, what is considered to many nested loops in Python ( I have certainly seen 2 nested loops before). Secondly, if this is too heavily nested, what is an alternative way to write this code?
from itertools import product
def horizontal():
for x, y in product(range(20), range(17)):
print 1 + sum(int(n) for n in grid[x][y: y + 4])
You should be using the sum function. Of course you can't if you shadow it with a variable, so I changed it to my_sum
grid = [range(20) for i in range(20)]
sum(sum( 1 + sum(grid[x][y: y + 4]) for y in range(17)) for x in range(20))
The above outputs 13260, for the particular grid created in the first line of code. It uses sum() three times. The innermost sum adds up the numbers in grid[x][y: y + 4], plus the slightly strange initial value sum = 1 shown in the code in the question. The middle sum adds up those values for the 17 possible y values. The outer sum adds up the middle values over possible x values.
If elements of grid are strings instead of numbers, replace
sum(grid[x][y: y + 4])
with
sum(int(n) for n in grid[x][y: y + 4]
You can use a dictionary to optimize performance significantly
This is another example:
locations = {}
for i in range(len(airports)):
locations[airports["abb"][i][1:-1]] = (airports["height"][i], airports["width"][i])
for i in range(len(uniqueData)):
h, w = locations[uniqueData["dept_apt"][i]]
uniqueData["dept_apt_height"][i] = h
uniqueData["dept_apt_width"][i] = w
Related
Here I'm calculating the Pearson correlation such that I'm accounting for every comparison.
x = pd.DataFrame({'a':[3,6,4,7,9],'b':[6,2,4,1,5],'c':[7,9,1,2,9]},index=['aa','bb','cc','dd','ee']).T
y = pd.DataFrame({'A':[9,4,1,3,5],'B':[9,8,9,5,7],'C':[1,1,3,1,2]},index=['aa','bb','cc','dd','ee']).T
table = pd.DataFrame(columns=['Correlation Coeff'])
for i in range(0, len(x)):
for j in range(0, len(y)):
xf = list(x.iloc[i])
yf = list(y.iloc[j])
n = np.corrcoef(xf,yf)[0,1]
name = x.index[i]+'|'+y.index[j]
table.at[name, 'Correlation Coeff'] = n
table
This is the result:
Correlation Coeff
a|A -0.232973
a|B -0.713392
a|C -0.046829
b|A 0.601487
b|B 0.662849
b|C 0.29654
c|A 0.608993
c|B 0.16311
c|C -0.421398
Now when I just apply these tables directly to numpy's function, removing duplicate values and 'ones' it looks this this.
x = pd.DataFrame({'a':[3,6,4,7,9],'b':[6,2,4,1,5],'c':[7,9,1,2,9]},index=['aa','bb','cc','dd','ee']).T.to_numpy()
y = pd.DataFrame({'A':[9,4,1,3,5],'B':[9,8,9,5,7],'C':[1,1,3,1,2]},index=['aa','bb','cc','dd','ee']).T.to_numpy()
n = np.corrcoef(x,y)
n = n.tolist()
n = [element for sub in n for element in sub]
# Rounding to ensure no duplicates are being picked up.
rnd = [round(num, 13) for num in n]
X = [i for i in rnd if i != 1]
X = list(dict.fromkeys(X))
X
[-0.3231828652987,
0.3157400783243,
-0.232972779074,
-0.7133922984085,
-0.0468292905791,
0.3196502842345,
0.6014868821052,
0.6628489803599,
0.2965401263095,
0.608993434846,
0.1631095635753,
-0.4213976904463,
0.2417468892076,
-0.5841782301194,
0.3674842076296]
There are 6 extra values (in bold) not accounted for. I'm assuming that they are correlation values calculated within a single matrix and if so, why? Is there a way to use this function without generating these additional values?
You are right in assuming that those are the correlations from variables within x and y, and so far as I can tell there is no way to turn this behaviour off.
You can see that this is true by looking at the implementation of numpy.corrcoef. As expected, most of the heavy lifting is being done by a separate function that computes covariance - if you look at the implementation of numpy.cov, particularly line 2639, you will see that, if you supply an additional y argument, this is simply concatenated onto x before computing the covariance matrix.
If necessary, it wouldn't be too hard to implement your own version of corrcoef that works how you want. Note that you can do this in pure numpy, which in most cases will be faster than the iterative approach from the example code above.
I have written a code based on the two pointer algorithm to find the sum of two squares. My problem is that I run into a memory error when running this code for an input n=55555**2 + 66666**2. I am wondering how to correct this memory error.
def sum_of_two_squares(n):
look=tuple(range(n))
i=0
j = len(look)-1
while i < j:
x = (look[i])**2 + (look[j])**2
if x == n:
return (j,i)
elif x < n:
i += 1
else:
j -= 1
return None
n=55555**2 + 66666**2
print(sum_of_two_squares(n))
The problem Im trying to solve using two pointer algorithm is:
return a tuple of two positive integers whose squares add up to n, or return None if the integer n cannot be so expressed as a sum of two squares. The returned tuple must present the larger of its two numbers first. Furthermore, if some integer can be expressed as a sum of two squares in several ways, return the breakdown that maximizes the larger number. For example, the integer 85 allows two such representations 7*7 + 6*6 and 9*9 + 2*2, of which this function must therefore return (9, 2).
You're creating a tuple of size 55555^2 + 66666^2 = 7530713581
So if each element of the tuple takes one byte, the tuple will take up 7.01 GiB.
You'll need to either reduce the size of the tuple, or possibly make each element take up less space by specifying the type of each element: I would suggest looking into Numpy for the latter.
Specifically for this problem:
Why use a tuple at all?
You create the variable look which is just a list of integers:
look=tuple(range(n)) # = (0, 1, 2, ..., n-1)
Then you reference it, but never modify it. So: look[i] == i and look[j] == j.
So you're looking up numbers in a list of numbers. Why look them up? Why not just use i in place of look[i] and remove look altogether?
As others have pointed out, there's no need to use tuples at all.
One reasonably efficient way of solving this problem is to generate a series of integer square values (0, 1, 4, 9, etc...) and test whether or not subtracting these values from n leaves you with a value that is a perfect square.
You can generate a series of perfect squares efficiently by adding successive odd numbers together: 0 (+1) → 1 (+3) → 4 (+5) → 9 (etc.)
There are also various tricks you can use to test whether or not a number is a perfect square (for example, see the answers to this question), but — in Python, at least — it seems that simply testing the value of int(n**0.5) is faster than iterative methods such as a binary search.
def integer_sqrt(n):
# If n is a perfect square, return its (integer) square
# root. Otherwise return -1
r = int(n**0.5)
if r * r == n:
return r
return -1
def sum_of_two_squares(n):
# If n can be expressed as the sum of two squared integers,
# return these integers as a tuple. Otherwise return <None>
# i: iterator variable
# x: value of i**2
# y: value we need to add to x to obtain (i+1)**2
i, x, y = 0, 0, 1
# If i**2 > n / 2, then we can stop searching
max_x = n >> 1
while x <= max_x:
r = integer_sqrt(n-x)
if r >= 0:
return (i, r)
i, x, y = i+1, x+y, y+2
return None
This returns a solution to sum_of_two_squares(55555**2 + 66666**2) in a fraction of a second.
You do not need the ranges at all, and certainly do not need to convert them into tuples. They take a ridiculous amount of space, but you only need their current elements, numbers i and j. Also, as the friendly commenter suggested, you can start with sqrt(n) to improve the performance further.
def sum_of_two_squares(n):
i = 1
j = int(n ** (1/2))
while i < j:
x = i * i + j * j
if x == n:
return j, i
if x < n:
i += 1
else:
j -= 1
Bear in mind that the problem takes a very long time to be solved. Be patient. And no, NumPy won't help. There is nothing here to vectorize.
Actually i need help to solve a problem related to multiplying 2D lists.
The problem is that I have two lists, a and b:
a = [[-0.104],[-0.047],[-0.046]]
b = [[0.183, 0.366, 0.456], [0.971, 0.156, 0.856]]
I want to multiply each element in a with the corresponding element in the first sub-list of b, such that:
(-0.104 * 0.183) + (-0.047 * 0.366) + (-0.046 * 0.456)
Then, I come again to multiply each element in a with the corresponding element in the second sub-list of b, such that:
(-0.104 * 0.971) + (-0.047 * 0.156) + (-0.046 * 0.856)
The result should be 2 elements.
So, I've implements my own code using Python, but unfortunately the code didn't work correctly.
So, I need some help to fix the error in my code.
The code is below:
a= [[-0.104],[-0.047],[-0.046]]
b= [[0.183, 0.366, 0.456], [0.971, 0.156, 0.856]]
sumR=0
res2=[]
for i in range(0, len(a)):
for j in range(0, len(b[0])):
for k in range(j):
r= (a[i][j]*b[k][j])
sumR=sumR+r
res2.append(round(sumR,6))
print(res2)
Your question is something common to programmers coming to Python from a different language.
Try using what Python's strength are, instead of writing C/Java/whatever in Python:
xss = [[-0.104],[-0.047],[-0.046]]
yss = [[0.183, 0.366, 0.456], [0.971, 0.156, 0.856]]
answers = [sum([xs[0] * y for xs, y in zip(xss, ys)]) for ys in yss]
print(answers)
(or, if you don't object to using Python's further strengths, i.e. its many great third party libraries, use something like numpy, like #GilPinsky suggests)
A bit of an explanation of the list comprehension: something like [ys for ys in yss] causes Python to loop over yss, assigning each value of yss to ys in turn, collecting the results in a list. You can of course apply an operation to ys to make it useful.
zip(xss, ys) pairs each element of xss with an element from ys and returns an iterable. [xs, y for in zip(xss, ys)] would get you a list of all the tuples from that combination. And so sum([xs[0] * y for xs, y in zip(xss, ys)]) gets you the sum of all the products of each pair from xss and ys. It has xs[0] because elements from xss are themselves lists and you're only interested in the first element of each.
I've renamed the variables to make it a bit easier to keep track of what's what. x is just some value, xs is a list of a number of values, xss is a list of such lists, etc. - similar for y and ys.
Not sure if this is the answer you are looking for: [-0.05721, -0.147692]
a= [[-0.104],[-0.047],[-0.046]]
b= [[0.183, 0.366, 0.456], [0.971, 0.156, 0.856]]
sumR=0
res2=[]
for i in range(0, len(b)):
sumR = 0
for j in range(0, len(a)):
# every list in a as only 1 element
r= (a[j][0]*b[i][j])
# print(a[j][0],b[i][j],end = " + ")
sumR=sumR+r
#print()
res2.append(round(sumR,6))
print(res2)
Uncomment the print statements to see how the calculation is going
A better solution will be to not use loops in this case.
Take note that what you are trying to implement is a matrix multiplication by a vector, so you can use numpy to do this efficiently as follows:
import numpy as np
a = np.array([[-0.104], [-0.047], [-0.046]])
b = np.array([[0.183, 0.366, 0.456], [0.971, 0.156, 0.856]])
res2 = (b # a).tolist()
I want to write a list comprehension that will give out Fibonacci number until the number 4 millions. I want to add that to list comprehension and sum evenly spaced terms.
from math import sqrt
Phi = (1 + sqrt(5)) / 2
phi = (1 - sqrt(5)) / 2
series = [int((Phi**n - phi**n) / sqrt(5)) for n in range(1, 10)]
print(series)
[1, 1, 2, 3, 5, 8, 13, 21, 34]
This is a sample code that works and I want to write similar code using list comprehension. Please do help.
a, b = 1, 1
total = 0
while a <= 4000000:
if a % 2 == 0:
total += a
a, b = b, a+b
print(total)
Since there's no actual list required for what you need to do, it's a bit wasteful having a list comprehension. Far better would be to just provide a function to do all the heavy lifting for you, something like:
def sumEvenFibsBelowOrEqualTo(n):
a, b = 1, 1
total = 0
while a <= n:
if a % 2 == 0:
total += a
a, b = b, a + b
return total
Then just call it with print(sumEvenFibsBelowOrEqualTo(4000000)).
If you really do want a list of Fibonacci numbers (perhaps you want to run different comprehensions on it), you can make a small modification to do this - this returns a list rather than the sum of the even values:
def listOfFibsBelowOrEqualTo(n):
a, b = 1, 1
mylist = []
while a <= n:
mylist.append(a)
a, b = b, a + b
return mylist
You can then use the following list comprehension to sum the even ones:
print(sum([x for x in listOfFibsBelowOrEqualTo(4000000) if x % 2 == 0]))
This is probably not too bad given that the Fibonacci numbers get very big very fast (so the list won't be that big) but, for other sequences that don't do that (or for much larger limits), constructing a list may use up large chunks of memory unnecessarily.
A better method may be to use a generator which, if you want a list, you can always construct one from it. But, if you don't need a list, you can still use it in list comprehensions:
def fibGen(limit):
a, b = 1, 1
while a <= limit:
yield a
a, b = b, a + b
mylist = list(fibGen(4000000)) # a list
print(sum([x for x in fibGen(4000000) if x % 2 == 0])) # sum evens, no list
A list comprehension is by its nature a parallel process; it's a process in which an input iterable is fed in, some function is applied to each element, and an output list is created. When this function is applied, it is applied to each element independently of other elements. Thus, list comprehensions are not suitable to iterative algorithms such as the one you present. It could be used in your closed-form formula:
sum([int((Phi**n - phi**n) / sqrt(5)) for n in range(1, 10) if int((Phi**n - phi**n) / sqrt(5))%2 == 0])
If you want to use an iterative algorithm, a generator is more suitable.
I am trying to run the following script. My intention was to get the output which will give one high intensity gauss in the middle followed by two small gauss on both side of the big one. It is a Fourier summation of all the y values taking different n values each time and plotting them against x. But somehow, I am not getting the desired result. Some help would be appreciated. the code-
from pylab import *
n = 6
D = 6
x = linspace(-3, 3, 13000)
y = [1, 1, 1, 1, 1]
F = []
for i in range(1,n):
F=sum((item*cos(2*pi*i*x/D)for item in y))
plot(x,F,'r')
show()
You are reassigning F each time through the loop. This might be your problem, since at the end of your loop for i in range(1,n) you have F as a number, and not as a list of numbers.
To create a list of F values, simply change F = sum(....) to F.append(sum(...)) and you will end up with a list of values at the end.
Also please note that range(1, n) is the range from 1 to n-1. This may or may not be what you desired.