python comprehension with multiple 'for' clauses and single 'if' - python

Imagine a discrete x,y,z space : I am trying to create an iterator which will return all points which lie within a sphere of some radial distance from a point.
My approach was to first look at all points within a larger cube which is guaranteed to contain all the points needed and then cull or skip points which are too far away.
My first attempt was:
x,y,z=(0,0,1)
dist=2
#this doesn't work
it_0=((x+xp,y+yp,z+zp) for xp in range(-dist,dist+1) for yp in range(-dist,dist+1) for zp in range(-dist,dist+1) if ( ((x-xp)**2+(y-yp)**2+(z-zp)**2) <= dist**2+sys.float_info.epsilon ) )
a simple
for d,e,f in it_0:
#print(d,e,f)
print( ((x-d)**2+(y-e)**2+(z-f)**2) <= dist**2+sys.float_info.epsilon, d,e,f)
verifies that it_0 does not produce correct results. I believe it is applying the conditional only to the third (ie: z) 'for' clause
The following works:
it_1=((x+xp,y+yp,z+zp) for xp in range(-dist,dist+1) for yp in range(-dist,dist+1) for zp in range(-dist,dist+1))
it_2=filter( lambda p: ((x-p[0])**2+(y-p[1])**2+(z-p[2])**2) <= dist**2+sys.float_info.epsilon, it_1)
It collects all the points, then filter those which don't fit the conditional.
I was hoping there might be a way to correct the first attempted implementation, or make these expressions more readable or compact.

First of all, I suggest you replace the triply-nested for loop with itertools.product(), like so:
import itertools as it
it_1 = it.product(range(-dist, dist+1), repeat=3)
If you are using Python 2.x, you should use xrange() here instead of range().
Next, instead of using filter() you could just use a generator expression:
it_2=(x, y, z for x, y, z in it_1 if ((x-p[0])**2+(y-p[1])**2+(z-p[2])**2) <= dist**2+sys.float_info.epsilon)
This would avoid some overhead in Python 2.x (since filter() builds a list), but for Python 3.x would be about the same; and even in Python 2.x you could use itertools.ifilter().
But for readability, I would package the whole thing up into a generator, like so:
import itertools as it
import sys
def sphere_points(radius=0, origin=(0,0,0), epsilon=sys.float_info.epsilon):
x0, y0, z0 = origin
limit = radius**2 + epsilon
for x, y, z in it.product(range(-radius, radius+1), repeat=3):
if (x**2 + y**2 + z**2) <= limit:
yield (x+x0, y+y0, z+z0)
I just changed the code from your original code. Each range for x, y, and z is adjusted to center around the origin point. When I test this code with a radius of 0, I correctly get back a single point, the origin point.
Note that I provided arguments to the function letting you specify radius, origin point, and even the value to use for epsilon, with defaults for each. I also unpacked the origin point tuple into explicit variables; I'm not sure if Python would optimize away the indexing operation or not, but this way we know there won't be any indexing going on inside the loop. (I think the Python compiler would probably hoist the limit calculation out of the loop, but I actually prefer it on its own line as shown here, for readability.)
I think the above is about as fast as you can write it in native Python, and I think it is a big improvement in readability.
P.S. This code would probably run a lot faster if it was redone using Cython.
http://cython.org/
EDIT: Code simplified as suggested by #eryksun in comments.

You've confused the meanings of xp, yp, zp in your generator expression:
it_0=((x+xp,y+yp,z+zp) for xp in range(-dist,dist+1) for yp in range(-dist,dist+1) for zp in range(-dist,dist+1) if ( ((x-xp)**2+(y-yp)**2+(z-zp)**2) <= dist**2+sys.float_info.epsilon ) )
xp, yp, and zp are already the distances from the center of the sphere along the various axes. Thus you should not be taking the difference from x, y, z again. The expression should be:
it_0=((x+xp,y+yp,z+zp) for xp in range(-dist,dist+1) for yp in range(-dist,dist+1) for zp in range(-dist,dist+1) if ( (xp**2+yp**2+zp**2) <= dist**2+sys.float_info.epsilon ) )

Your use of epsilon is a bit off. The documentation for sys.float_info describes it as the difference between 1 and the next representable float, so it is too small to make a difference on 2, let alone 2**2.
Secondly, all your points are measured from x,y,z in the filter/if then offset by it in the result expression (x+xp,y+yp,z+zp). You'll get a rather off-centre sphere that way, so try (xp,yp,zp).

Others have pointed out the logic errors. I'll address readability. Also note with the data given, there is no floating point involved, so an epsilon is not needed.
from itertools import product
from pprint import pprint
x,y,z = 0,0,1
r = 2
def points_in_circle(radius):
'''return a generator of all integral points in circle of given radius.'''
return ((x,y,z)
for x,y,z in product(range(-dist,dist+1),repeat=3)
if x**2 + y**2 + z**2 <= radius**2)
# List integral points of radius r around point (x,y,z).
pprint([(x+xp,y+yp,z+zp) for xp,yp,zp in points_in_circle(r)])
Output
[(-2, 0, 1),
(-1, -1, 0),
(-1, -1, 1),
(-1, -1, 2),
(-1, 0, 0),
(-1, 0, 1),
(-1, 0, 2),
(-1, 1, 0),
(-1, 1, 1),
(-1, 1, 2),
(0, -2, 1),
(0, -1, 0),
(0, -1, 1),
(0, -1, 2),
(0, 0, -1),
(0, 0, 0),
(0, 0, 1),
(0, 0, 2),
(0, 0, 3),
(0, 1, 0),
(0, 1, 1),
(0, 1, 2),
(0, 2, 1),
(1, -1, 0),
(1, -1, 1),
(1, -1, 2),
(1, 0, 0),
(1, 0, 1),
(1, 0, 2),
(1, 1, 0),
(1, 1, 1),
(1, 1, 2),
(2, 0, 1)]

Related

Generate grid of coordinate tuples

Assume a d-dimensional integer grid, containing n^d (n >= 1) points.
I am trying to write a function that takes the number of domain points n and the number of dimensions d and returns a set that contains all the coordinate points in the grid, as tuples.
Example: intGrid (n=2, dim=2) should return the set:
{(0,0), (0,1), (1,0), (1,1)}
Note: I cannot use numpy or any external imports.
Python has a good set of built-in modules that provides most of the basic functionality you will probably need to start getting your things done.
One of such good modules is itertools, where you will find all sorts of functions related to iterations and combinatorics. The perfect function for you is product, that you can use as below:
from itertools import product
def grid(n, dim):
return set(product(range(n), repeat=dim))
print(grid(2, 2))
# {(0, 0), (0, 1), (1, 0), (1, 1)}
print(grid(2, 3))
# {(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)}

Is it possible to vectorize scipy.optimize.fminbound?

I have some data points of a trajectory that is parameterized by time, and I want to know the shortest distance of each point to the curve fitted to them. There seem to be several ways to go about this (e.g. here or here), but I chose scipy.optimize.fminbound, as proposed here (because it seemed to be the slickest way of going about it, and because I actually got it to work):
import numpy as np
import pandas as pd
from scipy.optimize import fminbound
data = np.array(
[
(212.82275865, 1650.40828168, 0., 0),
(214.22056952, 1649.99898924, 10.38, 0),
(212.86786868, 1644.25228805, 116.288, 0),
(212.78680031, 1643.87461108, 122.884, 0),
(212.57489485, 1642.01124032, 156.313, 0),
(212.53483954, 1641.61858242, 162.618, 0),
(212.43922274, 1639.58782771, 196.314, 0),
(212.53726315, 1639.13842423, 202.619, 0),
(212.2888428, 1637.24641296, 236.306, 0),
(212.2722447, 1636.92307229, 242.606, 0),
(212.15559302, 1635.0529813, 276.309, 0),
(212.17535631, 1634.60618711, 282.651, 0),
(212.02545613, 1632.72139574, 316.335, 0),
(211.99988779, 1632.32053329, 322.634, 0),
(211.33419846, 1631.07592039, 356.334, 0),
(211.58972239, 1630.21971902, 362.633, 0),
(211.70332122, 1628.2088542, 396.316, 0),
(211.74610735, 1627.67591368, 402.617, 0),
(211.88819518, 1625.67310022, 436.367, 0),
(211.90709414, 1625.39410321, 442.673, 0),
(212.00090348, 1623.42655008, 476.332, 0),
(211.9249017, 1622.94540583, 482.63, 0),
(212.34321938, 1616.32949197, 597.329, 0),
(213.09638942, 1615.2869643, 610.4, 0),
(219.41313491, 1580.22234313, 1197.332, 0),
(220.38660128, 1579.20043302, 1210.37, 0),
(236.35472669, 1542.30863041, 1798.267, 0),
(237.41755384, 1541.41679119, 1810.383, 0),
(264.08373622, 1502.66620597, 2398.244, 0),
(265.65655239, 1501.64308908, 2410.443, 0),
(304.66999824, 1460.94068336, 2997.263, 0),
(306.22550945, 1459.75817211, 3010.38, 0),
(358.88879764, 1416.472238, 3598.213, 0),
(361.14046402, 1415.40942931, 3610.525, 0),
(429.96379858, 1369.7972467, 4198.282, 0),
(432.06565776, 1368.22265539, 4210.505, 0),
(519.30493383, 1319.01141844, 4798.277, 0),
(522.12134083, 1317.68234967, 4810.4, 0),
(630.00294242, 1265.05368942, 5398.236, 0),
(633.67624272, 1263.63633508, 5410.431, 0),
(766.29767476, 1206.91262814, 5997.266, 0),
(770.78300649, 1205.48393374, 6010.489, 0),
(932.92308019, 1145.87780431, 6598.279, 0),
(937.54373403, 1141.55438694, 6609.525, 0),
], dtype=[
('x', 'f8'), ('y', 'f8'), ('t', 'f8'), ('dmin', 'f8'),
]
)
# fyi my data comes as a structured array; unfortunately, simply passing
# data[['x', 'y']] to np.polyfit does not work. using
# pd.DataFrame(data[['x', y']]).values seems to be the easiest solution:
# https://stackoverflow.com/a/26175750/5472354
coeffs = np.polyfit(
data['t'], pd.DataFrame(data[['x', 'y']]).values, 3
)
def curve(t):
# this can probably also be done in one statement, but I don't know how
x = np.polyval(coeffs[:, 0], t)
y = np.polyval(coeffs[:, 1], t)
return x, y
def f(t, p):
x, y = curve(t)
return np.hypot(x - p['x'], y - p['y'])
# instead of this:
for point in data:
tmin = fminbound(f, -50, 6659.525, args=(point, ))
point['dmin'] = f(tmin, point)
# do something like this:
# tmin = fminbound(f, -50, 6659.525, args=(data, ))
# data['dmin'] = f(tmin, data)
But as you can see, I use a for-loop to calculate the shortest distances for each data point, which slows down my program significantly, as this is performed several thousand times. Thus I would like to vectorize the operation / improve the performance, but haven't found a way. There are related posts to this (e.g. here or here), but I don't know how the suggested solutions apply in my case.
No, it is not possible to vectorize fminbound since it expects a scalar function of one variable. However, you can still vectorize the loop by reformulating the underlying mathematical optimization problem:
The N scalar optimization problems
min f_1(t) s.t. t_l <= t <= t_u
min f_2(t) s.t. t_l <= t <= t_u
.
.
.
min f_N(t) s.t. t_l <= t <= t_u
for scalar functions f_i are equivalent to one optimization problem
in N variables:
min f_1(t_1)**2 + ... + f_N(t_N)**2 s.t. t_l <= t_i <= t_u for all i = 1, .., N
which can be solved by means of scipy.optimize.minimize. Depending on your whole algorithm, you could use this approach to further eliminate more loops, i.e. you only solve one large-scale optimization problem instead of multiple thousands of scalar optimization problems.
After cleaning up your code, this can be done as follows:
import numpy as np
from scipy.optimize import minimize
data = np.array([
(212.82275865, 1650.40828168, 0., 0),
(214.22056952, 1649.99898924, 10.38, 0),
(212.86786868, 1644.25228805, 116.288, 0),
(212.78680031, 1643.87461108, 122.884, 0),
(212.57489485, 1642.01124032, 156.313, 0),
(212.53483954, 1641.61858242, 162.618, 0),
(212.43922274, 1639.58782771, 196.314, 0),
(212.53726315, 1639.13842423, 202.619, 0),
(212.2888428, 1637.24641296, 236.306, 0),
(212.2722447, 1636.92307229, 242.606, 0),
(212.15559302, 1635.0529813, 276.309, 0),
(212.17535631, 1634.60618711, 282.651, 0),
(212.02545613, 1632.72139574, 316.335, 0),
(211.99988779, 1632.32053329, 322.634, 0),
(211.33419846, 1631.07592039, 356.334, 0),
(211.58972239, 1630.21971902, 362.633, 0),
(211.70332122, 1628.2088542, 396.316, 0),
(211.74610735, 1627.67591368, 402.617, 0),
(211.88819518, 1625.67310022, 436.367, 0),
(211.90709414, 1625.39410321, 442.673, 0),
(212.00090348, 1623.42655008, 476.332, 0),
(211.9249017, 1622.94540583, 482.63, 0),
(212.34321938, 1616.32949197, 597.329, 0),
(213.09638942, 1615.2869643, 610.4, 0),
(219.41313491, 1580.22234313, 1197.332, 0),
(220.38660128, 1579.20043302, 1210.37, 0),
(236.35472669, 1542.30863041, 1798.267, 0),
(237.41755384, 1541.41679119, 1810.383, 0),
(264.08373622, 1502.66620597, 2398.244, 0),
(265.65655239, 1501.64308908, 2410.443, 0),
(304.66999824, 1460.94068336, 2997.263, 0),
(306.22550945, 1459.75817211, 3010.38, 0),
(358.88879764, 1416.472238, 3598.213, 0),
(361.14046402, 1415.40942931, 3610.525, 0),
(429.96379858, 1369.7972467, 4198.282, 0),
(432.06565776, 1368.22265539, 4210.505, 0),
(519.30493383, 1319.01141844, 4798.277, 0),
(522.12134083, 1317.68234967, 4810.4, 0),
(630.00294242, 1265.05368942, 5398.236, 0),
(633.67624272, 1263.63633508, 5410.431, 0),
(766.29767476, 1206.91262814, 5997.266, 0),
(770.78300649, 1205.48393374, 6010.489, 0),
(932.92308019, 1145.87780431, 6598.279, 0),
(937.54373403, 1141.55438694, 6609.525, 0)])
# the coefficients
coeffs = np.polyfit(data[:, 2], data[:, 0:2], 3)
# the points
points = data[:, :2]
# vectorized version of your objective function
# i.e. evaluates f_1, ..., f_N
def f(t, points):
poly_x = np.polyval(coeffs[:, 0], t)
poly_y = np.polyval(coeffs[:, 1], t)
return np.hypot(poly_x - points[:, 0], poly_y - points[:, 1])
# the scalar objective function we want to minimize
def obj_vec(t, points):
vals = f(t, points)
return np.sum(vals**2)
# variable bounds
bnds = [(-50, 6659.525)]*len(points)
# solve the optimization problem
res = minimize(lambda t: obj_vec(t, points), x0=np.zeros(len(points)), bounds=bnds)
dmins = f(res.x, points)
In order to further accelerate the optimization, it's highly recommended to pass the exact gradient of the objective function to minimize. Currently, the gradient is approximated by finite differences, which is quite slow:
In [7]: %timeit res = minimize(lambda t: obj_vec(t, points), x0=np.zeros(len(points)), bounds=bnds)
91.5 ms ± 868 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Since the gradient can easily be computed by the chain rule, I'll leave it as an exercise for the reader :). By the chain rule, the gradient reads as
def grad(t, points):
poly_x = np.polyval(coeffs[:, 0], t)
poly_y = np.polyval(coeffs[:, 1], t)
poly_x_deriv = np.polyval(np.polyder(coeffs[:, 0], m=1), t)
poly_y_deriv = np.polyval(np.polyder(coeffs[:, 1], m=1), t)
return 2*poly_x_deriv*(poly_x - points[:, 0]) + 2*(poly_y_deriv)*(poly_y - points[:, 1])
and passing it to minimize significantly reduces the runtime:
In [9]: %timeit res = minimize(lambda t: obj_vec(t, points), jac=lambda t: grad(t, points), x0=np.zeros(len(points)),
...: bounds=bnds)
6.13 ms ± 63.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Finally, setting another starting point might also lead to fewer iterations, as you already noted in the comments.

When finding derivatives, how do you use the filter function to return only the terms whose derivatives are not multiplied by zero, in python 3?

I wrote a function that, given the terms of an equation, can find derivatives. However when one of the terms is a zero, the function breaks down. How would I use filter to make sure terms that are multiplied by zero don't return?
Here's my baseline code which works but doesn't include the filter yet:
def find_derivative(function_terms):
return [(function_terms[0][0]*function_terms[0][1], function_terms[0][1]-1),(function_terms[1][0]*function_terms[1][1], function_terms[1][1]-1)]
The function_terms[1][1]-1 reduces the power of the term of the derivative by 1.
It works like this.
Input:
# Represent each polynomial term with a tuple of (coefficient, power)
# f(x) = 4 x^3 - 3 x
four_x_cubed_minus_three_x = [(4, 3), (-3, 1)]
find_derivative(four_x_cubed_minus_three_x)
Output:
[(12, 2), (-3, 0)]
This is the correct answer of 12 x^2 - 3
But here it breaks down:
Input:
# f(x) = 3 x^2 - 11
three_x_squared_minus_eleven = [(3, 2), (-11, 0)]
find_derivative(three_x_squared_minus_eleven)
It is supposed to find the derivative, given the equation.
Output:
((6, 1), (0, -1))
This has a "ghost" term of 0 * x^(-1); I don't want this term printed.
Expected Output:
[(6, 1)]
You can use the filter() function to filter the list of tuples and then apply logic on the filtered list. Something like this should work.
filtered_terms = list(filter(lambda x: x[1]!=0, function_terms))
Now you have the tuples without constants. So rather than hard-coding derivatives, try looping through the list to get the derivative.
result = []
for term in filtered_terms:
result.append((term[0]*term[1], term[1]-1))
return result
There is a symbolic math solver in python called sympy. Maybe it can be useful for you.
from sympy import *
x = symbols('x')
init_printing(use_unicode=True)
equation = 4*x**3 -3*x
diff_equation = equation.diff()
solution = diff_equation.subs({x:2})
Two changes:
Make your routine iterate through the polynomial terms, handling them one at a time, rather than depending on having exactly two terms.
Apply the filtering to the individual term as you encounter it.
I've extended this to also eliminate anything with a zero coefficient, as well as a zero exponent. I added a test case with both of those and a negative exponent, since the symbolic differentiation theorem applies equally.
def find_derivative(function_terms):
return [(term[0]*term[1], term[1]-1)
for i, term in enumerate(function_terms)
if term[0] * term[1] != 0 ]
four_x_cubed_minus_three_x = [(4, 3), (-3, 1)]
print(find_derivative(four_x_cubed_minus_three_x) )
three_x_squared_minus_eleven = [(3, 2), (-11, 0)]
print(find_derivative(three_x_squared_minus_eleven) )
fifth_degree = [(1, 5), (-1, 4), (0, 3), (8, 2), (-16, 0), (1, -2)]
print(find_derivative(fifth_degree) )
Output:
[(12, 2), (-3, 0)]
[(6, 1)]
[(5, 4), (-4, 3), (16, 1), (-2, -3)]

Check if some elements in a matrix are cohesive

I have to write a very little Python program that checks whether some group of coordinates are all connected together (by a line, not diagonally). The next 2 pictures show what I mean. In the left picture all colored groups are cohesive, in the right picture not:
I've already made this piece of code, but it doesn't seem to work and I'm quite stuck, any ideas on how to fix this?
def cohesive(container):
co = container.pop()
container.add(co)
return connected(co, container)
def connected(co, container):
done = {co}
todo = set(container)
while len(neighbours(co, container, done)) > 0 and len(todo) > 0:
done = done.union(neighbours(co, container, done))
return len(done) == len(container)
def neighbours(co, container, done):
output = set()
for i in range(-1, 2):
if i != 0:
if (co[0] + i, co[1]) in container and (co[0] + i, co[1]) not in done:
output.add((co[0] + i, co[1]))
if (co[0], co[1] + i) in container and (co[0], co[1] + i) not in done:
output.add((co[0], co[1] + i))
return output
this is some reference material that should return True:
cohesive({(1, 2), (1, 3), (2, 2), (0, 3), (0, 4)})
and this should return False:
cohesive({(1, 2), (1, 4), (2, 2), (0, 3), (0, 4)})
Both tests work, but when I try to test it with different numbers the functions fail.
You can just take an element and attach its neighbors while it is possible.
def dist(A,B):return abs(A[0]-B[0]) + abs(A[1]-B[1])
def grow(K,E):return {M for M in E for N in K if dist(M,N)<=1}
def cohesive(E):
K={min(E)} # an element
L=grow(K,E)
while len(K)<len(L) : K,L=L,grow(L,E)
return len(L)==len(E)
grow(K,E) return the neighborhood of K.
In [1]: cohesive({(1, 2), (1, 3), (2, 2), (0, 3), (0, 4)})
Out[1]: True
In [2]: cohesive({(1, 2), (1, 4), (2, 2), (0, 3), (0, 4)})
Out[2]: False
Usually, to check if something is connected, you need to use disjoint set data structures, the more efficient variations include weighted quick union, weighted quick union with path compression.
Here's an implementation, http://algs4.cs.princeton.edu/15uf/WeightedQuickUnionUF.java.html which you can modify to your needs. Also, the implementation found in the book "The Design and Analysis of Computer Algorithms" by A. Aho, allows you to specify the name of the group that you add 2 connected elements to, so I think that's the modification you're looking for.(It just involves using 1 extra array which keeps track of group numbers).
As a side note, since disjoint sets usually apply to arrays, don't forget that you can represent an N by N matrix as an array of size N*N.
EDIT: just realized that it wasn't clear to me what you were asking at first, and I realized that you also mentioned that diagonal components aren't connected, in that case the algorithm is as follows:
0) Check if all elements refer to the same group.
1) Iterate through the array of pairs that represent coordinates in the matrix in question.
2) For each pair make a set of pairs that satisfies the following formula:
|entry.x - otherEntry.x| + |entry.y - otherEntry.y|=1.
'entry' refers to the element that the outer for loop is referring to.
3) Check if all of the sets overlap. That can be done by "unioning" the sets you're looking at, at the end if you get more than 1 set, then the elements are not cohesive.
The complexity is O(n^2 + n^2 * log(n)).
Example:
(0,4), (1,2), (1,4), (2,2), (2,3)
0) check that they are all in the same group:
all of them belong to group 5.
1) make sets:
set1: (0,4), (1,4)
set2: (1,2), (2,2)
set3: (0,4), (1,4) // here we suppose that sets are sorted, other than that it
should be (1,4), (0,4)
set4: (1,2), (2,2), (2,3)
set5: (2,2), (2,3)
2) check for overlap:
set1 overlaps with set3, so we get:
set1' : (0,4), (1,4)
set2 overlaps with set4 and set 5, so we get:
set2' : (1,2), (2,2), (2,3)
as you can see set1' and set2' don't overlap, hence you get 2 disjoint sets that are in the same group, so the answer is 'false'.
Note that this is inefficient, but I have no idea how to do it more efficiently, but this answers your question.
The logic in your connected function seems wrong. You make a todo variable, but then never change its contents. You always look for neighbours around the same starting point.
Try this code instead:
def connected(co, container):
done = {co}
todo = {co}
while len(todo) > 0:
co = todo.pop()
n = neighbours(co, container, done)
done = done.union(n)
todo = todo.union(n)
return len(done) == len(container)
todo is a set of all the points we are still to check.
done is a set of all the points we have found to be 4-connected to the starting point.
I'd tackle this problem differently... if you're looking for five exactly, that means:
Every coordinate in the line has to be neighbouring another coordinate in the line, because anything less means that coordinate is disconnected.
At least three of the coordinates have to be neighbouring another two or more coordinates in the line, because anything less and the groups will be disconnected.
Hence, you can just get the coordinate's neighbours and check whether both conditions are fulfilled.
Here is a basic solution:
def cells_are_connected(connections):
return all(c > 0 for c in connections)
def groups_are_connected(connections):
return len([1 for c in connections if c > 1]) > 2
def cohesive(coordinates):
connections = []
for x, y in coordinates:
neighbours = [(x-1, y), (x+1, y), (x, y-1), (x, y+1)]
connections.append(len([1 for n in neighbours if n in coordinates]))
return cells_are_connected(connections) and groups_are_connected(connections)
print cohesive([(1, 2), (1, 3), (2, 2), (0, 3), (0, 4)]) # True
print cohesive([(1, 2), (1, 4), (2, 2), (0, 3), (0, 4)]) # False
No need for a general-case solution or union logic. :) Do note that it's specific to the five-in-a-line problem, however.

Project Euler #101 - how to work around numpy polynomial overflow?

Project Euler #101
I just started learning Numpy and it so far looks pretty straightforward to me.
One thing I ran into is that when I evaluate the polynomial, the result is a int32, so an overflow would occur.
u = numpy.poly1d([1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1])
for i in xrange(1, 11):
print(i, u(i))
The results are:
(1, 1)
(2, 683)
(3, 44287)
(4, 838861)
(5, 8138021)
(6, 51828151)
(7, 247165843)
(8, 954437177)
(9, -1156861335)
(10, 500974499)
The last two items are clearly incorrect.
The work around I can think of is factoring the coefficients by 100
u = numpy.poly1d([0.01, -0.01, 0.01, -0.01, 0.01, -0.01, 0.01, -0.01, 0.01, -0.01, 0.01])
for i in xrange(1, 11):
print(i, int(u(i) * 100))
This time the results are correct
(1, 1)
(2, 682)
(3, 44286)
(4, 838860)
(5, 8138020)
(6, 51828151)
(7, 247165843)
(8, 954437177)
(9, 3138105961L)
(10, 9090909091L)
Is there a better way? Does Numpy allow me to change the data type? Thanks.
It wasn't the scaling by 100 that helped, but the fact that the numbers given were floats instead of ints, and thus had a higher range. Due to the floating-point calculations, there are some inaccuracies introduced to the calculations as you have seen.
You can specify the type manually like this:
u = numpy.poly1d(numpy.array([1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1], dtype=numpy.int64))
The calculations for this problem fit in 64-bit ints, so this will work.
The supported types are listed here.
Interjay posted a better answer while I was writing this up, but I figured you might want an alternative anyway.
Here's a simple implementation for the examples you showed:
class Poly(object):
def __init__(self, coefficients):
assert len(coefficients) > 0
self.coefficients = coefficients
def __call__(self, value):
total = self.coefficients[0]
for c in self.coefficients[1:]:
total = total * value + c
return total
along with some tests
assert Poly([5])(1) == 5
assert Poly([7])(1) == 7
assert Poly([2,3])(5) == 13
assert Poly([1,0,0,0,0])(-2) == 16
u = Poly([1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1])
for i in range(1, 11):
print (i, u(i))
and the rather useless
assert Poly([2,"!"])("Hello ") == "Hello Hello !"

Categories