Python, combinations, permutations without repeat - python

Python. I have two lists, same length. The idea is to build paired data (for regression analysis). I figured out loops and it look like this.
a=(1,3,5,7) #first list
b=(2,4,6,10) #second list
w=zip(a,b) #paired values from both lists
i=0
j=0
for each in w:
x= w[i]
for that in xrange(i,len(w)-1):
i+=1
print x, w[i]
j+=1
i=j
The output is as I was expected - first pair together with second, third.. so on, then second pair with the third, fourth.. and so on (skipping the combination between second pair and first pair, because it is kinda the same as combination of first and second pairs...)
(1, 2) (3, 4)
(1, 2) (5, 6)
(1, 2) (7, 10)
(3, 4) (5, 6)
(3, 4) (7, 10)
(5, 6) (7, 10) [..] and so on as I was expecting.
The question is - is there some other, shorter, optimized ways how to rewrite this code, maybe using itertools?

You can use itertools.combinations with itertools.izip:
>>> from itertools import izip, combinations
>>> for a, b in combinations(izip(a, b), 2):
print a, b
...
(1, 2) (3, 4)
(1, 2) (5, 6)
(1, 2) (7, 10)
(3, 4) (5, 6)
(3, 4) (7, 10)
(5, 6) (7, 10)

Yes: itertools.combinations
print list(itertools.combinations(w, 2))
You mention this in your question - I'm not sure why you wouldn't look in the docs before asking on StackOverflow.

Related

Getting the correct max value from a list of tuples

My list of tuples look like this:
[(0, 0), (3, 0), (3, 3), (0, 3), (0, 0), (0, 6), (3, 6), (3, 9), (0, 9), (0, 6), (6, 0), (9, 0), (9, 3), (6, 3), (6, 0), (0, 3), (3, 3), (3, 6), (0, 6), (0, 3)]
It has the format of (X, Y) where I want to get the max and min of all Xs and Ys in this list.
It should be min(X)=0, max(X)=9, min(Y)=0, max(Y)=9
However, when I do this:
min(listoftuples)[0], max(listoftuples)[0]
min(listoftuples)[1], max(listoftuples)[1]
...for the Y values, the maximum value shown is 3 which is incorrect.
Why is that?
for the Y values, the maximum value shown is 3
because max(listoftuples) returns the tuple (9, 3), so max(listoftuples)[0] is 9 and max(listoftuples)[1] is 3.
By default, iterables are sorted/compared based on the values of the first index, then the value of the second index, and so on.
If you want to find the tuple with the maximum value in the second index, you need to use key function:
from operator import itemgetter
li = [(0, 0), (3, 0), ... ]
print(max(li, key=itemgetter(1)))
# or max(li, key=lambda t: t[1])
outputs
(3, 9)
Here is a simple way to do it using list comprehensions:
min([arr[i][0] for i in range(len(arr))])
max([arr[i][0] for i in range(len(arr))])
min([arr[i][1] for i in range(len(arr))])
max([arr[i][1] for i in range(len(arr))])
In this code, I have used a list comprehension to create a list of all X and all Y values and then found the min/max for each list. This produces your desired answer.
The first two lines are for the X values and the last two lines are for the Y values.
Tuples are ordered by their first value, then in case of a tie, by their second value (and so on). That means max(listoftuples) is (9, 3). See How does tuple comparison work in Python?
So to find the highest y-value, you have to look specifically at the second elements of the tuples. One way you could do that is by splitting the list into x-values and y-values, like this:
xs, ys = zip(*listoftuples)
Or if you find that confusing, you could use this instead, which is roughly equivalent:
xs, ys = ([t[i] for t in listoftuples] for i in range(2))
Then get each of their mins and maxes, like this:
x_min_max, y_min_max = [(min(L), max(L)) for L in (xs, ys)]
print(x_min_max, y_min_max) # -> (0, 9) (0, 9)
Another way is to use NumPy to treat listoftuples as a matrix.
import numpy as np
a = np.array(listoftuples)
x_min_max, y_min_max = [(min(column), max(column)) for column in a.T]
print(x_min_max, y_min_max) # -> (0, 9) (0, 9)
(There's probably a more idiomatic way to do this, but I'm not super familiar with NumPy.)

Need to output the top 3 outcomes of rolling 2 dice, which have a variable number of sides

The following code computes the probability distribution of outcomes of rolling two dice with a variable number of equal sides:
def compute_probability_distribution(sides):
dist = {x+y: 0 for x in range(1, sides+1) for y in range(1, sides+1)}
for die_1 in range(1, sides+1):
for die_2 in range(1, sides+1):
dist[die_1+die_2] = dist[die_1+die_2] + 1
probs = dist.items()
print "Prob dist: ", probs
E.g., for ordinary 6-sided dice the prob dist is [(2,6),(3,2),(4,3),(5,4),(6,5),(7,6),(8,5),)(9,4),(10,3),(11,2),(12,1)], where the first element of each tuple is the sum of the 2 dice, and the second element is the number of ways it can occur on one roll. Can anyone tell me how to sort the above prob dist list by the second element of each tuple so I can output the top (1 or 3) most likely occurrences? I am thinking of using the built-in list sort with some sort of comparison function.
probs = [(2,6),(3,2),(4,3),(5,4),(6,5),(7,6),(8,5),(9,4),(10,3),(11,2),(12,1)]
>>> sorted(probs, key=lambda x: x[1]) # x[1] is second element of tuple pair.
[(12, 1),
(3, 2),
(11, 2),
(4, 3),
(10, 3),
(5, 4),
(9, 4),
(6, 5),
(8, 5),
(2, 6),
(7, 6)]
You can do it with a nested comprehensions but you can also compute the most common values by hand if you know the number of sides.
In order of increasing probability:
2 and sides+sides which only have one "chance".
Then 3 and sides+nsides-1 which have 2.
4 and sides+nsides-2 have 3
...
Finally sides+1 has the highest probability which is just sides.
If you don't trust me look at the probability distribution for different numbers of sides.
So to get the 3 most common values, you can simply calculate them based on the number of sides:
def compute_probability_distribution(sides):
print([(sides+1, sides), (sides, sides-1), (sides+2, sides-1)])
However, that only works for dices with at least 2 sides. For a single side dice the result will be weird with this function.
I would simply use the data structure that is designed for this: a Counter:
from collections import Counter
def compute_probability_distribution(sides):
dist = Counter(die_1 + die_2 for die_1 in range(1, sides+1)
for die_2 in range(1, sides+1))
probs = dist.most_common(3)
print "Prob dist: ", probs
For two 6-dices, this then will produce:
>>> compute_probability_distribution(6)
Prob dist: [(7, 6), (6, 5), (8, 5)]
So we obtained six times a sum of seven; five times a sum of six; and five times a sum of eight.
In case you want to make the number of dices arbitrary, you can use:
from collections import Counter
from itertools import product
def compute_probability_distribution(sides,ndices=2,common=3):
dist = Counter(sum(d) for d in product(range(1,sides+1),repeat=ndices))
probs = dist.most_common(common)
print "Prob dist: ", probs
So now we can calculate the 10 most common sums when we roll three 5-dices:
>>> compute_probability_distribution(5,3,10)
Prob dist: [(9, 19), (8, 18), (10, 18), (7, 15), (11, 15), (6, 10), (12, 10), (5, 6), (13, 6), (4, 3)]

Issue with python recursion

I have the following code written in python 2.7 to find n time Cartesian product of a set (AxAxA...xA)-
prod=[]
def cartesian_product(set1,set2,n):
if n>=1:
for x in set1:
for y in set2:
prod.append('%s,%s'%(x,y))
#prod='[%s]' % ', '.join(map(str, prod))
#print prod
cartesian_product(set1,prod,n-1)
else:
print prod
n=raw_input("Number of times to roll: ")
events=["1","2","3","4","5","6"]
cartesian_product(events,events,1)
This works properly when n=1. But changing the parameter value from cartesian_product(events,events,1) to cartesian_product(events,events,2) doesn't work. Seems there's an infinite loop is running. I can't figure where exactly I'm making a mistake.
When you pass the reference to the global variable prod to the recursive call, you are modifying the list that set2 also references. This means that set2 is growing as you iterate over it, meaning the iterator never reaches the end.
You don't need a global variable here. Return the computed product instead.
def cartesian_product(set1, n):
# Return a set of n-tuples
rv = set()
if n == 0:
# Degenerate case: A^0 == the set containing the empty tuple
rv.add(())
else:
rv = set()
for x in set1:
for y in cartesian_product(set1, n-1):
rv.add((x,) + y)
return rv
If you want to perserve the order of the original argument, use rv = [] and rv.append instead.
def cartesian_product(*X):
if len(X) == 1: #special case, only X1
return [ (x0, ) for x0 in X[0] ]
else:
return [ (x0,)+t1 for x0 in X[0] for t1 in cartesian_product(*X[1:]) ]
n=int(raw_input("Number of times to roll: "))
events=[1,2,3,4,5,6]
prod=[]
for arg in range(n+1):
prod.append(events)
print cartesian_product(*prod)
Output:
Number of times to roll: 1
[(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)]
you can also pass string in your events list but it'll print string in tuple also.
inside the recursive call cartesian_product(set1,prod,n-1) you are passing the list prod, and you are again appending values to it, so it just grows over time and the inner loop never terminates. Perhaps you might need to change your implementation.

Find maximum equidistant points on a line

I need an algorithm to find maximum no of equidistant points on the same line.
Input: List of collinear points
For example: My points could be
[(1, 1), (1, 2), (1, 3)]
In this case what I could do is sort the points based on their distance from origin and find the distance sequentially. However, in a scenario such as below the condition is failing. All the points are on the same line y=-x+6, and are equidistant from each other.
[(3, 3), (2, 4), (4, 2), (5, 1), (1, 5)]
because all the points are equidistant from origin, and sorting order could be anything so sequential traversal is not possible.
For example, if final dictionary become this [(3, 3), (5, 1), (4, 2), (2, 4), (1,5)] we would end up calculating distance between (3,3) and (5,1), which is not correct. Ideally, I would want to calculate the distance between closest points so the order should be (1,5), (2,4).
To overcome this problem I created a O(n*n) solution by iterating using 2 loops, and finding frequency of minimum distance between any 2 points:
import sys
distance_list=[]
lop=[(1, 3), (2, 4), (3, 5), (4, 6), (10, 12), (11, 13), (12, 14), (13, 15), (14, 16)]
lop.sort(key=lambda x: x[0]*x[0] + x[1]*x[1])
for k in range(0, len(lop)):
min_dist=sys.maxint
for l in range(0, len(lop)):
if k!=l:
temp_dist = ( (lop[k][0] - lop[l][0])*(lop[k][0] - lop[l][0]) + (lop[k][1] - lop[l][1])*(lop[k][1] - lop[l][1]) )
min_dist= min(min_dist, temp_dist)
distance_list.append(min_dist)
print distance_list.count (max(distance_list,key=distance_list.count))
However, above solution failed for below test case:
[(1, 3), (2, 4), (3, 5), (4, 6), (10, 12), (11, 13), (12, 14), (13, 15), (14, 16)]
Expected answer should be: 5
However, I'm getting: 9
Essentially, I am not able to make sure, how do I do distinction between 2 cluster of points which contain equidistant points; In the above example that would be
[(1, 3), (2, 4), (3, 5), (4, 6)] AND [(10, 12), (11, 13), (12, 14), (13, 15), (14, 16)]
If you want to put the points in order, you don't need to sort them by distance from anything. You can just sort them by the default lexicographic order, which is consistent with the order along the line:
lop.sort()
Now you just need to figure out how to find the largest set of equidistant points. That could be tricky, especially if you're allowed to skip points.
because you want the distance of consecutive points, there is no need to calculate all combinations, you just need to calculate the distance of (p0,p1), (p1,p2), (p2,p3), and so on, and group those pairs in that order by the value of their distance, once you have done that, you just need the longest sequence among those, to do that the itertools module come in handy
from itertools import groupby, tee, izip
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
def distance(a,b):
ax,ay = a
bx,by = b
return (ax-bx)**2 + (ay-by)**2
def longest_seq(points):
groups = [ list(g) for k,g in groupby(pairwise(points), lambda p:distance(*p)) ]
max_s = max(groups,key=len) # this is a list of pairs [(p0,p1), (p1,p2), (p2,p3),..., (pn-1,pn)]
ans = [ p[0] for p in max_s ]
ans.append( max_s[-1][-1] ) # we need to include the last point manually
return ans
here the goupby function group together consecutive pairs of points that have the same distance, pairwise is a recipe to do the desire pairing, and the rest is self explanatory.
here is a test
>>> test = [(1, 3), (2, 4), (3, 5), (4, 6), (10, 12), (11, 13), (12, 14), (13, 15), (14, 16)]
>>> longest_seq(test)
[(10, 12), (11, 13), (12, 14), (13, 15), (14, 16)]
>>>

List coordinates between a set of coordinates

This should be fairly easy, but I'm getting a headache from trying to figure it out. I want to list all the coordinates between two points. Like so:
1: (1,1)
2: (1,3)
In between: (1,2)
Or
1: (1,1)
2: (5,1)
In between: (2,1), (3,1), (4,1)
It does not need to work with diagonals.
You appear to be a beginning programmer. A general technique I find useful is to do the job yourself, on paper, then look at how you did it and translate that to a program. If you can't see how, break it down into simpler steps until you can.
Depending on how you want to handle the edge cases, this seems to work:
def points_between(p1, p2):
xs = range(p1[0] + 1, p2[0]) or [p1[0]]
ys = range(p1[1] + 1, p2[1]) or [p1[1]]
return [(x,y) for x in xs for y in ys]
print points_between((1,1), (5,1))
# [(2, 1), (3, 1), (4, 1)]
print points_between((5,6), (5,12))
# [(5, 7), (5, 8), (5, 9), (5, 10), (5, 11)]

Categories