Related
I want to use python to define one set in terms of another, as follows: For some set N that consists of sets, define C as the set such that an element n of N is in C just in case any two elements of n both satisfy some specific condition.
Here is the particular problem I need to solve. Consider the power set N of the set of ordered pairs of elements in x={1,2,3,4,5,6}, and the following subsets of N:
i = {{1,1},{2,2},{3,4},{4,3},{5,6},{6,5}}
j = {{3,3},{4,4},{1,2},{2,1},{5,6},{6,5}}
k = {{5,5},{6,6},{1,2},{2,1},{3,4},{4,3}}
Using python, I want to define a special class of subsets of N: the subsets of N such that any two of their members are both either in i, j, or k.
More explicitly, I want to define the set terms: C = {n in N| for all a, b in n, either a and b are both in i or a and b are both in j or a and b are both in k}.
I'm attaching what I tried to do in Python. But this doesn't give me the right result: the set C I'm defining here is not such that any two of its members are both either in i, j, or k.
Any leads would be much appreciated!
import itertools
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
x = [1,2,3,4,5,6]
ordered_pairs = [[j,k] for j in x for k in x if k>=j]
powers = list(powerset(ordered_pairs))
i = [[1,1],[2,2],[3,4],[4,3],[5,6],[6,5]]
j = [[3,3],[4,4],[1,2],[2,1],[5,6],[6,5]]
k = [[5,5],[6,6],[1,2],[2,1],[3,4],[4,3]]
M = [i,j,k]
C = []
for n in powers:
for a in n:
for b in n:
for m in M:
if a in m:
if b in m:
if a != b:
C.append(n)
if len(n) == 1:
C.append(n)
First of all, note that the ordered pairs you list are sets, not pairs. Use tuples, since they're hashable, and you'll be able to easily generate the power set using itertools. With that done, you have an easier time identifying the qualifying subsets.
The code below implements that much of the process. You can accumulate the hits at the HIT line of code. Even better, you can collapse the loop into a nested comprehension using any to iterate over the three zone sets.
test_list = [
set(((1,1),(2,2))), # trivial hit on i
set(), # trivial miss
set(((1, 1), (4, 4), (6, 6))), # one element in each target set
set(((3, 3), (6, 2), (4, 4), (2, 2))), # two elements in j
]
i = set(((1,1),(2,2),(3,4),(4,3),(5,6),(6,5)))
j = set(((3,3),(4,4),(1,2),(2,1),(5,6),(6,5)))
k = set(((5,5),(6,6),(1,2),(2,1),(3,4),(4,3)))
zone = [i, j, k]
for candidate in test_list:
for target in zone:
overlap = candidate.intersection(target)
if len(overlap) >= 2:
print("HIT", candidate, target)
break
else:
print("MISS", candidate)
Output:
HIT {(1, 1), (2, 2)} {(5, 6), (4, 3), (2, 2), (3, 4), (1, 1), (6, 5)}
MISS set()
MISS {(4, 4), (1, 1), (6, 6)}
HIT {(6, 2), (3, 3), (4, 4), (2, 2)} {(1, 2), (3, 3), (5, 6), (4, 4), (2, 1), (6, 5)}
I have a circle-growth algorithm (line-growth with closed links) where new points are added between existing points at each iteration.
The linkage information of each point is stored as a tuple in a list. That list is updated iteratively.
QUESTIONS:
What would be the most efficient way to return the spatial order of these points as a list ?
Do I need to compute the whole order at each iteration or is there a way to cumulatively insert the new points in a orderly manner into that list ?
All I could come up with is the following:
tuples = [(1, 4), (2, 5), (3, 6), (1, 6), (0, 7), (3, 7), (0, 8), (2, 8), (5, 9), (4, 9)]
starting_tuple = [e for e in tuples if e[0] == 0 or e[1] == 0][0]
## note: 'starting_tuple' could be either (0, 7) or (0, 8), starting direction doesn't matter
order = list(starting_tuple) if starting_tuple[0] == 0 else [starting_tuple[1], starting_tuple[0]]
## order will always start from point 0
idx = tuples.index(starting_tuple)
## index of the starting tuple
def findNext():
global idx
for i, e in enumerate(tuples):
if order[-1] in e and i != idx:
ind = e.index(order[-1])
c = 0 if ind == 1 else 1
order.append(e[c])
idx = tuples.index(e)
for i in range(len(tuples)/2):
findNext()
print order
It is working but it is neither elegant (non pythonic) nor efficient.
It seems to me that a recursive algorithm may be more suitable but unfortunately I don't know how to implement such solution.
Also, please note that I'm using Python 2 and can have access to full python packages only (no numpy)
Rather than recursion, this seems more like a dictionary and generator problem to me:
from collections import defaultdict
def findNext(tuples):
previous = 0
yield previous # our first result
dictionary = defaultdict(list)
# [(1, 4), (2, 5), (3, 6), ...] -> {0: [7, 8], 1: [4, 6], 2: [5, 8], ...}
for a, b in tuples:
dictionary[a].append(b)
dictionary[b].append(a)
current = dictionary[0][0] # dictionary[0][1] should also work
yield current # our second result
while True:
a, b = dictionary[current] # possible connections
following = a if a != previous else b # only one will move us forward
if following == 0: # have we come full circle?
break
yield following # our next result
previous, current = current, following # reset for next iteration
tuples = [(1, 4), (2, 5), (3, 6), (1, 6), (7, 0), (3, 7), (8, 0), (2, 8), (5, 9), (4, 9)]
generator = findNext(tuples)
for n in generator:
print n
OUTPUT
% python test.py
0
7
3
6
1
4
9
5
2
8
%
Algorithm currently assumes we have more than two nodes.
Since the nodes only link to two other nodes, you can bin them by number, then follow the numbers around. This is O(n) sorting, which is pretty solid, but it's not a true sort in the <,>,= sense.
def bin_nodes(node_list):
#figure out the in and out nodes for each node, and put those into a dictionary.
node_bins = {} #init the bins
for node_pair in node_list: #go once through the list
for i in range(len(node_pair)): #put each node into the other's bin
if node_pair[i] not in node_bins: #initialize the bin dictionary for unseen nodes
node_bins[node_pair[i]] = []
node_bins[node_pair[i]].append(node_pair[(i+1)%2])
return node_bins
def sort_bins(node_bins):
#go from bin to bin, following the numbers
nodes = [0]*len(node_bins) #allocate a list
nodes[0] = next(iter(node_bins)) #pick an arbitrary one to start
nodes[1] = node_bins[nodes[0]][0] #pick a direction to go
for i in range(2, len(node_bins)):
#one of the two nodes in the bin is the horse we rode in on.
#The other is the next stop.
j = 1 if node_bins[nodes[i-1]][0] == nodes[i-2] else 0 #figure out which one ISN"T the one we came in on
nodes[i] = node_bins[nodes[i-1]][j] #pick the next node, then go to its bin, rinse repeat
return nodes
if __name__ == "__main__":
#test
test = [(1,2),(3,4),(2,4),(1,3)] #should give 1,3,4,2 or some rotation or reversal thereof
print(bin_nodes(test))
print(sort_bins(bin_nodes(test)))
I have seen some similar answers, but I can't find something specific for this case.
I have a list of tuples:
[(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)]
What I want is to remove tuples from this list only when first element of tuple has occurred previously in the list and the tuple which remains should have the smallest second element.
So the output should look like this:
[(5, 0), (3, 1), (6, 4)]
Here's a linear time approach that requires two iterations over your original list.
t = [(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)] # test case 1
#t = [(5, 3), (3, 1), (3, 2), (5, 0), (6, 4)] # test case 2
smallest = {}
inf = float('inf')
for first, second in t:
if smallest.get(first, inf) > second:
smallest[first] = second
result = []
seen = set()
for first, second in t:
if first not in seen and second == smallest[first]:
seen.add(first)
result.append((first, second))
print(result) # [(5, 0), (3, 1), (6, 4)] for test case 1
# [(3, 1), (5, 0), (6, 4)] for test case 2
Here is a compact version I came up with using OrderedDict and skipping replacement if new value is larger than old.
from collections import OrderedDict
a = [(5, 3), (3, 1), (3, 2), (5, 0), (6, 4)]
d = OrderedDict()
for item in a:
# Get old value in dictionary if exist
old = d.get(item[0])
# Skip if new item is larger than old
if old:
if item[1] > old[1]:
continue
#else:
# del d[item[0]]
# Assign
d[item[0]] = item
list(d.values())
Returns:
[(5, 0), (3, 1), (6, 4)]
Or if you use the else-statement (commented out):
[(3, 1), (5, 0), (6, 4)]
Seems to me that you need to know two things:
The tuple that has the smallest second element for each first element.
The order to index each first element in the new list
We can get #1 by using itertools.groupby and a min function.
import itertools
import operator
lst = [(3, 1), (5, 3), (5, 0), (3, 2), (6, 4)]
# I changed this slightly to make it harder to accidentally succeed.
# correct final order should be [(3, 1), (5, 0), (6, 4)]
tmplst = sorted(lst, key=operator.itemgetter(0))
groups = itertools.groupby(tmplst, operator.itemgetter(0))
# group by first element, in this case this looks like:
# [(3, [(3, 1), (3, 2)]), (5, [(5, 3), (5, 0)]), (6, [(6, 4)])]
# note that groupby only works on sorted lists, so we need to sort this first
min_tuples = {min(v, key=operator.itemgetter(1)) for _, v in groups}
# give the best possible result for each first tuple. In this case:
# {(3, 1), (5, 0), (6, 4)}
# (note that this is a set comprehension for faster lookups later.
Now that we know what our result set looks like, we can re-tackle lst to get them in the right order.
seen = set()
result = []
for el in lst:
if el not in min_tuples: # don't add to result
continue
elif el not in seen: # add to result and mark as seen
result.append(el)
seen.add(el)
This will do what you need:
# I switched (5, 3) and (5, 0) to demonstrate sorting capabilities.
list_a = [(5, 3), (3, 1), (3, 2), (5, 0), (6, 4)]
# Create a list to contain the results
list_b = []
# Create a list to check for duplicates
l = []
# Sort list_a by the second element of each tuple to ensure the smallest numbers
list_a.sort(key=lambda i: i[1])
# Iterate through every tuple in list_a
for i in list_a:
# Check if the 0th element of the tuple is in the duplicates list; if not:
if i[0] not in l:
# Add the tuple the loop is currently on to the results; and
list_b.append(i)
# Add the 0th element of the tuple to the duplicates list
l.append(i[0])
>>> print(list_b)
[(5, 0), (3, 1), (6, 4)]
Hope this helped!
Using enumerate() and list comprehension:
def remove_if_first_index(l):
return [item for index, item in enumerate(l) if item[0] not in [value[0] for value in l[0:index]]]
Using enumerate() and a for loop:
def remove_if_first_index(l):
# The list to store the return value
ret = []
# Get the each index and item from the list passed
for index, item in enumerate(l):
# Get the first number in each tuple up to the index we're currently at
previous_values = [value[0] for value in l[0:index]]
# If the item's first number is not in the list of previously encountered first numbers
if item[0] not in previous_values:
# Append it to the return list
ret.append(item)
return ret
Testing
some_list = [(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)]
print(remove_if_first_index(some_list))
# [(5, 0), (3, 1), (6, 4)]
I had this idea without seeing the #Anton vBR's answer.
import collections
inp = [(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)]
od = collections.OrderedDict()
for i1, i2 in inp:
if i2 <= od.get(i1, i2):
od.pop(i1, None)
od[i1] = i2
outp = list(od.items())
print(outp)
I am trying to solve finding the most frequent k-mers with mismatches in a string. The requirements are listed below:
Frequent Words with Mismatches Problem: Find the most frequent k-mers with mismatches in a string.
Input: A string Text as well as integers k and d. (You may assume k ≤ 12 and d ≤ 3.)
Output: All most frequent k-mers with up to d mismatches in Text.
Here is an example:
Sample Input:
ACGTTGCATGTCGCATGATGCATGAGAGCT
4 1
Sample Output:
GATG ATGC ATGT
The simplest and most inefficient way is to list all of k-mers in the text and calculate their hamming_difference between each other and pick out patterns whose hamming_difference less than or equal with d, below is my code:
import collections
kmer = 4
in_genome = "ACGTTGCATGTCGCATGATGCATGAGAGCT";
in_mistake = 1;
out_result = [];
mismatch_list = []
def hamming_distance(s1, s2):
# Return the Hamming distance between equal-length sequences
if len(s1) != len(s2):
raise ValueError("Undefined for sequences of unequal length")
else:
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
for i in xrange(len(in_genome)-kmer + 1):
v = in_genome[i:i + kmer]
out_result.append(v)
for i in xrange(len(out_result) - 1):
for j in xrange(i+1, len(out_result)):
if hamming_distance(str(out_result[i]), str(out_result[j])) <= in_mistake:
mismatch_list.extend([out_result[i], out_result[j]])
mismatch_count = collections.Counter(mismatch_list)
print [key for key,val in mismatch_count.iteritems() if val == max(mismatch_count.values())]
Instead of the expected results, I got 'CATG'. Does anyone know something wrong with my code?
It all seems great until your last line of code:
print [key for key,val in mismatch_count.iteritems() if val == max(mismatch_count.values())]
Since CATG scored higher than any other kmer, you'll only ever get that one answer. Take a look at:
>>> print mismatch_count.most_common()
[('CATG', 9), ('ATGA', 6), ('GCAT', 6), ('ATGC', 4), ('TGCA', 4), ('ATGT', 4), ('GATG', 4), ('GTTG', 2), ('TGAG', 2), ('TTGC', 2), ('CGCA', 2), ('TGAT', 1), ('GTCG', 1), ('AGAG', 1), ('ACGT', 1), ('TCGC', 1), ('GAGC', 1), ('GAGA', 1)]
to figure out what it is you really want back from this result.
I believe the fix is to change your second top level 'for' loop to read as follows:
for t_kmer in set(out_result):
for s_kmer in out_result:
if hamming_distance(t_kmer, s_kmer) <= in_mistake:
mismatch_list.append(t_kmer)
This produces a result similar to what you're expecting:
>>> print mismatch_count.most_common()
[('ATGC', 5), ('ATGT', 5), ('GATG', 5), ('CATG', 4), ('ATGA', 4), ('GTTG', 3), ('CGCA', 3), ('GCAT', 3), ('TGAG', 3), ('TTGC', 3), ('TGCA', 3), ('TGAT', 2), ('GTCG', 2), ('AGAG', 2), ('ACGT', 2), ('TCGC', 2), ('GAGA', 2), ('GAGC', 2), ('TGTC', 1), ('CGTT', 1), ('AGCT', 1)]
I have a list of tuples that looks like this:
lst = [(0, 0), (2, 3), (4, 3), (5, 1)]
What is the best way to accumulate the sum of the first and secound tuple elements? Using the example above, I'm looking for the best way to produce this list:
new_lst = [(0, 0), (2, 3), (6, 6), (11, 7)]
I am looking for a solution in Python 2.6
I would argue the best solution is itertools.accumulate() to accumulate the values, and using zip() to split up your columns and merge them back. This means the generator just handles a single column, and makes the method entirely scalable.
>>> from itertools import accumulate
>>> lst = [(0, 0), (2, 3), (4, 3), (5, 1)]
>>> list(zip(*map(accumulate, zip(*lst))))
[(0, 0), (2, 3), (6, 6), (11, 7)]
We use zip() to take the columns, then apply itertools.accumulate() to each column, then use zip() to merge them back into the original format.
This method will work for any iterable, not just sequences, and should be relatively efficient.
Prior to 3.2, accumulate can be defined as:
def accumulate(iterator):
total = 0
for item in iterator:
total += item
yield total
(The docs page gives a more generic implementation, but for this use case, we can use this simple implementation).
How about this generator:
def accumulate_tuples(iterable):
accum_a = accum_b = 0
for a, b in iterable:
accum_a += a
accum_b += b
yield accum_a, accum_b
If you need a list, just call list(accumulate_tuples(your_list)).
Here's a version that works for arbitrary length tuples:
def accumulate_tuples(iterable):
it = iter(iterable):
accum = next(it) # initialize with the first value
yield accum
for val in it: # iterate over the rest of the values
accum = tuple(a+b for a, b in zip(accum, val))
yield accum
>> reduce(lambda x,y: (x[0] + y[0], x[1] + y[1]), lst)
(11, 7)
EDIT. I can see your updated question. To get the running list you can do:
>> [reduce(lambda x,y: (x[0]+y[0], x[1]+y[1]), lst[:i]) for i in range(1,len(lst)+1)]
[(0, 0), (2, 3), (6, 6), (11, 7)]
Not super efficient, but at least it works and does what you want :)
This works for any length of tuples or other iterables.
from collections import defaultdict
def accumulate(lst):
sums = defaultdict(int)
for item in lst:
for index, subitem in enumerate(item):
sums[index] += subitem
yield [sums[index] for index in xrange(len(sums))]
print [tuple(x) for x in accumulate([(0, 0), (2, 3), (4, 3), (5, 1)])]
In Python 2.7+ you would use a Counter instead of defaultdict(int).
This is a really poor way (in terms of performance) to do this because list.append is expensive, but it works.
last = lst[0]
new_list = [last]
for t in lst[1:]:
last += t
new_list.append(last)
Simple method:
>> x = [(0, 0), (2, 3), (4, 3), (5, 1)]
>>> [(sum(a for a,b in x[:t] ),sum(b for a,b in x[:t])) for t in range(1,len(x)+1)]
[(0, 0), (2, 3), (6, 6), (11, 7)]
lst = [(0, 0), (2, 3), (4, 3), (5, 1)]
lst2 = [lst[0]]
for idx in range(1, len(lst)):
newItem = [0,0]
for idx2 in range(0, idx + 1):
newItem[0] = newItem[0] + lst[idx2][0]
newItem[1] = newItem[1] + lst[idx2][1]
lst2.append(newItem)
print(lst2)
You can use the following function
>>> def my_accumulate(lst):
new_lst = [lst[0]]
for x, y in lst[1:]:
new_lst.append((new_lst[-1][0]+x, new_lst[-1][1]+y))
return new_lst
>>> lst = [(0, 0), (2, 3), (4, 3), (5, 1)]
>>> my_accumulate(lst)
[(0, 0), (2, 3), (6, 6), (11, 7)]
Changed my code to a terser version:
lst = [(0, 0), (2, 3), (4, 3), (5, 1)]
def accumulate(the_list):
the_item = iter(the_list)
accumulator = next(the_item)
while True:
yield accumulator
accumulator = tuple(x+y for (x,y) in zip (accumulator, next(the_item)))
new_lst = list(accumulate(lst))