I'm trying to put together a code that replaces unique characters in a given input string with corresponding values in a dictionary in a combinatorial manner while preserving the position of 'non' unique characters.
For example, I have the following dictionary:
d = {'R':['A','G'], 'Y':['C','T']}
How would go about replacing all instances of 'R' and 'Y' while producing all possible combinations of the string but maintaining the positions of 'A' and 'C'?
For instance, the input 'ARCY' would generate the following output:
'AACC'
'AGCC'
'AACT'
'AGCT'
Hopefully that makes sense. If anyone can point me in the right directions, that would be great!
Given the dictionary, we can state a rule that tells us what letters are possible at a given position in the output. If the original letter from the input is in the dictionary, we use the value; otherwise, there is a single possibility - the original letter itself. We can express that very neatly:
def candidates(letter):
d = {'R':['A','G'], 'Y':['C','T']}
return d.get(letter, [letter])
Knowing the candidates for each letter (which we can get by mapping our candidates function onto the letters in the pattern), we can create the Cartesian product of candidates, and collapse each result (which is a tuple of single-letter strings) into a single string by simply ''.joining them.
def substitute(pattern):
return [
''.join(result)
for result in itertools.product(*map(candidates, pattern))
]
Let's test it:
>>> substitute('ARCY')
['AACC', 'AACT', 'AGCC', 'AGCT']
The following generator function produces all of your desired strings, using enumerate, zip, itertools.product, a list comprehension and argument list unpacking all of which are very handy Python tools/concepts you should read up on:
from itertools import product
def multi_replace(s, d):
indexes, replacements = zip(*[(i, d[c]) for i, c in enumerate(s) if c in d])
# indexes: (1, 3)
# replacements: (['A', 'G'], ['C', 'T'])
l = list(s) # turn s into sth. mutable
# iterate over cartesian product of all replacement tuples ...
for p in product(*replacements):
for index, replacement in zip(indexes, p):
l[index] = replacement
yield ''.join(l)
d = {'R': ['A', 'G'], 'Y': ['C', 'T']}
s = 'ARCY'
for perm in multi_replace(s, d):
print perm
AACC
AACT
AGCC
AGCT
s = 'RRY'
AAC
AAT
AGC
AGT
GAC
GAT
GGC
GGT
Change ARCY to multiple list and use below code:
import itertools as it
list = [['A'], ['A','G'],['C'],['C','T']]
[''.join(item) for item in it.product(*list)]
or
import itertools as it
list = ['A', 'AG','C', 'CT']
[''.join(item) for item in it.product(*list)]
Related
I am making a function that corrects our wrong English words.
but I have a problem.
What I want to do is find common letters from both lists (of words). I know that I can do this using the intersection method of sets but this will remove all double words.
wrong_word='addition'
probably_right_word='addiction'
common_letters=common(wrong_word, probably_right_word)
#answer should be ['a','d','d','i','t','i','o','n'] here letter 'c' is not present that's what I wanted.
# wrong_word & probably_right_word this will remove the duplicate letters so this is not valid answer.
#other example of my problem
list1=[1,1,2,3,4,1]
list2=[1,1,3,6]
result=[1,1,3]
#as shown result is the list of the similar elements in the both list.
Builds a dict of counts of elements in l2. Uses that to decide which elements in l1 to include.
w1='addition'
w2='addiction'
#other example of my problem
list1=[1,1,2,3,4,1]
list2=[1,1,3,6]
def f(l1, l2):
l2c = {}
for i in l2:
l2c[i] = l2c.get(i, 0) + 1
# build dict of counts of elements in list2
res = []
for x in l1:
if l2c.get(x,0) > 0:
res.append(x)
l2c[x]-=1
return res
print(f(list1,list2))
print(f(w1,w2))
This will achieve what you've asked for, but for real use cases this algorithm should be problematic. I've given examples in my comment above on the main thread of why this could cause issues, which depends on what you are trying to do.
Try the following:
>>> [char for char in wrong_word if char in probably_right_word]
['a', 'd', 'd', 'i', 't', 'i', 'o', 'n']
>>>
Simply iterate through the characters in either word and only add them to the list if they are in the other word as well.
I have the following list:
l3=[['a','b'],['a','e'],['e','g'],['f','h']]
I can easily generate all the combination of 3 element using the list
But now I want to find the combination of all the lists which have a common element between them. For instance:
one possible outcome can be ['a','e','b'] (Since between list [a,b] and [a,e]; there is common element'a')
However any combination like [a,e,f,h] is not allowed since there are no common elemnts betwee list [a.e] and [f,h].
Sets are easier to work with in this case:
[list(a | b) for a, b in itertools.combinations(map(set, l3), 2) if a & b]
# => [['a', 'b', 'e'], ['a', 'e', 'g']]
Turn each sublist into a set, then match each set against every other using combinations. If there is a non-empty intersection, yield the union of the two sets (as a list, so you can't see me using set magic :P )
As I understand the question, you are basically trying to get a random sample of nodes from connected components in a graph. This may be overkill, but you could use the Union-Find algorithm / Disjoint-Set data structure to quickly find the "connected components".
from collections import defaultdict
leaders = defaultdict(lambda: None)
def find(x):
l = leaders[x]
if l is not None:
l = find(l)
leaders[x] = l
return l
return x
l3 = [['a','b'],['a','e'],['e','g'],['f','h']]
for x,y in l3:
lx, ly = find(x), find(y)
if lx != ly:
leaders[lx] = ly
groups = defaultdict(set)
for x in leaders:
groups[find(x)].add(x)
groups = list(groups.values())
print(groups)
# [{'a', 'g', 'e', 'b'}, {'h', 'f'}]
Then just randomly choose one of those and get a sample from it. Optionally, use different probabilities for the different groups based on the number of elements in them. (Left as an excercise to the interested reader.)
import random
s = random.sample(random.choice(groups), 2)
print(s)
Note: This may also return samples that are not directly connected, e.g. [a, g]. Not sure wheter this is wanted or not.
I am attempting to deduplicate some pandas DataFrames, and I have a function that does this pair-wise (i.e. two dfs at a time). I want to write another function that takes a list of DataFrames of arbitrary length and combines the first two elements in the list, then combines the result with the third element in the list until we reach the end of the list.
For simplicity, I'll assume my deduplication function is simply string concatenation.
I tried some recursive functions, but it's not quite correct.
def dedupe_recursive(input_list):
if input_list == []:
return
else:
for i in range(0, len(input_list)-1):
new_list = input_list[i+1:]
deduped = dedupe(new_list[i], new_list[i+1])
print(deduped, new_list)
return dedupe_recursive(new_list)
Input (list): ['a', 'b', 'c', 'd']
Output (list of lists): [['ab'], ['ab', 'c'], ['abc', 'd']]
There's a function for exactly this kind of thing, it's called reduce. You would use it like this:
from functools import reduce
final_df = reduce(dedupe, list_of_dataframes)
Hi I'm looking for a way to split a list based on some values, and assuming the list's length equals to sum of some values, e.g.:
list: l = ['a','b','c','d','e','f']
values: v = (1,1,2,2)
so len(l) = sum(v)
and I'd like to have a function to return a tuple or a list, like: (['a'], ['b'], ['c','d'], ['d','e'])
currently my code is like:
(list1,list2,list3,list4) = (
l[0:v[0]],
l[v[0]:v[0]+v[1]],
l[v[0]+v[1]:v[0]+v[1]+v[2]],
l[v[0]+v[1]+v[2]:v[0]+v[1]+v[2]+v[3]])`
I'm thinking about make this clearer, but closest one I have so far is (note the results are incorrect, not what I wanted)
s=0
[list1,list2,list3,list4] = [l[s:s+i] for i in v]
the problem is I couldn't increase s at the same time while iterating values in v, I'm hoping to get a better code to do so, any suggestion is appreciated, thanks!
If you weren't stuck on ancient Python, I'd point you to itertools.accumulate. Of course, even on ancient Python, you could use the (roughly) equivalent code provided in the docs I linked to do it. Using either the Py3 code or equivalent, you could do:
from itertools import accumulate # Or copy accumulate equivalent Python code
from itertools import chain
# Calls could be inlined in listcomp, but easier to read here
starts = accumulate(chain((0,), v)) # Extra value from starts ignored when ends exhausted
ends = accumulate(v)
list1,list2,list3,list4 = [l[s:e] for s, e in zip(starts, ends)]
Maybe make a generator of the values in l?
def make_list(l, v):
g = (x for x in l)
if len(l) == sum(v):
return [[next(g) for _ in range(val)] for val in v]
return None
You could just write a simple loop to iterate over v to generate a result:
l = ['a','b','c','d','e','f']
v = (1,1,2,2)
result = []
offset = 0
for size in v:
result.append(l[offset:offset+size])
offset += size
print result
Output:
[['a'], ['b'], ['c', 'd'], ['e', 'f']]
The idea here is using a nested loop. Assuming that your condition will always holds true, the logic then is to run through v and pick up i elements from l where i is an number from v.
index = 0 # this is the start index
for num in v:
temp = [] # this is a temp array, to hold individual elements in your result array.
for j in range(index, index+num): # this loop will pickup the next num elements from l
temp.append(l[j])
data.append(temp)
index += num
Output:
[['a'], ['b'], ['c', 'd'], ['e', 'f']]
The first answer https://stackoverflow.com/a/39715361/5759063 is the most pythonic way to do it. This is just the algorithmic backbone.
Best I could find is a two line solution:
breaks=[0]+[sum(v[:i+1]) for i in range(len(v))] #build a list of section indices
result=[l[breaks[i]:breaks[i+1]] for i in range(len(breaks)-1)] #split array according to indices
print result
My aim is to sort a list of strings where words have to be sorted alphabetically.Except words starting with "s" should be at the start of the list (they should be sorted as well), followed by the other words.
The below function does that for me.
def mysort(words):
mylist1 = sorted([i for i in words if i[:1] == "s"])
mylist2 = sorted([i for i in words if i[:1] != "s"])
list = mylist1 + mylist2
return list
I am just looking for alternative approaches to achieve this or if anyone can find any issues with the code above.
You could do it in one line, with:
sorted(words, key=lambda x: 'a' + x if x.startswith('s') else 'b' + x)
The sorted() function takes a keyword argument key, which is used to translate the values in the list before comparisons are done.
For example:
sorted(words, key=str.lower)
# Will do a sort that ignores the case, since instead
# of checking 'A' vs. 'b' it will check str.lower('A')
# vs. str.lower('b').
sorted(intlist, key=abs)
# Will sort a list of integers by magnitude, regardless
# of whether they're negative or positive:
# >>> sorted([-5,2,1,-8], key=abs)
# [1, 2, -5, -8]
The trick I used translated strings like this when doing the sorting:
"hello" => "bhello"
"steve" => "asteve"
And so "steve" would come before "hello" in the comparisons, since the comparisons are done with the a/b prefix.
Note that this only affects the keys used for comparisons, not the data items that come out of the sort.
1 . You can use generator expression inside sorted.
2 . You can use str.startswith.
3 . Don't use list as a variable name.
4 . Use key=str.lower in sorted.
mylist1 = sorted((i for i in words if i.startswith(("s","S"))),key=str.lower)
mylist2 = sorted((i for i in words if not i.startswith(("s","S"))),key=str.lower)
return mylist1 + mylist2
why str.lower?
>>> "abc" > "BCD"
True
>>> "abc" > "BCD".lower() #fair comparison
False
>>> l = ['z', 'a', 'b', 's', 'sa', 'sb', '', 'sz']
>>> sorted(l, key=lambda x:(x[0].replace('s','\x01').replace('S','\x01') if x else '') + x[1:])
['', 's', 'sa', 'sb', 'sz', 'a', 'b', 'z']
This key function replaces, for the purpose of sorting, every value starting with S or s with a \x01 which sorts before everything else.
One the lines of Integer answer I like using a tuple slightly better because is cleaner and also more general (works for arbitrary elements, not just strings):
sorted(key=lambda x : ((1 if x[:1] in ("S", "s") else 2), x))
Explanation:
The key parameter allows sorting an array based on the values of f(item) instead of on the values of item where f is an arbitray function.
In this case the function is anonymous (lambda) and returns a tuple where the first element is the "group" you want your element to end up in (e.g. 1 if the string starts with an "s" and 2 otherwise).
Using a tuple works because tuple comparison is lexicographical on the elements and therefore in the sorting the group code will weight more than the element.