i want all combinations words of character using terms. Example :
word = 'aan'
result = ['ana', 'naa', 'aan']
terms :
number of character 'a' -> 2
number of character 'n' -> 1
I tried a one liner solution, and give the result in a list
You can use permutation tools from itertools package to get all of the permutations (not combinations) solutions
from itertools import permutations
word = 'aan'
list(set([ ''.join(list(i)) for i in permutations(word,len(word))]))
If I really understand what you want, I would do it like that:
from itertools import permutations
result = set()
for combination in permutations("aan"):
result.add(combination)
You can use recursion with a generator:
from collections import Counter
def combo(d, c = []):
if len(c) == len(d):
yield ''.join(c)
else:
_c1, _c2 = Counter(d), Counter(c)
for i in d:
if _c2.get(i, 0) < _c1[i]:
yield from combo(d, c+[i])
word = 'aan'
print(list(set(combo(word))))
Output:
['aan', 'naa', 'ana']
word = 'ain'
print(list(set(combo(word))))
Output:
['ina', 'nia', 'nai', 'ani', 'ian', 'ain']
Related
Background
I'm working on a HackerRank problem Word Order. The task is to
Read the following input from stdin
4
bcdef
abcdefg
bcde
bcdef
Produce the output that reflects:
Number of unique words in first line
Count of occurrences for each unique words
Example:
3 # Number of unique words
2 1 1 # count of occurring words, 'bcdef' appears twice = 2
Problem
I've coded two solutions, the second one passes initial tests but fail due to exceeding time limit. First one would also work but I was unnecessarily sorting outputs (time limit issue would occur though).
Notes
In first solution I was unnecessarily sorting values, this is fixed in the second solution
I'm keen to be making better (proper) use of standard Python data structures, list/dictionary comprehension - I would be particularly keen to receive a solution that doesn't import any addittional modules, with exception of import os if needed.
Code
import os
def word_order(words):
# Output no of distinct words
distinct_words = set(words)
n_distinct_words = len(distinct_words)
print(str(n_distinct_words))
# Count occurrences of each word
occurrences = []
for distinct_word in distinct_words:
n_word_appearances = 0
for word in words:
if word == distinct_word:
n_word_appearances += 1
occurrences.append(n_word_appearances)
occurrences.sort(reverse=True)
print(*occurrences, sep=' ')
# for o in occurrences:
# print(o, end=' ')
def word_order_two(words):
'''
Run through all words and only count multiple occurrences, do the maths
to calculate unique words, etc. Attempt to construct a dictionary to make
the operation more memory efficient.
'''
# Construct a count of word occurrences
dictionary_words = {word:words.count(word) for word in words}
# Unique words are equivalent to dictionary keys
unique_words = len(dictionary_words)
# Obtain sorted dictionary values
# sorted_values = sorted(dictionary_words.values(), reverse=True)
result_values = " ".join(str(value) for value in dictionary_words.values())
# Output results
print(str(unique_words))
print(result_values)
return 0
if __name__ == '__main__':
q = int(input().strip())
inputs = []
for q_itr in range(q):
s = input()
inputs.append(s)
# word_order(words=inputs)
word_order_two(words=inputs)
Those nested loops are very bad performance wise (they make your algorithm quadratic) and quite unnecessary. You can get all counts in single iteration. You could use a plain dict or the dedicated collections.Counter:
from collections import Counter
def word_order(words):
c = Counter(words)
print(len(c))
print(" ".join(str(v) for _, v in c.most_common()))
The "manual" implementation that shows the workings of the Counter and its methods:
def word_order(words):
c = {}
for word in words:
c[word] = c.get(word, 0) + 1
print(len(c))
print(" ".join(str(v) for v in sorted(c.values(), reverse=True)))
# print(" ".join(map(str, sorted(c.values(), reverse=True))))
Without any imports, you could count unique elements by
len(set(words))
and count their occurrences by
def counter(words):
count = dict()
for word in words:
if word in count:
count[word] += 1
else:
count[word] = 1
return count.values()
You can use Counter then print output like below:
>>> from collections import Counter
>>> def counter_words(words):
... cnt = Counter(words)
... print(len(cnt))
... print(*[str(v) for k,v in c.items()] , sep=' ')
>>> inputs = ['bcdef' , 'abcdefg' , 'bcde' , 'bcdef']
>>> counter_words(inputs)
3
2 1 1
Say I have a string in alphabetical order, based on the amount of times that a letter repeats.
Example: "BBBAADDC".
There are 3 B's, so they go at the start, 2 A's and 2 D's, so the A's go in front of the D's because they are in alphabetical order, and 1 C. Another example would be CCCCAAABBDDAB.
Note that there can be 4 letters in the middle somewhere (i.e. CCCC), as there could be 2 pairs of 2 letters.
However, let's say I can only have n letters in a row. For example, if n = 3 in the second example, then I would have to omit one "C" from the first substring of 4 C's, because there can only be a maximum of 3 of the same letters in a row.
Another example would be the string "CCCDDDAABC"; if n = 2, I would have to remove one C and one D to get the string CCDDAABC
Example input/output:
n=2: Input: AAABBCCCCDE, Output: AABBCCDE
n=4: Input: EEEEEFFFFGGG, Output: EEEEFFFFGGG
n=1: Input: XXYYZZ, Output: XYZ
How can I do this with Python? Thanks in advance!
This is what I have right now, although I'm not sure if it's on the right track. Here, z is the length of the string.
for k in range(z+1):
if final_string[k] == final_string[k+1] == final_string[k+2] == final_string[k+3]:
final_string = final_string.translate({ord(final_string[k]): None})
return final_string
Ok, based on your comment, you're either pre-sorting the string or it doesn't need to be sorted by the function you're trying to create. You can do this more easily with itertools.groupby():
import itertools
def max_seq(text, n=1):
result = []
for k, g in itertools.groupby(text):
result.extend(list(g)[:n])
return ''.join(result)
max_seq('AAABBCCCCDE', 2)
# 'AABBCCDE'
max_seq('EEEEEFFFFGGG', 4)
# 'EEEEFFFFGGG'
max_seq('XXYYZZ')
# 'XYZ'
max_seq('CCCDDDAABC', 2)
# 'CCDDAABC'
In each group g, it's expanded and then sliced until n elements (the [:n] part) so you get each letter at most n times in a row. If the same letter appears elsewhere, it's treated as an independent sequence when counting n in a row.
Edit: Here's a shorter version, which may also perform better for very long strings. And while we're using itertools, this one additionally utilises itertools.chain.from_iterable() to create the flattened list of letters. And since each of these is a generator, it's only evaluated/expanded at the last line:
import itertools
def max_seq(text, n=1):
sequences = (list(g)[:n] for _, g in itertools.groupby(text))
letters = itertools.chain.from_iterable(sequences)
return ''.join(letters)
hello = "hello frrriend"
def replacing() -> str:
global hello
j = 0
for i in hello:
if j == 0:
pass
else:
if i == prev:
hello = hello.replace(i, "")
prev = i
prev = i
j += 1
return hello
replacing()
looks a bit primal but i think it works, thats what i came up with on the go anyways , hope it helps :D
Here's my solution:
def snip_string(string, n):
list_string = list(string)
list_string.sort()
chars = set(string)
for char in chars:
while list_string.count(char) > n:
list_string.remove(char)
return ''.join(list_string)
Calling the function with various values for n gives the following output:
>>> string = "AAAABBBCCCDDD"
>>> snip_string(string, 1)
'ABCD'
>>> snip_string(string, 2)
'AABBCCDD'
>>> snip_string(string, 3)
'AAABBBCCCDDD'
>>>
Edit
Here is the updated version of my solution, which only removes characters if the group of repeated characters exceeds n.
import itertools
def snip_string(string, n):
groups = [list(g) for k, g in itertools.groupby(string)]
string_list = []
for group in groups:
while len(group) > n:
del group[-1]
string_list.extend(group)
return ''.join(string_list)
Output:
>>> string = "DDDAABBBBCCABCDE"
>>> snip_string(string, 3)
'DDDAABBBCCABCDE'
from itertools import groupby
n = 2
def rem(string):
out = "".join(["".join(list(g)[:n]) for _, g in groupby(string)])
print(out)
So this is the entire code for your question.
s = "AABBCCDDEEE"
s2 = "AAAABBBDDDDDDD"
s3 = "CCCCAAABBDDABBB"
s4 = "AAAAAAAA"
z = "AAABBCCCCDE"
With following test:
AABBCCDDEE
AABBDD
CCAABBDDABB
AA
AABBCCDE
I am trying to solve a problem where the user inputs a string say str = "aaabbcc" and an integer n = 2.
So the function is supposed to remove characters that appearing 'n' times from the str and output only "aaa".
I tried couple of approaches and I'm not able to obtain the right output.
Are there any Regular expression functions that I could use or any recursive functions or just plain old iterations.
Thanks in advance.
Using itertools.groupby
Ex:
from itertools import groupby
s = "aaabbcc"
n = 2
result = ""
for k, v in groupby(s):
value = list(v)
if not len(value) == n:
result += "".join(value)
print(result)
Output:
aaa
You can use itertools.groupby:
>>> s = "aaabbccddddddddddeeeee"
>>> from itertools import groupby
>>> n = 3
>>> groups = (list(values) for _, values in groupby(s))
>>> "".join("".join(v) for v in groups if len(v) < n)
'bbcc'
from collections import Counter
counts = Counter(string)
string = "".join(c for c in string if counts[c] != 2)
Edit: Wait, sorry, I missed "consecutive". This will remove characters that occur exactly two times in the whole string (fitting your example, but not the general case).
Consecutive filter is a bit more complex, but doable - just find the consecutive runs first, then filter out the ones which have length two.
runs = [[string[0], 0]]
for c in string:
if c == runs[-1][0]:
runs[-1][1] += 1
else:
runs.append([c, 1])
string = "".join(c*length for c,length in runs if length != 2)
Edit2: As the other answers correctly point out, the first part of this is done natively by groupby
from itertools import groupby
string = "".join(c*length for c,length in groupby(string) if length != 2)
In [15]: some_string = 'aaabbcc'
In [16]: n = 2
In [17]: final_string = ''
In [18]: for k, v in Counter(some_string).items():
...: if v != n:
...: final_string += k * v
...:
In [19]: final_string
Out[19]: 'aaa'
You'll need: from collections import Counter
from collections import defaultdict
def fun(string,n):
dic = defaultdict(int)
for i in string:
dic[i]+=1
check = []
for i in dic:
if dic[i]==n:
check.append(i)
for i in check:
del dic[i]
return dic
string = "aaabbcc"
n = 2
result = fun(string, n)
sol =''
for i in result:
sol+=i*result[i]
print(sol)
output
aaa
I'm trying to write python code that will take a string and a length, and search through the string to tell me which sub-string of that particular length occurs the most, prioritizing the first if there's a tie.
For example, "cadabra abra" 2 should return ab
I tried:
import sys
def main():
inputstring = str(sys.argv[1])
length = int(sys.argv[2])
Analyze(inputstring, length)
def Analyze(inputstring, length):
count = 0;
runningcount = -1;
sequence = ""
substring = ""
for i in range(0, len(inputstring)):
substring = inputstring[i:i+length]
for j in range(i+length,len(inputstring)):
#print(runningcount)
if inputstring[j:j+2] == substring:
print("runcount++")
runningcount += 1
print(runningcount)
if runningcount > count:
count = runningcount
sequence = substring
print(sequence)
main()
But can't seem to get it to work. I know I'm at least doing something wrong with the counts, but I'm not sure what. This is my first program in Python too, but I think my problem is probably more with the algorithm than the syntax.
Try to use built-in method, they will make your life easier, this way:
>>> s = "cadabra abra"
>>> x = 2
>>> l = [s[i:i+x] for i in range(len(s)-x+1)]
>>> l
['ca', 'ad', 'da', 'ab', 'br', 'ra', 'a ', ' a', 'ab', 'br', 'ra']
>>> max(l, key=lambda m:s.count(m))
'ab'
EDIT:
Much simpler syntax as per Stefan Pochmann comment:
>>> max(l, key=s.count)
import sys
from collections import OrderedDict
def main():
inputstring = sys.argv[1]
length = int(sys.argv[2])
analyze(inputstring, length)
def analyze(inputstring, length):
d = OrderedDict()
for i in range(0, len(inputstring) - length + 1):
substring = inputstring[i:i+length]
if substring in d:
d[substring] += 1
else:
d[substring] = 1
maxlength = max(d.values())
for k,v in d.items():
if v == maxlength:
print(k)
break
main()
Pretty good stab at a solution for a first Python program. As you learn the language, spend some time reading the excellent documentation. It is full of examples and tips.
For example, the standard library includes a Counter class for counting things (obviously) and an OrderedDict class which remebers the ording in which keys are entered. But the documentation includes an example that combines the two to make an OrderedCounter, which can be used to solve you problem like this:
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
pass
def analyze(s, n):
substrings = (s[i:i+n] for i in range(len(s)-n+1))
counts = OrderedCounter(substrings)
return max(counts.keys(), key=counts.__getitem__)
analyze("cadabra abra", 2)
How can I compare all strings in a list e.g:
"A-B-C-D-E-F-H-A",
"A-B-C-F-G-H-M-P",
And output until which character they are identical:
In the example above it would be:
Character 6
And output the most similar strings.
I tried with collections.Counter but that did not work.
You're trying to go character by character in the two strings in lockstep. This is a job for zip:
A = "A-B-C-D-E-F-H-A"
B = "A-B-C-F-G-H-M-P"
count = 0
for a, b in zip(A, B):
if a == b:
count += 1
else:
break
Or, if you prefer "…as long as they are…" is a job for takewhile:
from itertools import takewhile
from operator import eq
def ilen(iterable): return sum(1 for _ in iterable)
count = ilen(takewhile(lambda ab: eq(*ab), zip(A, B)))
If you have a list of these strings, and you want to compare every string to every other string:
First, you turn the above code into a function. I'll do it with the itertools version, but you can do it with the other just as easily:
def shared_prefix(A, B):
return ilen(takewhile(lambda ab: eq(*ab), zip(A, B)))
Now, for every string, you compare it to all the rest of the strings. There's an easy way to do it with combinations:
from itertools import combinations
counts = [shared_prefix(pair) for pair in combinations(list_o_strings, 2)]
But if you don't understand that, you can write it as a nested loop. The only tricky part is what "the rest of the strings" means. You can't loop over all the strings in both the outer and inner loops, or you'll compare each pair of strings twice (once in each order), and compare each string to itself. So it has to mean "all the strings after the current one". Like this:
counts = []
for i, s1 in enumerate(list_o_strings):
for s2 in list_o_strings[i+1:]:
counts.append(prefix(s1, s2))
I think this code will solve your problem.
listA = "A-B-C-D-E-F-H-A"
listB = "A-B-C-F-G-H-M-P"
newListA = listA.replace ("-", "")
newListB = listB.replace ("-", "")
# newListA = "ABCDEFHA"
# newListB = "ABCFGHMP"
i = 0
exit = 0
while ((i < len (newListA)) & (exit == 0)):
if (newListA[i] != newListB[i]):
exit = 1
i = i + 1
print ("Character: " + str(i))