Find common elements of two strings including characters that occur many times

Find common elements of two strings including characters that occur many times - python

I would like to get common elements in two given strings such that duplicates will be taken care of. It means that if a letter occurs 3 times in the first string and 2 times in the second one, then in the common string it has to occur 2 times. The length of the two strings may be different. eg
s1 = 'aebcdee'
s2 = 'aaeedfskm'
common = 'aeed'
I can not use the intersection between two sets. What would be the easiest way to find the result 'common' ? Thanks.

Well there are multiple ways in which you can get the desired result. For me the simplest algorithm to get the answer would be:
Define an empty dict. Like d = {}
Iterate through each character of the first string:
if the character is not present in the dictionary, add the character to the dictionary.
else increment the count of character in the dictionary.
Create a variable as common = ""
Iterate through the second string characters, if the count of that character in the dictionary above is greater than 0: decrement its value and add this character to common
Do whatever you want to do with the common
The complete code for this problem:
s1 = 'aebcdee'
s2 = 'aaeedfskm'
d = {}
for c in s1:
if c in d:
d[c] += 1
else:
d[c] = 1
common = ""
for c in s2:
if c in d and d[c] > 0:
common += c
d[c] -= 1
print(common)

You can use two arrays (length 26).
One array is for the 1st string and 2nd array is for the second string.
Initialize both the arrays to 0.
The 1st array's 0th index denotes the number of "a" in 1st string,
1st index denotes number of "b" in 1st string, similarly till - 25th index denotes number of "z" in 1st string.
Similarly, you can create an array for the second string and store the count of
each alphabet in their corresponding index.
s1 = 'aebcdee'
s2 = 'aaeedfs'
Below is the array example for the above s1 and s2 values
Now you can run through the 1st String
s1 = 'aebcdee'
for each alphabet find the
K = minimum of ( [ count(alphabet) in Array 1 ], [ count(alphabet) in Array 2 ] )
and print that alphabet K times.
then make that alphabet count to 0 in both the arrays. (Because if you dint make it zero, then our algo might print the same alphabet again if it comes in the future).
Complexity - O( length(S1) )
Note - You can also run through the string having a minimum length to reduce the complexity.
In that case Complexity - O( minimum [ length(S1), length(S2) ] )
Please let me know if you want the implementation of this.

you can use collection.Counter and count each char in two string and if each char exist in two string using min of list and create a new string by join of them.
from collections import Counter, defaultdict
from itertools import zip_longest
s1 = 'aebcdee'
s2 = 'aaeedfskm'
# Create a dictionary the value is 'list' and can append char in each 'list'
res = defaultdict(list)
# get count of each char
cnt1 = Counter(s1) # -> {'e': 3, 'a': 1, 'b': 1, 'c': 1, 'd': 1}
cnt2 = Counter(s2) # -> {'a': 2, 'e': 2, 'd': 1, 'f': 1, 's': 1, 'k': 1, 'm': 1}
# for appending chars in one step, we can zip count of chars in two strings,
# so Because maybe two string have different length, we can use 'itertools. zip_longest'
for a,b in zip_longest(cnt1 , cnt2):
# list(zip_longest(cnt1 , cnt2)) -> [('a', 'a'), ('e', 'e'), ('b', 'd'),
# ('c', 'f'), ('d', 's'), (None, 'k'),
# (None, 'm')]
# Because maybe we have 'none', before 'append' we need to check 'a' and 'b' don't be 'none'
if a: res[a].append(cnt1[a])
if b: res[b].append(cnt2[b])
# res -> {'a': [1, 2], 'e': [3, 2], 'b': [1], 'd': [1, 1], 'c': [1], 'f': [1], 's': [1], 'k': [1], 'm': [1]}
# If the length 'list' of each char is larger than one so this char is duplicated and we repeat this char in the result base min of each char in the 'list' of count char of two strings.
out = ''.join(k* min(v) for k,v in res.items() if len(v)>1)
print(out)
# aeed
We can use this approach for multiple string, like three strings.
s1 = 'aebcdee'
s2 = 'aaeedfskm'
s3 = 'aaeeezzxx'
res = defaultdict(list)
cnt1 = Counter(s1)
cnt2 = Counter(s2)
cnt3 = Counter(s3)
for a,b,c in zip_longest(cnt1 , cnt2, cnt3):
if a: res[a].append(cnt1[a])
if b: res[b].append(cnt2[b])
if c: res[c].append(cnt3[c])
out = ''.join(k* min(v) for k,v in res.items() if len(v)>1)
print(out)
# aeed

s1="ckglter"
s2="ancjkle"
final_list=[]
if(len(s1)<len(s2)):
for i in s1:
if(i in s2):
final_list.append(i)
else:
for i in s2:
if(i in s1):
final_list.append(i)
print(final_list)
you can also do it like this also, just iterate through both the string using for loop and append the common character into the empty list

Related

how to find the most popular letter in a string that also has the lowest ascii value

Implement the function most_popular_character(my_string), which gets the string argument my_string and returns its most frequent letter. In case of a tie, break it by returning the letter of smaller ASCII value.
Note that lowercase and uppercase letters are considered different (e.g., ‘A’ < ‘a’). You may assume my_string consists of English letters only, and is not empty.
Example 1: >>> most_popular_character("HelloWorld") >>> 'l'
Example 2: >>> most_popular_character("gggcccbb") >>> 'c'
Explanation: cee and gee appear three times each (and bee twice), but cee precedes gee lexicographically.
Hints (you may ignore these):
Build a dictionary mapping letters to their frequency;
Find the largest frequency;
Find the smallest letter having that frequency.
def most_popular_character(my_string):
char_count = {} # define dictionary
for c in my_string:
if c in char_count: #if c is in the dictionary:
char_count[c] = 1
else: # if c isn't in the dictionary - create it and put 1
char_count[c] = 1
sorted_chars = sorted(char_count) # sort the dictionary
char_count = char_count.keys() # place the dictionary in a list
max_per = 0
for i in range(len(sorted_chars) - 1):
if sorted_chars[i] >= sorted_chars[i+1]:
max_per = sorted_chars[i]
break
return max_per
my function returns 0 right now, and I think the problem is in the last for loop and if statement - but I can't figure out what the problem is..
If you have any suggestions on how to adjust the code it would be very appreciated!

Your dictionary didn't get off to a good start by you forgetting to add 1 to the character count, instead you are resetting to 1 each time.
Have a look here to get the gist of getting the maximum value from a dict: https://datagy.io/python-get-dictionary-key-with-max-value/
def most_popular_character(my_string):
# NOTE: you might want to convert the entire sting to upper or lower case, first, depending on the use
# e.g. my_string = my_string.lower()
char_count = {} # define dictionary
for c in my_string:
if c in char_count: #if c is in the dictionary:
char_count[c] += 1 # add 1 to it
else: # if c isn't in the dictionary - create it and put 1
char_count[c] = 1
# Never under estimate the power of print in debugging
print(char_count)
# max(char_count.values()) will give the highest value
# But there may be more than 1 item with the highest count, so get them all
max_keys = [key for key, value in char_count.items() if value == max(char_count.values())]
# Choose the lowest by sorting them and pick the first item
low_item = sorted(max_keys)[0]
return low_item, max(char_count.values())
print(most_popular_character("HelloWorld"))
print(most_popular_character("gggcccbb"))
print(most_popular_character("gggHHHAAAAaaaccccbb 12 3"))
Result:
{'H': 1, 'e': 1, 'l': 3, 'o': 2, 'W': 1, 'r': 1, 'd': 1}
('l', 3)
{'g': 3, 'c': 3, 'b': 2}
('c', 3)
{'g': 3, 'H': 3, 'A': 4, 'a': 3, 'c': 4, 'b': 2, ' ': 2, '1': 1, '2': 1, '3': 1}
('A', 4)
So: l and 3, c and 3, A and 4

def most_popular_character(my_string):
history_l = [l for l in my_string] #each letter in string
char_dict = {} #creating dict
for item in history_l: #for each letter in string
char_dict[item] = history_l.count(item)
return [max(char_dict.values()),min(char_dict.values())]
I didn't understand the last part of minimum frequency, so I make this function return a maximum frequency and a minimum frequency as a list!

Use a Counter to count the characters, and use the max function to select the "biggest" character according to your two criteria.
>>> from collections import Counter
>>> def most_popular_character(my_string):
... chars = Counter(my_string)
... return max(chars, key=lambda c: (chars[c], -ord(c)))
...
>>> most_popular_character("HelloWorld")
'l'
>>> most_popular_character("gggcccbb")
'c'
Note that using max is more efficient than sorting the entire dictionary, because it only needs to iterate over the dictionary once and find the single largest item, as opposed to sorting every item relative to every other item.

how to print a string showing the number of times each character is repeated in a string?

Given a string S, how to print a string containing number of times the character is repeated?
for example:
input: aaabbbbccaa
output: a3b4c2a2
my approach:
s = input()
len_string = ''
cur_char = s[0]
cur_counter = 0
for i in range(len(s)):
if s[i] == cur_char:
cur_counter += 1
if s[i] != cur_char or i == len(s) - 1:
len_string += cur_char + str(cur_counter)
cur_char = s[i]
cur_counter = 1
print(len_string)

Because you have shared your code, here is one concise way using groupby:
from itertools import groupby
s = 'aaabbbbccaa'
print(''.join([k + str(len(list(g))) for k, g in groupby(s)]))
# a3b4c2a2

I would use collections.Counter https://docs.python.org/2/library/collections.html#counter-objects
Init signature: collections.Counter(*args, **kwds)
Docstring:
Dict subclass for counting hashable items. Sometimes called a bag
or multiset. Elements are stored as dictionary keys and their counts
are stored as dictionary values.
>>> c = Counter('abcdeabcdabcaba') # count elements from a string
>>> c.most_common(3) # three most common elements
[('a', 5), ('b', 4), ('c', 3)]
>>> sorted(c) # list all unique elements
['a', 'b', 'c', 'd', 'e']
>>> ''.join(sorted(c.elements())) # list elements with repetitions
'aaaaabbbbcccdde'
>>> sum(c.values()) # total of all counts
15
>>> c['a'] # count of letter 'a'
5
>>> for elem in 'shazam': # update counts from an iterable
... c[elem] += 1 # by adding 1 to each element's count
>>> c['a'] # now there are seven 'a'
7
>>> del c['b'] # remove all 'b'
>>> c['b'] # now there are zero 'b'
0
>>> d = Counter('simsalabim') # make another counter
>>> c.update(d) # add in the second counter
>>> c['a'] # now there are nine 'a'
9
>>> c.clear() # empty the counter
>>> c
Counter()
Note: If a count is set to zero or reduced to zero, it will remain
in the counter until the entry is deleted or the counter is cleared:
>>> c = Counter('aaabbc')
>>> c['b'] -= 2 # reduce the count of 'b' by two
>>> c.most_common() # 'b' is still in, but its count is zero
[('a', 3), ('c', 1), ('b', 0)]

I think there is a way to do this much more easy:
x='aaabbbbccaa'
noreplist = list(dict.fromkeys(x))
countstring=''
for i in noreplist:
z=z+i+str(x.count(i))
print(countstring)
First, you have x that is your string.
Then you make a list with every char from that string, but without repeating any char.
And last, just counts how many times is that char repeated on the original string, and concatenate in a 'count string'.

#y is a list that contains every character in the string
#z is a list parallel to y but it contains the number of times each element in list y #has
x=input("Input string: ")
y=[]
z=[]
acc=0
for i in range(len(x)):
if(x[i] not in y):
y.append(x[i])
for i in range(len(y)):
for j in range(len(x)):
if(y[i]==x[j]):
acc=acc+1
z.append(acc)
acc=0
for k in range(len(y)):
print(str(y[k])+str(z[k]))

Counting subsequent letters

So I am trying to implement code that will count the next letter in a sentence, using python.
so for instance,
"""So I am trying to implement code that will count the next letter in a sentence, using
python"""
most common letters one after the other
for 's'
'o' :1
'e' :1
for 'o'
' ' :1
'd' :1
'u' :1
'n' :1
I think you get the idea
I already have written code for counting letters prior
def count_letters(word, char):
count = 0
for c in word:
if char == c:
count += 1
return count
As you can see this just counts for letters, but not the next letter. can someone give me a hand on this one?

from collections import Counter, defaultdict
counts = defaultdict(Counter)
s = """So I am trying to implement code that will count the next letter in a sentence, using
python""".lower()
for c1, c2 in zip(s, s[1:]):
counts[c1][c2] += 1
(apart from being simpler, this should be significantly faster than pault's answer by not iterating over the string for every letter)
Concepts to google that aren't named in the code:
for c1, c2 in ... (namely the fact that there are two variables): tuple unpacking
s[1:]: slicing. Basically this is a copy of the string after the first character.

Here is a relatively terse way to do it:
from itertools import groupby
from collections import Counter
def countTransitionFrequencies(text):
prevNext = list(zip(text[:-1], text[1:]))
prevNext.sort(key = lambda pn: pn[0])
transitions = groupby(prevNext, lambda pn: pn[0])
freqs = map(
lambda kts: (kts[0], Counter(map(lambda kv: kv[1], kts[1]))),
transitions
)
return freqs
Explanation:
zip creates list of pairs with (previous, next) characters
The pairs are sorted and grouped by the previous character
The frequencies of the next characters (extracted from pairs by kv[1]) are then counted using Counter.
Sorting is not really necessary, but unfortunately, this is how the provided groupby works.
An example:
for k, v in countTransitionFrequencies("hello world"):
print("%r -> %r" % (k, v))
This prints:
' ' -> Counter({'w': 1})
'e' -> Counter({'l': 1})
'h' -> Counter({'e': 1})
'l' -> Counter({'l': 1, 'o': 1, 'd': 1})
'o' -> Counter({' ': 1, 'r': 1})
'r' -> Counter({'l': 1})
'w' -> Counter({'o': 1})

Here's a way using collections.Counter:
Suppose the string you provided was stored in a variable s.
First we iterate over the set of all lower case letters in s. We do this by making another string s_lower which will convert the string s to lowercase. We then wrap this with the set constructor to get unique values.
For each char, we iterate through the string and check to see if the previous letter is equal to char. If so, we store this in a list. Finally, we pass this list into the collections.Counter constructor which will count the occurrences.
Each counter is stored in a dictionary, counts, where the keys are the unique characters in the string.
from collections import Counter
counts = {}
s_lower = s.lower()
for char in set(s_lower):
counts[char] = Counter(
[c for i, c in enumerate(s_lower) if i > 0 and s_lower[i-1] == char]
)
For your string, this has the following outputs:
>>> print(counts['s'])
#Counter({'i': 1, 'e': 1, 'o': 1})
>>> print(counts['o'])
#Counter({' ': 2, 'd': 1, 'n': 1, 'u': 1})
One caveat is that this method will iterate through the whole string for each unique character, which could potentially make it slow for large lists.
Here is an alternative approach using collections.Counter and collections.defaultdict that only loops through the string once:
from collections import defaultdict, Counter
def count_letters(s):
s_lower = s.lower()
counts = defaultdict(Counter)
for i in range(len(s_lower) - 1):
curr_char = s_lower[i]
next_char = s_lower[i+1]
counts[curr_char].update(next_char)
return counts
counts = count_letters(s)
We loop over each character in the string (except the last) and on each iteration we update a counter using the next character.

This should work, the only thing is it doesn't sort the values, but that can be solved by creating a new dictionary with list of tuples (char, occurrences) and using sorted function on tuple[1].
def countNext(word):
d = {}
word = word.lower()
for i in range(len(word) - 1):
c = word[i]
cc = word[i+1]
if(not c.isalpha() or not cc.isalpha()):
continue
if c in d:
if cc in d[c]:
d[c][cc] += 1
else:
d[c][cc] = 1
else:
d[c] = {}
d[c][cc] = 1
return d

How to count elements on each position in lists

I have a lot of lists like:
SI821lzc1n4
MCap1kr01lv
All of them have the same length. I need to count how many times each symbol appears on each position. Example:
abcd
a5c1
b51d
Here it'll be a5cd

One way is to use zip to associate characters in the same position. We can then send all of the characters from each position to a Counter, then use Counter.most_common to get the most common character
from collections import Counter
l = ['abcd', 'a5c1', 'b51d']
print(''.join([Counter(z).most_common(1)[0][0] for z in zip(*l)]))
# a5cd

from statistics import mode
[mode([x[i] for x in y]) for i in xrange(len(y[0]))]
where y is your list.
Python 3.4 and up

You could use combination of zip and Counter
a = ("abcd")
b = ("a5c1")
c = ("b51d")
from collections import Counter
zippedList = list(zip(a,b,c))
print("zipped: {}".format(zippedList))
final = ""
for x in zippedList:
countLetters = Counter(x)
print(countLetters)
final += countLetters.most_common(3)[0][0]
print("output: {}".format(final))
output:
zipped: [('a', 'a', 'b'), ('b', '5', '5'), ('c', 'c', '1'), ('d', '1', 'd')]
Counter({'a': 2, 'b': 1})
Counter({'5': 2, 'b': 1})
Counter({'c': 2, '1': 1})
Counter({'d': 2, '1': 1})
output: a5cd

This all depends on where your list is. Is your list coming from another file or is it an actual array? At the end of the day, the best way to do this simply is going to be to use a dictionary and a for loop.
new_dict = {}
for i in range(len(line)):
if i in new_dict:
new_dict[i].append(line[i])
else:
new_dict[i] = [line[i]]
Then after that I'm assuming that you'd like to output the four most common element appearances. For that I'd recommend importing statistics and using the mode method...
from statistics import mode
new_line = ""
for key in new_dict:
x = mode(new_dict[key])
new_line = new_line + x
However, your question is quite vague, please elaborate more next time.
P.s. I'm a newbie so all you experienced programmers plz don't hate :)

I would use a combination of defaultdict, enumerate, and Counter:
>>> from collections import Counter, defaultdict
>>> data = '''abcd
a5c1
b51d
'''
>>> poscount = defaultdict(Counter)
>>> for line in data.split():
for i, character in enumerate(line):
poscount[i][character] += 1
>>> ''.join([poscount[i].most_common(1)[0][0] for i in sorted(poscount)])
'a5cd'
Here's how it works:
The defaultdict() creates new entries when it sees a new key.
The enumerate() function returns both the character and its position in the line.
The Counter counts the occurences of individual characters
Combining the three makes a defaultdict whose keys are the column positions and whose values are character counters. That gives you one character counter per column.
The most_common() method returns the highest frequency (character, count) pair for that counter.
The [0][0] extracts the character from the list of (character, count) tuples.
The str.join() method combines the results back together.

Counting number of permutations of each element

I need some help. I check, there are few questions about 'counting permutations', but I didn't find an answer suitable for my case.
I would like to count the total number of permutations of each item in a list of items. Say, you have two lists ('first', 'second' see below) and for each element of 'first', I would like to have its total number of unique permutations. e.g for 'a' in 'first' we have
ab
ac
ad
ab
ac
ad
ab
ab
by removing duplicates, we have
ab ac ad
So the number of permutations of 'a' will be '3'
The final result I would like to get should be like
(a, 3)
(b, 3)
(c, 3)
(d, 3)
I start with
import itertools
from collections import Counter
first = ['a','b','c','d']
second = [['a','b','c','d'], ['a','b'], ['a','c','d'], ['a','b','d']]
c = Counter()
for let in second:
letPermut = list(set(itertools.permutations(let, 2)))
for i in first:
for permut in letPermut:
if permut[0] == i:
c[i] += 1
for item in c.items():
print(item)
But in the output I get different counts for each element in first list, and the Counter's results are higher than the expected output. I don't know what I am doing wrong.
Any help?

Well, the question is still not very clear, but here my 0.02$:
def do_the_stuff(first, second):
second = list(set(second))
return {
el1: sum(1 for el2 in second if el1 in el2)
for el1 in first
}
With some test data:
>>> first = ['a','b','c','d', 'j']
>>> second = ['abcd', 'ab', 'ab', 'acd', 'abd']
>>> print do_the_stuff(first, second)
{'a': 4, 'c': 2, 'b': 3, 'd': 3, 'j': 0}

If I did understand well your problem, these changes make your code ignore duplicate permutations:
import itertools
from collections import Counter
first = ['a','b','c','d']
second = [['a','b','c','d'], ['a','b'], ['a','c','d'], ['a','b','d']]
uniques = []
c = Counter()
for let in second:
letPermut = list(set(itertools.permutations(let, 2)))
for i in first:
for permut in letPermut:
if permut[0] == i and not permut in uniques:
c[i] += 1
uniques.append(permut)
for item in c.items():
print(item)
The changes:
Declare an empty list called uniques
We check against uniques if permutation is a duplicate before counting +1
After increasing the counter we add the permutation to uniques for future check
Took the printing loop out of the for let in second loop. Thus, each counter is only printed once at the end.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find common elements of two strings including characters that occur many times - python

Related

how to find the most popular letter in a string that also has the lowest ascii value

how to print a string showing the number of times each character is repeated in a string?

Counting subsequent letters

How to count elements on each position in lists

Counting number of permutations of each element

Categories

Resources