Finding common letters between 2 strings in Python

Finding common letters between 2 strings in Python - python

For a homework assignment, I have to take 2 user inputted strings, and figure out how many letters are common (in the same position of both strings), as well as find common letters.. For example for the two strings 'cat' and 'rat', there are 2 common letter positions (which are positions 2 and 3 in this case), and the common letters are also 2 because 'a' is found one and 't' is found once too..
So I made a program and it worked fine, but then my teacher updated the homework with more examples, specifically examples with repetitive letters, and my program isn't working for that.. For example, with strings 'ahahaha' and 'huhu' - there are 0 common letters in same positions, but there's 3 common letters between them (because 'h' in string 2 appears in string 1, three times..)
My whole issue is that I can't figure out how to count if "h" appears multiple times in the first string, as well as I don't know how to NOT check the SECOND 'h' in huhu because it should only count unique letters, so the overall common letter count should be 2..
This is my current code:
S1 = input("Enter a string: ")
S2 = input("Enter a string: ")
i = 0
big_string = 0
short_string = 0
same_letter = 0
common_letters = 0
if len(S1) > len(S2):
big_string = len(S1)
short_string = len(S2)
elif len(S1) < len(S2):
big_string = len(S2)
short_string = len(S1)
elif len(S1) == len(S2):
big_string = short_string = len(S1)
while i < short_string:
if (S1[i] == S2[i]) and (S1[i] in S2):
same_letter += 1
common_letters += 1
elif (S1[i] == S2[i]):
same_letter += 1
elif (S1[i] in S2):
common_letters += 1
i += 1
print("Number of positions with the same letter: ", same_letter)
print("Number of letters from S1 that are also in S2: ", common_letters)
So this code worked for strings without common letters, but when I try to use it with "ahahaha" and "huhu" I get 0 common positions (which makes sense) and 2 common letters (when it should be 3).. I figured it might work if I tried to add the following:
while x < short_string:
if S1[i] in S2[x]:
common_letters += 1
else:
pass
x += 1
However this doesn't work either...
I am not asking for a direct answer or piece of code to do this, because I want to do it on my own, but I just need a couple of hints or ideas how to do this..
Note: I can't use any functions we haven't taken in class, and in class we've only done basic loops and strings..

You need a data structure like multidict. To my knowledge, the most similar data structure in standard library is Counter from collections.
For simple frequency counting:
>>> from collections import Counter
>>> strings = ['cat', 'rat']
>>> counters = [Counter(s) for s in strings]
>>> sum((counters[0] & counters[1]).values())
2
With index counting:
>>> counters = [Counter(zip(s, range(len(s)))) for s in strings]
>>> sum(counters[0] & counters[1].values())
2
For your examples ahahaha and huhu, you should get 2 and 0, respectively since we get two h but in wrong positions.
Since you can't use advanced constructs, you just need to simulate counter with arrays.
Create 26 elements arrays
Loop over strings and update relevant index for each letter
Loop again over arrays simultaneously and sum the minimums of respective indexes.

A shorter version is this:
def gen1(listItem):
returnValue = []
for character in listItem:
if character not in returnValue and character != " ":
returnValue.append(character)
return returnValue
st = "first string"
r1 = gen1(st)
st2 = "second string"
r2 = gen1(st2)
if len(st)> len(st2):
print list(set(r1).intersection(r2))
else:
print list(set(r2).intersection(r1))
Note:
This is a pretty old post but since its got new activity,I posted my version.

Since you can't use arrays or lists,
Maybe try to add every common character to a var_string then test
if c not in var_string:
before incrementing your common counter so you are not counting the same character multiple times.

You are only getting '2' because you're only going to look at 4 total characters out of ahahaha (because huhu, the shortest string, is only 4 characters long). Change your while loop to go over big_string instead, and then add (len(S2) > i) and to your two conditional tests; the last test performs an in, so it won't cause a problem with index length.
NB: All of the above implicitly assumes that len(S1) >= len(S2); that should be easy enough to ensure, using a conditional and an assignment, and it would simplify other parts of your code to do so. You can replace the first block entirely with something like:
if (len(S2) > len(S1)): (S2, S1) = (S1, S2)
big_string = len(S1)
short_string = len(S2)

We can solve this by using one for loop inside of another as follows
int y=0;
for(i=0;i<big_string ;i++)
{
for(j=0;j<d;j++)
{
if(s1[i]==s2[j])
{y++;}
}
If you enter 'ahahaha' and 'huhu' this code take first character of big
string 'a' when it goes into first foor loop. when it enters into second for loop
it takes first letter of small string 'h' and compares them as they are not
equal y is not incremented. In next step it comes out of second for loop but
stays in first for loop so it consider first character of big string 'a' and
compares it against second letter of small string 'u' as 'j' is incremented even
in this case both of them are not equal and y remains zero. Y is incremented in
the following cases:-
when it compares second letter of big string 'h' and small letter of first string y is incremented for once i,e y=1;
when it compares fourth letter of big string 'h' and small letter of first string y is incremented again i,e y=2;
when it compares sixth letter of big string 'h' and small letter of first string y is incremented again i,e y=3;
Final output is 3. I think that is what we want.

Related

How can I make this 2 strings matching algorithm more efficient?

So I am working on this exercise on code wars and my code does what it is supposed to do, but it needs to be more efficient and I don't know what else I can do. Below are the exercise and code I wrote.
Complete the function scramble(str1, str2) that returns true if a portion of str1 characters can be rearranged to match str2, otherwise returns false.
Notes:
Only lower case letters will be used (a-z). No punctuation or digits will be included.
Performance needs to be considered
Input strings s1 and s2 are null terminated.
Examples
scramble('rkqodlw', 'world') ==> True
scramble('cedewaraaossoqqyt', 'codewars') ==> True
scramble('katas', 'steak') ==> False
def scramble(s1, s2):
#initiate variables
i,j,count =0,0,0
#sorting our 2 strings
s1, s2="".join(sorted(s1)),"".join(sorted(s2))
#for loop to go over each character in the str we want to match, s2
for j in range(len(s2)):
#while loop to go over s1 to match characters to s2 char
i=0
while i<len(s1):
#when 2 chars match, count increases by 1, i increases to exit while loop
#s1 sliced for increasing efficiency in the next loop
if s1[i]==s2[j]:
count+=1
x=i
i=len(s1)
s1=s1[x+1:]
#if character larger in s1, no need to go through the whole string
#therefore, exits while loop and slices
elif s1[i]>s2[j]:
x=i
i=len(s1)
s1=s1[x+1:]
#increases iterator
else: i+=1
#return statement, if count equals length of s2 then it must be true
if len(s2)==count: return True
else: return False
Side question: is the time complexity for this code O(n^2)?

You could make use of the fact that an exception is thrown if you try to find and pop an element from a list which does not exist:
def scramble(s1,s2):
l1 = list(s1)
for c in s2:
try:
i = l1.index(c)
l1.pop(i)
except:
return False
return True

So here you are sorting both strings which gives O(n log n + m log m) complexity for n and m being sizes of s1 and s2. Then you are running two nested loops to check for characters from s2 to exist in s1 and then rewriting s1 if the character in question does exist. This is actually super inefficient, and it indeed gives you O(nm) complexity for that part with quite a big constant on top (since you are rewriting a string over and over).
Instead you can just count occurrences of each character in each string to see if you have enough characters to compose s2 from s1. That would give you linear time plus the size of an alphabet, which is a small constant (26 in your case since only lower-case alphabetic characters are allowed according to the challenge prompt).

Deciding whether a string is a palindrome

This is a python question. Answer should be with O(n) time complexity and use no additional memory. As input i get a string which should be classified as palindrome or not (palindrome is as word or a phrase that can be read the same from left to right and from right to left, f.e "level"). In the input there can be punctuation marks and gaps between words.
For example "I. did,,, did I????" The main goal is to decide whether the input is a palindrome.
When I tried to solve this question i faced several challenges. When I try to delete non letter digits
for element in string:
if ord(element) not in range(97, 122):
string.remove(element)
if ord(element) == 32:
string.remove(element)
I use O(n^2) complexity, because for every element in the string i use remove function, which itself has O(n) complexity, where n is the length of the list. I need help optimizing the part with eliminating non letter characters with O(n) complexity
Also, when we get rid of spaces as punctuation marks I know how to check whether a word is a palindrome, but my method uses additional memory.

Here is your O(n) solution without creating a new string:
def is_palindrome(string):
left = 0
right = len(string) - 1
while left < right:
if not string[left].isalpha():
left += 1
continue
if not string[right].isalpha():
right -= 1
continue
if string[left] != string[right]:
return False
left += 1
right -= 1
return True
print(is_palindrome("I. did,,, did I????"))
Output:
True

I'm assuming you mean you want to test if a string is a palindrome when we remove all punctuation digits from the string. In that case, the following code should suffice:
from string import ascii_letters
def is_palindrome(s):
s = ''.join(c for c in s if c in ascii_letters)
return s == s[::-1]
# some test cases:
print(is_palindrome('hello')) # False
print(is_palindrome('ra_ceca232r')) # True

Here's a one-liner using assignment expression syntax (Python 3.8+):
>>> s = "I. did,,, did I????"
>>> (n := [c.lower() for c in s if c.isalpha()]) == n[::-1]
True
I mostly showed the above as a demonstration; for readability's sake I'd recommend something more like SimonR's solution (although still using isalpha over comparing to ascii_letters).
Alternatively, you can use generator expressions to do the same comparison without allocating O(n) extra memory:
def is_palindrome(s):
forward = (c.lower() for c in s if c.isalpha())
back = (c.lower() for c in reversed(s) if c.isalpha())
return all(a == b for a, b in zip(forward, back))
Note that zip still allocates in Python 2, you'll need to use itertools.izip there.

Will this help:
word = input('Input your word: ')
word1 = ''
for l in word:
if l.isalnum():
word1 += l
word2=''
for index in sorted(range(len(word1)),reverse=True):
word2+=word1[index]
if word1 == word2:
print('It is a palindrone.')
else:
print('It is not a palindrone.')

Backward search implementation python

I am dealing with some string search tasks just to improve an efficient way of searching.
I am trying to implement a way of counting how many substrings there are in a given set of strings by using backward search.
For example given the following strings:
original = 'panamabananas$'
s = smnpbnnaaaaa$a
s1 = $aaaaaabmnnnps #sorted version of s
I am trying to find how many times the substring 'ban' it occurs. For doing so I was thinking in iterate through both strings with zip function. In the backward search, I should first look for the last character of ban (n) in s1 and see where it matches with the next character a in s. It matches in indexes 9,10 and 11, which actually are the third, fourth and fifth a in s. The next character to look for is b but only for the matches that occurred before (This means, where n in s1 matched with a in s). So we took those a (third, fourth and fifth) from s and see if any of those third, fourth or fifth a in s1 match with any b in s. This way we would have found an occurrence of 'ban'.
It seems complex to me to iterate and save cuasi-occurences so what I was trying is something like this:
n = 0 #counter of occurences
for i, j in zip(s1, s):
if i == 'n' and j == 'a': # this should save the match
if i[3:6] == 'a' and any(j[3:6] == 'b'):
n += 1
I think nested if statements may be needed but I am still a beginner. Because I am getting 0 occurrences when there are one ban occurrences in the original.

You can run a loop with find to count the number of occurence of substring.
s = 'panamabananasbananasba'
ss = 'ban'
count = 0
idx = s.find(ss, 0)
while (idx != -1):
count += 1
idx += len(ss)
idx = s.find(ss, idx)
print count
If you really want backward search, then reverse the string and substring and do the same mechanism.
s = 'panamabananasbananasban'
s = s[::-1]
ss = 'ban'
ss = ss[::-1]

How to compare words in two lists according to same order in python?(I have def a function)

Recently, I def a function which can compare two words in each wordlist. However, I also found some problems here.
def printcorrectletters():
x=0
for letters in correctanswer:
for letters2 in userinput:
if letters == letters2:
x = x+1
break
return x
In this function, if the correctanswer='HUNTING', and I input 'GHUNTIN', it will show 6 letters are correct. However, I want it compare words' letters 1 by 1. So, it should march 0. For example, 'H' will match first letter of userinput.. and so on.
I also think another function which can solve it by using 'zip'. However, our TA ask me to finish it without things like 'zip'.

If the strings are different lengths, you want to compare each letter of the shorter string:
shortest_length = min(len(correctanswer), len(userinput))
min just gives you the minimum of two or more values. You could code it yourself as:
def min(a, b):
return a if a < b else b
You can index a character in a string, using [index]:
>>> 'Guanfong'[3]
n
So you can loop over all the letter indices:
correct = 0
for index in range(shortest_length):
if correctanswer[index] == userinput[index]:
correct += 1
If you did use zip and sum:
correct = sum(1 for (correct_char, user_char) in zip(correctanswer, userinput)
if correct_char == user_char)
Python provides great facilities for simplifying ideas and for communicating with the computer and programmers (including yourself, tomorrow).

Without zip you can use enumerate() to loop over elements of correctanswer , and get index and element at the same time. Example -
def printcorrectletters():
x=0
for i, letter in enumerate(correctanswer):
if i < len(userinput) and letter == userinput[i]:
x = x+1
return x
Or if even enumerate() is not allowed, simply use range() loop till len(correctanswer) and get elements from each index.

Find symmetric words in a text [duplicate]

This question already has answers here:
how to find words that made up of letter exactly facing each other? (python) [closed]
(4 answers)
Closed 9 years ago.
I have to write a function which takes one arguments text containing a block of text in the form of a str, and returns a sorted list of “symmetric” words. A symmetric word is defined as a word where for all values i, the letter i positions from the start of the word and the letter i positions from the end of the word are equi-distant from the respective ends of the alphabet. For example, bevy is a symmetric word as: b (1 position from the start of the word) is the second letter of the alphabet and y (1 position from the end of the word) is the second-last letter of the alphabet; and e (2 positions from the start of the word) is the fifth letter of the alphabet and v (2 positions from the end of the word) is the fifth-last letter of the alphabet.
For example:
>>> symmetrics("boy bread aloz bray")
['aloz','boy']
>>> symmetrics("There is a car and a book;")
['a']
All I can think about the solution is this but I can't run it since it's wrong:
def symmetrics(text):
func_char= ",.?!:'\/"
for letter in text:
if letter in func_char:
text = text.replace(letter, ' ')
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
sym = []
for word in text.lower().split():
n = range(0,len(word))
if word[n] == word[len(word)-1-n]:
sym.append(word)
return sym
The code above doesn't take into account the position of alpha1 and alpha2 as I don't know how to put it. Is there anyone can help me?

Here is a hint:
In [16]: alpha1.index('b')
Out[16]: 1
In [17]: alpha2.index('y')
Out[17]: 1
An alternative way to approach the problem is by using the str.translate() method:
import string
def is_sym(word):
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
tr = string.maketrans(alpha1, alpha2)
n = len(word) // 2
return word[:n] == word[::-1][:n].translate(tr)
print(is_sym('aloz'))
print(is_sym('boy'))
print(is_sym('bread'))
(The building of the translation table can be easily factored out.)

The for loop could be modified as:
for word in text.lower().split():
for n in range(0,len(word)//2):
if alpha1.index(word[n]) != alpha2.index(word[len(word)-1-n]):
break
else:
sym.append(word)
return sym

According to your symmetric rule, we may verify a symmetric word with the following is_symmetric_word function:
def is_symmetric_word(word):
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
length = len(word)
for i in range(length / 2):
if alpha1.index(word[i]) != alpha2.index(word[length - 1 - i]):
return False
return True
And then the whole function to get all unique symmetric words out of a text can be defined as:
def is_symmetrics(text):
func_char= ",.?!:'\/;"
for letter in text:
if letter in func_char:
text = text.replace(letter, ' ')
sym = []
for word in text.lower().split():
if is_symmetric_word(word) and not (word in sym):
sym.append(word)
return sym
The following are two test cases from you:
is_symmetrics("boy bread aloz bray") #['boy', 'aloz']
is_symmetrics("There is a car and a book;") #['a']

Code first. Discussion below the code.
import string
# get alphabet and reversed alphabet
try:
# Python 2.x
alpha1 = string.lowercase
except AttributeError:
# Python 3.x and newer
alpha1 = string.ascii_lowercase
alpha2 = alpha1[::-1] # use slicing to reverse alpha1
# make a dictionary where the key, value pairs are symmetric
# for example symd['a'] == 'z', symd['b'] == 'y', and so on
_symd = dict(zip(alpha1, alpha2))
def is_symmetric_word(word):
if not word:
return False # zero-length word is not symmetric
i1 = 0
i2 = len(word) - 1
while True:
if i1 >= i2:
return True # we have checked the whole string
# get a pair of chars
c1 = word[i1]
c2 = word[i2]
if _symd[c1] != c2:
return False # the pair wasn't symmetric
i1 += 1
i2 -= 1
# note, added a space to list of chars to filter to a space
_filter_to_space = ",.?!:'\/ "
def _filter_ch(ch):
if ch in _filter_to_space:
return ' ' # return a space
elif ch in alpha1:
return ch # it's an alphabet letter so return it
else:
# It's something we don't want. Return empty string.
return ''
def clean(text):
return ''.join(_filter_ch(ch) for ch in text.lower())
def symmetrics(text):
# filter text: keep only chars in the alphabet or spaces
for word in clean(text).split():
if is_symmetric_word(word):
# use of yield makes this a generator.
yield word
lst = list(symmetrics("The boy...is a yob."))
print(lst) # prints: ['boy', 'a', 'yob']
No need to type the alphabet twice; we can reverse the first one.
We can make a dictionary that pairs each letter with its symmetric letter. This will make it very easy to test whether any given pair of letters is a symmetric pair. The function zip() makes pairs from two sequences; they need to be the same length, but since we are using a string and a reversed copy of the string, they will be the same length.
It's best to write a simple function that does one thing, so we write a function that does nothing but check if a string is symmetric. If you give it a zero-length string it returns False, otherwise it sets i1 to the first character in the string and i2 to the last. It compares characters as long as they continue to be symmetric, and increments i1 while decrementing i2. If the two meet or pass each other, we know we have seen the whole string and it must be symmetric, in which case we return True; if it ever finds any pair of characters that are not symmetric, it returns False. We have to do the check for whether i1 and i2 have met or passed at the top of the loop, so it won't try to check if a character is its own symmetric character. (A character can't be both 'a' and 'z' at the same time, so a character is never its own symmetric character!)
Now we write a wrapper that filters out the junk, splits the string into words, and tests each word. Not only does it convert the chosen punctuation characters to spaces, but it also strips out any unexpected characters (anything not an approved punctuation char, a space, or a letter). That way we know nothing unexpected will get through to the inner function. The wrapper is "lazy"... it is a generator that yields up one word at a time, instead of building the whole list and returning that. It's easy to use list() to force the generator's results into a list. If you want, you can easily modify this function to just build a list and return it.
If you have any questions about this, just ask.
EDIT: The original version of the code didn't do the right thing with the punctuation characters; this version does. Also, as #heltonbiker suggested, why type the alphabet when Python has a copy of it you can use? So I made that change too.
EDIT: #heltonbiker's change introduced a dependency on Python version! I left it in with a suitable try:/except block to handle the problem. It appears that Python 3.x has improved the name of the lowercase ASCII alphabet to string.ascii_lowercase instead of plain string.lowercase.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding common letters between 2 strings in Python - python

Since you can't use arrays or lists, Maybe try to add every common character to a var_string then test if c not in var_string: before incrementing your common counter so you are not counting the same character multiple times.

Related

How can I make this 2 strings matching algorithm more efficient?

Deciding whether a string is a palindrome

Backward search implementation python

How to compare words in two lists according to same order in python?(I have def a function)

Find symmetric words in a text [duplicate]

Categories

Resources