Permutations to compare two different string to match - python

I'm cracking my head here to solve this problem.
I'm trying to compare two strings like:
a = 'apple'
b = 'ppale'
If I user permutation method, variable b could be re-generated and it can match with the variable a.
I wrote the code, god knows why, I'm only getting False value even when I'm expecting True.
import itertools
def gen(char, phrase):
char = list(char)
wordsList = list(map(''.join, itertools.permutations(char)))
for word in wordsList:
if word == phrase:
return True
else:
return False
print(gen('apple', 'appel')) # should be True
print(gen('apple', 'elppa')) # should be True
print(gen('apple', 'ap')) # should be False

The problem is you're returning on the first iteration of the loop, regardless of whether there's a match or not. In particular, you're returning False the first time there's a mismatch. Instead, you need to complete the entire loop before returning False:
def gen(char, phrase):
char = list(char)
wordsList = list(map(''.join, itertools.permutations(char)))
for word in wordsList:
if word == phrase:
return True
return False
Note that there are some improvements that can be made. One is there's no need to do char = list(char), since char is already an iterable. Also, there's no need to expand the map result into a list. It's just being used as an iterable, so it can be used directly, which can potentially save a lot of memory:
def gen(char, phrase):
wordsIter = map(''.join, itertools.permutations(char))
for word in wordsIter:
if word == phrase:
return True
return False
However, since you're just comparing two words for the same characters, you don't really need to generate all the permutations. Instead, you just need to check to see of the two sets of characters are the same (allowing for multiple instances of characters, so technically you want to compare two multisets). You can do this much more efficiently as follows:
import collections
def gen(char, phrase):
counter1 = collections.Counter(char)
counter2 = collections.Counter(phrase)
return counter1 == counter2
This is an O(n) algorithm, which is the best that can be done for this problem. This is much faster than generating the permutations. For long strings, it's also significantly faster than sorting the letters and comparing the sorted results, which is O(n*log(n)).
Example output:
>>> gen("apple", "elxpa")
False
>>> gen("apple", "elppa")
True
>>> gen("apple", "elpa")
False
>>>
Note that this only returns True if the letters are the same, and the number of each letter is the same.
If you want to speed up the case where the two strings have different lengths, you could add a fast check up front that returns False if the lengths differ, before counting the characters.

The main reason it's not working is this that your loop is returning False back to the caller as soon as the first non-match occurs. What you want is something like this:
for word in wordsList:
if word == phrase:
return True
return False
which will test one or more permutations; if one matches, it will return True immediately, but only after they all fail to match will it return False.
Also, there's no need to do the char = list(char). A string is an iterable, just like a list, so you can use it as an argument to permutations().

You can do this much more simply: sort the letter in each word and compare for equality.
def gen(a, b):
return sorted(a) == sorted(b)

An alternate method you can use is to just loop through the letters of one word, and remove the letters in the other word if it exists. If after the iteration of the first word, the second word is empty, it is an anagram:
def anagram(word1, word2):
for letter in word1:
word2 = word2.replace(letter,"",1)
if word2 == "":
return True
return False
a = "apple"
b = "pleap"
print(anagram(a,b)) #returns True
a = "apple"
b = "plaap"
print(anagram(a,b)) #returns False

Related

How can I count the spaces and symbols in a string in python?

I'm trying to create a function that can determine whether a word or sentence is an anagram. I've come this far, but I can't figure out how to tell my function to handle special characters such as '!' or '?', or spaces in the string. Right now, the function will read spaces and symbols and return an anagram as False. Here's the code
def is_anagram(string_a, string_b):
string_a.lower()
string_b.lower()
if len(string_a) != len(string_b):
return False
char_times_a = dict()
char_times_b = dict()
for i in range(len(string_a)):
if string_a[i] not in char_times_a.keys():
char_times_a[string_a[i]] = 0
else:
char_times_a[string_a[i]] += 1
if string_b[i] not in char_times_b.keys():
char_times_b[string_b[i]] = 0
else:
char_times_b[string_b[i]] += 1
return char_times_a == char_times_b
is_anagram('scar', 'cars')
True
is_anagram('Tom Marvolo Riddle', 'I am Lord Voldemort')
False
that last statement should return as true, because it is an anagram.
Change the first two lines of the function from:
string_a.lower()
string_b.lower()
To:
string_a = string_a.lower().replace(' ', '')
string_b = string_b.lower().replace(' ', '')
You need to assign it back and also replace the spaces, lower does not do it in place.
Your problem is that your function considers a whitespace as part of the set of chars alongside the other letters. This might be what you want, bit then your second example is indeed not an anagram because there is different number of whitespaces.
Specifically, this line returns False for your example:
if len(string_a) != len(string_b):
return False
But even if you would remove it, your function counts the number of whitespaces, and also the characters are not lowercased, so it will return False either way.
You can create your functions like :
def is_anagram(string_a, string_b):
return set(string_a.lower()) == set(string_b.lower()) and len(string_a) == len(string_b)
or
def is_anagram(string_a, string_b):
return sorted(string_a.lower()) == sorted(string_b.lower())
You can use this method to remove all instances of special characters and white spaces from a string. The isalnum method returns True for alphanumeric characters. The lower method was included to prevent an error if your function is case sensitive.
string_a = ''.join(filter(str.isalnum, string_a)).lower()
string_b = ''.join(filter(str.isalnum, string_b)).lower()

How can I print double letters from string in a list without using regular expressions

I would like to know how to find the appearance of double letters in a list of strings without using regular expressions. Below is what I have so far.
word="kookss"
new_words=["laal","mkki"]
def double_letter(word):
for i in range(len(word)-1):
if word[i]== word[i+1]:
return (word[i],word[i+1])
print(double_letter(word))
for w in range(len(new_words)-1):
print(double_letter(new_words))
output :
["OO","ss"]
["aa"]
["kk"]
word="kookss"
new_words=["laal","mkki"]
def double_letter(word):
# each double letter found should be put in this list.
double_letters = []
for i in range(len(word)-1):
if word[i]== word[i+1]:
double_letters.append(word[i] + word[i+1])
return double_letters
print(double_letter(word))
for w in new_words:
# for each word `w` in list `new_words` call double_letter method
print(double_letter(w))
output:
['oo', 'ss']
['aa']
['kk']
your code is not working because:
for w in range(len(new_words)-1):
print(double_letter(new_words))
This code you are passing new_words (which is a list) to double_letter method which is expecting a single word.
Suddenly word[i]== word[i+1] becomes "laal" == "mkki" which is false so your return is None
you pass the same list twice so you get 2 None.

Evaluating whether a string is a subanagram of another

I would like to create a function with 2 arguments (x,y) ,x and y is a string, and returns true if x is a sub anagram of y. example: "red" is a sub anagram of "reda" but "reda" is not a sub anagram of "red".
So far what I have got:
I have turned x,y into list and then sorted them. That way I can compare the alphabets from each string.
def sub_anagram(str1, str2):
s1 = list(str1)
s2 = list(str2)
s1.sort()
s2.sort()
for letters in s2:
if letters in s1:
return True
else:
return False
What I am confused with:
I want to compare the string y to x and if y contains all the characters from x then it returns true otherwise false
You can use collections.Counter.
from collections import Counter
def subanagram(str1, str2):
str1_counter, str2_counter = Counter(str1), Counter(str2)
return all(str1_counter[char] <= str2_counter[char]
for char in str1_counter)
In the code above, str1_counter is basically a dictionary with the characters appearing in str1 and their frequency as the key, value. Similarly for str2_counter.
Then the code checks that for all characters in str1, that character appears at least as many times in str2 as it does in str1.
Edit: If a subanagram is defined to be strictly smaller than the original, e.g. you want subanagram("red", "red") to be False, then first compare the two counters for equality.
from collections import Counter
def subanagram(str1, str2):
str1_counter, str2_counter = Counter(str1), Counter(str2)
if str1_counter == str2_counter:
return False
return all(str1_counter[char] <= str2_counter[char]
for char in str1_counter)
If I were not using Counter for some reason, it would be something along the lines of:
def subanagram(str1, str2):
if len(str1) == len(str2):
return False #Ensures strict subanagram
s2 = list(str2)
try:
for char in str1:
s2.remove(char)
except ValueError:
return False
return True
But as you can see, it is longer, less declarative and less efficient than using Counter.
I don't think you can just check for each character in x being present in y, as this does not account for a character being repeated in x. In other words, 'reeeeed' is not a sub-anagram of 'reda'.
This is one way to do it:
make a copy of y
for each character in x, if that character is present in the y-copy, remove it from the y-copy. if it isn't present, return false.
if you reach the end of the loop and the y-copy is empty, return false. (x is an anagram, but not a sub-anagram.)
otherwise return true.

make a program return True if there is more than one dot in the string?

so I'm new to programming (and python) and I have to make this program that returns True if the string has zero or one dot characters ("." characters) and return False if the string contains two or more dots
here is what I currently have, I cannot get it to work for me, please correct me if I am wrong, thanks!
def check_dots(text):
text = []
for char in text:
if '.' < 2 in text:
return True
else:
return False
Use the builtin Python function list.count()
if text.count('.') < 2:
return True
It can be even shorter if instead of an if-else statement, you do
return text.count('.') < 2
Also, there are some errors in your function. All you need to do is
def check_dots(text):
return text.count('.') < 2
A correct and shorter version would be:
return text.count('.') <= 1
Python has a function called count()
You can do the following.
if text.count('.') < 2: #it checks for the number of '.' occuring in your string
return True
else:
return False
A shortcut would be:
return text.count('.')<2
Let's analyze the above statement.
in this part, text.count('.')<2: It basically says "I will check for periods that occur less than twice in the string and return True or False depending on the number of occurences." So if text.count('.') was 3, then that would be 3<2 which would become False.
another example. Say you want it to return False if a string is longer than 7 characters.
x = input("Enter a string.")
return len(x)>7
The code snippet len(x)>7 means that the program checks for the length of x. Let's pretend the string length is 9. In this case, len(x) would evaluate to 9, then it would evaluate to 9>7, which is True.
I shall now analyze your code.
def check_dots(text):
text = [] ################ Don't do this. This makes it a list,
# and the thing that you are trying to
# do involves strings, not lists. Remove it.
for char in text: #not needed, delete
if '.' < 2 in text: #I see your thinking, but you can use the count()
#to do this. so -> if text.count('.')<2: <- That
# will do the same thing as you attempted.
return True
else:
return False

Search for a pattern in a string in python

Question: I am very new to python so please bear with me. This is a homework assignment that I need some help with.
So, for the matchPat function, I need to write a function that will take two arguments, str1 and str2, and return a Boolean indicating whether str1 is in str2. But I have to use an asterisk as a wild card in str1. The * can only be used in str1 and it will represent one or more characters that I need to ignore. Examples of matchPat are as follow:
matchPat ( 'a*t*r', 'anteaters' ) : True
matchPat ( 'a*t*r', 'albatross' ) : True
matchPat ( 'a*t*r', 'artist' ) : False
My current matchPat function can tell whether the characters of str1 are in str2 but I don't really know how I could tell python (by using the * as a wild card) to look for 'a' (the first letter) and after it finds a, skip the next 0 or more characters until it finds the next letter(which would be 't' in the example) and so on.
def matchPat(str1,str2):
## str(*)==str(=>1)
if str1=='':
return True
elif str2=='':
return False
elif str1[0]==str2[0]:
return matchPat(str1[2],str2[len(str1)-1])
else: return True
Python strings have the in operator; you can check if str1 is a substring of str2 using str1 in str2.
You can split a string into a list of substrings based on a token. "a*b*c".split("*") is ["a","b","c"].
You can find the offset of next occurrence of a substring in a string using the string's find method.
So the problem of wildcard matching becomes:
split the pattern into parts which were separated by astrix
for each part of the pattern
can we find this after the previous part's locations?
You are going to have to cope with corner cases like patterns that start with or end with an asterisk or have two asterisk beside each other and so on. Good luck!
There is a find() method of strings that searches for a substring from a particular point, returning either its index (if found) or -1 if not found. The index() method is similar but raises an exception if the target string is not found.
I'd suggest that you first split the pattern string on "*". This will give you a list of chunks to look for. Set the starting position to zero, and for each element in the list of chunks, do a find() or index() from the current position.
If you find the current chunk then work out from its starting position and length where to start searching for the next chunk and update the starting position. If you find all the chunks then the target string matches the pattern. If any chunk is missing then the pattern search should fail.
Since this is homework I am hoping that gives you enough of an idea to move on.
The basic idea here is to compare each character in str1 and str2, and if char in str1 is "*", find that character in str2 which is the character next to the "*" in str1.
Assuming that you are not going to use any function, (except find(), which can be implemented easily), this is the hard way (the code is straight-forward but messy, and I've commented wherever possible)-
def matchPat(str1, str2):
index1 = 0
index2 = 0
while index1 < len(str1):
c = str1[index1]
#Check if the str2 has run it's course.
if index2 >= len(str2):
#This needs to be checked,assuming matchPatch("*", "") to be true
if(len(str2) == 0 and str1 == "*"):
return True
return False
#If c is not "*", then it's normal comparision.
if c != "*":
if c != str2[index2]:
return False
index2 += 1
#If c is "*", then you need to increment str1,
#search for the next value in str2,
#and update index2
else:
index1 += 1
if(index1 == len(str1)):
return True
c = str1[index1]
#Search the character in str2
i = str2.find(c, index2)
#If search fails, return False
if(i == -1):
return False
index2 = i + 1
index1 += 1
return True
OUTPUT -
print matchPat("abcde", "abcd")
#False
print matchPat("a", "")
#False
print matchPat("", "a")
#True
print matchPat("", "")
#True
print matchPat("abc", "abc")
#True
print matchPat("ab*cd", "abacacd")
#False
print matchPat("ab*cd", "abaascd")
#True
print matchPat ('a*t*r', 'anteater')
#True
print matchPat ('a*t*r', 'albatross')
#True
print matchPat ('a*t*r', 'artist')
#False
Without giving you the complete answer, first, split the str1 string into a list of strings on the '*' character. I usually call str1 the "needle" and str2 the "haystack", since you are looking for the needle in the haystack.
needles = needle.split('*')
Next, have a counter (which I will call i) start at 0. You will always be looking at haystack[i:] for the next string in needles.
In pseudocode, it'll look like this:
needles = needle.split('*')
i = 0
loop through all strings in needles:
if current needle not in haystack[i:], return false
increment i to just after the occurence of the current needle in haystack (use the find() string method or write your own function to handle this)
return true
Are you allowed to use regular expressions? If so, the function you're looking for already exists in the re.search function:
import re
bool(re.search('a.t.r', 'anteasters')) # True
bool(re.search('a.t.r', 'artist' )) # False
And if asterisks are a strict necessity, you can use regular expressions for that, too:
newstr = re.sub('\*', '.', 'a*t*r') # Replace * with .
bool(re.search(newstr, 'anteasters')) # Search using the new string
If regular expressions aren't allowed, the simplest way to do that would be to look at substrings of the second string that are the same length as the first string, and compare the two. Something like this:
def matchpat(str1, str2):
if len(str1) > len(str2): return False #Can't match if the first string is longer
for i in range(0, len(str2)-len(str1)+1):
substring = str2[i:i+len(str1)] # create substring of same length as first string
for j in range(0, len(str1)):
matched = False # assume False until match is found
if str1[j] != '*' and str1[j] != substring[j]: # check each character
break
matched = True
if matched == True: break # we don't need to keep searching if we've found a match
return matched

Categories