Evaluating whether a string is a subanagram of another - python

I would like to create a function with 2 arguments (x,y) ,x and y is a string, and returns true if x is a sub anagram of y. example: "red" is a sub anagram of "reda" but "reda" is not a sub anagram of "red".
So far what I have got:
I have turned x,y into list and then sorted them. That way I can compare the alphabets from each string.
def sub_anagram(str1, str2):
s1 = list(str1)
s2 = list(str2)
s1.sort()
s2.sort()
for letters in s2:
if letters in s1:
return True
else:
return False
What I am confused with:
I want to compare the string y to x and if y contains all the characters from x then it returns true otherwise false

You can use collections.Counter.
from collections import Counter
def subanagram(str1, str2):
str1_counter, str2_counter = Counter(str1), Counter(str2)
return all(str1_counter[char] <= str2_counter[char]
for char in str1_counter)
In the code above, str1_counter is basically a dictionary with the characters appearing in str1 and their frequency as the key, value. Similarly for str2_counter.
Then the code checks that for all characters in str1, that character appears at least as many times in str2 as it does in str1.
Edit: If a subanagram is defined to be strictly smaller than the original, e.g. you want subanagram("red", "red") to be False, then first compare the two counters for equality.
from collections import Counter
def subanagram(str1, str2):
str1_counter, str2_counter = Counter(str1), Counter(str2)
if str1_counter == str2_counter:
return False
return all(str1_counter[char] <= str2_counter[char]
for char in str1_counter)
If I were not using Counter for some reason, it would be something along the lines of:
def subanagram(str1, str2):
if len(str1) == len(str2):
return False #Ensures strict subanagram
s2 = list(str2)
try:
for char in str1:
s2.remove(char)
except ValueError:
return False
return True
But as you can see, it is longer, less declarative and less efficient than using Counter.

I don't think you can just check for each character in x being present in y, as this does not account for a character being repeated in x. In other words, 'reeeeed' is not a sub-anagram of 'reda'.
This is one way to do it:
make a copy of y
for each character in x, if that character is present in the y-copy, remove it from the y-copy. if it isn't present, return false.
if you reach the end of the loop and the y-copy is empty, return false. (x is an anagram, but not a sub-anagram.)
otherwise return true.

Related

Permutations to compare two different string to match

I'm cracking my head here to solve this problem.
I'm trying to compare two strings like:
a = 'apple'
b = 'ppale'
If I user permutation method, variable b could be re-generated and it can match with the variable a.
I wrote the code, god knows why, I'm only getting False value even when I'm expecting True.
import itertools
def gen(char, phrase):
char = list(char)
wordsList = list(map(''.join, itertools.permutations(char)))
for word in wordsList:
if word == phrase:
return True
else:
return False
print(gen('apple', 'appel')) # should be True
print(gen('apple', 'elppa')) # should be True
print(gen('apple', 'ap')) # should be False
The problem is you're returning on the first iteration of the loop, regardless of whether there's a match or not. In particular, you're returning False the first time there's a mismatch. Instead, you need to complete the entire loop before returning False:
def gen(char, phrase):
char = list(char)
wordsList = list(map(''.join, itertools.permutations(char)))
for word in wordsList:
if word == phrase:
return True
return False
Note that there are some improvements that can be made. One is there's no need to do char = list(char), since char is already an iterable. Also, there's no need to expand the map result into a list. It's just being used as an iterable, so it can be used directly, which can potentially save a lot of memory:
def gen(char, phrase):
wordsIter = map(''.join, itertools.permutations(char))
for word in wordsIter:
if word == phrase:
return True
return False
However, since you're just comparing two words for the same characters, you don't really need to generate all the permutations. Instead, you just need to check to see of the two sets of characters are the same (allowing for multiple instances of characters, so technically you want to compare two multisets). You can do this much more efficiently as follows:
import collections
def gen(char, phrase):
counter1 = collections.Counter(char)
counter2 = collections.Counter(phrase)
return counter1 == counter2
This is an O(n) algorithm, which is the best that can be done for this problem. This is much faster than generating the permutations. For long strings, it's also significantly faster than sorting the letters and comparing the sorted results, which is O(n*log(n)).
Example output:
>>> gen("apple", "elxpa")
False
>>> gen("apple", "elppa")
True
>>> gen("apple", "elpa")
False
>>>
Note that this only returns True if the letters are the same, and the number of each letter is the same.
If you want to speed up the case where the two strings have different lengths, you could add a fast check up front that returns False if the lengths differ, before counting the characters.
The main reason it's not working is this that your loop is returning False back to the caller as soon as the first non-match occurs. What you want is something like this:
for word in wordsList:
if word == phrase:
return True
return False
which will test one or more permutations; if one matches, it will return True immediately, but only after they all fail to match will it return False.
Also, there's no need to do the char = list(char). A string is an iterable, just like a list, so you can use it as an argument to permutations().
You can do this much more simply: sort the letter in each word and compare for equality.
def gen(a, b):
return sorted(a) == sorted(b)
An alternate method you can use is to just loop through the letters of one word, and remove the letters in the other word if it exists. If after the iteration of the first word, the second word is empty, it is an anagram:
def anagram(word1, word2):
for letter in word1:
word2 = word2.replace(letter,"",1)
if word2 == "":
return True
return False
a = "apple"
b = "pleap"
print(anagram(a,b)) #returns True
a = "apple"
b = "plaap"
print(anagram(a,b)) #returns False

palindrome error with 8 or 9 letters

I have the following code, when going through the python, the options aaabaaaa, zzzazzazz gave me the false test.Here is the code, I am not too sure on how to fix it.
def checkPalindrome(inputString):
n=len(inputString)
#if string is one letter
if n==1:
return True
#if string has more than one letter
for i in range (0, math.floor(n/2)) :
if inputString[i]!=inputString[n-1-i]:
return False
else:
return True
You have a few issues. The main issue here is that your else clause has a return True inside the loop. What you'd want to do is finish iterating over the string before returning True. If you are familiar with boolean logic, this is the equivalent of short circuiting with AND.
The other issue (not really an issue, more a nitpick) is that you can just use integer division //, instead of having to import math's floor function.
So,
def isPalindrome(string):
for i in range(0, len(string) // 2):
if string[i] != string[-(i + 1)]:
return False
return True
Another way of handling this would be using all:
def isPalindrome(string):
return all(x == y for x, y in zip(string, reversed(string)))
Or, taking advantage of python's convenient slice notation for the most concise solution possible, we have:
def isPalindrome(string):
return string == string[::-1]
Try this which uses array slicing (reversing an array of chars)
def checkPalindrome(inputString):
n=len(inputString)
#if string is one letter
if n==1:
return True
#if string has more than one letter
return inputString==inputString[::-1]
Another approach could be using slicing. Strings can be accessed by index like arrays/lists and also be inverted like this.
def isPalindrom(string)
return string == string[::-1]
the [::-1] slicing returns the reversed string, the comparision with the original string is True if it's the same otherwise false.

How to see if a string only contains certain characters, and if they do, return True, else return False: Python

For python 3.4.1, how would you go about finding if certain characters are in your string? I tried doing it this way:
def isItBinary(myString):
for ele in myString:
if ele == '1' or if ele == '0':
return True
else:
return False
The problem with this code is that if I type isItBinary('102'), it will return True. I just want it to return True if and only if it contains '1' or '0'.
I would just use the all function.
def isItBinary(myString):
return all(x in ('0', '1') for x in myString)
The x in ('0', '1') checks that the character in x is either '0' or '1'.
You want to apply isitBinary on multiple characters, since as the way you wrote it it will return as soon as the first character is checked.
A simple way to do what you want would be:
def binaryChar(myCharacter):
return myCharacter == '1' or myCharacter == '0'
and then apply it to all of the chars in a string, like this:
def isItBinary(myString):
return all(binaryChar(c) for c in myString)
Of course, these can be simplified in a more readable way:
def isItBinary(myString):
return all(c in '01' for c in myString)
or via a lambda function:
isItBinary = lambda myString: all(c in '01' for c in myString)
Your program was returning the value as soon as the first character was encountered, you were not even iterating over the whole string. This approach below checks if the condition is invalid at any point, returns False otherwise iterates the whole string and returns True.
def isItBinary(myString):
for ele in myString:
if not ele in ("0","1"):
return False
return True
print isItBinary("102")
>>> False
print isItBinary("101")
>>> True
Use sets:
def is_it_binary(s):
return not (set(s) - set("01"))
If the set constructed from the string contains any characters not in the second set the subtraction gives a non-empty set. Otherwise if it contains only the characters in the second set you get an empty set which the not flips to True.
Alternatively just:
def is_it_binary(s):
allowed = set("01")
return all(c in allowed for c in s)
which has the advantage of short-circuiting (i.e. it bombs out as soon as an invalid character is found).

Search for a pattern in a string in python

Question: I am very new to python so please bear with me. This is a homework assignment that I need some help with.
So, for the matchPat function, I need to write a function that will take two arguments, str1 and str2, and return a Boolean indicating whether str1 is in str2. But I have to use an asterisk as a wild card in str1. The * can only be used in str1 and it will represent one or more characters that I need to ignore. Examples of matchPat are as follow:
matchPat ( 'a*t*r', 'anteaters' ) : True
matchPat ( 'a*t*r', 'albatross' ) : True
matchPat ( 'a*t*r', 'artist' ) : False
My current matchPat function can tell whether the characters of str1 are in str2 but I don't really know how I could tell python (by using the * as a wild card) to look for 'a' (the first letter) and after it finds a, skip the next 0 or more characters until it finds the next letter(which would be 't' in the example) and so on.
def matchPat(str1,str2):
## str(*)==str(=>1)
if str1=='':
return True
elif str2=='':
return False
elif str1[0]==str2[0]:
return matchPat(str1[2],str2[len(str1)-1])
else: return True
Python strings have the in operator; you can check if str1 is a substring of str2 using str1 in str2.
You can split a string into a list of substrings based on a token. "a*b*c".split("*") is ["a","b","c"].
You can find the offset of next occurrence of a substring in a string using the string's find method.
So the problem of wildcard matching becomes:
split the pattern into parts which were separated by astrix
for each part of the pattern
can we find this after the previous part's locations?
You are going to have to cope with corner cases like patterns that start with or end with an asterisk or have two asterisk beside each other and so on. Good luck!
There is a find() method of strings that searches for a substring from a particular point, returning either its index (if found) or -1 if not found. The index() method is similar but raises an exception if the target string is not found.
I'd suggest that you first split the pattern string on "*". This will give you a list of chunks to look for. Set the starting position to zero, and for each element in the list of chunks, do a find() or index() from the current position.
If you find the current chunk then work out from its starting position and length where to start searching for the next chunk and update the starting position. If you find all the chunks then the target string matches the pattern. If any chunk is missing then the pattern search should fail.
Since this is homework I am hoping that gives you enough of an idea to move on.
The basic idea here is to compare each character in str1 and str2, and if char in str1 is "*", find that character in str2 which is the character next to the "*" in str1.
Assuming that you are not going to use any function, (except find(), which can be implemented easily), this is the hard way (the code is straight-forward but messy, and I've commented wherever possible)-
def matchPat(str1, str2):
index1 = 0
index2 = 0
while index1 < len(str1):
c = str1[index1]
#Check if the str2 has run it's course.
if index2 >= len(str2):
#This needs to be checked,assuming matchPatch("*", "") to be true
if(len(str2) == 0 and str1 == "*"):
return True
return False
#If c is not "*", then it's normal comparision.
if c != "*":
if c != str2[index2]:
return False
index2 += 1
#If c is "*", then you need to increment str1,
#search for the next value in str2,
#and update index2
else:
index1 += 1
if(index1 == len(str1)):
return True
c = str1[index1]
#Search the character in str2
i = str2.find(c, index2)
#If search fails, return False
if(i == -1):
return False
index2 = i + 1
index1 += 1
return True
OUTPUT -
print matchPat("abcde", "abcd")
#False
print matchPat("a", "")
#False
print matchPat("", "a")
#True
print matchPat("", "")
#True
print matchPat("abc", "abc")
#True
print matchPat("ab*cd", "abacacd")
#False
print matchPat("ab*cd", "abaascd")
#True
print matchPat ('a*t*r', 'anteater')
#True
print matchPat ('a*t*r', 'albatross')
#True
print matchPat ('a*t*r', 'artist')
#False
Without giving you the complete answer, first, split the str1 string into a list of strings on the '*' character. I usually call str1 the "needle" and str2 the "haystack", since you are looking for the needle in the haystack.
needles = needle.split('*')
Next, have a counter (which I will call i) start at 0. You will always be looking at haystack[i:] for the next string in needles.
In pseudocode, it'll look like this:
needles = needle.split('*')
i = 0
loop through all strings in needles:
if current needle not in haystack[i:], return false
increment i to just after the occurence of the current needle in haystack (use the find() string method or write your own function to handle this)
return true
Are you allowed to use regular expressions? If so, the function you're looking for already exists in the re.search function:
import re
bool(re.search('a.t.r', 'anteasters')) # True
bool(re.search('a.t.r', 'artist' )) # False
And if asterisks are a strict necessity, you can use regular expressions for that, too:
newstr = re.sub('\*', '.', 'a*t*r') # Replace * with .
bool(re.search(newstr, 'anteasters')) # Search using the new string
If regular expressions aren't allowed, the simplest way to do that would be to look at substrings of the second string that are the same length as the first string, and compare the two. Something like this:
def matchpat(str1, str2):
if len(str1) > len(str2): return False #Can't match if the first string is longer
for i in range(0, len(str2)-len(str1)+1):
substring = str2[i:i+len(str1)] # create substring of same length as first string
for j in range(0, len(str1)):
matched = False # assume False until match is found
if str1[j] != '*' and str1[j] != substring[j]: # check each character
break
matched = True
if matched == True: break # we don't need to keep searching if we've found a match
return matched

How do I check existence of a string in a list of strings, including substrings?

I have written a function to check for the existence of a value in a list and return True if it exists. It works well for exact matches, but I need for it to return True if the value exists anywhere in the list entry (e.g. value <= listEntry, I think.) Here is the code I am using for the function:
def isValInLst(val,lst):
"""check to see if val is in lst. If it doesn't NOT exist (i.e. != 0),
return True. Otherwise return false."""
if lst.count(val) != 0:
return True
else:
print 'val is '+str(val)
return False
Without looping through the entire character string and/or using RegEx's (unless those are the most efficient), how should I go about this in a pythonic manner?
This is very similar to another SO question, but I need to check for the existence of the ENTIRE val string anywhere in the list. It would also be great to return the index / indices of matches, but I'm sure that's covered elsewhere on Stackoverflow.
If I understood your question then I guess you need any:
return any(val in x for x in lst)
Demo:
>>> lst = ['aaa','dfbbsd','sdfdee']
>>> val = 'bb'
>>> any(val in x for x in lst)
True
>>> val = "foo"
>>> any(val in x for x in lst)
False
>>> val = "fde"
>>> any(val in x for x in lst)
True
Mostly covered, but if you want to get the index of the matches I would suggest something like this:
indices = [index for index, content in enumerate(input) if substring in content]
if you want to add in the true/false you can still directly use the result from this list comprehension since it will return an empty list if your input doesn't contain the substring which will evaluate to False.
In the terms of your first function:
def isValInLst(val, lst):
return bool([index for index, content in enumerate(lst) if val in content])
where the bool() just converts the answer into a boolean value, but without the bool this will return a list of all places where the substring appears in the list.
There are multiple possibilities to do that. For example:
def valInList1 (val, lst):
# check `in` for each element in the list
return any(val in x for x in lst)
def valInList2 (val, lst):
# join the list to a single string using some character
# that definitely does not occur in val
return val in ';;;'.join(lst)

Categories