Creating anagram detector - python

I'm having trouble getting this anagram function to work. The aim is for
the function to take 2 strings abc and cba, convert them into a list;
sort them in to alphabetical order, compare the elements of the list and print whether they are anagrams or not.
My code is as follows...
def anagram(str1, str2):
x = str1
y = str2
x1 = x.sort()
y1 = y.sort()
if (x1) == (y1):
print("Anagram is True")
else:
print("Anagram is False")
str1 = str('abc')
str2 = str('cba')
print(anagram(str1, str2))

Your problem is that you can't call String.sort(). Try changing:
x1 = x.sort()
y1 = y.sort()
to:
x1 = sorted(x)
y1 = sorted(y)

The specific issue
x.sort() works in-place if x is a list. This means the sort method changes the objects internal representation. It also returns None which is the reason why it doesn't work as intended.
If x is a string, there is no .sort() method as strings are immutable.
I recommend to use the sorted() function instead, which returns the sorted string.
The more general issues
There are two more general issues:
Runtime: This is an O(log(n) * n) solution
Unicode modifiers and compound glyphs
Print: You print the value, but instead you should return the result. How would you test your code?
Unicode modifiers
Lets say you wrote the function more compact:
def is_anagram(a: str, b: str) -> bool:
return sorted(a) == sorted(b)
This works fine for normal characters, but fails for compound glyphs. For example, the thumbsup / thumbsdown emoji can be modified to have different colors. The change in color is actually a second unicode "character" which gives the skin tone. The modifier and the previous character belong together, but sorted just looks at the code points. Which results in this:
>>> is_anagram("๐Ÿ‘๐Ÿ‘Ž๐Ÿฟ", "๐Ÿ‘Ž๐Ÿ‘๐Ÿฟ")
True # <-- This should be False!
Sublime Text shows the actual code points:
You can easily fix this by using the grapheme package:
from grapheme import graphemes
def is_anagram(a: str, b: str) -> bool:
return sorted(graphemes(a)) == sorted(graphemes(b))
Runtime
You can get O(n) runtime if you don't sort, but instead count characters:
from collections import Counter
from grapheme import grahemes
def is_anagram(a: str, b: str) -> bool:
return not (Counter(grapheme(a)) - Counter(grapheme(b)))

you cannot call .sort() on a string, nor should you be cause that is actually a method that sorts a list in place and will not return anything. instead, use sorted(x)
>>> def anagram(str1, str2):
x1 = sorted(str1)
y1 = sorted(str2)
if (x1) == (y1):
print("Anagram is True")
else:
print("Anagram is False")
>>> anagram('abc','bca')
Anagram is True

Related

Convert list of similar ints to tuple of int and occurances [duplicate]

I'm trying to write a simple Python algorithm to solve this problem. Can you please help me figure out how to do this?
If any character is repeated more than 4 times, the entire set of
repeated characters should be replaced with a slash '/', followed by a
2-digit number which is the length of this run of repeated characters,
and the character. For example, "aaaaa" would be encoded as "/05a".
Runs of 4 or less characters should not be replaced since performing
the encoding would not decrease the length of the string.
I see many great solutions here but none that feels very pythonic to my eyes. So I'm contributing with a implementation I wrote myself today for this problem.
def run_length_encode(data: str) -> Iterator[Tuple[str, int]]:
"""Returns run length encoded Tuples for string"""
# A memory efficient (lazy) and pythonic solution using generators
return ((x, sum(1 for _ in y)) for x, y in groupby(data))
This will return a generator of Tuples with the character and number of instances, but can easily be modified to return a string as well. A benefit of doing it this way is that it's all lazy evaluated and won't consume more memory or cpu than needed if you don't need to exhaust the entire search space.
If you still want string encoding the code can quite easily be modified for that use case like this:
def run_length_encode(data: str) -> str:
"""Returns run length encoded string for data"""
# A memory efficient (lazy) and pythonic solution using generators
return "".join(f"{x}{sum(1 for _ in y)}" for x, y in groupby(data))
This is a more generic run length encoding for all lengths, and not just for those of over 4 characters. But this could also quite easily be adapted with a conditional for the string if wanted.
Rosetta Code has a lot of implementations, that should easily be adaptable to your usecase.
Here is Python code with regular expressions:
from re import sub
def encode(text):
'''
Doctest:
>>> encode('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW')
'12W1B12W3B24W1B14W'
'''
return sub(r'(.)\1*', lambda m: str(len(m.group(0))) + m.group(1),
text)
def decode(text):
'''
Doctest:
>>> decode('12W1B12W3B24W1B14W')
'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW'
'''
return sub(r'(\d+)(\D)', lambda m: m.group(2) * int(m.group(1)),
text)
textin = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
assert decode(encode(textin)) == textin
Aside for setting a=i after encoding a sequence and setting a width for your int when printed into the string. You could also do the following which takes advantage of pythons groupby. Its also a good idea to use format when constructing strings.
from itertools import groupby
def runLengthEncode (plainText):
res = []
for k,i in groupby(plainText):
run = list(i)
if(len(run) > 4):
res.append("/{:02}{}".format(len(run), k))
else:
res.extend(run)
return "".join(res)
Just observe the behaviour:
>>> runLengthEncode("abcd")
'abc'
Last character is ignored. You have to append what you've collected.
>>> runLengthEncode("abbbbbcd")
'a/5b/5b'
Oops, problem after encoding. You should set a=i even if you found a long enough sequence.
I know this is not the most efficient solution, but we haven't studied functions like groupby() yet so here's what I did:
def runLengthEncode (plainText):
res=''
a=''
count = 0
for i in plainText:
count+=1
if a.count(i)>0:
a+=i
else:
if len(a)>4:
if len(a)<10:
res+="/0"+str(len(a))+a[0][:1]
else:
res+="/" + str(len(a)) + a[0][:1]
a=i
else:
res+=a
a=i
if count == len(plainText):
if len(a)>4:
if len(a)<10:
res+="/0"+str(len(a))+a[0][:1]
else:
res+="/" + str(len(a)) + a[0][:1]
else:
res+=a
return(res)
Split=(list(input("Enter string: ")))
Split.append("")
a = 0
for i in range(len(Split)):
try:
if (Split[i] in Split) >0:
a = a + 1
if Split[i] != Split[i+1]:
print(Split[i],a)
a = 0
except IndexError:
print()
this is much easier and works everytime
def RLE_comp_encode(text):
if text == text[0]*len(text) :
return str(len(text))+text[0]
else:
comp_text , r = '' , 1
for i in range (1,len(text)):
if text[i]==text[i-1]:
r +=1
if i == len(text)-1:
comp_text += str(r)+text[i]
else :
comp_text += str(r)+text[i-1]
r = 1
return comp_text
This worked for me,
You can use the groupby() function combined with a list/generator comprehension:
from itertools import groupby, imap
''.join(x if reps <= 4 else "/%02d%s" % (reps, x) for x, reps in imap(lambda x: (x[0], len(list(x[1]))), groupby(s)))
An easy solution to run-length encoding which I can think of:
For encoding a string like "a4b5c6d7...":
def encode(s):
counts = {}
for c in s:
if counts.get(c) is None:
counts[c] = s.count(c)
return "".join(k+str(v) for k,v in counts.items())
For decoding a string like "aaaaaabbbdddddccccc....":
def decode(s):
return "".join((map(lambda tup: tup[0] * int(tup[1]), zip(s[0:len(s):2], s[1:len(s):2]))))
Fairly easy to read and simple.
text=input("Please enter the string to encode")
encoded=[]
index=0
amount=1
while index<=(len(text)-1):
if index==(len(text)-1) or text[index]!=text[(index+1)]:
encoded.append((text[index],amount))
amount=1
else:
amount=amount+1
index=index+1
print(encoded)

python, printing longest length of string in a list

My question is to write a function which returns the longest string and ignores any non-strings, and if there are no strings in the input list, then it should return None.
my answer:
def longest_string(x):
for i in max(x, key=len):
if not type(i)==str:
continue
if
return max
longest_string(['cat', 'dog', 'horse'])
I'm a beginner so I have no idea where to start. Apologies if this is quite simple.
This is how i would do it:
def longest_string(x):
Strings = [i for i in x if isinstance(i, str)]
return(max(Strings, key=len)) if Strings else None
Based on your code:
def longest_string(x):
l = 0
r = None
for s in x:
if isinstance(s, str) and len(s) > l:
l = len(s)
r = s
return r
print(longest_string([None, 'cat', 1, 'dog', 'horse']))
# horse
def longest_string(items):
try:
return max([x for x in items if isinstance(x, str)], key=len)
except ValueError:
return None
def longest_string(items):
strings = (s for s in items if isinstance(s, str))
longest = max(strings, key=len) if strings else None
return longest
print(longest_string(['cat', 'dog', 'horse']))
Your syntax is wrong (second-to-last line: if with no condition) and you are returning max which you did not define manually. In actuality, max is a built-in Python function which you called a few lines above.
In addition, you are not looping through all strings, you are looping through the longest string. Your code should instead be
def longest_string(l):
strings = [item for item in l if type(item) == str]
if len(strings):
return max(strings, key=len)
return None
You're on a good way, you could iterate the list and check each item is the longest:
def longest_string(x)
# handle case of 0 strings
if len(x) == 0:
return None
current_longest = ""
# Iterate the strings
for i in x:
# Handle nonestring
if type(i) != str:
continue
# if the current string is longer than the longest, replace the string.
if len(i) > len(current_longest):
current_longest = i
# This condition handles multiple elements where none are strings and should return None.
if len(current_longest) > 0:
return current_longest
else:
return None
Since you are a beginner, I recommend you to start using python's built-in methods to sort and manage lists. Is the best when it comes to logic and leaves less room for bugs.
def longest_string(x):
x = filter(lambda obj: isinstance(obj, str), x)
longest = max(list(x), key=lambda obj: len(obj), default=None)
return longest
Nonetheless, you were in a good way. Just avoid using pythonยดs keywords for variable names (such as max, type, list, etc.)
EDIT: I see a lot of answers using one-liner conditionals, list comprehension, etc. I think those are fantastic solutions, but for the level of programming the OP is at, my answer attempts to document each step of the process and be as readable as possible.
First of all, I would highly suggest defining the type of the x argument in your function.
For example; since I see you are passing a list, you can define the type like so:
def longest_string(x: list):
....
This not only makes it more readable for potential collaborators but helps enormously when creating docstrings and/or combined with using an IDE that shows type hints when writing functions.
Next, I highly suggest you break down your "specs" into some pseudocode, which is enormously helpful for taking things one step at a time:
returns the longest string
ignores any non-strings
if there are no strings in the input list, then it should return None.
So to elaborate on those "specifications" further, we can write:
Return the longest string from a list.
Ignore any element from the input arg x that is not of type str
if no string is present in the list, return None
From here we can proceed to writing the function.
def longest_string(x: list):
# Immediately verify the input is the expected type. if not, return None (or raise Exception)
if type(x) != list:
return None # input should always be a list
# create an empty list to add all strings to
str_list = []
# Loop through list
for element in x:
# check type. if not string, continue
if type(element) != str:
pass
# at this point in our loop the element has passed our type check, and is a string.
# add the element to our str_list
str_list.append(element)
# we should now have a list of strings
# however we should handle an edge case where a list is passed to the function that contains no strings at all, which would mean we now have an empty str_list. let's check that
if not str_list: # an empty list evaluates to False. if not str_list is basically saying "if str_list is empty"
return None
# if the program has not hit one of the return statements yet, we should now have a list of strings (or at least 1 string). you can check with a simple print statement (eg. print(str_list), print(len(str_list)) )
# now we can check for the longest string
# we can use the max() function for this operation
longest_string = max(str_list, key=len)
# return the longest string!
return longest_string

Evaluating whether a string is a subanagram of another

I would like to create a function with 2 arguments (x,y) ,x and y is a string, and returns true if x is a sub anagram of y. example: "red" is a sub anagram of "reda" but "reda" is not a sub anagram of "red".
So far what I have got:
I have turned x,y into list and then sorted them. That way I can compare the alphabets from each string.
def sub_anagram(str1, str2):
s1 = list(str1)
s2 = list(str2)
s1.sort()
s2.sort()
for letters in s2:
if letters in s1:
return True
else:
return False
What I am confused with:
I want to compare the string y to x and if y contains all the characters from x then it returns true otherwise false
You can use collections.Counter.
from collections import Counter
def subanagram(str1, str2):
str1_counter, str2_counter = Counter(str1), Counter(str2)
return all(str1_counter[char] <= str2_counter[char]
for char in str1_counter)
In the code above, str1_counter is basically a dictionary with the characters appearing in str1 and their frequency as the key, value. Similarly for str2_counter.
Then the code checks that for all characters in str1, that character appears at least as many times in str2 as it does in str1.
Edit: If a subanagram is defined to be strictly smaller than the original, e.g. you want subanagram("red", "red") to be False, then first compare the two counters for equality.
from collections import Counter
def subanagram(str1, str2):
str1_counter, str2_counter = Counter(str1), Counter(str2)
if str1_counter == str2_counter:
return False
return all(str1_counter[char] <= str2_counter[char]
for char in str1_counter)
If I were not using Counter for some reason, it would be something along the lines of:
def subanagram(str1, str2):
if len(str1) == len(str2):
return False #Ensures strict subanagram
s2 = list(str2)
try:
for char in str1:
s2.remove(char)
except ValueError:
return False
return True
But as you can see, it is longer, less declarative and less efficient than using Counter.
I don't think you can just check for each character in x being present in y, as this does not account for a character being repeated in x. In other words, 'reeeeed' is not a sub-anagram of 'reda'.
This is one way to do it:
make a copy of y
for each character in x, if that character is present in the y-copy, remove it from the y-copy. if it isn't present, return false.
if you reach the end of the loop and the y-copy is empty, return false. (x is an anagram, but not a sub-anagram.)
otherwise return true.

Python: can a function return a string?

I am making a recursive function that slices string until it is empty. When it is empty it alternatively selects the characters and is supposed to print or return the value. In this case I am expecting my function to return two words 'Hello' and 'World'. Maybe I have got it all wrong but what I don't understand is that my function doesn't let me print or return string. I am not asking for help but I'd like some explanation :) thanks
def lsubstr(x):
a= ''
b= ''
if x == '':
return ''
else:
a = a + x[0:]
b = b + x[1:]
lsubstr(x[2:])
#print (a,b)
return a and b
lsubstr('hweolrllod')
so I changed my code to this:
def lsubstr(x):
if len(x) <1:
return x
else:
return (lsubstr(x[2:])+str(x[0]),lsubstr(x[2:])+str(x[1]))
lsubstr('hweolrllod')
and what I am trying to make is a tuple which will store 2 pairs of characters and concatenate the next ones,
the error I get is
TypeError: Can't convert 'tuple' object to str implicitly
what exactly is going wrong, I have checked in visualization, it has trouble in concatenating.
The and keyword is a boolean operator, which means it compares two values, and returns one of the values. I think you want to return a tuple instead, like this:
...
return (a, b)
And then you can access the values using the indexing operator like this:
a = lsubstr( ... )
a[0]
a[1]
Or:
word1, word2 = lsubstr( ... )

How do I check existence of a string in a list of strings, including substrings?

I have written a function to check for the existence of a value in a list and return True if it exists. It works well for exact matches, but I need for it to return True if the value exists anywhere in the list entry (e.g. value <= listEntry, I think.) Here is the code I am using for the function:
def isValInLst(val,lst):
"""check to see if val is in lst. If it doesn't NOT exist (i.e. != 0),
return True. Otherwise return false."""
if lst.count(val) != 0:
return True
else:
print 'val is '+str(val)
return False
Without looping through the entire character string and/or using RegEx's (unless those are the most efficient), how should I go about this in a pythonic manner?
This is very similar to another SO question, but I need to check for the existence of the ENTIRE val string anywhere in the list. It would also be great to return the index / indices of matches, but I'm sure that's covered elsewhere on Stackoverflow.
If I understood your question then I guess you need any:
return any(val in x for x in lst)
Demo:
>>> lst = ['aaa','dfbbsd','sdfdee']
>>> val = 'bb'
>>> any(val in x for x in lst)
True
>>> val = "foo"
>>> any(val in x for x in lst)
False
>>> val = "fde"
>>> any(val in x for x in lst)
True
Mostly covered, but if you want to get the index of the matches I would suggest something like this:
indices = [index for index, content in enumerate(input) if substring in content]
if you want to add in the true/false you can still directly use the result from this list comprehension since it will return an empty list if your input doesn't contain the substring which will evaluate to False.
In the terms of your first function:
def isValInLst(val, lst):
return bool([index for index, content in enumerate(lst) if val in content])
where the bool() just converts the answer into a boolean value, but without the bool this will return a list of all places where the substring appears in the list.
There are multiple possibilities to do that. For example:
def valInList1 (val, lst):
# check `in` for each element in the list
return any(val in x for x in lst)
def valInList2 (val, lst):
# join the list to a single string using some character
# that definitely does not occur in val
return val in ';;;'.join(lst)

Categories