Understanding isomorphic strings algorithm

Understanding isomorphic strings algorithm - python

I am understanding the following code to find if the strings are isomorphic or not. The code uses two hashes s_dict and t_dict respectively. I am assuming the strings are of same length.
def isIsomorphic(s, t):
s_dict = {}
t_dict = {}
for i in range(len(s)):
if s[i] in s_dict.keys() and s_dict[s[i]] != t[i]:
return False
if t[i] in t_dict.keys() and t_dict[t[i]] != s[i]:
return False
s_dict[s[i]] = t[i]
t_dict[t[i]] = s[i]
return True
Now, if I modify the above code such that only one hash s_dict() is used, then also it gives desired results to my limited test cases. The modified code is as follows:
def isIsomorphic(s, t):
s_dict = {}
for i in range(len(s)):
if s[i] in s_dict.keys() and s_dict[s[i]] != t[i]:
return False
s_dict[s[i]] = t[i]
return True
What are the test cases in which the above modified code will fail? Is my understanding of the isomorphic strings wrong?

One simple example, your code doesn't work on s='ab',t='aa'.
Basically you have to have both way to be isomorphic. Your code only checked that t can be modified from s, but not the other way around.

This was kind of fun to look at. Just for kicks, here's my solution using itertools.groupby
from itertools import groupby
from collections import defaultdict
def get_idx_count(word):
"""Turns a word into a series of tuples (idx, consecutive_chars)
"aabbccaa" -> [[(0, 2), (3, 2)], [(1, 2)], [(2, 2)]]
"""
lst = defaultdict(list)
for idx, (grp, val) in enumerate(groupby(word)):
lst[grp].append((idx, sum(1 for _ in val)))
return sorted(list(lst.values()))
def is_isomorphic(a, b):
return get_idx_count(a) == get_idx_count(b)
is_isomorphic('aabbcc', 'bbddcc') # True
is_isomorphic('aabbaa', 'bbddcc') # False
Rather than building the lists, I feel like I could do something more like:
from itertools import groupby
from collections import defaultdict
def is_isomorphic(a, b):
a_idxs, b_idxs = defaultdict(set), defaultdict(set)
for idx, ((a_grp, a_vals), (b_grp, b_vals)) in enumerate(zip(groupby(a), groupby(b))):
if sum(1 for _ in a_vals) != sum(1 for _ in b_vals):
return False
# ensure sequence is of same length
if a_grp in a_idxs and b_idxs[b_grp] != a_idxs[a_grp] or\
b_grp in b_idxs and a_idxs[a_grp] != b_idxs[b_grp]:
return False
# ensure previous occurrences are matching groups
a_idxs[a_grp].add(idx)
b_idxs[b_grp].add(idx)
# save indexes for future checks
return True
But I haven't had a chance to test that whatsoever. It does, however, have the likelihood of exiting early, rather than having to build all the lists and comparing them at the end. It also skips a few sets of sorting that shouldn't be necessary anyway, except that we're pulling from dicts, so bleh. Pretty sure list.sort is faster than saving to to a collections.OrderedDict. I think.

Here is a solution for Isomorphic String problem in Python:
Solution #1 (the fastest one):
def isometric_strings_translate(str1: str, str2: str) -> bool:
trans = str.maketrans(str1, str2)
return str1.translate(trans) == str2
Solution #2 (using set):
def isometric_strings_set(str1: str, str2: str) -> bool:
return len(set(zip(str1, str2))) == len(set(str1))
Solution #3 (using dictionary):
def isometric_strings_dict(str1: str, str2: str) -> bool:
map = {}
for s1, s2 in zip(str1, str2):
if map.setdefault(s1, s2) != s2:
return False
return True
Here you can view full code.

Thinking about it without maps or dictionaries can help you understand:
def isIso(x,y):
if len(x) != len(y):
return False
for i in range(len(x)):
count = 0
if x.count(x[i]) != y.count(y[i]):
return False
return True

Related

Checking interweaving strings in pythonic way

I have a problem for checking the interweaving strings. So, I have 3 strings and I need to check whether the third string can be formed by interweaving the first two strings.
To simplify, interweaving strings means to merge them by alternating their letters without any specific pattern.
For example: "abc" and "123" are two strings and possible interweaving can be "abc123" or "a1b2c3" or "ab1c23" etc mixing them without any pattern but following order sequence.
another example, for false case:
str1 = "aabcc",
str2 = "dbbca",
str3 = "aadbbbaccc" # This is false case, where str3 is not interweaving from str1 and str2
what i have tried:
I have tried below solution which is working for me. returning True if string c is interweaving of string a and b else returning False
def weavingstring(a,b,c):
for i in c:
if len(a)>0:
if i == a[0]:
a=a[1:]
if len(b)>0:
if i == b[0]:
b=b[1:]
if len(a) > 0 or len(b) > 0:
return False
return True
print(weavingstring("abc", "def", "abcdef")) # True here
print(weavingstring("aabcc", "dbbca", "aadbbbaccc")) # False here
What i want
I have the normal solution, I wanted to know if there is any pythonic way of doing this by using list comprehension, map, filter, reduce or lambda functions etc.

One approach of using the lru_cache with lambda to speed things up:
from functools import lru_cache
class Solution:
def isInterleave(self, s1: str, s2: str, s3: str) -> bool:
return (memo:=lru_cache()
( lambda i=0, j=0, k=0:
(i, j, k) == (len(s1), len(s2), len(s3))
or ( i < len(s1) and k < len(s3) and s1[i] == s3[k] and memo( i+1, j, k+1 ) )
or ( j < len(s2) and k < len(s3) and s2[j] == s3[k] and memo( i, j+1, k+1 ) ) )
)()

May I interest you in a simple recursive solution?
def weaving(a: str, b: str, res: str) -> bool:
if not a: return b == res
if not b: return a == res
if not res: return False
if res[0] == a[0]:
return weaving(a[1:], b, res[1:])
elif res[0] == b[0]:
return weaving(a, b[1:], res[1:])
else:
return False

A simple solution would be to use regexes.
import re
def iswoven(first, second, test):
return all((re.search(r'.*'.join(x), test) for x in (first, second)))
>>> iswoven('abc', '123', 'a1b2c23')
True
>>> iswoven('abc', '123', 'a1b223')
False
This will check that the test string matches both 'a.*b.*c' and '1.*2.*3' and return the result.
Depending on the length of your strings, this could end up trying to match on some pretty hairy regexes, so buyer beware.

Isomorphic Strings (the easy way?)

So I have been trying to solve the Easy questions on Leetcode and so far I dont understand most of the answers I find on the internet. I tried working on the Isomorphic strings problem (here:https://leetcode.com/problems/isomorphic-strings/description/)
and I came up with the following code
def isIso(a,b):
if(len(a) != len(b)):
return false
x=[a.count(char1) for char1 in a]
y=[b.count(char1) for char1 in b]
return x==y
string1 = input("Input string1..")
string2 = input("Input string2..")
print(isIso(string1,string2))
Now I understand that this may be the most stupid code you have seen all day but that is kinda my point. I'd like to know why this would be wrong(and where) and how I should further develop on this.

If I understand the problem correctly, because a character can map to itself, it's just a case of seeing if the character counts for the two words are the same.
So egg and add are isomorphic as they have character counts of (1,2). Similarly paper and title have counts of (1,1,1,2).
foo and bar aren't isomorphic as the counts are (1,2) and (1,1,1) respectively.
To see if the character counts are the same we'll need to sort them.
So:
from collections import Counter
def is_isomorphic(a,b):
a_counts = list(Counter(a).values())
a_counts.sort()
b_counts = list(Counter(b).values())
b_counts.sort()
if a_counts == b_counts:
return True
return False
Your code is failing because here:
x=[a.count(char1) for char1 in a]
You count the occurrence of each character in the string for each character in the string. So a word like 'odd' won't have counts of (1,2), it'll have (1,2,2) as you count d twice!

You can use two dicts to keep track of the mapping of each character in a to b, and the mapping of each character in b to a while you iterate through a, and if there's any violation in a corresponding character, return False; otherwise return True in the end.
def isIso(a, b):
m = {} # mapping of each character in a to b
r = {} # mapping of each character in b to a
for i, c in enumerate(a):
if c in m:
if b[i] != m[c]:
return False
else:
m[c] = b[i]
if b[i] in r:
if c != r[b[i]]:
return False
else:
r[b[i]] = c
return True
So that:
print(isIso('egg', 'add'))
print(isIso('foo', 'bar'))
print(isIso('paper', 'title'))
print(isIso('paper', 'tttle')) # to test reverse mapping
would output:
True
False
True
False

I tried by creating a dictionary, and it resulted in 72ms runtime.
here's my code -
def isIsomorphic(s: str, t: str) -> bool:
my_dict = {}
if len(s) != len(t):
return False
else:
for i in range(len(s)):
if s[i] in my_dict.keys():
if my_dict[s[i]] == t[i]:
pass
else:
return False
else:
if t[i] in my_dict.values():
return False
else:
my_dict[s[i]] = t[i]
return True

There are many different ways on how to do it. Below I provided three different ways by using a dictionary, set, and string.translate.
Here I provided three different ways how to solve Isomorphic String solution in Python.

from itertools import zip_longest
def isomorph(a, b):
return len(set(a)) == len(set(b)) == len(set(zip_longest(a, b)))
here is the second way to do it:
def isomorph(a, b):
return [a.index(x) for x in a] == [b.index(y) for y in b]

How to check if a sequence in a tuple is part of a tuple? python [duplicate]

If I have this:
a='abcdefghij'
b='de'
Then this finds b in a:
b in a => True
Is there a way of doing an similar thing with lists?
Like this:
a=list('abcdefghij')
b=list('de')
b in a => False
The 'False' result is understandable - because its rightly looking for an element 'de', rather than (what I happen to want it to do) 'd' followed by 'e'
This is works, I know:
a=['a', 'b', 'c', ['d', 'e'], 'f', 'g', 'h']
b=list('de')
b in a => True
I can crunch the data to get what I want - but is there a short Pythonic way of doing this?
To clarify: I need to preserve ordering here (b=['e','d'], should return False).
And if it helps, what I have is a list of lists: these lists represents all possible paths (a list of visited nodes) from node-1 to node-x in a directed graph: I want to 'factor' out common paths in any longer paths. (So looking for all irreducible 'atomic' paths which constituent all the longer paths).
Related
Best Way To Determine if a Sequence is in another sequence in Python

I suspect there are more pythonic ways of doing it, but at least it gets the job done:
l=list('abcdefgh')
pat=list('de')
print pat in l # Returns False
print any(l[i:i+len(pat)]==pat for i in xrange(len(l)-len(pat)+1))

Don't know if this is very pythonic, but I would do it in this way:
def is_sublist(a, b):
if not a: return True
if not b: return False
return b[:len(a)] == a or is_sublist(a, b[1:])
Shorter solution is offered in this discussion, but it suffers from the same problems as solutions with set - it doesn't consider order of elements.
UPDATE:
Inspired by MAK I introduced more concise and clear version of my code.
UPDATE:
There are performance concerns about this method, due to list copying in slices. Also, as it is recursive, you can encounter recursion limit for long lists. To eliminate copying, you can use Numpy slices which creates views, not copies. If you encounter performance or recursion limit issues you should use solution without recursion.

I think this will be faster - It uses C implementation list.index to search for the first element, and goes from there on.
def find_sublist(sub, bigger):
if not bigger:
return -1
if not sub:
return 0
first, rest = sub[0], sub[1:]
pos = 0
try:
while True:
pos = bigger.index(first, pos) + 1
if not rest or bigger[pos:pos+len(rest)] == rest:
return pos
except ValueError:
return -1
data = list('abcdfghdesdkflksdkeeddefaksda')
print find_sublist(list('def'), data)
Note that this returns the position of the sublist in the list, not just True or False. If you want just a bool you could use this:
def is_sublist(sub, bigger):
return find_sublist(sub, bigger) >= 0

I timed the accepted solution, my earlier solution and a new one with an index. The one with the index is clearly best.
EDIT: I timed nosklo's solution, it's even much better than what I came up with. :)
def is_sublist_index(a, b):
if not a:
return True
index = 0
for elem in b:
if elem == a[index]:
index += 1
if index == len(a):
return True
elif elem == a[0]:
index = 1
else:
index = 0
return False
def is_sublist(a, b):
return str(a)[1:-1] in str(b)[1:-1]
def is_sublist_copylist(a, b):
if a == []: return True
if b == []: return False
return b[:len(a)] == a or is_sublist_copylist(a, b[1:])
from timeit import Timer
print Timer('is_sublist([99999], range(100000))', setup='from __main__ import is_sublist').timeit(number=100)
print Timer('is_sublist_copylist([99999], range(100000))', setup='from __main__ import is_sublist_copylist').timeit(number=100)
print Timer('is_sublist_index([99999], range(100000))', setup='from __main__ import is_sublist_index').timeit(number=100)
print Timer('sublist_nosklo([99999], range(100000))', setup='from __main__ import sublist_nosklo').timeit(number=100)
Output in seconds:
4.51677298546
4.5824368
1.87861895561
0.357429027557

So, if you aren't concerned about the order the subset appears, you can do:
a=list('abcdefghij')
b=list('de')
set(b).issubset(set(a))
True
Edit after you clarify: If you need to preserve order, and the list is indeed characters as in your question, you can use:
''.join(a).find(''.join(b)) > 0

This should work with whatever couple of lists, preserving the order.
Is checking if b is a sub list of a
def is_sublist(b,a):
if len(b) > len(a):
return False
if a == b:
return True
i = 0
while i <= len(a) - len(b):
if a[i] == b[0]:
flag = True
j = 1
while i+j < len(a) and j < len(b):
if a[i+j] != b[j]:
flag = False
j += 1
if flag:
return True
i += 1
return False

>>>''.join(b) in ''.join(a)
True

Not sure how complex your application is, but for pattern matching in lists, pyparsing is very smart and easy to use.

Use the lists' string representation and remove the square braces. :)
def is_sublist(a, b):
return str(a)[1:-1] in str(b)
EDIT: Right, there are false positives ... e.g. is_sublist([1], [11]). Crappy answer. :)

How to find all occurrences of a list of strings in a string and return a list of list of int in python?

For example
def find_all_occurrences(a,b):
'''
>>>find_all_occurrences('wswwswwwswwwws', ['ws', 'wws'])
[[0,3,7,12], [2,6,11]]
'''
How can I return a list of lists that have all the occurrences without import any modules.

You can use regular expressions
import re
def all_occurrences(a, b):
return [[occur.start() for occur in re.finditer(word, a)] for word in b]
Without imports it gets a little messy, but definitely still doable
def all_occurrences(a, b):
result = []
for word in b:
word_res = []
index = a.find(word)
while index != -1:
word_res.append(index)
index = a.find(word, index+1)
result.append(word_res)
return result

You can find all the occurrences by using the last found position as the start of the next search:
str.find(...)
S.find(sub [,start [,end]]) -> int
Return the lowest index in S where substring sub is found,
such that sub is contained within S[start:end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
A loop that calls haystack.find(needle, last_pos + 1) repeatedly until it returns -1 should work.

you can also have simple list comprehensions to help with problems like these
[[i for i in range(len(a)-len(strng)+1) if strng == a[i:i+len(strng)]] for strng in b]
where
>>> a
'wswwswwwswwwws'
>>> b
['ws', 'wws']

Solution with a recursive procedure. I used a nested/inner function to maintain the OP's function signature:
def find_all_occurrences(a,b):
'''
>>>find_all_occurrences('wswwswwwswwwws', ['ws', 'wws'])
[[0,3,7,12], [2,6,11]]
'''
def r(a, b, count = 0, result = None):
if not a:
return result
if result is None:
# one sublist for each item in b
result = [[] for _ in b]
for i, thing in enumerate(b):
if a.startswith(thing):
result[i].append(count)
return r(a[1:], b, count = count + 1, result = result)
return r(a, b)

Basic Anagram VS more advanced one

I'm currently working my way through a "List" unit and in one of the exercises we need to create an anagram (for those who don't know; two words are an anagram if you can rearrange the letters from one to spell the other).
The easier solution that comes to mind is:
def is_anagram(a, b):
return sorted(a) == sorted(b)
print is_anagram('god', 'dog')
It works, but it doesn't really satisfy me. If we were in this situation for example:
def is_anagram(a, b):
return sorted(a) == sorted(b)
print is_anagram('god', 'dyog') #extra letter 'd' in second argument
>>> False
Return is False, although we should be able to to build the word 'god' out of 'dyog'. Perhaps this game/problem isn't called an anagram, but I've been trying to figure it out anyway.
Technically my solution is to:
1- Iterate through every element of b.
2- As I do, I check if that element exists in a.
3- If all of them do; then we can create a out of b.
4- Otherwise, we can't.
I just can't seem to get it to work. For the record, I do not know how to use lambda
Some tests:
print is_anagram('god', 'ddog') #should return True
print is_anagram('god', '9d8s0og') #should return True
print is_anagram('god', '_#10d_o g') #should return True
Thanks :)

Since the other answers do not, at time of writing, support reversing the order:
Contains type hints for convenience, because why not?
# Just strip hints out if you're in Python < 3.5.
def is_anagram(a: str, b: str) -> bool:
long, short = (a, b) if len(a) > len(b) else (b, a)
cut = [x for x in long if x in short]
return sorted(cut) == sorted(short)
If, in future, you do learn to use lambda, an equivalent is:
# Again, strip hints out if you're in Python < 3.5.
def is_anagram(a: str, b: str) -> bool:
long, short = (a, b) if len(a) > len(b) else (b, a)
# Parentheses around lambda are not necessary, but may help readability
cut = filter((lambda x: x in short), long)
return sorted(cut) == sorted(short)

If you need to check if a word could be created from b you can do this
def is_anagram(a,b):
b_list = list(b)
for i_a in a:
if i_a in b_list:
b_list.remove(i_a)
else:
return False
return True
UPDATE(Explanation)
b_list = list(b) makes a list of str objects(characters).
>>> b = 'asdfqwer'
>>> b_list = list(b)
>>> b_list
['a', 's', 'd', 'f', 'q', 'w', 'e', 'r']
What basically is happening in the answer: We check if every character in a listed in b_list, when occurrence happens we remove that character from b_list(we do that to eliminate possibility of returning True with input 'good', 'god'). So if there is no occurrence of another character of a in rest of b_list then it's not advanced anagram.

def is_anagram(a, b):
test = sorted(a) == sorted(b)
testset = b in a
testset1 = a in b
if testset == True:
return True
if testset1 == True:
return True
else:
return False
No better than the other solution. But if you like verbose code, here's mine.

Try that:
def is_anagram(a, b):
word = [filter(lambda x: x in a, sub) for sub in b]
return ''.join(word)[0:len(a)]
Example:
>>> is_anagram('some','somexcv')
'some'
Note: This code returns the common word if there is any or '' otherwise. (it doesn't return True/False as the OP asked - realized that after, but anyway this code can easily change to adapt the True/False type of result - I won't fix it now such there are great answers in this post that already do :) )
Brief explanation:
This code takes(filter) every letter(sub) on b word that exists in a word also. Finally returns the word which must have the same length as a([0:len(a)]). The last part with len needs for cases like this:
is_anagram('some','somenothing')

I think all this sorting is a red herring; the letters should be counted instead.
def is_anagram(a, b):
return all(a.count(c) <= b.count(c) for c in set(a))
If you want efficiency, you can construct a dictionary that counts all the letters in b and decrements these counts for each letter in a. If any count is below 0, there are not enough instances of the character in b to create a. This is an O(a + b) algorithm.
from collections import defaultdict
def is_anagram(a, b):
b_dict = defaultdict(int)
for char in b:
b_dict[char] += 1
for char in a:
b_dict[char] -= 1
if b_dict[char] < 0:
return False
return True
Iterating over b is inevitable since you need to count all the letters there. This will exit as soon as a character in a is not found, so I don't think you can improve this algorithm much.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Understanding isomorphic strings algorithm - python

One simple example, your code doesn't work on s='ab',t='aa'. Basically you have to have both way to be isomorphic. Your code only checked that t can be modified from s, but not the other way around.

Thinking about it without maps or dictionaries can help you understand: def isIso(x,y): if len(x) != len(y): return False for i in range(len(x)): count = 0 if x.count(x[i]) != y.count(y[i]): return False return True

Related

Checking interweaving strings in pythonic way

Isomorphic Strings (the easy way?)

How to check if a sequence in a tuple is part of a tuple? python [duplicate]

How to find all occurrences of a list of strings in a string and return a list of list of int in python?

Basic Anagram VS more advanced one

Categories

Resources