Find longest unique substring in string python

Find longest unique substring in string python - python

I am trying that age old question (there are multitudes of versions around) of finding the longest substring of a string which doesn't contain repeated characters. I can't work out why my attempt doesn't work properly:
def findLongest(inputStr):
resultSet = []
substr = []
for c in inputStr:
print ("c: ", c)
if substr == []:
substr.append([c])
continue
print(substr)
for str in substr:
print ("c: ",c," - str: ",str,"\n")
if c in str:
resultSet.append(str)
substr.remove(str)
else:
str.append(c)
substr.append([c])
print("Result set:")
print(resultSet)
return max(resultSet, key=len)
print (findLongest("pwwkewambb"))
When my output gets to the second 'w', it doesn't iterate over all the substr elements. I think I've done something silly, but I can't see what it is so some guidance would be appreciated! I feel like I'm going to kick myself at the answer...
The beginning of my output:
c: p
c: w
[['p']]
c: w - str: ['p']
c: w
[['p', 'w'], ['w']]
c: w - str: ['p', 'w'] # I expect the next line to say c: w - str: ['w']
c: k
[['w'], ['w']] # it is like the w was ignored as it is here
c: k - str: ['w']
c: k - str: ['w']
...
EDIT:
I replaced the for loop with
for idx, str in enumerate(substr):
print ("c: ",c," - str: ",str,"\n")
if c in str:
resultSet.append(str)
substr[idx] = []
else:
str.append(c)
and it produces the correct result. The only thing is that the empty element arrays get set with the next character. It seems a bit pointless; there must be a better way.
My expected output is kewamb.
e.g.
c: p
c: w
[['p']]
c: w - str: ['p']
c: w
[['p', 'w'], ['w']]
c: w - str: ['p', 'w']
c: w - str: ['w']
c: k
[[], [], ['w']]
c: k - str: []
c: k - str: []
c: k - str: ['w']
c: e
[['k'], ['k'], ['w', 'k'], ['k']]
c: e - str: ['k']
c: e - str: ['k']
c: e - str: ['w', 'k']
c: e - str: ['k']
...

Edit, per comment by #seymour on incorrect responses:
def find_longest(s):
_longest = set()
def longest(x):
if x in _longest:
_longest.clear()
return False
_longest.add(x)
return True
return ''.join(max((list(g) for _, g in groupby(s, key=longest)), key=len))
And test:
In [101]: assert find_longest('pwwkewambb') == 'kewamb'
In [102]: assert find_longest('abcabcbb') == 'abc'
In [103]: assert find_longest('abczxyabczxya') == 'abczxy'
Old answer:
from itertools import groupby
s = set() ## for mutable access
''.join(max((list(g) for _, g in groupby('pwwkewambb', key=lambda x: not ((s and x == s.pop()) or s.add(x)))), key=len))
'kewamb'
groupby returns an iterator grouped based on the function provided in the key argument, which by default is lambda x: x. Instead of the default we are utilizing some state by using a mutable structure (which could have been done a more intuitive way if using a normal function)
lambda x: not ((s and x == s.pop()) or s.add(x))
What is happening here is since I can't reassign a global assignment in a lambda (again I can do this, using a proper function), I just created a global mutable structure that I can add/remove. The key (no pun) is that I only keep elements that I need by using a short circuit to add/remove items as needed.
max and len are fairly self explanatory, to get the longest list produced by groupby
Another version without the mutable global structure business:
def longest(x):
if hasattr(longest, 'last'):
result = not (longest.last == x)
longest.last = x
return result
longest.last = x
return True
''.join(max((list(g) for _, g in groupby('pwwkewambb', key=longest)), key=len))
'kewamb'

Not sure what is wrong in your attempt, but it's complex and in:
for str in substr:
print ("c: ",c," - str: ",str,"\n")
if c in str:
resultSet.append(str)
substr.remove(str)
you're removing elements from a list while iterating on it: don't do that, it gives unexpected results.
Anyway, my solution, not sure it's intuitive, but it's probably simpler & shorter:
slice the string with an increasing index
for each slice, create a set and store letters until you reach the end of the string or a letter is already in the set. Your index is the max length
compute the max of this length for every iteration & store the corresponding string
Code:
def findLongest(s):
maxlen = 0
longest = ""
for i in range(0,len(s)):
subs = s[i:]
chars = set()
for j,c in enumerate(subs):
if c in chars:
break
else:
chars.add(c)
else:
# add 1 when end of string is reached (no break)
# handles the case where the longest string is at the end
j+=1
if j>maxlen:
maxlen=j
longest=s[i:i+j]
return longest
print(findLongest("pwwkewambb"))
result:
kewamb

Depends on your definition of repeated characters: if you mean consecutive, then the approved solution is slick, but not of characters appearing more than once (e.g.: pwwkewabmb -> 'kewabmb' ).
Here's what I came up with (Python 2):
def longest(word):
begin = 0
end = 0
longest = (0,0)
for i in xrange(len(word)):
try:
j = word.index(word[i],begin,end)
# longest?
if end-begin >= longest[1]-longest[0]:
longest = (begin,end)
begin = j+1
if begin==end:
end += 1
except:
end = i+1
end=i+1
if end-begin >= longest[1]-longest[0]:
longest = (begin,end)
return word[slice(*longest)]
Thus
>>> print longest('pwwkewabmb')
kewabm
>>> print longest('pwwkewambb')
kewamb
>>> print longest('bbbb')
b

My 2-cents:
from collections import Counter
def longest_unique_substr(s: str) -> str:
# get all substr-ings from s, starting with the longest one
for substr_len in range(len(s), 0, -1):
for substr_start_index in range(0, len(s) - substr_len + 1):
substr = s[substr_start_index : substr_start_index + substr_len]
# check if all substr characters are unique
c = Counter(substr)
if all(v == 1 for v in c.values()):
return substr
# ensure empty string input returns ""
return ""
Run:
In : longest_unique_substr('pwwkewambb')
Out: 'kewamb'

s=input()
ma=0
n=len(s)
l=[]
a=[]
d={}
st=0;i=0
while i<n:
if s[i] not in d:
d[s[i]]=i
l.append(s[i])
else:
t=d[s[i]]
d[s[i]]=i
s=s[t+1:]
d={}
n=len(s)
if len(l)>=3:
a.append(l)
ma=max(ma,len(l))
l=[];i=-1
i=i+1
if len(l)!=0 and len(l)>=3:
a.append(l)
ma=max(ma,len(l))
if len(a)==0:
print("-1")
else:
for i in a:
if len(i)==ma:
for j in i:
print(j,end="")
break

Related

How to get the index of a repeating element in list?

I wanted to make a Japanese transliteration program.
I won't explain the details, but some characters in pairs have different values than if they were separated, so I made a loop that gets two characters (current and next)
b = "きゃきゃ"
b = list(b)
name = ""
for i in b:
if b.index(i) + 1 <= len(b) - 1:
if i in "き / キ" and b[b.index(i) + 1] in "ゃ ャ":
if b[b.index(i) + 1] != " ":
del b[b.index(i) + 1]
del b[int(b.index(i))]
cur = "kya"
name += cur
print(name)
but it always automatically giving an index 0 to "き", so i can't check it more than once.
How can i change that?
I tried to delete an element after analyzing it.... but it didn't help.

Rather than looking ahead a character, it may be easier to store a reference to the previous character, and replacing the previous transliteration if you found a combo match.
Example (I'm not sure if I got all of the transliterations correct):
COMBOS = {('き', 'ゃ'): 'kya', ('き', 'ャ'): 'kya', ('キ', 'ゃ'): 'kya', ('キ', 'ャ'): 'kya'}
TRANSLITERATIONS = {'き': 'ki', 'キ': 'ki', 'ャ': 'ya', 'ゃ': 'ya'}
def transliterate(text: str) -> str:
transliterated = []
last = None
for c in text:
try:
combo = COMBOS[(last, c)]
except KeyError:
transliterated.append(TRANSLITERATIONS.get(c, c))
else:
transliterated.pop() # remove the last value that was added
transliterated.append(combo)
last = c
return ''.join(transliterated) # combine the transliterations into a single str
That being said, rather than re-inventing the wheel, it may make more sense to use an existing library that already handles transliterating Japanese to romaji, such as Pykakasi.
Example:
>>> import pykakasi
>>> kks = pykakasi.kakasi()
>>> kks.convert('きゃ')
[{'orig': 'きゃ', 'hira': 'きゃ', 'kana': 'キャ', 'hepburn': 'kya', 'kunrei': 'kya', 'passport': 'kya'}]

if you are looking for the indices of 'き':
b = "きゃきゃ"
b = list(b)
indices = [i for i, x in enumerate(b) if x == "き"]
print(indices)
[0, 2]

Replace string one by one

I have a string and I need to replace "e" with "x" one at a time. For e.g.
x = "three"
Then the expected output is:
("thrxe", "threx")
and if I have 3 characters to replace, for e.g.
y = "threee"
Then the expected output will be:
("thrxee", "threxe", "threex")
I have tried this:
x.replace("e", "x", 1) # -> 'thrxe'
But not sure how to return the second string "threx".

Try this
x = "threee"
# build a generator expression that yields the position of "e"s
# change "e"s with "x" according to location of "e"s yielded from the genexp
[f"{x[:i]}x{x[i+1:]}" for i in (i for i, e in enumerate(x) if e=='e')]
['thrxee', 'threxe', 'threex']

You could use a generator to replace e with x sequentially through the string. For example:
def replace(string, old, new):
l = len(old)
start = 0
while start != -1:
start = string.find(old, start + l)
if start != -1:
yield string[:start] + new + string[start + l:]
z = replace('threee', 'e', 'x')
for s in z:
print(s)
Output:
thrxee
threxe
threex
Note I've generalised the code to allow for arbitrary length match and replacement strings, if you don't need that just replace l (len(old)) with 1.

def replace(string,old,new):
f = string.index(old)
l = list(string)
i = 0
for a in range(string.count(old)):
l[f] = new
yield ''.join(l)
l[f]=old
try:
f = string.index(old,f+1)
except ValueError:
pass
i+=1
z = replace('threee', 'e', 'x')
for a in z:
print(a)
OUTPUT
thrxee
threxe
threex

Python taking too long to exectute simple code... might have entered an infinite loop

I've tried changing variables in case I made a scope error, etc., but nothing seems to work.
I've defined multiple functions for finding frequency of words that appear in a string. It evaluates till two functions but the last function always enters infinite loop... except when there is no repetition.
def freq_finder(k):
dict = {}
k = k.split(' ')
for word in k:
if word in dict:
dict[word] += 1
else:
dict[word] = 1
return dict
def freq_max(l):
to_ = freq_finder(l)
values = to_.values()
best = max(values)
words = []
for t in to_:
if to_[t] == best:
words.append(t)
return (words, best)
def freq_maxi(h):
values = h.values()
best = max(values)
words = []
for t in h:
if h == best:
words.append(t)
return (words, best)
def words_above_freq(r, freq):
result = []
temp_faltu = freq_finder(r)
done = False
while not done:
temp = freq_maxi(temp_faltu)
if temp[1] >= freq: # temp[1] is 'best' that was a return from freq_max
result.append(temp)
for w in temp[0]: # temp[0] is the 'words'
del(temp_faltu[w])
else:
done = True
return result
horde = "I was not was I not"
print(freq_finder(horde))
print(freq_max(horde))
print(words_above_freq(horde, 2))

The below function is returning an empty array for words...
def freq_max(l):
to_ = freq_finder(l)
values = to_.values()
best = max(values)
words = []
for t in to_:
if to_[t] == best:
words.append(t)
return (words, best)
This is in turn causing an infinite loop as a condition never gets met for the next function. You actually have many issues in this code and it is way overcomplicated for what you are doing. For example, do not delete elements in an array that is being looped through. This is very nasty.
Finally, look into a module called collections that can do this and more...
https://pymotw.com/2/collections/counter.html
Or more simply loop through this to find a counter...
horde = "I was not was I not"
substring = "I"
count = horde.count(substring)
# print count
print("The count is:", count)

>>> def freq_finder(k):
... d = {}
... k = k.split(' ')
... for word in k:
... if word in d:
... d[word] += 1
... else:
... d[word] = 1
... return d
...
>>> horde = "I was not was I not"
>>> print(freq_finder(horde))
{'I': 2, 'was': 2, 'not': 2}
the problem is that you're overwriting dict which is a python class for dictionaries.
Using another name (for example d) fix te problem.
>>> def freq_max(l):
... to_ = freq_finder(l)
... values = to_.values()
... best = max(values)
... words = []
... for t in to_:
... if to_[t] == best:
... words.append(t)
... return (words, best)
...
>>> print(freq_max(horde))
(['I', 'was', 'not'], 2)
this seems ok, I don't think you had an infinite loop, this cannot happen because freq_finder cannot return an infinite dictionary.
Hint:
add breakpoint() in the code where you want to understand things and use
!<variable name> to print variables
n to go ahead by 1 line
l to print where you're in the code
c to run till the end
exit to exit
for example create a file called freq_utils.py and put the code inside:
def freq_finder(k):
breakpoint()
d = {}
k = k.split(' ')
for word in k:
if word in d:
d[word] += 1
else:
d[word] = 1
return d
horde = "I was not was I not"
print(freq_finder(horde))
$ python3 /tmp/freq_utils.py
> /tmp/freq_utils.py(4)freq_finder()
-> d = {}
(Pdb) !d
*** NameError: name 'd' is not defined
(Pdb) n
> /tmp/freq_utils.py(5)freq_finder()
-> k = k.split(' ')
(Pdb) !d
{}
(Pdb) n
> /tmp/freq_utils.py(6)freq_finder()
-> for word in k:
(Pdb) n
> /tmp/freq_utils.py(7)freq_finder()
-> if word in d:
(Pdb) n
> /tmp/freq_utils.py(10)freq_finder()
-> d[word] = 1
(Pdb) n
> /tmp/freq_utils.py(6)freq_finder()
-> for word in k:
(Pdb) !d
{'I': 1}
(Pdb) c
{'I': 2, 'was': 2, 'not': 2}

Count of sub-strings that contain character X at least once. E.g Input: str = “abcd”, X = ‘b’ Output: 6

This question was asked in an exam but my code (given below) passed just 2 cases out of 7 cases.
Input Format : single line input seperated by comma
Input: str = “abcd,b”
Output: 6
“ab”, “abc”, “abcd”, “b”, “bc” and “bcd” are the required sub-strings.
def slicing(s, k, n):
loop_value = n - k + 1
res = []
for i in range(loop_value):
res.append(s[i: i + k])
return res
x, y = input().split(',')
n = len(x)
res1 = []
for i in range(1, n + 1):
res1 += slicing(x, i, n)
count = 0
for ele in res1:
if y in ele:
count += 1
print(count)

When the target string (ts) is found in the string S, you can compute the number of substrings containing that instance by multiplying the number of characters before the target by the number of characters after the target (plus one on each side).
This will cover all substrings that contain this instance of the target string leaving only the "after" part to analyse further, which you can do recursively.
def countsubs(S,ts):
if ts not in S: return 0 # shorter or no match
before,after = S.split(ts,1) # split on target
result = (len(before)+1)*(len(after)+1) # count for this instance
return result + countsubs(ts[1:]+after,ts) # recurse with right side
print(countsubs("abcd","b")) # 6
This will work for single character and multi-character targets and will run much faster than checking all combinations of substrings one by one.

Here is a simple solution without recursion:
def my_function(s):
l, target = s.split(',')
result = []
for i in range(len(l)):
for j in range(i+1, len(l)+1):
ss = l[i] + l[i+1:j]
if target in ss:
result.append(ss)
return f'count = {len(result)}, substrings = {result}'
print(my_function("abcd,b"))
#count = 6, substrings = ['ab', 'abc', 'abcd', 'b', 'bc', 'bcd']

Here you go, this should help
from itertools import combinations
output = []
initial = input('Enter string and needed letter seperated by commas: ') #Asking for input
list1 = initial.split(',') #splitting the input into two parts i.e the actual text and the letter we want common in output
text = list1[0]
final = [''.join(l) for i in range(len(text)) for l in combinations(text, i+1)] #this is the core part of our code, from this statement we get all the available combinations of the set of letters (all the way from 1 letter combinations to nth letter)
for i in final:
if 'b' in i:
output.append(i) #only outputting the results which have the required letter/phrase in it

Printing alphabets advanced by n in Python

how can i write a python program to intake some alphabets in and print out (alphabets+n) in the output. Example
my_string = 'abc'
expected_output = 'cde' # n=2
One way I've thought is by using str.maketrans, and mapping the original input to (alphabets + n). Is there any other way?
PS: xyz should translate to abc
I've tried to write my own code as well for this, (apart from the infinitely better answers mentioned):
number = 2
prim = """abc! fgdf """
final = prim.lower()
for x in final:
if(x =="y"):
print("a", end="")
elif(x=="z"):
print("b", end="")
else:
conv = ord(x)
x = conv+number
print(chr(x),end="")
Any comments on how to not convert special chars? thanks

If you don't care about wrapping around, you can just do:
def shiftString(string, number):
return "".join(map(lambda x: chr(ord(x)+number),string))
If you do want to wrap around (think Caesar chiffre), you'll need to specify a start and an end of where the alphabet begins and ends:
def shiftString(string, number, start=97, num_of_symbols=26):
return "".join(map(lambda x: chr(((ord(x)+number-start) %
num_of_symbols)+start) if start <= ord(x) <= start+num_of_symbols
else x,string))
That would, e.g., convert abcxyz, when given a shift of 2, into cdezab.
If you actually want to use it for "encryption", make sure to exclude non-alphabetic characters (like spaces etc.) from it.
edit: Shameless plug of my Vignère tool in Python
edit2: Now only converts in its range.

How about something like
>>> my_string = "abc"
>>> n = 2
>>> "".join([ chr(ord(i) + n) for i in my_string])
'cde'
Note As mentioned in comments the question is bit vague about what to do when the edge cases are encoundered like xyz
Edit To take care of edge cases, you can write something like
>>> from string import ascii_lowercase
>>> lower = ascii_lowercase
>>> input = "xyz"
>>> "".join([ lower[(lower.index(i)+2)%26] for i in input ])
'zab'
>>> input = "abc"
>>> "".join([ lower[(lower.index(i)+2)%26] for i in input ])
'cde'

I've made the following change to the code:
number = 2
prim = """Special() ops() chars!!"""
final = prim.lower()
for x in final:
if(x =="y"):
print("a", end="")
elif(x=="z"):
print("b", end="")
elif (ord(x) in range(97, 124)):
conv = ord(x)
x = conv+number
print(chr(x),end="")
else:
print(x, end="")
**Output**: urgekcn() qru() ejctu!!

test_data = (('abz', 2), ('abc', 3), ('aek', 26), ('abcd', 25))
# translate every character
def shiftstr(s, k):
if not (isinstance(s, str) and isinstance(k, int) and k >=0):
return s
a = ord('a')
return ''.join([chr(a+((ord(c)-a+k)%26)) for c in s])
for s, k in test_data:
print(shiftstr(s, k))
print('----')
# translate at most 26 characters, rest look up dictionary at O(1)
def shiftstr(s, k):
if not (isinstance(s, str) and isinstance(k, int) and k >=0):
return s
a = ord('a')
d = {}
l = []
for c in s:
v = d.get(c)
if v is None:
v = chr(a+((ord(c)-a+k)%26))
d[c] = v
l.append(v)
return ''.join(l)
for s, k in test_data:
print(shiftstr(s, k))
Testing shiftstr_test.py (above code):
$ python3 shiftstr_test.py
cdb
def
aek
zabc
----
cdb
def
aek
zabc
It covers wrapping.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find longest unique substring in string python - python

Related

How to get the index of a repeating element in list?

Replace string one by one

Python taking too long to exectute simple code... might have entered an infinite loop

Count of sub-strings that contain character X at least once. E.g Input: str = “abcd”, X = ‘b’ Output: 6

Printing alphabets advanced by n in Python

Categories

Resources