iterating through loops without using Regex [duplicate]

iterating through loops without using Regex [duplicate] - python

lst = 'AB[CD]EF[GH]'
Output: ['A','B','CD','E','F','GH']
This is what I've tried but it's not working...
while(index < len(my_string)):
curr_char = my_string[index]
if(curr_char == '['):
while(curr_char != ']'):
multi = my_string[index + 1]
index += 1
lst += multi
Can anybody please help? Without importing Regex or whatever. I wanna do this without using it.

The problems with the original code seemed to be:
1) lst, index and multi are not initialised
2) the loop is infinite because the loop variable (index) isn't incremented on each iteration.
3) the close bracket needs to be skipped when detected to avoid including it in the final list
This code is an example of how to fix those issues:
def getList(s):
outList=[]
lIndex=0
while lIndex < len(s):
if s[lIndex] == "[":
letters=""
lIndex+=1
while s[lIndex] != "]":
letters+=s[lIndex]
lIndex+=1
outList.append(letters)
else:
outList.append(s[lIndex])
lIndex+=1
return outList
print(getList('AB[CD]EF[GH]'))

You can't use
lst += multi
because you can't concatenate a string with a list.
Moreover, your code enters an infinite loop, because you aren't updating the curr_char variable inside the inner loop, so the condition will always be True.
Also, you are not handling the case when curr_char != '['. And more errors there are.
You can use this code which fixes the above errors while using the same basic logic as your code:
index = 0
multi = ""
res = []
my_str = 'AB[CD]EF[GH]'
while (index < len(my_str)):
curr_char = my_str[index]
if curr_char == '[':
multi += curr_char
while curr_char != ']':
index += 1
curr_char = my_str[index]
multi += curr_char
res.append(multi)
multi = ""
else:
res.append(curr_char)
index += 1
print(res)
Output:
['A', 'B', '[CD]', 'E', 'F', '[GH]']

Please try the following code snippet.
my_string = 'AB[CD]EF[GH]'
lst = []
ind = 0
n = len(my_string)
while (ind < n):
if my_string[ind] == '[':
# if '[' is found, look for the next ']' but ind should not exceed n.
# Your code does not do a ind < n check. It may enter an infinite loop.
ind += 1 # this is done to skip the '[' in result list
temp = '' # create a temporary string to store chars inside '[]'
while ind < n and my_string[ind] != ']':
temp = temp + my_string[ind]
ind+=1
lst.append(temp) # add this temp string to list
ind += 1 # do this to skip the ending ']'.
else:
# If its not '[', simply append char to list.
lst.append(my_string[ind])
ind += 1
print(lst)

Related

Trouble trying to find length of longest substring

I wrote the following code. It should return to me the length of the longest subscript in a string without a repeat in letters.
def lengthOfLongestSubstring(s):
lst = []
y = 0
final = 0
count = len(s)
while len(s) > 0:
s = s[y:]
for i in range(len(s)):
if s[i] in lst:
y += 1
count = len(lst)
lst =[]
break
else:
lst.append(s[i])
if count > final:
final=count
return(final)
when entering the string "tmmzuxt" i expect to get an output of 5 (length of "mzuxt") but instead get 4. I have debugged to figure out the problem seems to be that my function skips over the second 'm' when indexing but I can't figure out why. Any suggestions?
Realized I somehow missed a line. Hope this makes more sense.

Your issue here is that you are modifying s while you are running your code.
Consider that in the first iteration, you are getting s = s[0:], so s will now be 'tmmzuxt'. In your next iteration, you are getting s = s[1:], from the modified s. This is still not a problem, because you just get 'mmzuxt'. However, in your third iteration, you are getting s = s[2:], which is now 'zuxt'.
So you need a different variable than s to hold the substring of s that you are actually testing.

here, in your code(line 7) you are updating your string value inside function, everytime your for loop iterates.
for e.g., after every break inside for loop. you string(which is "tmmzuxt") is becoming short and short.
i created a new variable which contains your original string.
def lengthOfLongestSubstring(s):
lst = []
y = 0
final = 0
count = len(s)
main_string = s;#change done here
while len(s) > 0:
s = main_string[y:] #change done here
for i in range(len(s)):
if s[i] in lst:
y += 1
count = len(lst)
lst =[]
break
else:
lst.append(s[i])
if count > final:
final =count
print(final)
return(final)
lengthOfLongestSubstring("tmmzuxt")

The main problem with your code is that you incremented y, even though it should only ever remove the first character. There is no need for a variable y. Try this:
def lengthOfLongestSubstring(s):
final = 0
while len(s) > 0:
count = len(s)
lst = []
for i in range(len(s)):
if s[i] in lst:
count = i - 1
break
lst.append(s[i])
if count > final:
final = count
s = s[1:]
return final
print(lengthOfLongestSubstring("tmmzuxt"))

Here is an edited code. removing #lst =[] and #break lines.
[Code]
def lengthOfLongestSubstring(s):
lst = []
y = 0
final = 0
count = len(s)
while len(s) > 0:
s = s[y:]
for i in range(len(s)):
if s[i] in lst:
y += 1
count = len(lst)
#lst =[]
#break
else:
lst.append(s[i])
if count > final:
final=count
return(final)
s="tmmzuxt"
print(lengthOfLongestSubstring(s))
[Output]
5

I'm not sure if I understand your code, or if the while loop is needed here, actually. Try this instead:
def lengthOfLongestSubstring(s):
max_length = 0
length = 0
previous = ''
for thisCharacter in s:
if thisCharacter != previous:
length += 1
else:
max_length = max(length, max_length)
length = 1
return max_length

smallest window contains all the elements in an array

I need to write a function to find the smallest window that contains all the elements in an array. Below is what I have tried:
def function(item):
x = len(set(item))
i = 0
j = len(item) - 1
result = len(item)
while i <= j:
if len(set(item[i + 1: j + 1])) == x:
result = min(result, len(item[i + 1: j + 1]))
i += 1
elif len(set(item[i:j])) == x:
result = min(result, len(item[i:j]))
j -= 1
else:
return result
return result
print(function([8,8,8,8,1,2,5,7,8,8,8,8]))
The time complexity is in O(N^2), Can someone help me to improve it to O(N) or better? Thanks.

You can use the idea from How to find smallest substring which contains all characters from a given string? for this specific case and get a O(N) solution.
Keep a counter for how many copies of each unique number is included in the window and move the end of the window to the right until all unique numbers are included at least once. Then move the start of the window until one unique number disappears. Then repeat:
from collections import Counter
def smallest_window(items):
element_counts = Counter()
n_unique = len(set(items))
characters_included = 0
start_enumerator = enumerate(items)
min_window = len(items)
for end, element in enumerate(items):
element_counts[element] += 1
if element_counts[element] == 1:
characters_included += 1
while characters_included == n_unique:
start, removed_element = next(start_enumerator)
min_window = min(end-start+1, min_window)
element_counts[removed_element] -= 1
if element_counts[removed_element] == 0:
characters_included -= 1
return min_window
>>> smallest_window([8,8,8,8,1,2,5,7,8,8,8,8])
5

This problem can be solved as below.
def lengthOfLongestSublist(s):
result = 0
#set a dictionary to store item in s as the key and index as value
d={}
i=0
j=0
while (j < len(s)):
#if find the s[j] value is already exist in the dictionary,
#move the window start point from i to i+1
if (s[j] in d):
i = max(d[s[j]] + 1,i)
#each time loop, compare the current length of s to the previouse one
result = max(result,j-i+1)
#store s[j] as key and the index of s[j] as value
d[s[j]] = j
j = j + 1
return result
lengthOfLongestSubstring([8,8,8,8,8,5,6,7,8,8,8,8,])
Output: 4
Set a dictionary to store the value of input list as key and index
of the list as the value. dic[l[j]]=j
In the loop, find if the current value exists in the dictionary. If
exist, move the start point from i to i + 1.
Update result.
The complexity is O(n).

Alternate letters in a string - code not working

I am trying to make a string alternate between upper and lower case letters. My current code is this:
def skyline (str1):
result = ''
index = 0
for i in str1:
result += str1[index].upper() + str1[index + 1].lower()
index += 2
return result
When I run the above code I get an error saying String index out of range. How can I fix this?

One way using below with join + enumerate:
s = 'asdfghjkl'
''.join(v.upper() if i%2==0 else v.lower() for i, v in enumerate(s))
#'AsDfGhJkL'

This is the way I would rewrite your logic:
from itertools import islice, zip_longest
def skyline(str1):
result = ''
index = 0
for i, j in zip_longest(str1[::2], islice(str1, 1, None, 2), fillvalue=''):
result += i.upper() + j.lower()
return result
res = skyline('hello')
'HeLlO'
Explanation
Use itertools.zip_longest to iterate chunks of your string.
Use itertools.islice to extract every second character without building a separate string.
Now just iterate through your zipped iterable and append as before.

Try for i in range(len(str1)): and substitute index for i in the code. After, you could do
if i % 2 == 0: result += str1[i].upper()
else: result += str1[i].lower()

For every character in your input string, you are incrementing the index by 2. That's why you are going out of bounds.
Try using length of string for that purpose.

you do not check if your index is still in the size of your string.
It would be necessary to add a condition which verifies if the value of i is always smaller than the string and that i% 2 == 0 and that i == 0 to put the 1st character in Upper
with i% 2 == 0 we will apply the upper one letter on two
for i, __ in enumerate(str1):
if i+1 < len(str1) and i % 2 == 0 or i == 0:
result += str1[i].upper() + str1[i + 1].lower()

I tried to modify as minimal as possible in your code, so that you could understand properly. I just added a for loop with step 2 so that you wouldn't end up with index out of range. And for the final character in case of odd length string, I handled separately.
def skyline (str1):
result = ''
length = len(str1)
for index in range(0, length - 1, 2):
result += str1[index].upper() + str1[index + 1].lower()
if length % 2 == 1:
result += str1[length - 1].upper()
return result

You can use the following code:
def myfunc(str1):
result=''
for i in range(0,len(str1)):
if i % 2 == 0:
result += str1[i].upper()
else:
result += str1[i].lower()
return result

in your code you are get 2 word by one time so you should divide your loop by 2 because your loop work by depending your input string so make an variable like peak and equal it to len(your input input) then peak = int(peak/2) it will solve your pr
def func(name):
counter1 = 0
counter2 = 1
string = ''
peak = len(name)
peak = int(peak/2)
for letter in range(1,peak+1):
string += name[counter1].lower() + name[counter2].upper()
counter1 +=2
counter2 +=2
return string

How to return alphabetical substrings?

I'm trying to write a function that takes a string s as an input and returns a list of those substrings within s that are alphabetical. For example, s = 'acegibdh' should return ['acegi', 'bdh'].
Here's the code I've come up with:
s = 'acegibdh'
ans = []
subs = []
i = 0
while i != len(s) - 1:
while s[i] < s[i+1]:
subs.append(s[i])
i += 1
if s[i] > s[i-1]:
subs.append(s[i])
i += 1
subs = ''.join(subs)
ans.append(subs)
subs = []
print ans
It keeps having trouble with the last letter of the string, because of the i+1 test going beyond the index range. I've spent a long time tinkering with it to try and come up with a way to avoid that problem. Does anyone know how to do this?

Why not hard-code the first letter into ans, and then just work with the rest of the string? You can just iterate over the string itself instead of using indices.
>>> s = 'acegibdh'
>>> ans = []
>>> ans.append(s[0])
>>> for letter in s[1:]:
... if letter >= ans[-1][-1]:
... ans[-1] += letter
... else:
... ans.append(letter)
...
>>> ans
['acegi', 'bdh']

s = 'acegibdh'
ans = []
subs = []
subs.append(s[0])
for x in range(len(s)-1):
if s[x] <= s[x+1]:
subs.append(s[x+1])
if s[x] > s[x+1]:
subs = ''.join(subs)
ans.append(subs)
subs = []
subs.append(s[x+1])
subs = ''.join(subs)
ans.append(subs)
print ans
I decided to change your code a bit let me know if you have any questions

Just for fun, a one line solution.
>>> s='acegibdh'
>>> [s[l:r] for l,r in (lambda seq:zip(seq,seq[1:]))([0]+[idx+1 for idx in range(len(s)-1) if s[idx]>s[idx+1]]+[len(s)])]
['acegi', 'bdh']

You should try to avoid loops that increment the position by more than one char per iteration.
Often it is more clear to introduce an additional variable to store information about the previous state:
s = 'acegibdh'
prev = None
ans = []
subs = []
for ch in s:
if prev is None or ch > prev:
subs.append(ch)
else:
ans.append(''.join(subs))
subs = [ch]
prev = ch
ans.append(''.join(subs))
I think this read more straight forward (if there is no previous character or it's still alphabetical with the current char append, else start a new substring). Also you can't get index out of range problems with this approch.

More than one while loop is overkill. I think this is simpler and satisfies your requirement. Note, this fails on empty string.
s = 'acegibdh'
ans = []
current = str(s[0])
i = 1
while i < len(s):
if s[i] > s[i-1]:
current += s[i]
else:
ans.append(current)
current = ''
i += 1
if current != '':
ans.append(current)
print ans

just for fun cause I like doing things a little different sometimes
from itertools import groupby,chain,cycle
def my_gen(s):
check = cycle([1,0])
for k,v in groupby(zip(s,s[1:]),lambda x:x[0]<x[1]):
if k:
v = zip(*v)
yield v[0] + (v[1][-1],)
print list(my_gen('acegibdhabcdefghijk'))

Some of the solutions posted have an index error for empty strings.
Also, instead of keeping a list of characters, or doing repeated string concatenations, you can track the start index, i, of a solution substring and yield s[i:j] where s[j] < s[j-1], then set i to j.
Generator that yields substrings when the next letter is lexicographically less than the previous:
def alpha_subs(s):
i, j = 0, 1
while j < len(s):
if s[j] < s[j-1]:
yield s[i:j]
i = j
j += 1
if s[i:j]:
yield s[i:j]
print(list(alpha_subs('')))
print(list(alpha_subs('acegibdh')))
print(list(alpha_subs('acegibdha')))
[]
['acegi', 'bdh']
['acegi', 'bdh', 'a']
For case insensitivity:
def alpha_subs(s, ignore_case=False):
qs = s.lower() if ignore_case else s
i, j = 0, 1
while j < len(s):
if qs[j] < qs[j-1]:
yield s[i:j]
i = j
j += 1
if s[i:j]:
yield s[i:j]
print(list(alpha_subs('acEgibDh', True)))
print(list(alpha_subs('acEgibDh')))
['acEgi', 'bDh']
['ac', 'Egi', 'b', 'Dh']

Python symbol comparison

I have st = 'aaaabbсaa'. My task is if in the string characters repeat then I must write the character plus a number counting the repeats.
My code (but it doesn't work):
st = "aaaabbcaa"
cnt = 0
cnt2 = 0
cnt3 = 0
j = len(st)
i = 0
while i < j:
if st[i] == st[i - 1]:
cnt += 1
print("a" + str(cnt), end="")
elif st[i] == st[i - 1]:
cnt2 += 1
print("b" + str(cnt2), end="")
elif st[i] == st[i - 1]:
cnt3 += 1
print("c" + str(cnt3), end="")
i += 1
Sample Input 1: aaaabbcaa
Sample Output 1: a4b2c1a2
Sample Input 2: abc
Sample Output 2: a1b1c1

This looks like a task for itertools.groupby.
from itertools import groupby
data = 'aaaabbсaa'
compressed = ''.join('{}{}'.format(key, len(list(group))) for key, group in groupby(data))
print(compressed)
Result
a4b2с1a2
This might help to understand what's happening here.
data = 'aaaabbсaa'
for key, group in groupby(data):
print(key, len(list(group)))
Result
a 4
b 2
с 1
a 2

You've got three problems with your code.
First, as gnibbler points out, all of your if/elif conditions are the same. And you don't need a separate condition for each letter, you just need to print the variable (like st[i]) instead of a literal (like "a").
Second, you're trying to print out the current run length for each character in the run, instead of after the entire run. So, if you get this working, instead of a4b2c1a2 you're going to get a1a2a3a4b1b2c1a1a2. You need to keep track of the current run length for each character in the run, but then only print it out when you get to a different character.
Finally, you've got two off-by-one errors. First, when i starts at 0, st[i - 1] is st[-1], which is the last character; you don't want to compare with that. Second, when i finally gets to j-1 at the end, you've got a leftover run that you need to deal with.
So, the smallest change to your code is:
st = "aaaabbcaa"
cnt = 0
j = len(st)
i = 0
while i < j:
if i == 0 or st[i] == st[i - 1]:
cnt += 1
else:
print(st[i - 1] + str(cnt), end="")
cnt = 1
i += 1
print(st[i - 1] + str(cnt))
As a side note, one really easy way to improve this: range(len(st)) gives you all the numbers from 0 up to but not including len(st), so you can get rid of j and the manual i loop and just use for i in range(len(st)):.
But you can improve this even further by looping over an iterable of st[i], st[i-1] pairs; then you don't need the indexes at all. This is pretty easy with zip and slicing. And then you don't need the special handling for the edges either either:
st = "aaaabbcaa"
cnt = 1
for current, previous in zip(st[1:]+" ", st):
if current == previous:
cnt += 1
else:
print(previous + str(cnt), end="")
cnt = 1
I think Matthias's groupby solution is more pythonic, and simpler (there's still a lot of things you could get wrong with this, like starting with cnt = 0), but this should be mostly understandable to a novice out of the box. (If you don't understand the zip(st[1:]+" ", st), try printing out st[1:], list(zip(st[1:], st)), and list(zip(st[1:]+" ", st) and it should be clearer.)

This is kind of a silly way to go about it, but:
def encode(s):
_lastch = s[0]
out = []
count = 0
for ch in s:
if ch == _lastch:
count +=1
else:
out.append(_lastch + str(count))
_lastch = ch
count = 1
out.append(_lastch + str(count))
return ''.join(out)
Example
>>> st = "aaaabbcaa"
>>> encode(st)
'a4b2c1a2'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

iterating through loops without using Regex [duplicate] - python

Related

Trouble trying to find length of longest substring

smallest window contains all the elements in an array

Alternate letters in a string - code not working

How to return alphabetical substrings?

Python symbol comparison

Categories

Resources