StarKill riddle in Python - python

Riddle:
Return a version of the given string, where for every star (*) in the string the star and the chars immediately to its left and right are gone. So "ab*cd" yields "ad" and "ab**cd" also yields "ad".
I'm wondering if there's a pythonish way to improve this algorithm:
def starKill(string):
result = ''
for idx in range(len(string)):
if(idx == 0 and string[idx] != '*'):
result += string[idx]
elif (idx > 0 and string[idx] != '*' and (string[idx-1]) != '*'):
result += string[idx]
elif (idx > 0 and string[idx] == '*' and (string[idx-1]) != '*'):
result = result[0:len(result) - 1]
return result
starKill("wacy*xko") yields wacko

Here's a numpy solution just for fun:
def star_kill(string, target='*'):
arr = np.array(list(string))
mask = arr != '*'
mask[1:] &= mask[:-1]
mask[:-1] &= mask[1:]
arr = arr[mask]
return arr[mask].view(dtype=f'U{arr.size}').item()

Regular expression?
>>> import re
>>> for s in "ab*cd", "ab**cd", "wacy*xko", "*Mad*Physicist*":
print(re.sub(r'\w?\*\w?', '', s))
ad
ad
wacko
ahysicis

You can do this by iterating over the string three times in parallel. Each iteration will be shifted relative to the next by one character. The middle one is the one that will provide the valid letters, the other two let us check if adjacent characters are stars. The two flanking iterators require dummy values to represent "before the start" and "after the end" of the string. There are a variety of ways to set that up, I'm using itertools.chain (and .islice) to fill in None for the dummy values. But you could use plain string and iterator manipulation if you prefer (i.e. iter('x' + string) and iter(string[1:] + 'x')):
import itertools
def star_kill(string):
main_iterator = iter(string)
look_behind = itertools.chain([None], string)
look_ahead = itertools.chain(itertools.islice(string, 1, None), [None])
return "".join(a for a, b, c in zip(main_iterator, look_behind, look_ahead)
if a != '*' and b != '*' and c != '*')

Not sure whether or not it's "Pythonic," but the problem can be solved with regular expressions.
import re
def starkill(s):
s = re.sub(".{0,1}\\*{1,}.{0,1}", "", s)
return s
For those not familiar with regex, I'll break that long string down:
Prefix
".{0,1}"
This specifies we want the replaced section to begin with either 0 or 1 of any character. If there is a character before the star, we want to replace it; otherwise, we still want the expression to hit if the star is at the very beginning of the input string.
Star
"\\*{1,}"
This specifies that the middle of the expression must contain an asterisk character, but it can also contain more than one. For instance, "a****b" will still hit, even though there are four stars. We need a backslash before the asterisk because regex has asterisk as a reserved character, and we need a second backslash before that because Python strings reserve the backslash character.
Suffix
.{0,1}
Same as the prefix. The expression can either end with one or zero of any character.
Hope that helps!

Related

Deciding whether a string is a palindrome

This is a python question. Answer should be with O(n) time complexity and use no additional memory. As input i get a string which should be classified as palindrome or not (palindrome is as word or a phrase that can be read the same from left to right and from right to left, f.e "level"). In the input there can be punctuation marks and gaps between words.
For example "I. did,,, did I????" The main goal is to decide whether the input is a palindrome.
When I tried to solve this question i faced several challenges. When I try to delete non letter digits
for element in string:
if ord(element) not in range(97, 122):
string.remove(element)
if ord(element) == 32:
string.remove(element)
I use O(n^2) complexity, because for every element in the string i use remove function, which itself has O(n) complexity, where n is the length of the list. I need help optimizing the part with eliminating non letter characters with O(n) complexity
Also, when we get rid of spaces as punctuation marks I know how to check whether a word is a palindrome, but my method uses additional memory.
Here is your O(n) solution without creating a new string:
def is_palindrome(string):
left = 0
right = len(string) - 1
while left < right:
if not string[left].isalpha():
left += 1
continue
if not string[right].isalpha():
right -= 1
continue
if string[left] != string[right]:
return False
left += 1
right -= 1
return True
print(is_palindrome("I. did,,, did I????"))
Output:
True
I'm assuming you mean you want to test if a string is a palindrome when we remove all punctuation digits from the string. In that case, the following code should suffice:
from string import ascii_letters
def is_palindrome(s):
s = ''.join(c for c in s if c in ascii_letters)
return s == s[::-1]
# some test cases:
print(is_palindrome('hello')) # False
print(is_palindrome('ra_ceca232r')) # True
Here's a one-liner using assignment expression syntax (Python 3.8+):
>>> s = "I. did,,, did I????"
>>> (n := [c.lower() for c in s if c.isalpha()]) == n[::-1]
True
I mostly showed the above as a demonstration; for readability's sake I'd recommend something more like SimonR's solution (although still using isalpha over comparing to ascii_letters).
Alternatively, you can use generator expressions to do the same comparison without allocating O(n) extra memory:
def is_palindrome(s):
forward = (c.lower() for c in s if c.isalpha())
back = (c.lower() for c in reversed(s) if c.isalpha())
return all(a == b for a, b in zip(forward, back))
Note that zip still allocates in Python 2, you'll need to use itertools.izip there.
Will this help:
word = input('Input your word: ')
word1 = ''
for l in word:
if l.isalnum():
word1 += l
word2=''
for index in sorted(range(len(word1)),reverse=True):
word2+=word1[index]
if word1 == word2:
print('It is a palindrone.')
else:
print('It is not a palindrone.')

partition a string by dash (-) python

I want to get a string and divide it into parts separated by "-".
Input:
aabbcc
And output:
aa-bb-cc
is there a way to do so?
If you want to do it based on the same letter then you can use itertools.groupby() to do this, e.g.:
In []:
import itertools as it
s = 'aabbcc'
'-'.join(''.join(g) for k, g in it.groupby(s))
Out[]:
'aa-bb-cc'
Or if you want it in chunks of 2 you can use iter() and zip():
In []:
n = 2
'-'.join(''.join(p) for p in zip(*[iter(s)]*n))
Out[]:
'aa-bb-cc'
Note: if the string length is not divisible by 2 this will drop the last character - you can replace zip(...) with itertools.zip_longest(..., fillvalue='') but it is unclear if the OP has this issue)
If you consider creating pair-divided by a dash, you can use the below function:
def pair_div(string):
newString=str() #for storing the divided string
for i,s in enumerate(string):
if i%2!=0 and i<(len(string)-1): #we make sure the function divides every two chars but not the last character of string.
newString+=s+'-' #If it is the second member of pair, add a dash after it
else:
newString+=s #If not, just add the character
return(newString)
And for example:
[In]:string="aazzxxcceewwqqbbvvaa"
[Out]:'aa-zz-xx-cc-ee-ww-qq-bb-vv-aa'
But if you consider dividing same characters as a group and separate with a dash, you better your regex methods.
BR,
Shend
You can try
data = "aabbcc"
"-".join([data[x:x+2] for x in range(0, len(data), 2)])
if you want to divide the string into block of 2 characters, then this will help you.
import textwrap
s='aabbcc'
lst=textwrap.wrap(s,2)
print('-'.join(lst))
2nd attribute defines the no. of characters you want in a particular group
s = 'aabbccdd'
#index 01234567
new_s = ''
1)
for idx, char in enumerate(s):
new_s+=char
if idx%2 != 0:
new_s += '-'
print(new_s.strip('-'))
# aa-bb-cc-dd
2)
new_s = ''.join([s[i]+'-' if i%2 != 0 else s[i] for i in range(len(s))]).strip('-')
print(new_s)
# aa-bb-cc-dd

Remove punctuation items from end of string

I have a seemingly simple problem, which I cannot seem to solve. Given a string containing a DOI, I need to remove the last character if it is a punctuation mark until the last character is letter or number.
For example, if the string was:
sampleDoi = "10.1097/JHM-D-18-00044.',"
I want the following output:
"10.1097/JHM-D-18-00044"
ie. remove .',
I wrote the following script to do this:
invalidChars = set(string.punctuation.replace("_", ""))
a = "10.1097/JHM-D-18-00044.',"
i = -1
for each in reversed(a):
if any(char in invalidChars for char in each):
a = a[:i]
i = i - 1
else:
print (a)
break
However, this produces 10.1097/JHM-D-18-00 but I would like it to produce 10.1097/JHM-D-18-00044. Why is the 44 removed from the end?
The string function rstrip() is designed to do exactly this:
>>> sampleDoi = "10.1097/JHM-D-18-00044.',"
>>> sampleDoi.rstrip(",.'")
'10.1097/JHM-D-18-00044'
Corrected code:
import string
invalidChars = set(string.punctuation.replace("_", ""))
a = "10.1097/JHM-D-18-00044.',"
i = -1
for each in reversed(a):
if any(char in invalidChars for char in each):
a = a[:i]
i = i # Well Really this line can just be removed all together.
else:
print (a)
break
This gives the output you want, while keeping the original code mostly the same.
This is one way using next and str.isalnum with a generator expression utilizing enumerate / reversed.
sampleDoi = "10.1097/JHM-D-18-00044.',"
idx = next((i for i, j in enumerate(reversed(sampleDoi)) if j.isalnum()), 0)
res = sampleDoi[:-idx]
print(res)
'10.1097/JHM-D-18-00044'
The default parameter 0is used so that, if no alphanumeric character is found, an empty string is returned.
If you dont wanna use regex:
the_str = "10.1097/JHM-D-18-00044.',"
while the_str[-1] in string.punctuation:
the_str = the_str[:-1]
Removes the last character until it's no longer a punctuation character.

Splitting a number pattern

I want to now how do i split a string like
44664212666666 into [44664212 , 666666] or
58834888888888 into [58834, 888888888]
without knowing where the first occurrence of the last recurring digit occurs.
so passing it to a function say seperate(str) --> [non_recurring_part, end_recurring digits]
print re.findall(r'^(.+?)((.)\3+)$', '446642126666')[0][:-1] # ('44664212', '6666')
As pointed out in the comments, the last group should be made optional to handle strings with no repeated symbols correctly:
print re.findall(r'^(.+?)((.)\3+)?$', '12333')[0][:-1] # ('12', '333')
print re.findall(r'^(.+?)((.)\3+)?$', '123')[0][:-1] # ('123', '')
Same answer as Justin:
>>> for i in range(len(s) - 1, 0, -1):
if s[i] != s[-1]:
break
>>> non_recurring_part, end_recurring_digits = s[:i], s[i + 1:]
>>> non_recurring_part, end_recurring_digits
('4466421', '666666')
Here is a non-regex answer that deals with cases when there are no repeating digits.
def separate(s):
last = s[-1]
t = s.rstrip(last)
if len(t) + 1 == len(s):
return (s, '')
else:
return t, last * (len(s) - len(t))
Examples:
>>> separate('123444')
('123', '444')
>>> separate('1234')
('1234', '')
>>> separate('11111')
('', '11111')
Can't you just scan from the last character to the first character and stop when the next char doesn't equal the previous. Then split at that index.
def separate(n):
s = str(n)
return re.match(r'^(.*?)((.)\3*)$', s).groups()
def seperate(s):
return re.findall('^(.+?)('+s[-1]+'+)$',s)
>>> import re
>>> m = re.match(r'(.*?)((.)\3+)$', '1233333')
>>> print list(m.groups())[:2]
['12', '33333']
Here you use regular expressions. The last part of the re ((.)\3+)$ says that the same number must be repeated till the end of the string. And all the rest is the first part of the string. The function m.groups() return the list of the string that correspond to the () parts of the re. The 0 element contains the first part; the 1 element contains the second part. The third part is not needed, we can just ignore it.
Another important point is ? in .*?. Using the symbol you say that you need non-greedy search. That means that you need to switch to the second part of re as soon as possible.
start iterating from the end,towards the initial digit, just get the position where the character occurring changes, that should be the limit for sub string splitting, Let that limit index is--> i, Then Your Result will be-->{sub-string [0,i) , sub-string [i,size)},, That will solve your problem..
int pos=0;
String str="ABCDEF";
for (int i = str.length()-1; i > 0; i--)
{
if(str.charAt(i) != str.charAt(i-1))
{
pos=i;
break;
}
}
String sub1=str.substring(0, pos);
String sub2=str.substring(pos);

String manipulation weirdness when incrementing trailing digit

I got this code:
myString = 'blabla123_01_version6688_01_01Long_stringWithNumbers'
versionSplit = re.findall(r'-?\d+|[a-zA-Z!##$%^&*()_+.,<>{}]+|\W+?', myString)
for i in reversed(versionSplit):
id = versionSplit.index(i)
if i.isdigit():
digit = '%0'+str(len(i))+'d'
i = int(i) + 1
i = digit % i
versionSplit[id]=str(i)
break
final = ''
myString = final.join(versionSplit)
print myString
Which suppose to increase ONLY the last digit from the string given. But if you run that code you will see that if there is the same digit in the string as the last one it will increase it one after the other if you keep running the script. Can anyone help me find out why?
Thank you in advance for any help
Is there a reason why you aren't doing something like this instead:
prefix, version = re.match(r"(.*[^\d]+)([\d]+)$", myString).groups()
newstring = prefix + str(int(version)+1).rjust(len(version), '0')
Notes:
This will actually "carry over" the version numbers properly: ("09" -> "10") and ("99" -> "100")
This regex assumes at least one non-numeric character before the final version substring at the end. If this is not matched, it will throw an AttributeError. You could restructure it to throw a more suitable or specific exception (e.g. if re.match(...) returns None; see comments below for more info).
Adjust accordingly.
The issue is the use of the list.index() function on line 5. This returns the index of the first occurrence of a value in a list, from left to right, but the code is iterating over the reversed list (right to left). There are lots of ways to straighten this out, but here's one that makes the fewest changes to your existing code: Iterate over indices in reverse (avoids reversing the list).
for idx in range(len(versionSplit)-1, -1, -1):
i = versionSplit[idx]
if chunk.isdigit():
digit = '%0'+str(len(i))+'d'
i = int(i) + 1
i = digit % i
versionSplit[idx]=str(i)
break
myString = 'blabla123_01_version6688_01_01veryLong_stringWithNumbers01'
versionSplit = re.findall(r'-?\d+|[^\-\d]+', myString)
for i in xrange(len(versionSplit) - 1, -1, -1):
s = versionSplit[i]
if s.isdigit():
n = int(s) + 1
versionSplit[i] = "%0*d" % (len(s), n)
break
myString = ''.join(versionSplit)
print myString
Notes:
It is silly to use the .index() method to try to find the string. Just use a decrementing index to try each part of versionSplit. This was where your problem was, as commented above by #David Robinson.
Don't use id as a variable name; you are covering up the built-in function id().
This code is using the * in a format template, which will accept an integer and set the width.
I simplified the pattern: either you are matching a digit (with optional leading minus sign) or else you are matching non-digits.
I tested this and it seems to work.
First, three notes:
id is a reserved python word;
For joining, a more pythonic idiom is ''.join(), using a literal empty string
reversed() returns an iterator, not a list. That's why I use list(reversed()), in order to do rev.index(i) later.
Corrected code:
import re
myString = 'blabla123_01_version6688_01_01veryLong_stringWithNumbers01'
print myString
versionSplit = re.findall(r'-?\d+|[a-zA-Z!##$%^&*()_+.,<>{}]+|\W+?', myString)
rev = list(reversed(versionSplit)) # create a reversed list to work with from now on
for i in rev:
idd = rev.index(i)
if i.isdigit():
digit = '%0'+str(len(i))+'d'
i = int(i) + 1
i = digit % i
rev[idd]=str(i)
break
myString = ''.join(reversed(rev)) # reverse again only just before joining
print myString

Categories