Remove punctuation items from end of string - python

I have a seemingly simple problem, which I cannot seem to solve. Given a string containing a DOI, I need to remove the last character if it is a punctuation mark until the last character is letter or number.
For example, if the string was:
sampleDoi = "10.1097/JHM-D-18-00044.',"
I want the following output:
"10.1097/JHM-D-18-00044"
ie. remove .',
I wrote the following script to do this:
invalidChars = set(string.punctuation.replace("_", ""))
a = "10.1097/JHM-D-18-00044.',"
i = -1
for each in reversed(a):
if any(char in invalidChars for char in each):
a = a[:i]
i = i - 1
else:
print (a)
break
However, this produces 10.1097/JHM-D-18-00 but I would like it to produce 10.1097/JHM-D-18-00044. Why is the 44 removed from the end?

The string function rstrip() is designed to do exactly this:
>>> sampleDoi = "10.1097/JHM-D-18-00044.',"
>>> sampleDoi.rstrip(",.'")
'10.1097/JHM-D-18-00044'

Corrected code:
import string
invalidChars = set(string.punctuation.replace("_", ""))
a = "10.1097/JHM-D-18-00044.',"
i = -1
for each in reversed(a):
if any(char in invalidChars for char in each):
a = a[:i]
i = i # Well Really this line can just be removed all together.
else:
print (a)
break
This gives the output you want, while keeping the original code mostly the same.

This is one way using next and str.isalnum with a generator expression utilizing enumerate / reversed.
sampleDoi = "10.1097/JHM-D-18-00044.',"
idx = next((i for i, j in enumerate(reversed(sampleDoi)) if j.isalnum()), 0)
res = sampleDoi[:-idx]
print(res)
'10.1097/JHM-D-18-00044'
The default parameter 0is used so that, if no alphanumeric character is found, an empty string is returned.

If you dont wanna use regex:
the_str = "10.1097/JHM-D-18-00044.',"
while the_str[-1] in string.punctuation:
the_str = the_str[:-1]
Removes the last character until it's no longer a punctuation character.

Related

python string replace using for loop with if else

I am new with python, trying to replace string using for loop with if else condition,
I have a string and want to replace some character of that string in a such way that it should take / pick first character of the string and search them in the old_list if the character match it should replace that character with the character of new_list and if the character does not match it should consider that character (previous) and the next character together of the string and search them combinely and again search in old_list and so on.
it should replace in this oder (picking the character from string) = 010,101,010,010,100,101,00,00,011,1101,011,00,101,010,00,011,1111,1110,00,00,00,010,101,010,
replacing value = 1001,0000,0000,1000,1111,1001,1111,1111,100,1010101011,100,1111,1001,0000,1111,100,10100101,101010,1111,1111,1111,0000,1001,
with the example of above string if we performed that operation the string will becomes
final or result string = 10010000000010001111100111111111100101010101110011111001000011111001010010110101011111111111100001001
string = 01010101001010010100000111101011001010100001111111110000000010101010
old_list = ['00','011','010','101','100','1010','1011','1101','1110','1111']
new_list = ['1111','100','0000','1001','1000'1111','0101','1010101011','101010','10100101']
i = 0
for i in range((old), 0):
if i == old:
my_str = my_str.replace(old[i],new[i], 0)
else:
i = i + 1
print(my_str)
as result, string become = 10010000000010001111100111111111100101010101110011111001000011111001010010110101011111111111100001001
new = ['a ','local ','is ']
my_str = 'anindianaregreat'
old = ['an','indian','are']
for i, string in enumerate(old):
my_str = my_str.replace(string, new[i], 1)
print(my_str)
Your usage of range is incorrect.
range goes from lower (inclusive) to higher (exclusive) or simply 0 to higher (exclusive)
Your i == old condition is incorrect as well. (i is an integer, while old is a list). Also what is it supposed to do?
You can simply do:
for old_str, new_str in zip(old, new):
my_str = my_str.replace(old_str, new_str, 1)
https://docs.python.org/3/library/stdtypes.html#str.replace
You can provide an argument to replace to specify how many occurrences to replace.
No conditional is required since if old_str is absent, nothing will be replaced anyway.

Python: Is there a way to find and remove the first and last occurrence of a character in a string?

The problem:
Given a string in which the letter h occurs at least twice.
Remove from that string the first and the last occurrence of
the letter h, as well as all the characters between them.
How do I find the first and last occurrence of h? And how can I remove them and the characters in between them?
#initialize the index of the input string
index_count =0
#create a list to have indexes of 'h's
h_indexes = []
#accept input strings
origin_s = input("input:")
#search 'h' and save the index of each 'h' (and save indexes of searched 'h's into h_indexes
for i in origin_s:
first_h_index =
last_h_index =
#print the output string
print("Output:"+origin_s[ : ]+origin_s[ :])
Using a combination of index, rindex and slicing:
string = 'abc$def$ghi'
char = '$'
print(string[:string.index(char)] + string[string.rindex(char) + 1:])
# abcghi
You need to use regex:
>>> import re
>>> s = 'jusht exhamplhe'
>>> re.sub(r'h.+h', '', s)
'juse'
How do I find the first and last occurrence of h?
First occurence:
first_h_index=origin_s.find("h");
Last occurence:
last_h_index=origin_s.rfind("h");
And how can I remove them and the characters in between them?
Slicing
string = '1234-123456789'
char_list = []
for i in string:
char_list.append(string[i])
char_list.remove('character_to_remove')
According to the documentation, remove(arg) is a method acting on a mutable iterable (for example list) that removes the first instance of arg in the iterable.
This will help you to understand more clearly:
string = 'abchdef$ghi'
first=string.find('h')
last=string.rfind('h')
res=string[:first]+string[last+1:]
print(res)

Error while Trying To Print the First Occurrence of a repeating Character in a String using Python 3.6

I am writing a simple program to replace the repeating characters in a string with an *(asterisk). But the thing here is I can print the 1st occurrence of a repeating character in a string, but not the other occurrences.
For example,
if my input is Google, my output should be Go**le.
I am able to replace the characters that repeat with an asterisk, but just cant find a way to print the 1st occurrence of the character. In other words, my output right now is ****le.
Have a look at my Python3 code for this:
s = 'Google'
s = s.lower()
for i in s:
if s.count(i)>1:
s = s.replace(i,'*')
print(s)
Can someone suggest me what should be done to get the required output?
replace will replace ALL occurences of the char. You need to follow on the characters you already have seen, and if they are repeated to replace JUST this character (at specific index).
Strings don't support index assignment, so we can build a new list that represents the new string and ''.join() it afterwards.
Using Set you can follow on what items you have seen already.
It would look like this:
s = 'Google'
seen = set()
new_string = []
for c in s:
if c.lower() in seen:
new_string.append('*')
else:
new_string.append(c)
seen.add(c.lower())
new_string = ''.join(new_string)
print(new_string)
Go**le
This is my approach:
First, you need to find the nth occurrence of the character. Then, you can replace other occurrences by using this snippet:
s = s[:position] + '*' + s[position+1:]
Full example code:
def find_nth(haystack, needle, n):
start = haystack.find(needle)
while start >= 0 and n > 1:
start = haystack.find(needle, start+len(needle))
n -= 1
return start
s = 'Google'
s_lower = s.lower()
for c in s_lower:
if s_lower.count(c) > 1:
position = find_nth(s_lower, c, 2)
s = s[:position] + '*' + s[position+1:]
print(s)
Runnable link: https://repl.it/Mc4U/4
Regex approach:
import re
s = 'Google'
s_lower = s.lower()
for c in s_lower:
if s_lower.count(c) > 1:
position = [m.start() for m in re.finditer(c, s_lower)][1]
s = s[:position] + '*' + s[position+1:]
print(s)
Runnable link: https://repl.it/Mc4U/3
How about using list comprensions? When constructing a list from another list (which is kind of what you are doing here, since we're considering strings as lists), list comprehension is a great tool:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
inputstring = 'Google'.lower()
outputstring = ''.join(
[char if inputstring.find(char, 0, index) == -1 else '*'
for index, char in enumerate(inputstring)])
print(outputstring)
This results in go**le.
Hope this helps!
(edited to use '*' as the replacement character instead of '#')

Count spaces in text (treat consecutive spaces as one)

How would you count the number of spaces or new line charaters in a text in such a way that consecutive spaces are counted only as one?
For example, this is very close to what I want:
string = "This is an example text.\n But would be good if it worked."
counter = 0
for i in string:
if i == ' ' or i == '\n':
counter += 1
print(counter)
However, instead of returning with 15, the result should be only 11.
The default str.split() function will treat consecutive runs of spaces as one. So simply split the string, get the size of the resulting list, and subtract one.
len(string.split())-1
Assuming you are permitted to use Python regex;
import re
print len(re.findall(ur"[ \n]+", string))
Quick and easy!
UPDATE: Additionally, use [\s] instead of [ \n] to match any whitespace character.
You can do this:
string = "This is an example text.\n But would be good if it worked."
counter = 0
# A boolean flag indicating whether the previous character was a space
previous = False
for i in string:
if i == ' ' or i == '\n':
# The current character is a space
previous = True # Setup for the next iteration
else:
# The current character is not a space, check if the previous one was
if previous:
counter += 1
previous = False
print(counter)
re to the rescue.
>>> import re
>>> string = "This is an example text.\n But would be good if it worked."
>>> spaces = sum(1 for match in re.finditer('\s+', string))
>>> spaces
11
This consumes minimal memory, an alternative solution that builds a temporary list would be
>>> len(re.findall('\s+', string))
11
If you only want to consider space characters and newline characters (as opposed to tabs, for example), use the regex '(\n| )+' instead of '\s+'.
Just store a character that was the last character found. Set it to i each time you loop. Then within your inner if, do not increase the counter if the last character found was also a whitespace character.
You can iterate through numbers to use them as indexes.
for i in range(1, len(string)):
if string[i] in ' \n' and string[i-1] not in ' \n':
counter += 1
if string[0] in ' \n':
counter += 1
print(counter)
Pay attention to the first symbol as this constuction starts from the second symbol to prevent IndexError.
You can use enumerate, checking the next char is not also whitespace so consecutive whitespace will only count as 1:
string = "This is an example text.\n But would be good if it worked."
print(sum(ch.isspace() and not string[i:i+1].isspace() for i, ch in enumerate(string, 1)))
You can also use iter with a generator function, keeping track of the last character and comparing:
def con(s):
it = iter(s)
prev = next(it)
for ele in it:
yield prev.isspace() and not ele.isspace()
prev = ele
yield ele.isspace()
print(sum(con(string)))
An itertools version:
string = "This is an example text.\n But would be good if it worked. "
from itertools import tee, izip_longest
a, b = tee(string)
next(b)
print(sum(a.isspace() and not b.isspace() for a,b in izip_longest(a,b, fillvalue="") ))
Try:
def word_count(my_string):
word_count = 1
for i in range(1, len(my_string)):
if my_string[i] == " ":
if not my_string[i - 1] == " ":
word_count += 1
return word_count
You can use the function groupby() to find groups of consecutive spaces:
from collections import Counter
from itertools import groupby
s = 'This is an example text.\n But would be good if it worked.'
c = Counter(k for k, _ in groupby(s, key=lambda x: ' ' if x == '\n' else x))
print(c[' '])
# 11

String manipulation weirdness when incrementing trailing digit

I got this code:
myString = 'blabla123_01_version6688_01_01Long_stringWithNumbers'
versionSplit = re.findall(r'-?\d+|[a-zA-Z!##$%^&*()_+.,<>{}]+|\W+?', myString)
for i in reversed(versionSplit):
id = versionSplit.index(i)
if i.isdigit():
digit = '%0'+str(len(i))+'d'
i = int(i) + 1
i = digit % i
versionSplit[id]=str(i)
break
final = ''
myString = final.join(versionSplit)
print myString
Which suppose to increase ONLY the last digit from the string given. But if you run that code you will see that if there is the same digit in the string as the last one it will increase it one after the other if you keep running the script. Can anyone help me find out why?
Thank you in advance for any help
Is there a reason why you aren't doing something like this instead:
prefix, version = re.match(r"(.*[^\d]+)([\d]+)$", myString).groups()
newstring = prefix + str(int(version)+1).rjust(len(version), '0')
Notes:
This will actually "carry over" the version numbers properly: ("09" -> "10") and ("99" -> "100")
This regex assumes at least one non-numeric character before the final version substring at the end. If this is not matched, it will throw an AttributeError. You could restructure it to throw a more suitable or specific exception (e.g. if re.match(...) returns None; see comments below for more info).
Adjust accordingly.
The issue is the use of the list.index() function on line 5. This returns the index of the first occurrence of a value in a list, from left to right, but the code is iterating over the reversed list (right to left). There are lots of ways to straighten this out, but here's one that makes the fewest changes to your existing code: Iterate over indices in reverse (avoids reversing the list).
for idx in range(len(versionSplit)-1, -1, -1):
i = versionSplit[idx]
if chunk.isdigit():
digit = '%0'+str(len(i))+'d'
i = int(i) + 1
i = digit % i
versionSplit[idx]=str(i)
break
myString = 'blabla123_01_version6688_01_01veryLong_stringWithNumbers01'
versionSplit = re.findall(r'-?\d+|[^\-\d]+', myString)
for i in xrange(len(versionSplit) - 1, -1, -1):
s = versionSplit[i]
if s.isdigit():
n = int(s) + 1
versionSplit[i] = "%0*d" % (len(s), n)
break
myString = ''.join(versionSplit)
print myString
Notes:
It is silly to use the .index() method to try to find the string. Just use a decrementing index to try each part of versionSplit. This was where your problem was, as commented above by #David Robinson.
Don't use id as a variable name; you are covering up the built-in function id().
This code is using the * in a format template, which will accept an integer and set the width.
I simplified the pattern: either you are matching a digit (with optional leading minus sign) or else you are matching non-digits.
I tested this and it seems to work.
First, three notes:
id is a reserved python word;
For joining, a more pythonic idiom is ''.join(), using a literal empty string
reversed() returns an iterator, not a list. That's why I use list(reversed()), in order to do rev.index(i) later.
Corrected code:
import re
myString = 'blabla123_01_version6688_01_01veryLong_stringWithNumbers01'
print myString
versionSplit = re.findall(r'-?\d+|[a-zA-Z!##$%^&*()_+.,<>{}]+|\W+?', myString)
rev = list(reversed(versionSplit)) # create a reversed list to work with from now on
for i in rev:
idd = rev.index(i)
if i.isdigit():
digit = '%0'+str(len(i))+'d'
i = int(i) + 1
i = digit % i
rev[idd]=str(i)
break
myString = ''.join(reversed(rev)) # reverse again only just before joining
print myString

Categories