I have a string, and i need to check whether it contains a number/digit at the end of the string, and need to increment that number/digit at the end of the string with +1
I will get the strings as below
string2 = suppose_name_1
string3 = suppose_name_22
string4 = supp22ose45_na56me_45
for sure i will get the string in the above format like suppose_somthing + Underscore + digits
So from the above strings
I need to check whether a string contains a number/digit at the end of the string after underscore
If it contains then need to increment that with +1 like below
string2 = suppose_name_2
string3 = suppose_name_23
string4 = supp22ose45_na56me_46
How can we do this in python by using regular expressions or something, but that should be very fast.
I have done something like here, but want to implement with re that will be very fast , so approached SO
Edit:
sorry din't mentioned above
Sometimes it contains just something_name without integer, hence i need to check whether it contains a number in it first
How about using regular expressions:
import re
def process_string(s):
try:
part1, part2 = re.search('^(.*_)(\d+)$', s).groups()
part2 = str(int(part2) + 1)
return part1 + part2
except AttributeError:
return s
print process_string("suppose_name_1")
print process_string("suppose_name_22")
print process_string("supp22ose45_na56me_45")
print process_string("suppose_name")
prints:
suppose_name_2
suppose_name_23
supp22ose45_na56me_46
suppose_name
FYI, there is nothing wrong or scary with using regular expressions.
You don't need regex. You can just use simple str.replace:
>>> s = 'suppose_name_1'
>>> index = s.rfind('_') # Last index of '_'
>>> s.replace(s[index+1:], str(int(s[index+1:]) + 1))
'suppose_name_2'
If you need to first check whether you have digits at the end, you can check that using str.isdigit() method:
>>> s = 'suppose_name'
>>>
>>> index = s.rfind('_')
>>> if s[index+1:].isdigit():
s = s.replace(s[index+1:], str(int(s[index+1:]) + 1))
>>> s
'suppose_name'
Here's short regex solution that increments the number with re.sub(...):
from re import sub
string2 = 'suppose_name_1'
string3 = 'suppose_name_22'
string4 = 'supp22ose45_na56me_45'
print [sub(r'^(?P<pretext>.*_)(?P<number>\d+)$', lambda x : '%s%d' % (x.group('pretext'), int(x.group('number')) + 1), s) for s in (string2, string3, string4)]
and the output:
['suppose_name_2', 'suppose_name_23', 'supp22ose45_na56me_46']
The easier to read version would be something like this:
from re import sub
string2 = 'suppose_name_1'
string3 = 'suppose_name_22'
string4 = 'supp22ose45_na56me_45'
regex = r'^(?P<pretext>.*_)(?P<number>\d+)$'
def increment(matchObject):
return '%s%d' % (matchObject.group('pretext'), int(matchObject.group('number')) + 1)
for s in (string2, string3, string4):
print sub(regex, increment, s)
and the output:
suppose_name_2
suppose_name_23
supp22ose45_na56me_46
Related
we get a string from user and want to lowercase it and remove vowels and add a '.' before each letter of it. for example we get 'aBAcAba' and change it to '.b.c.b' . two early things are done but i want some help with third one.
str = input()
str=str.lower()
for i in range(0,len(str)):
str=str.replace('a','')
str=str.replace('e','')
str=str.replace('o','')
str=str.replace('i','')
str=str.replace('u','')
print(str)
for j in range(0,len(str)):
str=str.replace(str[j],('.'+str[j]))
print(str)
A few things:
You should avoid the variable name str because this is used by a builtin, so I've changed it to st
In the first part, no loop is necessary; replace will replace all occurrences of a substring
For the last part, it is probably easiest to loop through the string and build up a new string. Limiting this answer to basic syntax, a simple for loop will work.
st = input()
st=st.lower()
st=st.replace('a','')
st=st.replace('e','')
st=st.replace('o','')
st=st.replace('i','')
st=st.replace('u','')
print(st)
st_new = ''
for c in st:
st_new += '.' + c
print(st_new)
Another potential improvement: for the second part, you can also write a loop (instead of your five separate replace lines):
for c in 'aeiou':
st = st.replace(c, '')
Other possibilities using more advanced techniques:
For the second part, a regular expression could be used:
st = re.sub('[aeiou]', '', st)
For the third part, a generator expression could be used:
st_new = ''.join(f'.{c}' for c in st)
You can use str.join() to place some character in between all the existing characters, and then you can use string concatenation to place it again at the end:
# st = 'bcb'
st = '.' + '.'.join(st)
# '.b.c.b'
As a sidenote, please don't use str as a variable name. It's the name of the "string" datatype, and if you make a variable named it then you can't properly work with other strings any more. string, st, s, etc. are fine, as they're not the reserved keyword str.
z = "aBAcAba"
z = z.lower()
newstring = ''
for i in z:
if not i in 'aeiou':
newstring+='.'
newstring+=i
print(newstring)
Here I have gone step by step, first converting the string to lowercase, then checking if the word is not vowel, then add a dot to our final string then add the word to our final string.
You could try splitting the string into an array and then build a new string with the indexes of the array appending an "."
not too efficient but will work.
thanks to all of you especially allani. the bellow code worked.
st = input()
st=st.lower()
st=st.replace('a','')
st=st.replace('e','')
st=st.replace('o','')
st=st.replace('i','')
st=st.replace('u','')
print(st)
st_new = ''
for c in st:
st_new += '.' + c
print(st_new)
This does everything.
import re
data = 'KujhKyjiubBMNBHJGJhbvgqsauijuetystareFGcvb'
matches = re.compile('[^aeiou]', re.I).finditer(data)
final = f".{'.'.join([m.group().lower() for m in matches])}"
print(final)
#.k.j.h.k.y.j.b.b.m.n.b.h.j.g.j.h.b.v.g.q.s.j.t.y.s.t.r.f.g.c.v.b
s = input()
s = s.lower()
for i in s:
for x in ['a','e','i','o','u']:
if i == x:
s = s.replace(i,'')
new_s = ''
for i in s:
new_s += '.'+ i
print(new_s)
def add_dots(n):
return ".".join(n)
print(add_dots("test"))
def remove_dots(a):
return a.replace(".", "")
print(remove_dots("t.e.s.t"))
How can I remove the all lowercase letters before and after "Johnson" in these strings?
str1 = 'aBcdJohnsonzZz'
str2 = 'asdVJohnsonkkk'
Expected results are as below:
str1 = 'BJohnsonZ'
str2 = 'VJohnson'
You can partition the string, check it had the separator, than translate out lowercase letters, eg:
from string import ascii_lowercase as alc
str1 = 'aBcdJohnsonzZz'
p1, sep, p2 = str1.partition('Johnson')
if sep:
str1 = p1.translate(None, alc) + sep + p2.translate(None, alc)
print str1
str.partition() is your friend here:
def munge(text, match):
prefix, match, suffix = text.partition(match)
prefix = "".join(c for c in prefix if not c.islower())
suffix = "".join(c for c in suffix if not c.islower())
return prefix + match + suffix
Example use:
>>> munge("aBcdJohnsonzZz", "Johnson")
'BJohnsonZ'
>>> munge("asdVJohnsonkkk", "Johnson")
'VJohnson'
import re
def foo(input_st, keep_st):
parts = input_st.split(keep_st)
clean_parts = [re.sub("[a-z]*", "", part) for part in parts]
return keep_st.join(clean_parts)
Other methods using the partition module don't seem to take into account your trigger word being repeated. This example will work in the case you have 'aBcJohnsonDeFJohnsonHiJkL' in the event that, that particular case is of concern to you.
There are a couple of ways you could tackle this. Here's the simplest one I could think of. The idea is to tackle it in three parts. First off, you need to know the middle string. In your case 'Johnson.' Then you can remove the lowercase letters from the part before and the part after.
def removeLowercaseAround(full, middle):
stop_at = full.index(middle) #the beginning of the name
start_again = stop_at+len(middle) #the end of the name
new_str = ''; #the string we'll return at the end
for i in range(stop_at): #for each char until the middle starts
if not full[i].islower(): #if it is not a lowercase char
new_str += full[i] #add it to the end of the new string
new_str+=middle #then add the middle char
for i in range(start_again, len(full)): #do the same thing with the end
if not full[i].islower(): #if it is not a lowercase char
new_str += full[i] #add it to the string
return new_str
print removeLowercaseAround('ABcdJohnsonzZZ', 'Johnson')
Not exactly very simple or streamlined, but you could do this sort of thing (based partially on Zero Piraeus')
(edited to reflect errors)
def remove_lower(string):
return ''.join(filter(str.isupper, string))
def strip_johnson(input_str):
prefix, match, postfix = input_str.partition("Johnson")
return (
remove_lower(prefix) +
match +
remove_lower(postfix)
)
If I have a series of python strings that I'm working with that will always take the form of
initialword_content
and I want to strip out the initialword portion, which will always be the same number of characters, and then I want to turn all instances of _ into spaces -- since content may have some underscores in it -- what's the easiest way to do that?
strs = "initialword_content"
strs = strs[12:].replace("_", " ")
print strs
Due to the initialword always has same number of character, so you can just get the suffix of the string. And use string.replace to replace all "_" into spaces.
First, split the string once (with the parameter 1 to split) to get two parts: the throw-away 'initialword' and the rest, where you replace all underscores with spaces.
s = 'initialword_content'
a, b = s.split('_', 1)
b = b.replace('_', ' ')
# b == 'content'
s = 'initialword_content_with_more_words'
a, b = s.split('_', 1)
b = b.replace('_', ' ')
# b == 'content with more words'
This can be done with a single command:
s.split('_', 1)[1].replace('_', ' ')
another way:
' '.join(s.split('_')[1:])
or, if the length of "initialword" is always the same (and you don't have to calculate it each time), take the #JunHu's solution.
I used slicing and the replace() function. replace() simply... replaces!
string = 'initialword_content'
content = string[12:] # You mentioned that intialword will always be the same length, so I used slicing.
content = content.replace('_', ' ')
For example:
>>> string = 'elephantone_con_ten_t' # elephantone was the first thing I thought of xD
>>> content = string[12:]
>>> content
... con_ten_t
>>> content = content.replace('_', ' ')
>>> content
... con ten t
However, if you also want to reference "elephantone" somewhere else, do this:
>>> string = 'elephantone_con_ten_t'
>>> l = string.split('_', 1) # This will only strip the string ONCE from the left.
>>> l[0]
... 'elephantone'
>>> l[1].replace('_', ' ')
... 'con ten t'
I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?
Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK
You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'
Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'
Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)
You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?
You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E
this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()
There is a string, it contains numbers and characters.
I need to find an entire number(s) (in that string) that contains number 467033.
e.g. 1.467033777777777
Thanks
Try this:
import re
RE_NUM = re.compile('(\d*\.\d+)', re.M)
text = 'eghwodugo83o135.13508yegn1.4670337777777773u87208t'
for num in RE_NUM.findall(text):
if '467033' in num:
print num
Prints:
1.4670337777777773
Generalized / optimized in response to comment:
def find(text, numbers):
pattern = '|'.join('[\d.]*%s[\d.]*' % n for n in numbers)
re_num = re.compile(pattern, re.M)
return [m.group() for m in re_num.finditer(text)]
print find(text, ['467033', '13'])
Prints:
['135.13508', '1.4670337777777773']
If you're just searching for a substring within another substring, you can use in:
>>> sub_num = "467033"
>>> my_num = "1.467033777777777"
>>> sub_num in my_num
True
However, I suspect there's more to your problem than just searching strings, and that doing it this way might not be optimal. Can you be more specific about what you're trying to do?
import re
a = 'e.g. 1.467033777777777\nand also 576575567467033546.90 Thanks '
r = re.compile('[0-9.]*467033[0-9.]*')
r.findall(a)
['1.467033777777777', '576575567467033546.90']