looking to parse a string in python

looking to parse a string in python - python

If I have a series of python strings that I'm working with that will always take the form of
initialword_content
and I want to strip out the initialword portion, which will always be the same number of characters, and then I want to turn all instances of _ into spaces -- since content may have some underscores in it -- what's the easiest way to do that?

strs = "initialword_content"
strs = strs[12:].replace("_", " ")
print strs
Due to the initialword always has same number of character, so you can just get the suffix of the string. And use string.replace to replace all "_" into spaces.

First, split the string once (with the parameter 1 to split) to get two parts: the throw-away 'initialword' and the rest, where you replace all underscores with spaces.
s = 'initialword_content'
a, b = s.split('_', 1)
b = b.replace('_', ' ')
# b == 'content'
s = 'initialword_content_with_more_words'
a, b = s.split('_', 1)
b = b.replace('_', ' ')
# b == 'content with more words'
This can be done with a single command:
s.split('_', 1)[1].replace('_', ' ')
another way:
' '.join(s.split('_')[1:])
or, if the length of "initialword" is always the same (and you don't have to calculate it each time), take the #JunHu's solution.

I used slicing and the replace() function. replace() simply... replaces!
string = 'initialword_content'
content = string[12:] # You mentioned that intialword will always be the same length, so I used slicing.
content = content.replace('_', ' ')
For example:
>>> string = 'elephantone_con_ten_t' # elephantone was the first thing I thought of xD
>>> content = string[12:]
>>> content
... con_ten_t
>>> content = content.replace('_', ' ')
>>> content
... con ten t
However, if you also want to reference "elephantone" somewhere else, do this:
>>> string = 'elephantone_con_ten_t'
>>> l = string.split('_', 1) # This will only strip the string ONCE from the left.
>>> l[0]
... 'elephantone'
>>> l[1].replace('_', ' ')
... 'con ten t'

Related

how to add a dot before each letter in a string in python

we get a string from user and want to lowercase it and remove vowels and add a '.' before each letter of it. for example we get 'aBAcAba' and change it to '.b.c.b' . two early things are done but i want some help with third one.
str = input()
str=str.lower()
for i in range(0,len(str)):
str=str.replace('a','')
str=str.replace('e','')
str=str.replace('o','')
str=str.replace('i','')
str=str.replace('u','')
print(str)
for j in range(0,len(str)):
str=str.replace(str[j],('.'+str[j]))
print(str)

A few things:
You should avoid the variable name str because this is used by a builtin, so I've changed it to st
In the first part, no loop is necessary; replace will replace all occurrences of a substring
For the last part, it is probably easiest to loop through the string and build up a new string. Limiting this answer to basic syntax, a simple for loop will work.
st = input()
st=st.lower()
st=st.replace('a','')
st=st.replace('e','')
st=st.replace('o','')
st=st.replace('i','')
st=st.replace('u','')
print(st)
st_new = ''
for c in st:
st_new += '.' + c
print(st_new)
Another potential improvement: for the second part, you can also write a loop (instead of your five separate replace lines):
for c in 'aeiou':
st = st.replace(c, '')
Other possibilities using more advanced techniques:
For the second part, a regular expression could be used:
st = re.sub('[aeiou]', '', st)
For the third part, a generator expression could be used:
st_new = ''.join(f'.{c}' for c in st)

You can use str.join() to place some character in between all the existing characters, and then you can use string concatenation to place it again at the end:
# st = 'bcb'
st = '.' + '.'.join(st)
# '.b.c.b'
As a sidenote, please don't use str as a variable name. It's the name of the "string" datatype, and if you make a variable named it then you can't properly work with other strings any more. string, st, s, etc. are fine, as they're not the reserved keyword str.

z = "aBAcAba"
z = z.lower()
newstring = ''
for i in z:
if not i in 'aeiou':
newstring+='.'
newstring+=i
print(newstring)
Here I have gone step by step, first converting the string to lowercase, then checking if the word is not vowel, then add a dot to our final string then add the word to our final string.

You could try splitting the string into an array and then build a new string with the indexes of the array appending an "."
not too efficient but will work.

thanks to all of you especially allani. the bellow code worked.
st = input()
st=st.lower()
st=st.replace('a','')
st=st.replace('e','')
st=st.replace('o','')
st=st.replace('i','')
st=st.replace('u','')
print(st)
st_new = ''
for c in st:
st_new += '.' + c
print(st_new)

This does everything.
import re
data = 'KujhKyjiubBMNBHJGJhbvgqsauijuetystareFGcvb'
matches = re.compile('[^aeiou]', re.I).finditer(data)
final = f".{'.'.join([m.group().lower() for m in matches])}"
print(final)
#.k.j.h.k.y.j.b.b.m.n.b.h.j.g.j.h.b.v.g.q.s.j.t.y.s.t.r.f.g.c.v.b

s = input()
s = s.lower()
for i in s:
for x in ['a','e','i','o','u']:
if i == x:
s = s.replace(i,'')
new_s = ''
for i in s:
new_s += '.'+ i
print(new_s)

def add_dots(n):
return ".".join(n)
print(add_dots("test"))
def remove_dots(a):
return a.replace(".", "")
print(remove_dots("t.e.s.t"))

How to extract a specific symbols from string if they follow after the number

I need to extract a single or multiple symbols # from a string. I only need those symbols that follow one after another and are not separated with any characters and white spaces.
The symbol or multiple symbols # should follow right after number. If not the symbols should be disregarded and not returned.
From a string a i would need to extract only three ### symbols since the fourth symbol is separated with a white space character.
a='some text 1 a8 777### # more text here 123 456`
result would be:
###
From variable b the function would return None since not a single symbol # follows after a number or numbers.
b='some text ### # more text here 123 456`
From c variable only a single symbol # is returned since it is the only one that follows after the numbers (and not separated from them):
c='some text ### 777# more text here 123 456`
result: #

You can use regex for this:
>>> import re
>>> r = re.compile(r'\d(#+)')
>>> a = 'some text 1 a8 777### # more text here 123 456'
>>> r.search(a).group(1)
'###'
>>> b = 'some text ### # more text here 123 456'
>>> r.search(b) #None
>>> c = 'some text ### 777# more text here 123 456'
>>> r.search(c).group(1)
'#'
Combine it with an if condition to check whether the regex matched anything in string or not:
>>> m = r.search(c)
>>> if m:
print m.group(1)
#

While there's probably a regular expression to do this, a loop is easier to understand if you don't know what regex-es are.
i = 0
found = False
while i < len(string) and not found:
if i != 0 and string[i] == '#':
if string[i-1].isnumeric():
found = True
else:
i+=1
else:
i+=1
if not found:
return None
else:
out = ''
while string[i] == '#':
out += '#'
i+=1
return out
Probably can be rewritten better, but that's the simplistic way to do it.
Footnote: A regex would be better though.

import re
print re.findall('[0-9]#+', a)
This would print a list containing all the matches, in the above case it would print
['7###']
Now you can do slicing on the string, to get what you want.
Hope this helps !

Does this work for you?
>>> import re
>>> re.search('\d(#+)', a).groups()[0]
'###'
>>> re.search('\d(#+)', b)
>>> re.search('\d(#+)', c).groups()[0]
'#'

Check whether a string contains a numeric/digit/number in python

I have a string, and i need to check whether it contains a number/digit at the end of the string, and need to increment that number/digit at the end of the string with +1
I will get the strings as below
string2 = suppose_name_1
string3 = suppose_name_22
string4 = supp22ose45_na56me_45
for sure i will get the string in the above format like suppose_somthing + Underscore + digits
So from the above strings
I need to check whether a string contains a number/digit at the end of the string after underscore
If it contains then need to increment that with +1 like below
string2 = suppose_name_2
string3 = suppose_name_23
string4 = supp22ose45_na56me_46
How can we do this in python by using regular expressions or something, but that should be very fast.
I have done something like here, but want to implement with re that will be very fast , so approached SO
Edit:
sorry din't mentioned above
Sometimes it contains just something_name without integer, hence i need to check whether it contains a number in it first

How about using regular expressions:
import re
def process_string(s):
try:
part1, part2 = re.search('^(.*_)(\d+)$', s).groups()
part2 = str(int(part2) + 1)
return part1 + part2
except AttributeError:
return s
print process_string("suppose_name_1")
print process_string("suppose_name_22")
print process_string("supp22ose45_na56me_45")
print process_string("suppose_name")
prints:
suppose_name_2
suppose_name_23
supp22ose45_na56me_46
suppose_name
FYI, there is nothing wrong or scary with using regular expressions.

You don't need regex. You can just use simple str.replace:
>>> s = 'suppose_name_1'
>>> index = s.rfind('_') # Last index of '_'
>>> s.replace(s[index+1:], str(int(s[index+1:]) + 1))
'suppose_name_2'
If you need to first check whether you have digits at the end, you can check that using str.isdigit() method:
>>> s = 'suppose_name'
>>>
>>> index = s.rfind('_')
>>> if s[index+1:].isdigit():
s = s.replace(s[index+1:], str(int(s[index+1:]) + 1))
>>> s
'suppose_name'

Here's short regex solution that increments the number with re.sub(...):
from re import sub
string2 = 'suppose_name_1'
string3 = 'suppose_name_22'
string4 = 'supp22ose45_na56me_45'
print [sub(r'^(?P<pretext>.*_)(?P<number>\d+)$', lambda x : '%s%d' % (x.group('pretext'), int(x.group('number')) + 1), s) for s in (string2, string3, string4)]
and the output:
['suppose_name_2', 'suppose_name_23', 'supp22ose45_na56me_46']
The easier to read version would be something like this:
from re import sub
string2 = 'suppose_name_1'
string3 = 'suppose_name_22'
string4 = 'supp22ose45_na56me_45'
regex = r'^(?P<pretext>.*_)(?P<number>\d+)$'
def increment(matchObject):
return '%s%d' % (matchObject.group('pretext'), int(matchObject.group('number')) + 1)
for s in (string2, string3, string4):
print sub(regex, increment, s)
and the output:
suppose_name_2
suppose_name_23
supp22ose45_na56me_46

python string manipulation

I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?

Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK

You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.

>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'

Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'

Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)

You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?

You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E

this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()

`' '.join(e)` isn't changing value of e

I'm creating a function to create all 26 combinations of words with a fixed suffix. The script works except for the JOIN in the second-to-last line.
def create_word(suffix):
e=[]
letters="abcefghijklmnopqrstuvwxyz"
t=list(letters)
for i in t:
e.append(i)
e.append(suffix)
' '.join(e)
print e
Currently, it is printing ['a', 'suffix', 'b', 'suffix, ...etc]. And I want it to print out as one long string: 'aSuffixbSuffixcSuffix...etc.' Why isn't the join working in this? How can I fix this?
In addition, how would I separate the characters once I have the string? For example to translate "take the last character of the suffix and add a space to it every time ('aSuffixbSuffixcSuffix' --> 'aSuffix bSuffix cSuffix')". Or, more generally, to replace the x-nth character, where x is any integer (e.g., to replace the 3rd, 6th, 9th, etc. character some something I choose).

str.join returns the new value, not transform the existing one. Here's one way to accomplish it.
result = ' '.join(e)
print result
But if you're feeling clever, you can streamline a lot of the setup.
import string
def create_word(suffix):
return ' '.join(i + suffix for i in string.ascii_lowercase)

join doesn't change its arguments - it just returns a new string:
result = ' '.join(e)
return result

If you really want the output you specified (all of the results concatenated together):
>>> import string
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
>>> letters = string.ascii_lowercase
>>> suffix = 'Suffix'
>>> ''.join('%s%s' % (l, suffix) for l in letters)
'aSuffixbSuffixcSuffixdSuffixeSuffixfSuffixgSuffixhSuffixiSuffixjSuffixkSuffixlSuffixmSuffixnSuffixoSuffixpSuffixqSuffixrSuffixsSuffixtSuffixuSuffixvSuffixwSuffixxSuffixySuffixzSuffix'

Beside the problem already mentioned by rekursive, you should have a look at list comprehension:
def create_word(suffix):
return ''.join(
[i+suffix for i in "abcefghijklmnopqrstuvwxyz"]
)
print create_word('suffix')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

looking to parse a string in python - python

strs = "initialword_content" strs = strs[12:].replace("_", " ") print strs Due to the initialword always has same number of character, so you can just get the suffix of the string. And use string.replace to replace all "_" into spaces.

Related

how to add a dot before each letter in a string in python

How to extract a specific symbols from string if they follow after the number

Check whether a string contains a numeric/digit/number in python

python string manipulation

`' '.join(e)` isn't changing value of e

Categories

Resources