Let us say I have a string
c = "a string is like this and roberta a a thanks"
I want the output to be as
' string is like this and roberta thanks"
This is what I am trying
c.replace('a', ' ')
' string is like this nd robert thnks'
But this replaces each 'a' in the string
So I tried this
c.replace(' a ', ' ')
'a string is like this and roberta thanks'
But this leaves out 'a' in the starting of the string.
How do i do this?
this looks like a job for re :
import re
while re.subn('(\s+a\s+|^a\s+)',' ',txt)[1]!=0:
txt=re.subn('(\s+a\s+|^a\s+)',' ',txt)[0]
I myself figured it out.
c = "a string is like this and roberta a a thanks"
import re
re.sub('\\ba\\b', ' ', c)
' string is like this and roberta thanks'
Here you go myself! Enjoy!
Related
Is it possible to ignore string in quotes for python replace()?
I have a string variable like this:
a = "I like bananas 'I like bananas'"
I want to get a result like this via replace():
"I like apples 'I like bananas'".
But when I execute print(a.replace("bananas", "apples")),the result is:
"I like apples 'I like apples'".
How can I do to make replace() ignore string in quotes?
Split the string by ', process only the odd elements of the array, reassemble the string
a = "I like bananas 'I like bananas'"
ap = a.split("'")
ar = [ ai.replace("bananas", "apples") if i%2==0 else ai for i,ai in enumerate(ap)]
print("'".join(ar))
Here is regexp example:
import re
text = "I like bananas 'I like bananas' 'I like also bananas'"
def replace2(orginal_text, b, c):
pattern = re.compile(r".*? (\'.*?\')") # patternt to match text inside single quotes
matches = []
for match in pattern.findall(orginal_text): # match with pattern as many times pattern is found
matches.append(match)
for match in matches:
replace_with = match.replace(b, c) # replace b with c in matched string
orginal_text = re.sub(match, replace_with, orginal_text) # replace matched text with new string
return orginal_text
result = replace2(text, "bananas", "apples")
print(result)
It will try to foind all text that are between single quotes. Then replaces the old string (b) with new (c) from the matches. Finally replaces the new edited matches from original string.
No, it is not possible, you cannot make replace ignore those matches. You will have to code your own solution.
You can use count value (optional parameter of the replace method) to specify how many occurrences of the old value you want to replace.
It works fine for both.
a = "I like bananas \"I like bananas\""
print(a.replace("bananas", "apples",1))
a = "I like bananas 'I like bananas'"
print(a.replace("bananas", "apples",1))
Output:
I like apples 'I like bananas'
It's absolutely possible, this is complete answer for this question :
import re
original_str = "I like bananas 'I Love banana' somthing 'I like banana' I love banana ' I like babana again' "
pattern = r"('(.+?)')"
replaced_str = ''
quoted_strings = re.compile(pattern)
newstring = "foo"
x_start = 0
print("original_str = (", original_str+")\n")
for m in quoted_strings.finditer(original_str):
print(m.span(), m.group())
x_end, x_next = m.span()
w = original_str[x_start:x_end]
w = w.replace("banana", "apple")
replaced_str = replaced_str + w + original_str[x_end:x_next]
x_start = x_next
print(replaced_str)
output :
original_str = ( I like bananas 'I Love banana' somthing 'I like banana' I love banana ' I like babana again' )
(15, 30) 'I Love banana'
(42, 57) 'I like banana'
(73, 95) ' I like babana again'
I like apples 'I Love banana' somthing 'I like banana' I love apple ' I like babana again'
As per your update to your requirements in your reply to gnight
a = "I like bananas 'I like \'bananas\' ' "
print (a)
Gives:
I like bananas 'I like 'bananas' '
as the \' gets converted to ' when run,
that is it is the same as
a = "I like bananas 'I like 'bananas' ' "
as gnight says the only real option is to only replace in the first and last sections of the string that arent in quotes, Ie
a = "I like bananas 'I like \'bananas\' ' "
ap = a.split("'")
if len(ap)>0:
ap[0]=ap[0].replace("bananas", "apples")
if len(ap)>1:
ap[-1]=ap[-1].replace("bananas", "apples")
print("'".join(ap))
that gives:
I like apples 'I like 'bananas' '
In the past i have written parsers to handle tripple quote escaping that excel uses and a state machine to track the quote state, not fun to implement if you end up having to do that.If you can give some more examples of desired input an output it may help
I need to make a modification on a python code.
This code scrapes information from a .csv file, to finally integrate it in a new .csv file, in a different structure.
In one of the columns of the source files, I have a value (string), which is in 99% of the time formed this way: 'block1 block2 block3'.
Block2 always ends with the value 'm' 99% of the time.
example: 'R2 180m RFT'.
By browsing the source dataset, I realized that in 1% of the cases, the block2 can end with 'M'.
As I need all the values after the 'm' or 'M' value, I'm a bit stuck.
I used the .split() function, like this in my :
'Newcolumn': getattr(row_unique_ids, 'COLUMNINTHEDATASET').split ('m') [1],
By doing so, my script falls in error, because it falls on a value of this style :
R2 180M AST'.
So I would like to know how to integrate an additional argument, which would allow me to make the split work well if the script falls on 'm' or 'M'.
Thank you for your help.
One solution is to
s = getattr(row_unique_ids, 'COLUMNINTHEDATASET')
s = s.lower()
s.split('m')[1]
But that will mess up your casing. If you want to preserve casing,
another solution is to do:
x = ''
s = getattr(row_unique_ids, 'COLUMNINTHEDATASET')
for c in s:
if c == 'M'
x += 'm'
x += c
x.split('m')[1]
One way to do multi-arguments split is, in general:
import re
string = "this is 3an infamous String4that I need to s?plit in an infamou.s way"
#Preserve the original char
print (re.sub(r"([0-9]|[?.]|[A-Z])",r'\1'+"DELIMITER",string).split('DELIMITER'))
#Discard the original char
print (re.sub(r"([0-9]|[?.]|[A-Z])","DELIMITER",string).split('DELIMITER'))
Output:
['this is 3', 'an infamous S', 'tring4', 'that I', ' need to s?', 'plit in an infamou.', 's way']
['this is ', 'an infamous ', 'tring', 'that ', ' need to s', 'plit in an infamou', 's way']
In your context:
import re
string = "R2 180m RFT R2 180M RFT"
print (re.sub(r"\b([0-9]+)[mM]\b",r'\1'+"M",string).split('M'))
#print (re.sub(r"\b([0-9]+)[mM]\b",r'\1'+"M",getattr(row_unique_ids, 'COLUMNINTHEDATASET')).split('M'))
Output:
['R2 180', ' RFT R2 180', ' RFT']
It will split on m and M if those are preceded by a number.
I am using maketrans from string module in Python 3 to do simple text preprocessing like lowering, removing digits and punctuations. The problem is that during the punctuation removal all words are attached together with no empty space! For example, let's say I have the following text:
text='[{"Hello":"List:","Test"321:[{"Hello":"Airplane Towel for Kitchen"},{"Hello":2 " Repair massive utilities "2},{"Hello":"Some 3 appliance for our kitchen"2}'
text=text.lower()
text=text.translate(str.maketrans(' ',' ',string.digits))
Works just fine, it gives:
'[{"hello":"list:","test":[{"hello":"airplane towel for kitchen"},{"hello": " repair massives utilities "},{"hello":"some appliance for our kitchen"}'
But once I want to remove the punctuations:
text=text.translate(str.maketrans(' ',' ',string.punctuation))
It gives me this:
'hellolisttesthelloairplane towel for kitchenhello nbsprepair massives utilitiesnbsphellosome appliance for our kitchen'
Ideally it should yield:
'hello list test hello airplane towel for kitchen hello nbsp repair massives utilities nbsp hello some appliance for our kitchen'
There is not specific reason I am doing it with maketrans, but I like as it is fast and easy and kind of stuck solving it. Thanks!
Disclaimer: I already know how to do it with re like the following:
import re
s = "string.]With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)
well... this works
txt = text.translate(str.maketrans(string.punctuation, ' ' * len(string.punctuation))).replace(' '*4, ' ').replace(' '*3, ' ').replace(' '*2, ' ').strip()
I have such a String as an example:
"[greeting] Hello [me] my name is John."
I want to split it and get such a result
('[greetings]', 'Hello' , '[me]', 'my name is John')
Can it be done in one line of code?
OK another example as it seems that many misunderstood the question.
"[greeting] Hello my friends [me] my name is John. [bow] nice to meet you."
then I should get
('[greetings]', ' Hello my friends ' , '[me]', ' my name is John. ', '[bow]', ' nice to meet you.')
I basically want to send this kind of string to my robot. It will automatically decompose it and do some motion corresponding to [greetings] [me] and [bow] and in between speak the other strings.
Using regex:
>>> import re
>>> s = "[greeting] Hello my friends [me] my name is John. [bow] nice to meet you."
>>> re.findall(r'\[[\w\s.]+\]|[\w\s.]+', s)
['[greeting]', ' Hello my friends ', '[me]', ' my name is John. ', '[bow]', ' nice to meet you.']
Edit:
>>> s = "I can't see you"
>>> re.findall(r'\[.*?\]|.*?(?=\[|$)', s)[:-1]
["I can't see you"]
>>> s = "[greeting] Hello my friends [me] my name is John. [bow] nice to meet you."
>>> re.findall(r'\[.*?\]|.*?(?=\[|$)', s)[:-1]
['[greeting]', ' Hello my friends ', '[me]', ' my name is John. ', '[bow]', ' nice to meet you.'
The function you're after is .split(). The function accepts a delimiter as its argument and returns a list made by splitting the string at every occurrence of the delimiter. To split a string, using either "[" or "]" as a delimiter, you should use a regular expression:
import re
str = "[greeting] Hello [me] my name is John."
re.split("\]|\[", str)
# returns ['', 'greeting', ' Hello ', 'me', ' my name is John.']
This uses a regular expression to split the string.
\] # escape the right bracket
| # OR
\[ # escape the left bracket
I think can't be done in one line, you need first split by ], then [:
# Run in the python shell
sentence = "[greeting] Hello [me] my name is John."
for part in sentence.split(']')
part.split('[')
# Output
['', 'greeting']
[' Hello ', 'me']
[' my name is John.']
I'm trying to decode the strings in the list below. They were all encoded in utf-8 format.
_strs=['."\n\nThe vicar\'',':--\n\nIn the', 'cathedral']
Expected output:
['.The vicar', ':--In the', 'cathedral']
My attempts
>>> for x in _str:
x.decode('string_escape')
print x
'."\n\nThe vicar\''
."
The vicar'
':--\n\nIn the'
:--
In the
'cathedral'
cathedral
>>> print [x.decode('string_escape') for x in _str]
['."\n\nThe vicar\'', ':--\n\nIn the', 'cathedral']
Both attempts failed.
Any ideas?
So you want to remove some characters from your list, it can be done using a simple regex like in the following:
import re
print [re.sub(r'[."\'\n]','',x) for x in _str]
this regex removes all the (., ", ', \n) and the result will be:
['The vicar', ':--In the', 'cathedral']
hope this helps.