Is it possible to ignore string in quotes for replace()? - python

Is it possible to ignore string in quotes for python replace()?
I have a string variable like this:
a = "I like bananas 'I like bananas'"
I want to get a result like this via replace():
"I like apples 'I like bananas'".
But when I execute print(a.replace("bananas", "apples")),the result is:
"I like apples 'I like apples'".
How can I do to make replace() ignore string in quotes?

Split the string by ', process only the odd elements of the array, reassemble the string
a = "I like bananas 'I like bananas'"
ap = a.split("'")
ar = [ ai.replace("bananas", "apples") if i%2==0 else ai for i,ai in enumerate(ap)]
print("'".join(ar))

Here is regexp example:
import re
text = "I like bananas 'I like bananas' 'I like also bananas'"
def replace2(orginal_text, b, c):
pattern = re.compile(r".*? (\'.*?\')") # patternt to match text inside single quotes
matches = []
for match in pattern.findall(orginal_text): # match with pattern as many times pattern is found
matches.append(match)
for match in matches:
replace_with = match.replace(b, c) # replace b with c in matched string
orginal_text = re.sub(match, replace_with, orginal_text) # replace matched text with new string
return orginal_text
result = replace2(text, "bananas", "apples")
print(result)
It will try to foind all text that are between single quotes. Then replaces the old string (b) with new (c) from the matches. Finally replaces the new edited matches from original string.

No, it is not possible, you cannot make replace ignore those matches. You will have to code your own solution.

You can use count value (optional parameter of the replace method) to specify how many occurrences of the old value you want to replace.
It works fine for both.
a = "I like bananas \"I like bananas\""
print(a.replace("bananas", "apples",1))
a = "I like bananas 'I like bananas'"
print(a.replace("bananas", "apples",1))
Output:
I like apples 'I like bananas'

It's absolutely possible, this is complete answer for this question :
import re
original_str = "I like bananas 'I Love banana' somthing 'I like banana' I love banana ' I like babana again' "
pattern = r"('(.+?)')"
replaced_str = ''
quoted_strings = re.compile(pattern)
newstring = "foo"
x_start = 0
print("original_str = (", original_str+")\n")
for m in quoted_strings.finditer(original_str):
print(m.span(), m.group())
x_end, x_next = m.span()
w = original_str[x_start:x_end]
w = w.replace("banana", "apple")
replaced_str = replaced_str + w + original_str[x_end:x_next]
x_start = x_next
print(replaced_str)
output :
original_str = ( I like bananas 'I Love banana' somthing 'I like banana' I love banana ' I like babana again' )
(15, 30) 'I Love banana'
(42, 57) 'I like banana'
(73, 95) ' I like babana again'
I like apples 'I Love banana' somthing 'I like banana' I love apple ' I like babana again'

As per your update to your requirements in your reply to gnight
a = "I like bananas 'I like \'bananas\' ' "
print (a)
Gives:
I like bananas 'I like 'bananas' '
as the \' gets converted to ' when run,
that is it is the same as
a = "I like bananas 'I like 'bananas' ' "
as gnight says the only real option is to only replace in the first and last sections of the string that arent in quotes, Ie
a = "I like bananas 'I like \'bananas\' ' "
ap = a.split("'")
if len(ap)>0:
ap[0]=ap[0].replace("bananas", "apples")
if len(ap)>1:
ap[-1]=ap[-1].replace("bananas", "apples")
print("'".join(ap))
that gives:
I like apples 'I like 'bananas' '
In the past i have written parsers to handle tripple quote escaping that excel uses and a state machine to track the quote state, not fun to implement if you end up having to do that.If you can give some more examples of desired input an output it may help

Related

How can I find an unknown word after a specific word?

I have this string "Hello, I bought apples, and sold bananas"
how can I get the value of the word after "bought" and the word after "sold" in python??
you can do this:
string = "Hello, I bought apples, and sold bananas"
string = string.replace(",","")
list_word = string.split(" ")
for i in range(len(list_word)):
if list_word[i]=="bought" or list_word[i]=="sold":
print(list_word[i+1])
output:
apples
bananas
There are several ways to do it. One way is to use regular expressions:
import re
s = "Hello, I bought apples, and sold bananas"
re.findall('\W(?:bought|sold)\W+(\w+)', s)

Python: How to slice string using string?

Assuming that the user entered:
"i like eating big apples"
Want to remove "eating" and "apples" together with whatever is in between these two words. Output in this case
"i like"
In another case, if the user entered:
"i like eating apples very much"
Expected output:
"i like very much"
And I want to slice the input starting from "eating" to "apples"
(However, the index cannot be used as you are unsure how long the user is going to type, but it is guaranteed that "eating" and "apples" will be entered)
So, is there any way that we can slide without using the index, instead, we indicate the start and end of the slide with another string?
Slicing a string in python is like this:
mystr = "i like eating big apples"
print(mystr[10:20])
This means between the 10th boundary of characters in the string and the 20th. So it will become: ing big ap.
Now the question is how to find out which index 'eating' starts and 'apple' ends.
Use the .index method to find the beginning of something in a string.
mystr.index('eating') returns 7, so if you print mystr[7:] (which means from the 7th index till the last of the string) you'll have 'eating big apples'.
The second part is a little tricky. If you use mystr.index('apple'), you'll get the beginning of apple, (18), so mystr[7:18] will give you 'eating big '.
In fact you should go some characters further to include the apple word too, which is 5 chars exactly, and this number will be returned by len('apple'). So the final result is:
start = mystr.index('eating')
stop = mystr.index('apple') + len('apple')
print(mystr[start:stop])
You can do the follwoing:
s = "i like eating big apples"
start_ = s.find("eating")
end_ = s.find("apples") + len("apples")
s[start_:end_] # 'eating big apples'
Using find() to find the starting indices of the desired word in the string, and then adjust the start_/end_ to your needs.
To remove the sub string:
s[:start_] + s[end_:] # i like
And for:
s = "i like eating apples very much"
end_ = s.find("apples") + len("apples")
start_ = s.find("eating")
s[:start_] + s[end_:] # 'i like very much'
maybe you can use this:
txt = "Hello, welcome to my world."
x = txt.find("welcome")
print(x)
Which outputs: 7
To find "eating" and "apple"
S = "i like eating big apples"
Index = S.find("eating")
output = S[Index:-1]
Use find() or rfind() method for searching substring's occurrence indices, then paste method's result into slice:
s = "i like eating big apples"
substr = s[s.rfind("eating"):s.rfind("apples")]
You can use str.partition to split string into three parts.
In [112]: s = "i like eating apples very much"
In [113]: h, _, t = s.partition('eating')
In [114]: _, _, t = t.partition('apples')
In [115]: h + t
Out[115]: 'i like very much'
In [116]: s = "i like eating big apples"
In [117]: h, _, t = s.partition('eating')
In [118]: _, _, t = t.partition('apples')
In [119]: h + t
Out[119]: 'i like '

How to use backreferences as index to substitute via list?

I have a list
fruits = ['apple', 'banana', 'cherry']
I like to replace all these elements by their index in the list. I know, that I can go through the list and use replace of a string like
text = "I like to eat apple, but banana are fine too."
for i, fruit in enumerate(fruits):
text = text.replace(fruit, str(i))
How about using regular expression? With \number we can backreference to a match. But
import re
text = "I like to eat apple, but banana are fine too."
text = re.sub('apple|banana|cherry', fruits.index('\1'), text)
doesn't work. I get an error that \x01 is not in fruits. But \1 should refer to 'apple'.
I am interested in the most efficient way to do the replacement, but I also like to understand regex better. How can I get the match string from the backreference in regex.
Thanks a lot.
Using Regex.
Ex:
import re
text = "I like to eat apple, but banana are fine too."
fruits = ['apple', 'banana', 'cherry']
pattern = re.compile("|".join(fruits))
text = pattern.sub(lambda x: str(fruits.index(x.group())), text)
print(text)
Output:
I like to eat 0, but 1 are fine too.

How to return a word in a string if it starts with a certain character? (Python)

I'm building a reddit bot for practice that converts US dollars into other commonly used currencies, and I've managed to get the conversion part working fine, but now I'm a bit stuck trying to pass the characters that directly follow a dollar sign to the converter.
This is sort of how I want it to work:
def run_bot():
subreddit = r.get_subreddit("randomsubreddit")
comments = subreddit.get_comments(limit=25)
for comment in comments:
comment_text = comment.body
#If comment contains a string that starts with '$'
# Pass the rest of the 'word' to a variable
So for example, if it were going over a comment like this:
"I bought a boat for $5000 and it's awesome"
It would assign '5000' to a variable that I would then put through my converter
What would be the best way to do this?
(Hopefully that's enough information to go off, but if people are confused I'll add more)
You could use re.findall function.
>>> import re
>>> re.findall(r'\$(\d+)', "I bought a boat for $5000 and it's awesome")
['5000']
>>> re.findall(r'\$(\d+(?:\.\d+)?)', "I bought two boats for $5000 $5000.45")
['5000', '5000.45']
OR
>>> s = "I bought a boat for $5000 and it's awesome"
>>> [i[1:] for i in s.split() if i.startswith('$')]
['5000']
If you dealing with prices as in float number, you can use this:
import re
s = "I bought a boat for $5000 and it's awesome"
matches = re.findall("\$(\d*\.\d+|\d+)", s)
print(matches) # ['5000']
s2 = "I bought a boat for $5000.52 and it's awesome"
matches = re.findall("\$(\d*\.\d+|\d+)", s2)
print(matches) # ['5000.52']

String splitting issue problem with multiword expressions

I have a series of strings like:
'i would like a blood orange'
I also have a list of strings like:
["blood orange", "loan shark"]
Operating on the string, I want the following list:
["i", "would", "like", "a", "blood orange"]
What is the best way to get the above list? I've been using re throughout my code, but I'm stumped with this issue.
This is a fairly straightforward generator implementation: split the string into words, group together words which form phrases, and yield the results.
(There may be a cleaner way to handle skip, but for some reason I'm drawing a blank.)
def split_with_phrases(sentence, phrase_list):
words = sentence.split(" ")
phrases = set(tuple(s.split(" ")) for s in phrase_list)
print phrases
max_phrase_length = max(len(p) for p in phrases)
# Find a phrase within words starting at the specified index. Return the
# phrase as a tuple, or None if no phrase starts at that index.
def find_phrase(start_idx):
# Iterate backwards, so we'll always find longer phrases before shorter ones.
# Otherwise, if we have a phrase set like "hello world" and "hello world two",
# we'll never match the longer phrase because we'll always match the shorter
# one first.
for phrase_length in xrange(max_phrase_length, 0, -1):
test_word = tuple(words[idx:idx+phrase_length])
if test_word in phrases:
return test_word
return None
skip = 0
for idx in xrange(len(words)):
if skip:
# This word was returned as part of a previous phrase; skip it.
skip -= 1
continue
phrase = find_phrase(idx)
if phrase is not None:
skip = len(phrase)
yield " ".join(phrase)
continue
yield words[idx]
print [s for s in split_with_phrases('i would like a blood orange',
["blood orange", "loan shark"])]
Ah, this is crazy, crude and ugly. But looks like it works. You may wanna clean and optimize it but certain ideas here might work.
list_to_split = ['i would like a blood orange', 'i would like a blood orange ttt blood orange']
input_list = ["blood orange", "loan shark"]
for item in input_list:
for str_lst in list_to_split:
if item in str_lst:
tmp = str_lst.split(item)
lst = []
for itm in tmp:
if itm!= '':
lst.append(itm)
lst.append(item)
print lst
output:
['i would like a ', 'blood orange']
['i would like a ', 'blood orange', ' ttt ', 'blood orange']
One quick and dirty, completely un-optimized approach might be to just replace the compounds in the string with a version including a different separator (preferably one that does not occur anywhere else in your target string or compound words). Then split and replace. A more efficient approach would be to iterate only once through the string, matching the compound words where appropriate - but you may have to watch out for instances where there are nested compounds, etc., depending on your array.
#!/usr/bin/python
import re
my_string = "i would like a blood orange"
compounds = ["blood orange", "loan shark"]
for i in range(0,len(compounds)):
my_string = my_string.replace(compounds[i],compounds[i].replace(" ","&"))
my_segs = re.split(r"\s+",my_string)
for i in range(0,len(my_segs)):
my_segs[i] = my_segs[i].replace("&"," ")
print my_segs
Edit: Glenn Maynard's solution is better.

Categories