I have a variable device='A/B/C/X1' that is commented out in another file. There can be multiple instances of the same device such as 'A/B/C/X1#1', ..#2 and so on. All of these devices are commented out in another file with a prefix *.
I want to remove the * but not affect similar devices like 'A/B/C/X**10**'.
I tried using regex to simply substitute a pattern using the following line of code, but I'm getting an InvalidExpression error.
line=re.sub('^*'+device+'#',device+'#',line)
Please help.
You need to escape the asterisk since it has a meaning in regex syntax:
line=re.sub(r'^\*'+device+'#',device+'#',line).
Escaping the variables you use to construct the regex is also always a good idea:
line=re.sub(r'^\*'+re.escape(device)+'#',device+'#',line)
def replace_all(text):
if text in ['*']:
text = text.replace(text, '')
return text
my_text = 'adsada*asd*****dsa*****'
result = "".join(map(replace_all, my_text))
print result
Or
import re
my_text = 'adsada*asd*****dsa*****'
print (re.sub('\*', '', my_text))
Related
I have a string:
s3://tester/test.pdf
I want to exclude s3://tester/ so even if i have s3://tester/folder/anotherone/test.pdf I am getting the entire path after s3://tester/
I have attempted to use the split & partition method but I can't seem to get it.
Currently am trying:
string.partition('/')[3]
But i get an error saying that it out of index.
EDIT: I should have specified that the name of the bucket will not always be the same so I want to make sure that it is only grabbing anything after the 3rd '/'.
You can use str.split():
path = 's3://tester/test.pdf'
print(path.split('/', 3)[-1])
Output:
test.pdf
UPDATE: With regex:
import re
path = 's3://tester/test.pdf'
print(re.split('/',path,3)[-1])
Output:
test.pdf
Have you tried .replace?
You could do:
string = "s3://tester/test.pdf"
string = string.replace("s3://tester/", "")
print(string)
This will replace "s3://tester/" with the empty string ""
Alternatively, you could use .split rather than .partition
You could also try:
string = "s3://tester/test.pdf"
string = "/".join(string.split("/")[3:])
print(string)
To answer "How to get everything after x amount of characters in python"
string[x:]
PLEASE SEE UPDATE
ORIGINAL
Using the builtin re module.
p = re.search(r'(?<=s3:\/\/tester\/).+', s).group()
The pattern uses a lookbehind to skip over the part you wish to ignore and matches any and all characters following it until the entire string is consumed, returning the matched group to the p variable for further processing.
This code will work for any length path following the explicit s3://tester/ schema you provided in your question.
UPDATE
Just saw updates duh.
Got the wrong end of the stick on this one, my bad.
Below re method should work no matter S3 variable, returning all after third / in string.
p = ''.join(re.findall(r'\/[^\/]+', s)[1:])[1:]
I need to find a string in a file and reformat it.
String to find format:
[title](link)
example:
[template application](https://stackoverflow.com/sample/base-app)
I would like to change that to HTML link:
title
example:
template application
What is the best way to do it?
I am thinking regular expressions but I have no clue to how achieve that. Is there a simple way?
You could take advantage of the fact that the sub function receives another function as a parameter for replacement:
import re
line = '[template application](https://stackoverflow.com/sample/base-app)'
def repl(match):
return '{}'.format(match.group(2), match.group(1))
result = re.sub('\[(.+?)\]\((https?.+?)\)', repl, line)
print(result)
Output
template application
The pattern '\[(.+?)\]\((https?.+?)\)' captures everything between brackets followed by something link like (starts with http), notice that you must escape brackets and parenthesis because they have a special meaning inside a regex.
Or as suggested by #JonClements, you could use:
re.sub('\[(.+?)\]\((https?.+?)\)', r'\1', line)
instead of the repl function.
You can reach your desired result using re as Daniel said, But if you dont want to use regex, you can do this with str.split:
line = '[template application](https://stackoverflow.com/sample/base-app)'
link = line.split('(')[1][:-1]
title = line.split(']')[0][1:]
result = '{}'.format(link, title)
if you are using Python 3.6 or higher:
result = f'{title}'
Please help with my regex problem
Here is my string
source="http://www.amazon.com/ref=s9_hps_bw_g200_t2?pf_rd_m=ATVPDKIKX0DER&pf_rd_i=3421"
source_resource="pf_rd_m=ATVPDKIKX0DER"
The source_resource is in the source may end with & or with .[for example].
So far,
regex = re.compile("pf_rd_m=ATVPDKIKX0DER+[&.]")
regex.findall(source)
[u'pf_rd_m=ATVPDKIKX0DER&']
I have used the text here. Rather using text, how can i use source_resource variable with & or . to find this out.
If the goal is to extract the pf_rd_m value (which it apparently is as you are using regex.findall), than I'm not sure regex are the easiest solution here:
>>> import urlparse
>>> qs = urlparse.urlparse(source).query
>>> urlparse.parse_qs(qs)
{'pf_rd_m': ['ATVPDKIKX0DER'], 'pf_rd_i': ['3421']}
>>> urlparse.parse_qs(qs)['pf_rd_m']
['ATVPDKIKX0DER']
You also have to escape the .
pattern=re.compile(source_resource + '[&\.]')
You can just build the string for the regular expression like a normal string, utilizing all string-formatting options available in Python:
import re
source_and="http://rads.stackoverflow.com/amzn/click/B0030DI8NA/pf_rd_m=ATVPDKIKX0DER&"
source_dot="http://rads.stackoverflow.com/amzn/click/B0030DI8NA/pf_rd_m=ATVPDKIKX0DER."
source_resource="pf_rd_m=ATVPDKIKX0DER"
regex_string = source_resource + "[&\.]"
regex = re.compile(regex_string)
print regex.findall(source_and)
print regex.findall(source_dot)
>>> ['pf_rd_m=ATVPDKIKX0DER&']
['pf_rd_m=ATVPDKIKX0DER.']
I hope this is what you mean.
Just take note that I modified your regular expression: the . is a special symbol and needs to be escaped, as is the + (I just assumed the string will only occur once, which makes the use of + unnecessary).
I am trying to use re.findall with look-behind and look-forward to extract data. The regular expression works fine when I am not using a raw_input variable, but I need users to be able to input a variety of different search terms.
Here is the current code:
me = re.findall(r"(?<='(.+)'+variable+'(.+)')(.*?)(?='(.+)+variable+(.+)')", raw)
As you can see, I am trying to pull out strings between one search term.
However, each time I use this type of formatting, I get a fixed width error. Is there anyway around this?
I have also tried the following formats with no success.
variable = raw_input('Term? ')
'.*' + variable + '.*'
and
'.*%s.*' % (variable, )
and
'.*{0}.*'.format(variable)
and
'.*{variable}.*'.format(variable=variable)
I'm not sure if this is what you mean, but it may get you started. As far as I understand your question, you don't need lookaheads or lookbehinds. This is for Python 2.x (won't work with Python 3):
>>> import re
>>> string_to_search = 'fish, hook, swallowed, reeled, boat, fish'
>>> entered_by_user = 'fish'
>>> search_regex = r"{0}(.+){0}".format(entered_by_user)
>>> match = re.search(search_regex, string_to_search)
>>> if match:
... print "result:", match.group(1).strip(' ,')
...
result: hook, swallowed, reeled, boat
If you really want the last 'fish' in the result as in your comment above, then just remove the second {0} from the format() string.
This solution should work:
me = re.findall(rf"(?<='(.+)'+{variable}+'(.+)')(.*?)(?='(.+)+{variable}+(.+)')", raw)
You also can add many different variables as you wish.
Add rf for the regular expression and the desired variables in between {}
import re
text = "regex is the best"
var1 = "is the"
var2 = "best"
yes = re.findall(rf"regex {var1} {var2}", text)
print(yes)
['regex is the best']
The way lookbehind is usually implemented (including its Python implementation) has an inherent limitation that you are unfortunately running into: lookbehinds cannot be variable-length. The "Important Notes About Lookbehind" section here explains why. I think you should be able to do the regex without a lookbehind, though.
I have an XML in which I'd like to rename one of the tag groups like this:
<string>ABC</string>
<string>unknown string</string>
should be
<xyz>ABC</xyz>
<xyz>unknown string</xyz>
ABC is always the same, so that's no issue. However, "unknown string" is always different, but since I need this information extracted, I also want to keep the same string in the replacement.
Here's what I got so far:
import re
#open the xml file for reading:
file = open('path/file','r+')
#convert to string:
data = file.read()
file.write(re.sub("<string>ABC</string>(\s+)<string>(.*)</string>","<xyz>ABC</xyz>[\1]<xyz>[\2]</xyz>",data))
print (data)
file.close()
I tried to use capture groups, but didn't do it correctly. The string is replaced with weird symbols in my XML. Plus, it's printed twice. I have both the unchanged and the changed version in my XML, which I don't want.
The problem you're experiencing is not due to your regex pattern. The backslash (\) in the strings are escaping proceeding characters thus resulting in the weird symbols that you see.
>>> print "hello\1world"
helloworld
>>> print r"hello\1world"
hello\1world
Always use the raw string notation to define your re patterns.
>>> data = """
... <string>ABC</string>
... <string>unknown string</string>
... """
>>> print re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data)
<xyz>ABC</xyz>
<xyz>unknown string</xyz>
Why are you including the content in your replacement operation? All you need to do is:
Replace <string> by <xyz>.
Replace </string> by </xyz>.
It would take two operations but the intent of your code would be clear and you don't need to know what unknown string is.