I'm working on a python script that will allow me to remove some attributes from a function call in a Java class. The problem is I can't find the right regex to include both the name of the attribute and the brackets.
The string I'm looking to remove is, as an example, 'withContentDescription("random text")'
What is the correct way to include the () brackets and the random content of those within my code?
import re
filein = '/path/file.java'
fileout = '/path/newfile.java'
f = open(filein,'r')
filedata = f.read()
f.close()
print("Removing Content Descriptor")
newdata = filedata.strip("withContentDescription\)")
f = open(fileout,'w')
f.write(newdata)
print("--- Done")
f.close()
I'd like to obtain something like
old string: allOf(withId(someinfo), withContentDescription("Text"))
new string: allOf(withId(someinfo))
Using re.sub
Ex:
import re
s = 'allOf(withId(someinfo), withContentDescription("Text"))'
print(re.sub(r",\s*(withContentDescription\(.*?\))", "", s))
Output:
allOf(withId(someinfo))
Related
I got a csv file 'svclist.csv' which contains a single column list as follows:
pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1
pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs
I need to strip each line from everything except the PL5 directoy and the 2 numbers in the last directory
and should look like that
PL5,00
PL5,01
I started the code as follow:
clean_data = []
with open('svclist.csv', 'rt') as f:
for line in f:
if line.__contains__('profile'):
print(line, end='')
and I'm stuck here.
Thanks in advance for the help.
you can use the regular expression - (PL5)[^/].{0,}([0-9]{2,2})
For explanation, just copy the regex and paste it here - 'https://regexr.com'. This will explain how the regex is working and you can make the required changes.
import re
test_string_list = ['pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1',
'pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs']
regex = re.compile("(PL5)[^/].{0,}([0-9]{2,2})")
result = []
for test_string in test_string_list:
matchArray = regex.findall(test_string)
result.append(matchArray[0])
with open('outfile.txt', 'w') as f:
for row in result:
f.write(f'{str(row)[1:-1]}\n')
In the above code, I've created one empty list to hold the tuples. Then, I'm writing to the file. I need to remove the () at the start and end. This can be done via str(row)[1:-1] this will slice the string.
Then, I'm using formatted string to write content into 'outfile.csv'
You can use regex for this, (in general, when trying to extract a pattern this might be a good option)
import re
pattern = r"pf=/usr/sap/PL5/SYS/profile/PL5_.*(\d{2})"
with open('svclist.csv', 'rt') as f:
for line in f:
if 'profile' in line:
last_two_numbers = pattern.findall(line)[0]
print(f'PL5,{last_two_numbers}')
This code goes over each line, checks if "profile" is in the line (this is the same as _contains_), then extracts the last two digits according to the pattern
I made the assumption that the number is always between the two underscores. You could run something similar to this within your for-loop.
test_str = "pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1"
test_list = test_str.split("_") # splits the string at the underscores
output = test_list[1].strip(
"abcdefghijklmnopqrstuvwxyz" + str.swapcase("abcdefghijklmnopqrstuvwxyz")) # removing any character
try:
int(output) # testing if the any special characters are left
print(f"PL5, {output}")
except ValueError:
print(f'Something went wrong! Output is PL5,{output}')
I have several blocks of text that look like this:
steps:
- class: pipe.steps.extract.Extract
conf:
unzip_patterns:
- .*EstimatesDaily_RealEstate_Q.*_{FD_YYYYMMDD}.*
id: extract
- class: pipe.steps.validate.Validate
conf:
schema_def:
fields:
I want to replace this block of text with this:
global:
global:
schema_def:
fields:
The catch here is that the text crosses several lines in each text file. Maybe there is an easy workaround for this, not sure. More troublesome, is that is don't always have '- .*EstimatesDaily_RealEstate_Q.*_{FD_YYYYMMDD}.*'. Sometimes the text is '- .*EstimatesDaily_RealEstate_Y.*_{FD_YYYYMMDD}.*' or it could be '- .*EstimatesDaily_RealEstate_EAP_Nav.*_{FD_YYYYMMDD}.*' One thng that is always the same in each block is that it starts with this ' steps:' and ends with this ' fields:'.
My sample code looks like this:
import glob
import re
path = 'C:\\Users\\ryans\\OneDrive\\Desktop\\output\\*.yaml'
regex = re.compile("steps:.*fields:", re.DOTALL)
print(regex)
replace = """global:
global:
schema_def:
fields:"""
for fname in glob.glob(path):
#print(str(fname))
with open(fname, 'r+') as f:
text = re.sub(regex, replace, '')
f.seek(0)
f.write(text)
f.truncate()
Of course, my example isn't simple.
Since you're doing a general replacement of things between strings, I'd say this calls for a regular expression [EDIT: Sorry, I see you've since replaced your string "replace" statements with regexp code]. So if your file is "myfile.txt", try this:
>>> import re
>>> f = open('myfile.txt', 'r')
>>> content = f.read()
>>> f.close()
>>> replacement = ' global:\n global:\n schema_def:\n fields:'
>>> print re.sub(r"(\ssteps\:)(.*?)(\sfields\:)", replacement, content, flags=re.DOTALL)
The output here should be the original contents of "myfile.txt" with all of the substitutions.
Instead of editing files directly, the usual convention in Python is to just copy what you need from a file, change it, and write everything back to a new file. It's less error prone this way, and should be fine unless you're dealing with an astronomically huge amount of content. So you could replace the last line I have here with something like this:
>>> newcontent = re.sub(r"(\ssteps\:)(.*?)(\sfields\:)", replacement, content, flags=re.DOTALL)
>>> f = open('newfile.txt', 'w')
>>> f.write(newcontent)
>>> f.close()
Regex is the best answer here probably. Will make this simple. Your mileage will vary with my example regex. Make it as tight as you need to make sure you only replace what you need to and dont get false positives.
import re
#re.DOTALL means it matches across newlines!
regex = re.compile("steps:.*?fields:", flags=re.DOTALL, count=1)
replace = """global:
global:
schema_def:
fields:"""
def do_replace(fname):
with open(fname, 'r') as f:
in = f.read()
with open(fname, 'w') as f:
f.write(re.sub(regex, replace, in))
for fname in glob.glob(path):
print(str(fname))
do_replace(fname)
This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 4 years ago.
I am a total newb to python. I pieced together a code that works great except when I have brackets in the string for the find_str variable. I tried using double brackets but it doesn't work either.
The goal is to replace all text in a list of CSV's that contain _(FAIL)_ with SUCCESS.
Here is my code:
import glob
import re
filenames = sorted(glob.glob('*.csv'))
filenames = filenames
for f2 in filenames:
csv_name=f2
# open your csv and read as a text string
with open(csv_name, 'r') as f:
my_csv_text = f.read()
find_str = "_(FAIL)_"
replace_str = "SUCCESS"
# substitute
new_csv_str = re.sub(find_str, replace_str, my_csv_text)
# open new file and save
new_csv_path = csv_name
with open(new_csv_path, 'w') as f:
f.write(new_csv_str)
No need to use a regular expression and re.sub() for this, str.replace() will do the job:
find_str = "_(FAIL)_"
replace_str = "SUCCESS"
my_csv_text = 'We go through life and _(FAIL)_ and _(FAIL)_ and _(FAIL)_'
new_csv_str = my_csv_text.replace(find_str, replace_str)
print(new_csv_str)
Gives:
We go through life and SUCCESS and SUCCESS and SUCCESS
Look like you need to escape the brackets
Try:
find_str = "_\(FAIL\)_"
replace_str = "SUCCESS"
# substitute
new_csv_str = re.sub(find_str, replace_str, my_csv_text)
I am a beginner and I have an issue with a short code. I want to replace a string from a csv to with another string, and put out a new
csv with an new name. The strings are separated with commas.
My code is a catastrophe:
import csv
f = open('C:\\User\\Desktop\\Replace_Test\\Testreplace.csv')
csv_f = csv.reader(f)
g = open('C:\\Users\\Desktop\\Replace_Test\\Testreplace.csv')
csv_g = csv.writer(g)
findlist = ['The String, that should replaced']
replacelist = ['The string that should replace the old striong']
#the function ?:
def findReplace(find,replace):
s = f.read()
for item, replacement in zip(findlist,replacelist):
s = s.replace(item,replacement)
g.write(s)
for row in csv_f:
print(row)
f.close()
g.close()
You can do this with the regex package re. Also, if you use with you don't have to remember to close your files, which helps me.
EDIT: Keep in mind that this matches the exact string, meaning it's also case-sensitive. If you don't want that then you probably need to use an actual regex to find the strings that need replacing. You would do this by replacing find_str in the re.sub() call with r'your_regex_here'.
import re
# open your csv and read as a text string
with open(my_csv_path, 'r') as f:
my_csv_text = f.read()
find_str = 'The String, that should replaced'
replace_str = 'The string that should replace the old striong'
# substitute
new_csv_str = re.sub(find_str, replace_str, my_csv_text)
# open new file and save
new_csv_path = './my_new_csv.csv' # or whatever path and name you want
with open(new_csv_path, 'w') as f:
f.write(new_csv_str)
I have an input file in the below format:
<ftnt>
<p><su>1</su> aaaaaaaaaaa </p>
</ftnt>
...........
...........
...........
... the <su>1</su> is availabe in the .........
I need to convert this to the below format by replacing the value and deleting the whole data in ftnt tags:
"""...
...
... the aaaaaaaaaaa is available in the ..........."""
Please find the code which i have written. Initially i saved the keys & values in dictionary and tried to replace the value based on the key using grouping.
import re
dict = {}
in_file = open("in.txt", "r")
outfile = open("out.txt", "w")
File1 = in_file.read()
infile1 = File1.replace("\n", " ")
for mo in re.finditer(r'<p><su>(\d+)</su>(.*?)</p>',infile1):
dict[mo.group(1)] = mo.group(2)
subval = re.sub(r'<p><su>(\d+)</su>(.*?)</p>','',infile1)
subval = re.sub('<su>(\d+)</su>',dict[\\1], subval)
outfile.write(subval)
I tried to use dictionary in re.sub but I am getting a KeyError. I don't know why this happens could you please tell me how to use. I'd appreciate any help here.
Try using a lambda for the second argument to re.sub, rather than a string with backreferences:
subval = re.sub('<su>(\d+)</su>',lambda m:dict[m.group(1)], subval)
First off, don't name dictionaries dict or you'll destroy the dict function. Second, \\1 doesn't work outside of a string hence the syntax error. I think the best bet is to take advantage of str.format
import re
# store the substitutions
subs = {}
# read the data
in_file = open("in.txt", "r")
contents = in_file.read().replace("\n", " ")
in_file.close()
# save some regexes for later
ftnt_tag = re.compile(r'<ftnt>.*</ftnt>')
var_tag = re.compile(r'<p><su>(\d+)</su>(.*?)</p>')
# pull the ftnt tag out
ftnt = ftnt_tag.findall(contents)[0]
contents = ftnt_tag.sub('', contents)
# pull the su
for match in var_tag.finditer(ftnt):
# added s so they aren't numbers, useful for format
subs["s" + match.group(1)] = match.group(2)
# replace <su>1</su> with {s1}
contents = re.sub(r"<su>(\d+)</su>", r"{s\1}", contents)
# now that the <su> are the keys, we can just use str.format
out_file = open("out.txt", "w")
out_file.write( contents.format(**subs) )
out_file.close()