i got an inputfile which contains a javascript code which contains many five-figure ids. I want to have these ids in a list like:
53231,53891,72829 etc
This is my actual python file:
import re
fobj = open("input.txt", "r")
text = fobj.read()
output = re.findall(r'[0-9][0-9][0-9][0-9][0-9]' ,text)
outp = open("output.txt", "w")
How can i get these ids in the output file like i want it?
Thanks
import re
# Use "with" so the file will automatically be closed
with open("input.txt", "r") as fobj:
text = fobj.read()
# Use word boundary anchors (\b) so only five-digit numbers are matched.
# Otherwise, 123456 would also be matched (and the match result would be 12345)!
output = re.findall(r'\b\d{5}\b', text)
# Join the matches together
out_str = ",".join(output)
# Write them to a file, again using "with" so the file will be closed.
with open("output.txt", "w") as outp:
outp.write(out_str)
Related
I have multiple instances of Fortran subroutines within a text file like the following:
SUBROUTINE ABCDEF(STRING1)
STRING2
STRING3
.
.
.
STRINGN
END
How can I delete the subroutines with their content in python using regex?
I have already tried this piece of code without success:
with open(input, 'r') as file:
output = open(stripped, 'w')
try:
for line in file:
result = re.sub(r"(?s)SUBROUTINE [A-Z]{6}(.*?)\bEND\b", input)
output.write("\n")
finally:
output.close()
Does this work? I replaced input with input_file as input is a builtin function, so it's bad practice to use it.
pattern = r"(?s)SUBROUTINE [A-Z]{6}(.*?)\bEND\b"
regex = re.compile(pattern, re.MULTILINE|re.DOTALL)
with open(input_file, 'r') as file:
with open(stripped, 'w') as output_file:
result = regex.sub('', file.read())
output_file.write(result)
I need to apply the following patterns
regex pattern 1 =>
search1: ^this, replace1: these
regex pattern 2 =>
search2: tests$, replace2: \t tests
regex patterns 3
.....
the following code only executes one search-replace operation.
how to combine multiple search operations? I might need to apply perhaps 10-20 patterns
thank you
import re
fin = open("data.txt", "r")
fout = open("data2.txt", "w")
for line in fin:
pattern1= re.sub('test\.', 'tests',line)
fout.write(pattern2)
Idk what did you mean by putting pattern1 and pattern2, here's my solution with what I understood :
Simply put then in a list and iterate through ?
import re
patterns = [('^this', 'these'), ('tests$', 'tests')]
fin = open("data.txt", "r")
fout = open("data2.txt", "w")
for line in fin:
for regex, replace in patterns:
line = re.sub(regex, replace, line)
fout.write(line)
You can put your regex patterns and replacement in one dictionary and use for loop on it.
check below example:
import re
patterns = {"pattern1":"replacetxt"}
fin = open("data.txt", "r")
fout = open("data2.txt", "w")
for line in fin:
for patt, replace in patterns.items():
line = re.sub(patt, replace, line)
fout.write(line)
I'm getting URLS with removed forward-lashes and I basically need to correct the urls inside of a text file.
The URLs in the file look like this:
https:www.ebay.co.ukitmReds-Challenge-184-214-Holo-Shiny-Rare-Pokemon-Card-SM-Unbroken-Bonds-Rare124315281970?hash=item1cf1c4aa32%3Ag%3AXBAAAOSwJGRfSGI1&LH_BIN=1
I need to correct it to:
https://www.ebay.co.uk/itm/Reds-Challenge-184-214-Holo-Shiny-Rare-Pokemon-Card-SM-Unbroken-Bonds-Rare/124315281970?hash=item1cf1c4aa32%3Ag%3AXBAAAOSwJGRfSGI1&LH_BIN=1
So basically I need a regex or another way that will edit in those forwardslashes to each URL within the file and replace and the broken URLs in the file.
while True:
import time
import re
#input file
fin = open("ebay2.csv", "rt")
#output file to write the result to
fout = open("out.txt", "wt")
#for each line in the input file
for line in fin:
#read replace the string and write to output file
fout.write(line.replace('https://www.ebay.co.uk/sch/', 'https://').replace('itm', '/itm/').replace('https:www.ebay','https://www.ebay'))
with open('out.txt') as f:
regex = r"\d{12}"
subst = "/\\g<0>"
for l in f:
result = re.sub(regex, subst, l, 0, re.MULTILINE)
if result:
print(result)
fin.close()
fout.close()
time.sleep(1)
I eventually came up with this. It's a bit clumsy but it does the job fast enough.
I m trying to open up a text file and look for string Num_row_lables. If the value for Num_row_labels is greater than or equal to 10, then print the name of the file.
In the example below, my text file test.mrk has some text in the format below: P.s., my text file doesn't have Num_row_labels >= 10. It always has "equal to".
Format= { Window_Type="Tabular", Tabular= { Num_row_labels=10 } }
so I created a variable teststring to hold the pattern I will be looking at.
Then I opened the file.
Then using re, I got Num_row_labels=10 in my variable called match.
Using group() on match, I extracted the threshold number I wanted and using int() converted the string to int.
My purpose is to read the text file to find/print the value for Num_row_labels along with the name of file if the text file has Num_row_labels = 10 or any # greater than 10.
Here's my test code:
import os
import os.path
import re
teststring = """Format= { Window_Type="Tabular", Tabular= { Num_row_labels=10 } }"""
fname = "E:\MyUsers\ssbc\test.mrk"
fo = open(fname, "r")
match = re.search('Num_row_labels=(\d+)', teststring)
tnum = int(match.group(1))
if(tnum>=10):
print(fname)
How do I make sure that I m searching the match in the content of opened file and checking the condition for tnum>=10? My test code would simply print the file name only on the basis of last 4 lines. I want to be sure that the search is all over the content of my text file.
so what you want to do is to read out the whole file as a string, and search for your pattern on that string
with open(fname, "r") as fo:
content_as_string = fo.read()
match = re.search('Num_row_labels=(\d+)', content_as_string)
# do want you want to the matchings
Python code to read file content based on condition
file = '../input/testtxt/kaggle.txt'
output = []
with open(file, 'r') as fp:
lines = fp.readlines()
for i in lines:
if('Image for' in i):
output.append(i)
print(output)
When I print the group "print(a)" the entire group is shown. When I save it to a text file "open("sirs1.txt", "w").write(a)" only the last row is saved to the file.
import re
def main():
f = open('sirs.txt')
for lines in f:
match = re.search('(AA|BB|CC|DD)......', lines)
if match:
a = match.group()
print(a)
open("sirs1.txt", "w").write(a)
How do I save the entire group to the text file.
nosklo is correct the main problem is that you are overwriting the whole file each time you write to it. mehmattski is also correct in that you will also need to explicitly add \n to each write in order to make the output file readable.
Try this:
enter code here
import re
def main():
f = open('sirs.txt')
outputfile = open('sirs1.txt','w')
for lines in f:
match = re.search('(AA|BB|CC|DD)......', lines)
if match:
a = match.group()
print(a)
outputfile.write(a+"\n")
f.close()
outputfile.close()
the open command creates a new file, so you're creating a new file every time.
Try to create the file outside the for-loop
import re
def main():
with open('sirs.txt') as f:
with open("sirs1.txt", "w") as fw:
for lines in f:
match = re.search('(AA|BB|CC|DD)......', lines)
if match:
a = match.group()
print(a)
fw.write(a)
You need to add a newline character after each string to get them to print on separate lines:
import re
def main():
f = open('sirs.txt')
outputfile = open('sirs1.txt','w')
for lines in f:
match = re.search('(AA|BB|CC|DD)......', lines)
if match:
a = match.group()
print(a)
outputfile.write(a+'/n')
f.close()
outputfile.close()