Regex to delete multi line content between two specific words - python

I have multiple instances of Fortran subroutines within a text file like the following:
SUBROUTINE ABCDEF(STRING1)
STRING2
STRING3
.
.
.
STRINGN
END
How can I delete the subroutines with their content in python using regex?
I have already tried this piece of code without success:
with open(input, 'r') as file:
output = open(stripped, 'w')
try:
for line in file:
result = re.sub(r"(?s)SUBROUTINE [A-Z]{6}(.*?)\bEND\b", input)
output.write("\n")
finally:
output.close()

Does this work? I replaced input with input_file as input is a builtin function, so it's bad practice to use it.
pattern = r"(?s)SUBROUTINE [A-Z]{6}(.*?)\bEND\b"
regex = re.compile(pattern, re.MULTILINE|re.DOTALL)
with open(input_file, 'r') as file:
with open(stripped, 'w') as output_file:
result = regex.sub('', file.read())
output_file.write(result)

Related

Replace incorrect urls in text file and fix them in Python

I'm getting URLS with removed forward-lashes and I basically need to correct the urls inside of a text file.
The URLs in the file look like this:
https:www.ebay.co.ukitmReds-Challenge-184-214-Holo-Shiny-Rare-Pokemon-Card-SM-Unbroken-Bonds-Rare124315281970?hash=item1cf1c4aa32%3Ag%3AXBAAAOSwJGRfSGI1&LH_BIN=1
I need to correct it to:
https://www.ebay.co.uk/itm/Reds-Challenge-184-214-Holo-Shiny-Rare-Pokemon-Card-SM-Unbroken-Bonds-Rare/124315281970?hash=item1cf1c4aa32%3Ag%3AXBAAAOSwJGRfSGI1&LH_BIN=1
So basically I need a regex or another way that will edit in those forwardslashes to each URL within the file and replace and the broken URLs in the file.
while True:
import time
import re
#input file
fin = open("ebay2.csv", "rt")
#output file to write the result to
fout = open("out.txt", "wt")
#for each line in the input file
for line in fin:
#read replace the string and write to output file
fout.write(line.replace('https://www.ebay.co.uk/sch/', 'https://').replace('itm', '/itm/').replace('https:www.ebay','https://www.ebay'))
with open('out.txt') as f:
regex = r"\d{12}"
subst = "/\\g<0>"
for l in f:
result = re.sub(regex, subst, l, 0, re.MULTILINE)
if result:
print(result)
fin.close()
fout.close()
time.sleep(1)
I eventually came up with this. It's a bit clumsy but it does the job fast enough.

File parsing with python

How to replace a specific line or a string with another string on a text file with Python? I tried this method:
with open("textfile.txt","r") as f:
newline = []
for word in f.readlines():
newline.append(word.replace("previous_line","new_line"))
Try below:
with open("test.txt", "r+") as f:
data=f.read()
Now we have all the contents of a file in variable data , now assuming each line is ending with a \n . We can split it .
mylines = data.splitlines()
for x in range(1,len(mylines)-1):
if myline[x]=='thanks for help':
myline[x] = myline[x-1]
data = "\\n".join(mylines)
f.write(data)
of course, if the line replacing the thanks for help is not the previous line, u may need to create some logic there.

Send keylogger log files to e-mail [duplicate]

I have a text file that looks like:
ABC
DEF
How can I read the file into a single-line string without newlines, in this case creating a string 'ABCDEF'?
For reading the file into a list of lines, but removing the trailing newline character from each line, see How to read a file without newlines?.
You could use:
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
Or if the file content is guaranteed to be one-line
with open('data.txt', 'r') as file:
data = file.read().rstrip()
In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:
from pathlib import Path
txt = Path('data.txt').read_text()
and then you can use str.replace to remove the newlines:
txt = txt.replace('\n', '')
You can read from a file in one line:
str = open('very_Important.txt', 'r').read()
Please note that this does not close the file explicitly.
CPython will close the file when it exits as part of the garbage collection.
But other python implementations won't. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951
To join all lines into a string and remove new lines, I normally use :
with open('t.txt') as f:
s = " ".join([l.rstrip("\n") for l in f])
with open("data.txt") as myfile:
data="".join(line.rstrip() for line in myfile)
join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.
This can be done using the read() method :
text_as_string = open('Your_Text_File.txt', 'r').read()
Or as the default mode itself is 'r' (read) so simply use,
text_as_string = open('Your_Text_File.txt').read()
I'm surprised nobody mentioned splitlines() yet.
with open ("data.txt", "r") as myfile:
data = myfile.read().splitlines()
Variable data is now a list that looks like this when printed:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
Note there are no newlines (\n).
At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:
for line in data:
print(line)
It's hard to tell exactly what you're after, but something like this should get you started:
with open ("data.txt", "r") as myfile:
data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.
with open("myfile.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
Here are four codes for you to choose one:
with open("my_text_file.txt", "r") as file:
data = file.read().replace("\n", "")
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().split("\n"))
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().splitlines())
or
with open("my_text_file.txt", "r") as file:
data = "".join([line for line in file])
you can compress this into one into two lines of code!!!
content = open('filepath','r').read().replace('\n',' ')
print(content)
if your file reads:
hello how are you?
who are you?
blank blank
python output
hello how are you? who are you? blank blank
You can also strip each line and concatenate into a final string.
myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
data = data + line.strip();
This would also work out just fine.
This is a one line, copy-pasteable solution that also closes the file object:
_ = open('data.txt', 'r'); data = _.read(); _.close()
f = open('data.txt','r')
string = ""
while 1:
line = f.readline()
if not line:break
string += line
f.close()
print(string)
python3: Google "list comprehension" if the square bracket syntax is new to you.
with open('data.txt') as f:
lines = [ line.strip('\n') for line in list(f) ]
Oneliner:
List: "".join([line.rstrip('\n') for line in open('file.txt')])
Generator: "".join((line.rstrip('\n') for line in open('file.txt')))
List is faster than generator but heavier on memory. Generators are slower than lists and is lighter for memory like iterating over lines. In case of "".join(), I think both should work well. .join() function should be removed to get list or generator respectively.
Note: close() / closing of file descriptor probably not needed
Have you tried this?
x = "yourfilename.txt"
y = open(x, 'r').read()
print(y)
To remove line breaks using Python you can use replace function of a string.
This example removes all 3 types of line breaks:
my_string = open('lala.json').read()
print(my_string)
my_string = my_string.replace("\r","").replace("\n","")
print(my_string)
Example file is:
{
"lala": "lulu",
"foo": "bar"
}
You can try it using this replay scenario:
https://repl.it/repls/AnnualJointHardware
I don't feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with '' you ended up creating a list. If you have a variable of x and print it out just by
x
or print(x)
or str(x)
You will see the entire list with the brackets. If you call each element of the (array of sorts)
x[0]
then it omits the brackets. If you use the str() function you will see just the data and not the '' either.
str(x[0])
Maybe you could try this? I use this in my programs.
Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()
Regular expression works too:
import re
with open("depression.txt") as f:
l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]
print (l)
['I', 'feel', 'empty', 'and', 'dead', 'inside']
with open('data.txt', 'r') as file:
data = [line.strip('\n') for line in file.readlines()]
data = ''.join(data)
from pathlib import Path
line_lst = Path("to/the/file.txt").read_text().splitlines()
Is the best way to get all the lines of a file, the '\n' are already stripped by the splitlines() (which smartly recognize win/mac/unix lines types).
But if nonetheless you want to strip each lines:
line_lst = [line.strip() for line in txt = Path("to/the/file.txt").read_text().splitlines()]
strip() was just a useful exemple, but you can process your line as you please.
At the end, you just want concatenated text ?
txt = ''.join(Path("to/the/file.txt").read_text().splitlines())
This works:
Change your file to:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
Then:
file = open("file.txt")
line = file.read()
words = line.split()
This creates a list named words that equals:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
That got rid of the "\n". To answer the part about the brackets getting in your way, just do this:
for word in words: # Assuming words is the list above
print word # Prints each word in file on a different line
Or:
print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space
This returns:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
with open(player_name, 'r') as myfile:
data=myfile.readline()
list=data.split(" ")
word=list[0]
This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.
Than you can easily access any word, or even store it in a string.
You can also do the same thing with using a for loop.
file = open("myfile.txt", "r")
lines = file.readlines()
str = '' #string declaration
for i in range(len(lines)):
str += lines[i].rstrip('\n') + ' '
print str
Try the following:
with open('data.txt', 'r') as myfile:
data = myfile.read()
sentences = data.split('\\n')
for sentence in sentences:
print(sentence)
Caution: It does not remove the \n. It is just for viewing the text as if there were no \n

Multiple string replacements on a 100mb file in python 2.6

I have a large 100mb file which I would like to perform about 5000 string replacements on it, what is the most efficient way of achieving this?
Is there no better way then reading the file line by line and performing the 5000 replacements on each line?
I also tried reading the file as a string using the .read method when opening the file and performing the 5000 replacements on the string, but this is even slower since it makes 5000 copies of the whole file.
This script has to run on windows using python 2.6
Thanks in advance
Try the following, in this order, until you get one that is fast enough.
Read the file into a large string and do each replacement in turn, overwriting the same variable.
with open(..., 'w') as f:
s = f.read()
for src, dest in replacements:
s = s.replace(src, dest)
f.seek(0)
f.write(s)
Memory map the file, and write a custom replacement function that does the replacements.
I suggest, instead of doing 5000 searches, do one search for 5000 items:
import re
replacements = {
"Abc-2454": "Gb-43",
"This": "that",
"you": "me"
}
pat = re.compile('(' + '|'.join(re.escape(key) for key in replacements.iterkeys()) + ')')
repl = lambda match: replacements[match.group(0)]
You can now apply re.sub either to the entire file,
with open("input.txt") as inf:
s = inf.read()
s = pat.sub(repl, s)
with open("result.txt") as outf:
outf.write(s)
or line-by-line,
with open("input.txt") as inf, open("result.txt") as outf:
outf.writelines(pat.sub(repl, line) for line in inf)
You should read in the text using open() and read(), and then use (compiled) regular expressions to do the string replacement. A short example:
import re
# read data
f = open("file.txt", "r")
txt = f.read()
f.close()
# list of patterns and what to replace them with
xs = [("foo","bar"), ("baz","foo")]
# do replacements
for (x,y) in xs:
regexp = re.compile(x)
txt = regexp.sub(y, txt)
# write back data
f = open("file.txt", "w")
f.write(txt)
f.close()

Python regex findall into output file

i got an inputfile which contains a javascript code which contains many five-figure ids. I want to have these ids in a list like:
53231,53891,72829 etc
This is my actual python file:
import re
fobj = open("input.txt", "r")
text = fobj.read()
output = re.findall(r'[0-9][0-9][0-9][0-9][0-9]' ,text)
outp = open("output.txt", "w")
How can i get these ids in the output file like i want it?
Thanks
import re
# Use "with" so the file will automatically be closed
with open("input.txt", "r") as fobj:
text = fobj.read()
# Use word boundary anchors (\b) so only five-digit numbers are matched.
# Otherwise, 123456 would also be matched (and the match result would be 12345)!
output = re.findall(r'\b\d{5}\b', text)
# Join the matches together
out_str = ",".join(output)
# Write them to a file, again using "with" so the file will be closed.
with open("output.txt", "w") as outp:
outp.write(out_str)

Categories