Writing to start of gzip file with python

Writing to start of gzip file with python - python

Writing to the start of a txt file can be achieved like this:
with open('foo.txt', 'wt') as outfn:
for i in range(10):
outfn.write('{}\n'.format(i))
with open('foo.txt', 'r+') as fn:
content = fn.read()
fn.seek(0, 0)
fn.write('foo\n{}'.format(content))
However, when I try to write to the start of a gzip file:
import gzip
with gzip.open('foo.txt.gz', 'wt') as outfn:
for i in range(10):
outfn.write('{}\n'.format(i))
with gzip.open('foo.txt.gz', 'r+') as fn:
content = fn.read()
fn.seek(0, 0)
fn.write('foo\n{}'.format(content))
The following error is thrown:
OSError: [Errno 9] write() on read-only GzipFile object
I tried multiple alternatives, but couldn't come up with a decent way to write text to the start of a gzip file.

I don't think that gzip.open has a '+' option the same way a normal file open does. See here: gzip docs
What exactly are you trying to do by writing to the beginning of the file? It may be easier to open the file again and overwrite it.

I have come up with this solution:
import gzip
content = str()
for i in range(10):
content += '{}\n'.format(i)
with gzip.open('foo.txt.gz', 'wt') as outfn:
outfn.write('foo\n{}'.format(content))

Related

os.write() appends file instead of overwriting, but O_APPEND isn't used [duplicate]

I have the following code:
import re
#open the xml file for reading:
file = open('path/test.xml','r+')
#convert to string:
data = file.read()
file.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
file.close()
where I'd like to replace the old content that's in the file with the new content. However, when I execute my code, the file "test.xml" is appended, i.e. I have the old content follwed by the new "replaced" content. What can I do in order to delete the old stuff and only keep the new?

You need seek to the beginning of the file before writing and then use file.truncate() if you want to do inplace replace:
import re
myfile = "path/test.xml"
with open(myfile, "r+") as f:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
f.truncate()
The other way is to read the file then open it again with open(myfile, 'w'):
with open(myfile, "r") as f:
data = f.read()
with open(myfile, "w") as f:
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
Neither truncate nor open(..., 'w') will change the inode number of the file (I tested twice, once with Ubuntu 12.04 NFS and once with ext4).
By the way, this is not really related to Python. The interpreter calls the corresponding low level API. The method truncate() works the same in the C programming language: See http://man7.org/linux/man-pages/man2/truncate.2.html

file='path/test.xml'
with open(file, 'w') as filetowrite:
filetowrite.write('new content')
Open the file in 'w' mode, you will be able to replace its current text save the file with new contents.

Using truncate(), the solution could be
import re
#open the xml file for reading:
with open('path/test.xml','r+') as f:
#convert to string:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
f.truncate()

import os#must import this library
if os.path.exists('TwitterDB.csv'):
os.remove('TwitterDB.csv') #this deletes the file
else:
print("The file does not exist")#add this to prevent errors
I had a similar problem, and instead of overwriting my existing file using the different 'modes', I just deleted the file before using it again, so that it would be as if I was appending to a new file on each run of my code.

See from How to Replace String in File works in a simple way and is an answer that works with replace
fin = open("data.txt", "rt")
fout = open("out.txt", "wt")
for line in fin:
fout.write(line.replace('pyton', 'python'))
fin.close()
fout.close()

in my case the following code did the trick
with open("output.json", "w+") as outfile: #using w+ mode to create file if it not exists. and overwrite the existing content
json.dump(result_plot, outfile)

Using python3 pathlib library:
import re
from pathlib import Path
import shutil
shutil.copy2("/tmp/test.xml", "/tmp/test.xml.bak") # create backup
filepath = Path("/tmp/test.xml")
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))
Similar method using different approach to backups:
from pathlib import Path
filepath = Path("/tmp/test.xml")
filepath.rename(filepath.with_suffix('.bak')) # different approach to backups
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))

Not able to fix file handling issue in python

I wrote python code to search a pattern in a tcl file and replace it with a string, it prints the output but the same is not saved in the tcl file
import re
import fileinput
filename=open("Fdrc.tcl","r+")
for i in filename:
if i.find("set qa_label")!=-1:
print(i)
a=re.sub(r'REL.*','harsh',i)
print(a)
filename.close()
actual result
set qa_label
REL_ts07n0g42p22sadsl01msaA04_2018-09-11-11-01
set qa_label harsh
Expected result is that in my file it should reflect the same result as above but it is not

You need to actually write your changes back to disk if you want to see them affected there. As #ImperishableNight says, you don't want to do this by trying to write to a file you're also reading from...you want to write to a new file. Here's an expanded version of your code that does that:
import re
import fileinput
fin=open("/tmp/Fdrc.tcl")
fout=open("/tmp/FdrcNew.tcl", "w")
for i in fin:
if i.find("set qa_label")!=-1:
print(i)
a=re.sub(r'REL.*','harsh',i)
print(a)
fout.write(a)
else:
fout.write(i)
fin.close()
fout.close()
Input and output file contents:
> cat /tmp/Fdrc.tcl
set qa_label REL_ts07n0g42p22sadsl01msaA04_2018-09-11-11-01
> cat /tmp/FdrcNew.tcl
set qa_label harsh
If you wanted to overwrite the original file, then you would want to read the entire file into memory and close the input file stream, then open the file again for writing, and write modified content to the same file.
Here's a cleaner version of your code that does this...produces an in memory result and then writes that out using a new file handle. I am still writing to a different file here because that's usually what you want to do at least while you're testing your code. You can simply change the name of the second file to match the first and this code will overwrite the original file with the modified content:
import re
lines = []
with open("/tmp/Fdrc.tcl") as fin:
for i in fin:
if i.find("set qa_label")!=-1:
print(i)
i=re.sub(r'REL.*','harsh',i)
print(i)
lines.append(i)
with open("/tmp/FdrcNew.tcl", "w") as fout:
fout.writelines(lines)

Open a tempfile for writing the updated file contents and open the file for writing.
After modifying the lines, write it back in the file.
import re
import fileinput
from tempfile import TemporaryFile
with TemporaryFile() as t:
with open("Fdrc.tcl", "r") as file_reader:
for line in file_reader:
if line.find("set qa_label") != -1:
t.write(
str.encode(
re.sub(r'REL.*', 'harsh', str(line))
)
)
else:
t.write(str.encode(line))
t.seek(0)
with open("Fdrc.tcl", "wb") as file_writer:
file_writer.writelines(t)

Open binary file in zip archive as ZipExtFile

I'm trying to access a binary stream (via a ZipExtFile object) from a data file contained in a Zip archive. To incrementally read in a text file object from the archive, this would be fairly straightforward:
with ziparchive as ZipFile("myziparchive.zip", 'r'):
with txtfile as ziparchive.open("mybigtextfile.txt", 'r'):
for line in txtfile:
....
Ideally the byte stream equivalent would be something like:
with ziparchive as ZipFile("myziparchive.zip", 'r'):
with binfile as ziparchive.open("mybigbinary.bin", 'rb'):
while notEOF
binchunk = binfile.read(MYCHUNKSIZE)
....
Unfortunately, ZipFile.open doesn't seem to support reading binary data to a ZipExtFile object. From the docs:
The mode parameter, if included, must be one of the following: 'r'
(the default), 'U', or 'rU'.
Given this constraint, how best to incrementally read in the binary file directly from the archive? Since the uncompressed file is quite large I'd like to avoid extracting it first.

I managed to solve the issue that I described in my comment to the OP. I have adapted it here, for your purpose, but I think that there is probably a way to just change the encoding of chunk_str, to avoid using ByteIO.
Anyway - here's my code if it helps:
from io import BytesIO
from zipfile import ZipFile
MYCHUNKSIZE = 10
archive_file = r"test_resources\0000232514_bom.zip"
src_file = r"0000232514_bom.xls"
no_of_chunks_to_read = 10
with ZipFile(archive_file,'r') as zf:
with zf.open(src_file) as src_f:
while no_of_chunks_to_read > 0:
chunk_str = src_f.read(MYCHUNKSIZE)
chunk_stream = BytesIO(chunk_str)
chunk_bytes = chunk_stream.read()
print type(chunk_bytes), len(chunk_bytes), chunk_bytes
if len(chunk_str) < MYCHUNKSIZE:
# End of file
break
no_of_chunks_to_read -= 1

For line by line reading:
with ZipFile("myziparchive.zip", 'r') as ziparchive:
with ziparchive.open("mybigtextfile.txt", 'r') as binfile:
for line in binfile:
line = line.decode() # bytes to str
...

How to unzip gz file using Python

I need to extract a gz file that I have downloaded from an FTP site to a local Windows file server. I have the variables set for the local path of the file, and I know it can be used by GZIP muddle.
How can I do this? The file inside the GZ file is an XML file.

import gzip
import shutil
with gzip.open('file.txt.gz', 'rb') as f_in:
with open('file.txt', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)

From the documentation:
import gzip
with gzip.open('file.txt.gz', 'rb') as f:
file_content = f.read()

Maybe you want pass it to pandas also.
with gzip.open('features_train.csv.gz') as f:
features_train = pd.read_csv(f)
features_train.head()

from sh import gunzip
gunzip('/tmp/file1.gz')

Not an exact answer because you're using xml data and there is currently no pd.read_xml() function (as of v0.23.4), but pandas (starting with v0.21.0) can uncompress the file for you! Thanks Wes!
import pandas as pd
import os
fn = '../data/file_to_load.json.gz'
print(os.path.isfile(fn))
df = pd.read_json(fn, lines=True, compression='gzip')
df.tail()

If you are parsing the file after unzipping it, don't forget to use decode() method, is necessary when you open a file as binary.
import gzip
with gzip.open(file.gz, 'rb') as f:
for line in f:
print(line.decode().strip())

It is very simple.. Here you go !!
import gzip
#path_to_file_to_be_extracted
ip = sample.gzip
#output file to be filled
op = open("output_file","w")
with gzip.open(ip,"rb") as ip_byte:
op.write(ip_byte.read().decode("utf-8")
wf.close()

You can use gzip.decompress() to do it:
read input file using rb mode;
open output file using w mode and utf8 encoding;
gzip.decompress() input bytes;
decode what you get to str.
write str to output file.
def decompress(infile, tofile):
with open(infile, 'rb') as inf, open(tofile, 'w', encoding='utf8') as tof:
decom_str = gzip.decompress(inf.read()).decode('utf-8')
tof.write(decom_str)

If you have the gzip (and gunzip) programs installed on your computer a simple way is to call that command from python:
import os
filename = 'file.txt.gz'
os.system('gunzip ' + filename)
optionally, if you want to preserve the original file, use
os.system('gunzip --keep ' + filename)

if you have a linux environment it is very easy to unzip using the command gunzip.
go to the file folder and give as below
gunzip file-name

How can I create files , read and write files in Python?

All the tutorials I can find follow the same format which isn't working.I don't get an error message but I don't get normal output. What I get appears to be the file description at some memory location.
# file_test
ftpr= open("file","w")
ftpr.write("This is a sample line/n")
a=open("file","r")
print a
#This is the result
<open file 'file', mode 'r' at 0x00000000029DDDB0>
>>>

Do you want to read the contents of the file? Try print a.readlines().
Ie:
with open('file', 'w') as f:
f.write("Hello, world!\nGoodbye, world!\n")
with open('file', 'r') as f:
print f.readlines() # ["Hello, world!\n", "Goodbye, world!\n"]
FYI, the with blocks, if you're unfamiliar with them, ensure that the open()-d files are close()-d.

This is not the correct way to read the file. You are printing return value from open call which is object of file type. Do like this for reading and writing.
for writing
f=open("myfile","w")
f.write("hello\n")
f.write("This is a sample line/n")
f.close()
For reading
f=open("file","r")
string = f.read()
print("string")
f.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Writing to start of gzip file with python - python

I don't think that gzip.open has a '+' option the same way a normal file open does. See here: gzip docs What exactly are you trying to do by writing to the beginning of the file? It may be easier to open the file again and overwrite it.

I have come up with this solution: import gzip content = str() for i in range(10): content += '{}\n'.format(i) with gzip.open('foo.txt.gz', 'wt') as outfn: outfn.write('foo\n{}'.format(content))

Related

os.write() appends file instead of overwriting, but O_APPEND isn't used [duplicate]

Not able to fix file handling issue in python

Open binary file in zip archive as ZipExtFile

How to unzip gz file using Python

How can I create files , read and write files in Python?

Categories

Resources