I'm trying to compress a text file into a gzip (.gz) file, in Python, using the following code:
import shutil, gzip
text_file = 'C:\\Users\\lenovo-miguel\\Downloads\\python\\text_file.txt'
gz_file = text_file + '.GZ'
with open(text_file, 'rb') as f_in:
with gzip.open(gz_file, 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
.GZ file image
The code works, the .gz file is created with the correct name (text_file.txt.GZ), but when you open the .GZ file, the name of the compressed file within it is not 'text_file.txt', it is also'text_file.txt.GZ' (see image).
I need to keep the original name, is there a way to do it?
Thanks in advance for any help!
Related
I am trying to go through all files within a folder, read the file data encoded using utf-8, then rewriting that data to a new file which should create a copy of that file. However when doing so the new copy of the file gets corrupted.
-Should i be using utf-8 text encoding to encode all file types (.py, .txt, .docx, .jpg)?
-Is there one standard text encoding format that works for all file types?
def read_files():
files = ["program.py", "letter.docx", "cat.jpg", "hello_world.py"]
for file in files:
#open exsting file
f = open(file, encoding="utf-8")
file_content = f.read()
#get file name info
file_extension = file.split(".")[1]
file_name = file.split(".")[0]
#write encoded data to new file
f = open(file_name + "_converted." + file_extension , "wb")
f.write(bytes(file_content, encoding="utf-8"))
f.close()
read_files()
proper way to copy files with shutil:
import shutil
source = file
destination = file_name + "_converted." + file_extension
shutil.copy(source, destination)
bad and slow way to copy files:
def read_files():
files = ["program.py", "letter.docx", "cat.jpg", "hello_world.py"]
for file in files:
#open exsting file
f = open(file,'rb') # read file in binary mode
file_content = f.read()
f.close() # don't forget to close the file !
#get file name info
file_extension = file.split(".")[1]
file_name = file.split(".")[0]
#write raw data to new file
f = open(file_name + "_converted." + file_extension , "wb")
f.write(file_content)
f.close()
read_files()
if you don't need to decode them to text then you should only open them in binary mode, as things like jpg and docx will break in text mode and should be opened in binary mode.
alternatively if you actually need to do some work on the docx or jpg files then you should use the proper modules to do so like Pillow for jpg and docx module for docx files.
from io import BytesIO
import zipfile
mem_zip = BytesIO()
with zipfile.ZipFile(mem_zip, mode="w",compression=zipfile.ZIP_DEFLATED) as zf:
zf.writestr("filename.txt", b"test")
data = mem_zip.getvalue()
with open('/path/to/file/test.zip', 'wb') as f:
f.write(data)
The code sinppet above creates a zip file in-memory with and writes to disk as a zip file. This works as expected. I can extract it and see the text file with the content "test".
I wish to password protect my zipfile. How do I do that?
I tried using the setpassword method but that had no effect on the output. The file written to disk was not password protected.
with zipfile.ZipFile(mem_zip, mode="w",compression=zipfile.ZIP_DEFLATED) as zf:
zf.setpassword(b"test_password")
zf.writestr("filename.txt", b"test")
I am writing to disk here just to test if the zipfile looks as I expect. My goal is to send the file as an email attachment and I wish to keep the zip file in-memory. So using pyminizip is not an option for me.
I have a gz file, how can I unzip the file and save the content to a txt in python?
I imported gzip already
file_path = gzip.open(file_name, 'rb')
How about opening a second file and writing to it?
import gzip
with gzip.open('file.txt.gz', 'rb') as f, open('file.txt', 'w') as f_out:
f_out.write(f.read())
Gzip's open method should open the file in such a way that its contents can be read like a normal file:
import gzip
#Define the file's location
file_path = "/path/to/file.gz"
#Open the file and read its contents
with gzip.open(file_path, "rb") as file:
file_content = file.read()
#Save the new txt file
txt_file_name = "txtFile.txt"
with open(txt_file_name, "w") as file:
file.write(file_content)
I am trying to add a file to a gzipped tarfile in python
import tarfile
# create test file
with open("testfile.txt", "w") as f:
f.write("TESTTESTTEST")
# create archive
with tarfile.open("archfile.tar.gz", "x:gz") as archive:
with open("testfile.txt", 'rb') as f:
archive.addfile(tarfile.TarInfo("testfile.txt"), f)
# read test file out of archive
with tarfile.open("archfile.tar.gz", "r:gz") as archive:
print(archive.extractfile("testfile.txt").read())
The result is b'' - an empty bytestring.
The file is not empty - if I try to read the file using the following code:
with open("testfile.txt", 'rb') as f:
print(f.read())
... I get b'TESTTESTTEST'
Is there something obvious I am missing? My end goal is to add the string in memory using f = io.StringIO('TESTTESTTEST')
I also tried removing the :gz and I see the same problem with a raw tar archive.
For additional info - I'm using Python 3 in a jupyter session on Windows 10. I see the same problem in Windows/Python 3.5.2/PyCharm.
I hit a similar problem. The documentation says that when you call tar.addfile it will write TarInfo.size bytes from the given file. That means that you have to either create the TarInfo with the file size or use tar.add() instead of tar.addfile:
# create archive V1
with tarfile.open("archfile.tar.gz", "x:gz") as archive:
with open("testfile.txt", 'rb') as f:
info = archive.gettarinfo("testfile.txt")
archive.addfile(info, f)
# create archive V2
with tarfile.open("archfile.tar.gz", "x:gz") as archive:
archive.add("testfile.txt")
# create archive V3
with tarfile.open("archfile.tar.gz", "w:gz") as archive:
with io.BytesIO(b"TESTTESTTEST") as f:
info = tarfile.TarInfo("testfile.txt")
f.seek(0, io.SEEK_END)
info.size = f.tell()
f.seek(0, io.SEEK_SET)
archive.addfile(info, f)
You can us the StringIO module to write the content as a file object to the tar file.
Sample:
import tarfile
import StringIO
tar = tarfile.TarFile("archfile.tar.gz","w")
with open("testfile.txt", 'rb') as f:
s = StringIO.StringIO(f.read())
info = tarfile.TarInfo(name="testfile.txt")
info.size = len(s.buf)
tar.addfile(tarinfo=info, fileobj=s)
tar.close()
Not a perfect answer but I managed to work around this with zipfile.
import zipfile
import io
# create archive
with zipfile.ZipFile("archfile.zip", "w") as archive:
with io.StringIO("TESTTESTTEST") as f:
archive.writestr("1234.txt", f.read())
# read test file out of archive
with zipfile.ZipFile("archfile.zip", "r") as archive:
print(archive.read("1234.txt"))
produces b'TESTTESTTEST'
I need to extract a gz file that I have downloaded from an FTP site to a local Windows file server. I have the variables set for the local path of the file, and I know it can be used by GZIP muddle.
How can I do this? The file inside the GZ file is an XML file.
import gzip
import shutil
with gzip.open('file.txt.gz', 'rb') as f_in:
with open('file.txt', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
From the documentation:
import gzip
with gzip.open('file.txt.gz', 'rb') as f:
file_content = f.read()
Maybe you want pass it to pandas also.
with gzip.open('features_train.csv.gz') as f:
features_train = pd.read_csv(f)
features_train.head()
from sh import gunzip
gunzip('/tmp/file1.gz')
Not an exact answer because you're using xml data and there is currently no pd.read_xml() function (as of v0.23.4), but pandas (starting with v0.21.0) can uncompress the file for you! Thanks Wes!
import pandas as pd
import os
fn = '../data/file_to_load.json.gz'
print(os.path.isfile(fn))
df = pd.read_json(fn, lines=True, compression='gzip')
df.tail()
If you are parsing the file after unzipping it, don't forget to use decode() method, is necessary when you open a file as binary.
import gzip
with gzip.open(file.gz, 'rb') as f:
for line in f:
print(line.decode().strip())
It is very simple.. Here you go !!
import gzip
#path_to_file_to_be_extracted
ip = sample.gzip
#output file to be filled
op = open("output_file","w")
with gzip.open(ip,"rb") as ip_byte:
op.write(ip_byte.read().decode("utf-8")
wf.close()
You can use gzip.decompress() to do it:
read input file using rb mode;
open output file using w mode and utf8 encoding;
gzip.decompress() input bytes;
decode what you get to str.
write str to output file.
def decompress(infile, tofile):
with open(infile, 'rb') as inf, open(tofile, 'w', encoding='utf8') as tof:
decom_str = gzip.decompress(inf.read()).decode('utf-8')
tof.write(decom_str)
If you have the gzip (and gunzip) programs installed on your computer a simple way is to call that command from python:
import os
filename = 'file.txt.gz'
os.system('gunzip ' + filename)
optionally, if you want to preserve the original file, use
os.system('gunzip --keep ' + filename)
if you have a linux environment it is very easy to unzip using the command gunzip.
go to the file folder and give as below
gunzip file-name