Python gzip folder structure when zipping single file - python

I'm using Python's gzip module to gzip content for a single file, using code similar to the example in the docs:
import gzip
content = "Lots of content here"
f = gzip.open('/home/joe/file.txt.gz', 'wb')
f.write(content)
f.close()
If I open the gz file in 7-zip, I see a folder hierarchy matching the path I wrote the gz to and my content is nested several folders deep, like /home/joe in the example above, or C: -> Documents and Settings -> etc in Windows.
How can I get the one file that I'm zipping to just be in the root of the gz file?

It looks like you will have to use GzipFile directly:
import gzip
content = "Lots of content here"
real_f = open('/home/joe/file.txt.gz', 'wb')
f = gzip.GZipFile('file.txt.gz', fileobj=real_f)
f.write(content)
f.close()
real_f.close()
It looks like open doesn't allow you to specify the fileobj separate from the filename.

You must use gzip.GzipFile and supply a fileobj. If you do that, you can specify an arbitrary filename for the header of the gz file.

Why not just open the file without specifying a directory hierarchy (just do gzip.open("file.txt.gz"))?. Seems to me like that works. You can always copy the file to another location, if you need to.

If you set your current working directory to your output folder, you can then call gzip.open("file.txt.gz") and the gz file will be created without the hierarchy
import os
import gzip
content = "Lots of content here"
outputPath = '/home/joe/file.txt.gz'
origDir = os.getcwd()
os.chdir(os.path.dirname(outputPath))
f = gzip.open(os.path.basename(outputPath), 'wb')
f.write(content)
f.close()
os.chdir(origDir)

Related

How to open and read text files in a folder python

I have a folder which has a text files in it. I want to be able to put in a path to this file and have python go through the folder, open each file and append its content to a list.
import os
folderpath = "/Users/myname/Downloads/files/"
inputlst = [os.listdir(folderpath)]
filenamelist = []
for filename in os.listdir(folderpath):
if filename.endswith(".txt"):
filenamelist.append(filename)
print(filename list)
So far this outputs:
['test1.txt', 'test2.txt', 'test3.txt', 'test4.txt', 'test5.txt', 'test6.txt', 'test7.txt', 'test8.txt', 'test9.txt', 'test10.txt']
I want to have the code take each of these files, open them and put all of its content into a single huge list not just print the file name. Is there any way to do this?
You should use file open for this.
Read here a documentation about its advanced options
Anyway, here is one way how you can do it:
import os
folderpath = r"yourfolderpath"
inputlst = [os.listdir(folderpath)]
filenamecontent = []
for filename in os.listdir(folderpath):
if filename.endswith(".txt"):
f = open(os.path.join(folderpath,filename), 'r')
filenamecontent.append(f.read())
print(filenamecontent)
If you are using Python3, you can use :
for filename in filename_list :
with open(filename,"r") as file_handler :
data = file_handler.read()
Please do mind that you will need the full (either relative or absolute) path to your file in filename
This way, your file handler will be automatically closed when you get out of the with scope.
More information around here : https://docs.python.org/fr/3/library/functions.html#open
On a side note, in order to list files, you might want to have a look to glob and use :
filename_list = glob.glob("/path/to/files/*.txt")
You can use fileinput
Code:
import fileinput
folderpath = "your_path_to_directory_where_files_are_stored"
file_list = [a for a in os.listdir(folderpath) if a.endswith(".txt")]
# This will return all the files which are in .txt format
get_all_files = fileinput.input(file_list)
with open("alldata.txt", 'ab+') as writefile:
for line in get_all_files:
writefile.write(line+'\n')
The above code will read all the data from .txt from a specified directory(folderpath) and store it in alldata.txt So, you wanted to have that long list, that list is now stored in .txt file if you want, else you can remove the write process.
Links:
https://docs.python.org/3/library/fileinput.html
https://docs.python.org/3/library/functions.html#open

gzip multiple files in python

I have to compress a lot of XML files into and split them by the data in the file name, just for clarification's sake, there is a parser which collects information from XML file and then moves it to a backup folder. My code needs to gzip it according to the date in the filename and group those files in a compressed .gz file.
Please find the code bellow:
import os
import re
import gzip
import shutil
import sys
import time
#
timestr = time.strftime("%Y%m%d%H%M")
logfile = 'D:\\Coleta\\log_compactador_xml_tar'+timestr+'.log'
ptm_dir = "D:\\PTM\\monitored_programs\\"
count_files_mdc = 0
count_files_3gpp = 0
count_tar = 0
#
for subdir, dir, files in os.walk(ptm_dir):
for file in files:
path = os.path.join(subdir, file)
try:
backup_files_dir = path.split(sep='\\')[4]
parser_id = path.split(sep='\\')[3]
if re.match('backup_files_*', backup_files_dir):
if file.endswith('xml'):
# print(time.strftime("%Y-%m-%d %H:%M:%S"), path)
data_arq = file[1:14]
if parser_id in ('parser-924'):
gzip_filename_mdc = os.path.join(subdir,'E4G_PM_MDC_IP51_'+timestr+'_'+data_arq)
with open(path, 'r')as f_in, gzip.open(gzip_filename_mdc + ".gz", 'at') as f_out_mdc:
shutil.copyfileobj(f_in, f_out_mdc)
count_files_mdc += 1
f_out_mdc.close()
f_in.close()
print(time.strftime("%Y-%m-%d %H:%M:%S"), "Compressing file MDC: ",path)
os.remove(path)
except PermissionError:
print(time.strftime("%Y-%m-%d %H:%M:%S"), "Permission error on file:", fullpath, file=logfile)
pass
except IndexError:
print(time.strftime("%Y-%m-%d %H:%M:%S"), "IndexError: ", path, file=logfile)
pass
As long as I seem it creates a stream of data, then compress and write it to a new file with the specified filename. However, instead of grouping each XML file independently inside a ".gz" file, it does creates inside the "gzip" file, a big file (big stream of data?) with the same name of the output "gzip" file, but without any extension. After the files are totally compressed, it's not possible to uncompress the big file generated inside the "gzip" output file. Does someone know where is the problem with my code?
PS: I have edited the code for readability purposes.
Not sure whether the solution is still needed, but I will just leave it here for anyone who faces the same issue.
There is a way to create a gzip archive in python using tarfile, the code is quite simple:
with tarfile.open(filename, mode="w:gz") as archive:
archive.add(name=name_of_file_to_add, recursive=True)
in this case name_of_file_to_add can be a directory, in which case tarfile will add it recursively with all its contents. Obviously you will need to import the tarfile module.
If you need to add files without a directory a simple for with calls to add will do (recursive flag is not required in this case).

how to zip all pdf files under a static folder? django

I have a folder named pdfs under static folder.
I am trying to have a returned zip which contains all the pdf files in the pdfs folder.
I have tried a few threads and used their codes, but I tried to workout things but then couldn't solve the last part that I get a message saying no file / directory
I know static folders are a bit different than usual folders.
can someone please give me a hand and see what I have missed?
Thanks in advance
from StringIO import StringIO
import zipfile
pdf_list = os.listdir(pdf_path)
print('###pdf list################################')
print(pdf_path) # this does show me the whole path up to the pdfs folder
print(pdf_list) # returns ['abc.pdf', 'efd.pdf']
zip_subdir = "somefiles"
zip_filename = "%s.zip" % zip_subdir
# Open StringIO to grab in-memory ZIP contents
s = StringIO()
# Grab ZIP file from in-memory, make response with correct MIME-type
resp = HttpResponse(content_type='application/zip')
# ..and correct content-disposition
resp['Content-Disposition'] = 'attachment; filename=%s' % zip_filename
# The zip compressor
zf = zipfile.ZipFile(s, "w")
for pdf_file in pdf_list:
print(pdf_file)
zf.write(pdf_file, pdf_path + pdf_file)
zf.writestr('file_name.zip', pdf_file.getvalue())
zf.close()
return resp
here I am getting errors for not able to find file / directory for 'abc.pdf'
P.S. I don't really need any sub folders zipped into the zip file. As long as all files are inside the zip, it'll be all good. (There won't be any sub folders in the pdfs folder)
I solved it myself and made it into a function with comments.
complicated things myself earlier
# two params
# 1. the directory where files want to be zipped
# e.g. of file directory is /et/ubuntu/vanfruits/vanfruits/static/pdfs/
# 2. filename of the zip file
def render_respond_zip(self, file_directory, zip_file_name):
response = HttpResponse(content_type='application/zip')
response['Content-Disposition'] = 'attachment; filename=' + zip_file_name
# open a file, writable
zip = ZipFile(response, 'w')
# loop through the directory provided
for single_file in os.listdir(file_directory):
# open the file, full path to the file including file name and extension is needed as first param
f = open(file_directory + single_file, 'r')
# write the file into the zip with
# first param is the name of the file inside the zip
# second param is read the file
zip.writestr(single_file, f.read())
zip.close()
return response

How to sequentially read all the files in a directory and export the contents in Python?

I have a directory /directory/some_directory/ and in that directory I have a set of files. Those files are named in the following format: <letter>-<number>_<date>-<time>_<dataidentifier>.log, for example:
ABC1-123_20162005-171738_somestring.log
DE-456_20162005-171738_somestring.log
ABC1-123_20162005-153416_somestring.log
FG-1098_20162005-171738_somestring.log
ABC1-123_20162005-031738_somestring.log
DE-456_20162005-171738_somestring.log
I would like to read those a subset of those files (for example, read only files named as ABC1-123*.log) and export all their contents to a single csv file (for example, output.csv), that is, a CSV file that will have all the data from the inidividual files collectively.
The code that I have written so far:
#!/usr/bin/env python
import os
file_directory=os.getcwd()
m_class="ABC1"
m_id="123"
device=m_class+"-"+m_id
for data_file in sorted(os.listdir(file_dir)):
if str(device)+"*" in os.listdir(file_dir):
print data_file
I don't know how to read a only a subset of filtered files and also how to export them to a common csv file.
How can I achieve this?
just use re lib to match file name pattern, and use csv lib to export.
Only a few adjustments, You were close
filesFromDir = os.listdir(os.getcwd())
fileList = [file for file in filesFromDir if file.startswith(device)]
f = open("LogOutput.csv", "ab")
for file in fileList:
#print "Processing", file
with open(file, "rb") as log_file:
txt = log_file.read()
f.write(txt)
f.write("\n")
f.close()
Your question could be better stated, based on your current code snipet, I'll assume that you want to:
Filter files in a directory based on glob pattern.
Concatenate their contents to a file named output.csv.
In python you can achieve (1.) by using glob to list filenames.
import glob
for filename in glob.glob('foo*bar'):
print filename
That would print all files starting with foo and ending with bar in
the current directory.
For (2.) you just read the file and write its content to your desired
output, using python's open() builtin function:
open('filename', 'r')
(Using 'r' as the mode you are asking python to open the file for
"reading", using 'w' you are asking python to open the file for
"writing".)
The final code would look like the following:
import glob
import sys
device = 'ABC1-123'
with open('output.csv', 'w') as output:
for filename in glob.glob(device+'*'):
with open(filename, 'r') as input:
output.write(input.read())
You can use the os module to list the files.
import os
files = os.listdir(os.getcwd())
m_class = "ABC1"
m_id = "123"
device = m_class + "-" + m_id
file_extension = ".log"
# filter the files by their extension and the starting name
files = [x for x in files if x.startswith(device) and x.endswith(file_extension)]
f = open("output.csv", "a")
for file in files:
with open(file, "r") as data_file:
f.write(data_file.read())
f.write(",\n")
f.close()

How to read text files in a zipped folder in Python

I have a compressed data file (all in a folder, then zipped). I want to read each file without unzipping. I tried several methods but nothing works for entering the folder in the zip file. How should I achieve that?
Without folder in the zip file:
with zipfile.ZipFile('data.zip') as z:
for filename in z.namelist():
data = filename.readlines()
With one folder:
with zipfile.ZipFile('data.zip') as z:
for filename in z.namelist():
if filename.endswith('/'):
# Here is what I was stucked
namelist() returns a list of all items in an archive recursively.
You can check whether an item is a directory by calling os.path.isdir():
import os
import zipfile
with zipfile.ZipFile('archive.zip') as z:
for filename in z.namelist():
if not os.path.isdir(filename):
# read the file
with z.open(filename) as f:
for line in f:
print line
Hope that helps.
I got Alec's code to work. I made some minor edits: (note, this won't work with password-protected zipfiles)
import os
import sys
import zipfile
z = zipfile.ZipFile(sys.argv[1]) # Flexibility with regard to zipfile
for filename in z.namelist():
if not os.path.isdir(filename):
# read the file
for line in z.open(filename):
print line
z.close() # Close the file after opening it
del z # Cleanup (in case there's further work after this)
I got RichS' code to work. I made some minor edits:
import os
import sys
import zipfile
archive = sys.argv[1] # assuming launched with `python my_script.py archive.zip`
with zipfile.ZipFile(archive) as z:
for filename in z.namelist():
if not os.path.isdir(filename):
# read the file
for line in z.open(filename):
print(line.decode('utf-8'))
As you can see the edits are minor. I've switched to Python 3, the ZipFile class has a capital F, and the output is converted from b-strings to unicode strings. Only decode if you are trying to unzip a text file.
PS I'm not dissing RichS at all. I just thought it would be hilarious. Both useful and a mild shitpost.
PPS You can get file from an archive with a password: ZipFile.open(name, mode='r', pwd=None, *, force_zip64=False) or ZipFile.read(name, pwd=None). If you use .read then there's no context manager so you would simply do
# read the file
print(z.read(filename).decode('utf-8'))

Categories