I wrote a simple, rough program that automatically zip everything inside the current working directory. It works very well on Linux but there is huge problem when running on Windows.
Here is my code:
import os, zipfile
zip = zipfile.ZipFile('zipped.zip', 'w') #Create a zip file
zip.close()
zip = zipfile.ZipFile('zipped.zip', 'a') #Make zip file append instead of overwriting
for dir, subdir, file in os.walk(os.path.relpath('.')): #Loop for walking thru the directory
for subdirectory in subdir:
subdirs = os.path.join(dir, subdirectory)
zip.write(subdirs, compress_type=zipfile.ZIP_DEFLATED)
for files in file:
fil = os.path.join(dir, files)
zip.write(fil, compress_type=zipfile.ZIP_DEFLATED)
zip.close()
When I ran this on Windows, it won't stop compressing, but infinitely create the "zipped.zip" file in the zipped file, after left it running a few seconds, generated few hundreds MB of file. On Linux, the program will stop after it zipped all the files excluding newly created zipped.zip.
Screenshot: A "zipped.zip" inside the "zipped.zip"
I am wondering did I miss some code that will make this works well on Windows?
I would zip the folder in a temporary zipfile, then move the temporary zipfile in the folder.
That seems to be because you are saving the zip to the same folder that you are trying to compress, and that must be confusing os.walk() somehow.
One possible solution, as long as you don't have a giant directory to compress, is to use os.walk() to build a full list of what will be compressed, and after the list is complete, then you would it to populate the zip, instead of using os.walk() directly.
Related
I'm still very new to Python so I'm trying to apply Python in to my own situation for some experience
One useful program is to delete files, in this case by file type from a directory
import os
target = "H:\\documents\\"
for x in os.listdir(target):
if x.endswith(".rtf"):
os.unlink(target + x)
Taking this program, I have tried to expand it to delete ost files in every local profiles:
import os
list = []
folder = "c:\\Users"
for subfolder in os.listdir(folder):
list.append(subfolder)
ost_folder = "c:\\users\\%s\\AppData\\Local\\Microsoft\\Outlook"
for users in list:
ost_list = os.listdir(ost_folder%users)
for file in ost_list:
if file.endswith(".txt"):
print(file)
This should be printing the file name but spits an error that the file directory cannot be found
Not every folder under C:\Users will have a AppData\Local\Microsoft\Outlook subdirectory (there are typically hidden directories there that you may not see in Windows Explorer that don't correspond to a real user, and have never run Outlook, so they don't have that folder at all, but will be found by os.listdir); when you call os.listdir on such a directory, it dies with the exception you're seeing. Skip the directories that don't have it. The simplest way to do so is to have the glob module do the work for you (which avoids the need for your first loop entirely):
import glob
import os
for folder in glob.glob(r"c:\users\*\AppData\Local\Microsoft\Outlook"):
for file in os.listdir(folder):
if file.endswith(".txt"):
print(os.path.join(folder, file))
You can simplify it even further by pushing all the work to glob:
for txtfile in glob.glob(r"c:\users\*\AppData\Local\Microsoft\Outlook\*.txt"):
print(txtfile)
Or do the more modern OO-style pathlib alternative:
for txtfile in pathlib.Path(r'C:\Users').glob(r'*\AppData\Local\Microsoft\Outlook\*.txt'):
print(txtfile)
One easy way is to create a directory and populate it with files. Then archive and compress that directory into a zip file called, say, file.zip. But this approach is needless since my files are in memory already, and needing to save them to disk is excessive.
Is it possible that I create the directory structure right in memory, without saving the unzipped files/directories? So that I end up saving only the final file.zip (without the intermediate stage of saving files/directories on file system)?
You can use zipfile:
from zipfile import ZipFile
with ZipFile("file.zip", "w") as zip_file:
zip_file.writestr("root/file.json", json.dumps(data))
zip_file.writestr("README.txt", "hello world")
I'm trying to create a python script which places a number of files in a "staging" directory tree, and then uses ZipFile to a create a .zip archive of them. This will later be copied to a linux machine, which will extract the files and use them. The staging directory contains a mix of text and binary data files. The section doing the writing is in this "try" block:
try:
import zipfile
zipf = zipfile.ZipFile(out_file, 'w', zipfile.ZIP_DEFLATED)
for root, dirs, files in os.walk(staging_dir):
for d in dirs:
# Write directories so even empty directories are copied:
arcname = os.path.relpath(os.path.join(root, d), staging_dir)
zipf.write(os.path.join(root, d), arcname)
for f in files:
arcname = os.path.relpath(os.path.join(root, f), staging_dir)
zipf.write(os.path.join(root, f), arcname)
This works on a linux machine running python 2.7 (my main goal) or 3.x (secondary goal). It can also run on a Windows machine (sort of an afterthought, it might be useful), but there's a problem with permissions in that case. Normally the script sets permissions in the files in the staging_dir with "os.chmod", and then zip creates the archive with the right permissions. But running this on windows, the "os.chmod" command doesn't really set all linux file modes (not possible), so the zipfile contents aren't at the right permissions. I'm trying to figure out if there's a way to fix the permissions when making the zipfile in the code above. In particular, files in staging_dir/bin need to have "0o750" permissions.
I've seen the answer to How do I set permissions (attributes) on a file in a ZIP file using Python's zipfile module, so I see how you could set permissions with "external_attr", and then write a file with "ZipFile.writestr". But the "external_attr" doesn't seem to apply to "ZipFile.write", only "ZipFile.writestr". And I'd like to do this on a zip archive that contains some binary files. Is there any other option than "writestr"? Is it be possible to use "writestr" on large binary files?
I have a python script that compresses specific files into a zip file. However I have noticed that a file ".DS_Store" is produced within this zip file. Is there a way I can remove this from the zip file or avoid it being created in the first place in my python script. From what I have found online I think on a windows machine this hidden file appears as "macosx" file.
I've tested the zip file with and without the ".DS_Store" hidden file (I manually deleted it). When I remove it, the zip file is able to be processed correctly and when I leave it in, errors are thrown.
This is how I create the zip file in my python script:
#Create zip file of all necessary files
zipf = zipfile.ZipFile(new_path+zip_file_name, 'w', zipfile.ZIP_DEFLATED)
create_zip(new_path,zipf)
zipf.close()
Any advice how to approach removing this hidden file would be appreciated.
Your code uses a function, create_zip, but you haven't shared the code of that function. Presumably, it loops through the contents of a directory and calls the .write method of the ZipFile instance in order to write each file into the archive. If this is the case, just add some logic to that function to exclude any files called .DS_Store.
def create_zip(path, zipfile):
files = os.listdir(path)
for file in files:
if file != '.DS_Store':
zipfile.write(file)
I have a Python script that reads through a text csv file and creates a playlist file. However I can only do one at a time, like:
python playlist.py foo.csv foolist.txt
However, I have a directory of files that need to be made into a playlist, with different names, and sometimes a different number of files.
So far I have looked at creating a txt file with a list of all the names of the file in the directory, then loop through each line of that, however I know there must be an easier way to do it.
for f in *.csv; do
python playlist.py "$f" "${f%.csv}list.txt"
done
Will that do the trick? This will put foo.csv in foolist.txt and abc.csv in abclist.txt.
Or do you want them all in the same file?
Just use a for loop with the asterisk glob, making sure you quote things appropriately for spaces in filenames
for file in *.csv; do
python playlist.py "$file" >> outputfile.txt;
done
Is it a single directory, or nested?
Ex.
topfile.csv
topdir
--dir1
--file1.csv
--file2.txt
--dir2
--file3.csv
--file4.csv
For nested, you can use os.walk(topdir) to get all the files and dirs recursively within a directory.
You could set up your script to accept dirs or files:
python playlist.py topfile.csv topdir
import sys
import os
def main():
files_toprocess = set()
paths = sys.argv[1:]
for p in paths:
if os.path.isfile(p) and p.endswith('.csv'):
files_toprocess.add(p)
elif os.path.isdir(p):
for root, dirs, files in os.walk(p):
files_toprocess.update([os.path.join(root, f)
for f in files if f.endswith('.csv')])
if you have directory name you can use os.listdir
os.listdir(dirname)
if you want to select only a certain type of file, e.g., only csv file you could use glob module.