I have a python script that compresses specific files into a zip file. However I have noticed that a file ".DS_Store" is produced within this zip file. Is there a way I can remove this from the zip file or avoid it being created in the first place in my python script. From what I have found online I think on a windows machine this hidden file appears as "macosx" file.
I've tested the zip file with and without the ".DS_Store" hidden file (I manually deleted it). When I remove it, the zip file is able to be processed correctly and when I leave it in, errors are thrown.
This is how I create the zip file in my python script:
#Create zip file of all necessary files
zipf = zipfile.ZipFile(new_path+zip_file_name, 'w', zipfile.ZIP_DEFLATED)
create_zip(new_path,zipf)
zipf.close()
Any advice how to approach removing this hidden file would be appreciated.
Your code uses a function, create_zip, but you haven't shared the code of that function. Presumably, it loops through the contents of a directory and calls the .write method of the ZipFile instance in order to write each file into the archive. If this is the case, just add some logic to that function to exclude any files called .DS_Store.
def create_zip(path, zipfile):
files = os.listdir(path)
for file in files:
if file != '.DS_Store':
zipfile.write(file)
Related
this might be a silly question but I'm struggling a lot finding solution to it.
So I have a file in the given folder:
Output\20190101_0100\20190101_0100.csv
Now I want to zip the file and save it to same location. So here's my try:
zipfile.ZipFile('Output/20190101_0100/20190101_0100_11.zip', mode='w', compression=zipfile.ZIP_DEFLATED).write('Output/20190101_0100/20190101_0100_11.csv')
But it's making a folder insider zip folder and saving it, as shown below:
Output\20190101_0100\20190101_0100_11.zip\Output\20190101_0100\20190101_0100_11.csv
Can someone tell me how can I save my file directly in the same location or location mentioned below:
Output\20190101_0100\20190101_0100_11.zip\20190101_0100_11.csv
Rephrasing of question
The question is slightly confusing because Output\20190101_0100\20190101_0100_11.zip\Output\20190101_0100\20190101_0100_11.csv won't be a file, but rather Output\20190101_0100\20190101_0100_11.csv will be a file within the zip file Output\20190101_0100\20190101_0100_11.zip (if I am not mistaken)
Just to restate your problem (if I understood it correctly):
You have a file Output\20190101_0100\20190101_0100.csv (a file 20190101_0100.csv in the Output -> 20190101_0100 sub directory)
You want to create the zip file Output/20190101_0100/20190101_0100_11.zip (20190101_0100_11.zip in the Output -> 20190101_0100.zip directory)
You want to add the aforementioned CSV file Output\20190101_0100\20190101_0100.csv but without the leading path, i.e. as 20190101_0100_11.csv rather than Output\20190101_0100\20190101_0100.csv.
Or to not get confused with too many similar directories, let's simplify it as:
You have a file test.csv in the sub directory sub-folder
You want to create the zip file test.zip
You want to add the aforementioned CSV file test.csv but without the leading path, i.e. as test.csv rather than sub-folder/test.csv.
Answer
From the ZipFile.write documentation:
Write the file named filename to the archive, giving it the archive
name arcname (by default, this will be the same as filename, but
without a drive letter and with leading path separators removed).
That means that arcname will default to the passed in filename (it doesn't have a drive letter or leading path separator).
If you want to remove the sub folder part, just pass in arcname as well. e.g.:
import zipfile
with zipfile.ZipFile('path-to-zip/test.zip', 'w') as zf:
zf.write('sub-folder/test.csv', arcname='test.csv')
You could try using a raw path:
zipfile.ZipFile('Output/20190101_0100/20190101_0100_11.zip', mode='w', compression=zipfile.ZIP_DEFLATED).write(r'C:\...\Output\20190101_0100\20190101_0100_11.csv')
I wrote a simple, rough program that automatically zip everything inside the current working directory. It works very well on Linux but there is huge problem when running on Windows.
Here is my code:
import os, zipfile
zip = zipfile.ZipFile('zipped.zip', 'w') #Create a zip file
zip.close()
zip = zipfile.ZipFile('zipped.zip', 'a') #Make zip file append instead of overwriting
for dir, subdir, file in os.walk(os.path.relpath('.')): #Loop for walking thru the directory
for subdirectory in subdir:
subdirs = os.path.join(dir, subdirectory)
zip.write(subdirs, compress_type=zipfile.ZIP_DEFLATED)
for files in file:
fil = os.path.join(dir, files)
zip.write(fil, compress_type=zipfile.ZIP_DEFLATED)
zip.close()
When I ran this on Windows, it won't stop compressing, but infinitely create the "zipped.zip" file in the zipped file, after left it running a few seconds, generated few hundreds MB of file. On Linux, the program will stop after it zipped all the files excluding newly created zipped.zip.
Screenshot: A "zipped.zip" inside the "zipped.zip"
I am wondering did I miss some code that will make this works well on Windows?
I would zip the folder in a temporary zipfile, then move the temporary zipfile in the folder.
That seems to be because you are saving the zip to the same folder that you are trying to compress, and that must be confusing os.walk() somehow.
One possible solution, as long as you don't have a giant directory to compress, is to use os.walk() to build a full list of what will be compressed, and after the list is complete, then you would it to populate the zip, instead of using os.walk() directly.
I have a *.tar.gz compressed file that I would like to read in with Python 2.7. The file contains multiple h5 formatted files as well as a few text files. I'm a novice with Python. Here is the code I'm trying to adapt:
`subset_path='c:\data\grant\files'
f=gzip.open(filename,'subset_full.tar.gz')
subset_data_path=os.path.join(subset_path,'f')
The first statement identifies the path to the folder with the data. The second statement tells Python to open a specific compressed file and the third statement (hopefully) executes a join of the prior two statements.
Several lines below this code I get an error when Python tries to use the 'subset_data_path' assignment.
What's going on?
The gzip module will only open a single file that has been compressed, i.e. my_file.gz. You have a tar archive of multiple files that are also compressed. This needs to be both untarred and uncompressed.
Try using the tarfile module instead, see https://docs.python.org/2/library/tarfile.html#examples
edit: To add a bit more information on what has happened, you have successfully opened the zipped tarball into a gzip file object, which will work almost the same as a standard file object. For instance you could call f.readlines() as if f was a normal file object and it would return the uncompressed lines.
However, this did not actually unpack the archive into new files in the filesystem. You did not create a subdirectory 'c:\data\grant\files\f', and so when you try to use the path subset_data_path you are looking for a directory that does not exist.
The following ought to work:
import tarfile
subset_path='c:\data\grant\files'
tar = tarfile.open("subset_full.tar.gz")
tar.extractall(subset_path)
subset_data_path=os.path.join(subset_path,'subset_full')
import os, csv
f=open("C:\\tempa\\file.csv", 'wb') #write to an existing blank csv file
w=csv.writer(f)
for path, dirs, files, in os.walk("C:\\tempa"):
for filename in files:
w.writerow([filename])
running win7 64bit latest python, using anaconda spyder, pyscripter issue persists regardless of the ide.
I have some media in folders in tempa jpg, pdf and mov... and I wanted to get a file list of all of them, and the code works but it stops without any issue at row 113, nothing special with the file it stops on, no weird characters.
I could have 3 blocks of code one for each folder to work around this weird bug. but it shouldnt have an issue.. the folders are all in the root folder without going too deep in sub folders:
C:\
-tempa
-jpg
-pdf
-mov
I have heard there are issues with os.walk but I didn't expext anything weird like this.
Maybe I need an f=close?
You were examining the file before it was fully closed. (f won't be closed until, at least, it is no longer referenced by any in-scope variable name.) If you examine a file before it is closed, you may not see the final, partial, data buffer.
Use the file object's context manager to ensure that the file is flushed and closed in all cases:
import os, csv
with open("C:\\tempa\\file.csv", 'wb') as f: #write to an existing blank csv file
w=csv.writer(f)
for path, dirs, files, in os.walk("C:\\tempa"):
for filename in files:
w.writerow([filename])
# Now no need for f.close()
I'm trying to replace a file in a zipped archive with a script using zipfile. The file is one directory, the archive is in another. To do this, I copy everything from the original archive into another, excluding the file I want to replace. Then I write the new version of the file to replace into the new archive, close it, and delete the old archive and rename the new one. Should be easy, right? Wrong.
For whatever reason, the zipfile.write() method has this silly thing it does where it assumes that the second (optional) argument, arcname is the same as your file name, unless you specify it. So, if I have the following:
fileName = "C:\\Documents\\file"
archive.write(fileName)
I will get an archive with a subarchive called "Documents", and within that will be the file. I want the file to be in the root directory of the archive (sidenote: is 'root directory' the right term for what I'm refering to?)
Thing's I've Tried:
archive.write(fileName,'') This produced a weird file in the archive, which could not be opened.
archive.write(fileName, archive) I really thought this would work, but the system really didn't like it.
archive.write(fileNameWithoutPath) This one returned an error, since Python could no longer find the file.
So how do I specify that I want to put the file in the root directory of the archive and still specify its path so Python can find it?
Minor, and semi-related question: Is there a way to create the new archive such that it is hidden in windows explorer?
I am assuming you want a entry in the zipfile called file containin the contents of C:\Documents\file
From python docs
ZipFile.write(filename[, arcname[, compress_type]])
Write the file named filename to the archive, giving it the archive name arcname
so you want
archive.write(fileName, fileNameWithoutPath)
The first argument is the file that goes in the zip and the second is the name that is to be used in the archive, as it contains no path separators it will not create any directories.