Remove auto-generated __MACOSX folder from inside a zip file in Python

Remove auto-generated __MACOSX folder from inside a zip file in Python - python

I have zip files uploaded by clients through a web server that sometimes contain pesky __MACOSX directories inside that gum things up. How can I remove these?
I thought of using ZipFile, but this answer says that isn't possible and gives this suggestion:
Read out the rest of the archive and write it to a new zip file.
How can I do this with ZipFile? Another Python based alternative like shutil or something similar would also be fine.

The examples below are designed to determine if a '__MACOSX' file is contained within a zip file. If this pesky exist then a new zip archive is created and all the files that are not __MACOSX files are written to this new archive. This code can be extended to include .ds_store files. Please let me if you need to delete the old zip file and replace it with the new clean zip file.
Hopefully, these answers help you solve your issue.
Example One
from zipfile import ZipFile
original_zip = ZipFile ('original.zip', 'r')
new_zip = ZipFile ('new_archve.zip', 'w')
for item in original_zip.infolist():
buffer = original_zip.read(item.filename)
if not str(item.filename).startswith('__MACOSX/'):
new_zip.writestr(item, buffer)
new_zip.close()
original_zip.close()
Example Two
def check_archive_for_bad_filename(file):
zip_file = ZipFile(file, 'r')
for filename in zip_file.namelist():
print(filename)
if filename.startswith('__MACOSX/'):
return True
def remove_bad_filename_from_archive(original_file, temporary_file):
zip_file = ZipFile(original_file, 'r')
for item in zip_file.namelist():
buffer = zip_file.read(item)
if not item.startswith('__MACOSX/'):
if not os.path.exists(temporary_file):
new_zip = ZipFile(temporary_file, 'w')
new_zip.writestr(item, buffer)
new_zip.close()
else:
append_zip = ZipFile(temporary_file, 'a')
append_zip.writestr(item, buffer)
append_zip.close()
zip_file.close()
archive_filename = 'old.zip'
temp_filename = 'new.zip'
results = check_archive_for_bad_filename(archive_filename)
if results:
print('Removing MACOSX file from archive.')
remove_bad_filename_from_archive(archive_filename, temp_filename)
else:
print('No MACOSX file in archive.')

The idea would be to use ZipFile to extract the contents into some defined folder then remove the __MACOSX entry (os.rmdir, os.remove) and then compress it again.
Depending if you have zip command on your OS you might be able to skip the re-compressing part. You could as well control this command from python by using os.system or subprocess module.

Related

Creating one csv file containing data from many csv files

I have a folder full of csv files that contain results for different participants in an experiment. I'm trying to create one large csv file containing all the csv files in the directory. I'm using listdir() to create a list of all the files but I haven't been able to open the individual files in this list. I think I need to loop over each file but I haven't been able to figure out how.
This is the code I've come up with so far. I get the following error: FileNotFoundError: [Errno 2] No such file or directory: 'results_262.csv' because it appears that the loop only reads one file in the files variable even though there should be many.
from os import listdir
path = "results"
files = listdir(path)
print(files)
results = open("results.csv", "w")
data = []
for file in files:
participant = open(f"{file}", "r")
data.append(participant.readlines())
Would anyone be able to help?

Your issue is that listdir() returns the filenames without their path information, so you need to append the filename to the path.
import os
...
for file in files:
participant = open(os.join(path, file), "r"):
...
Other details:
f"{file}" is the same thing as just file. I.e., open(file, "r") is equivalent to open(f"{file}", "r"), but more efficient - there is not need to use interpolation what you already have the value you want in a variable.
And, you're not closing your files. You should add participant.close() in your loop, or, even better, use a context manager:
for file in files:
with open(os.join(path, file), "r") as participant:
...
The with puts your file handle in a context manager, and the file gets closed automatically when you leave the scope of the context manager.

How to add a password and output directory using Zipfile module in Python?

I got below code from online and I am trying to add a password and I want to change the result directory to be "C:#SFTPDWN" (Final Zip file should be in this folder).
I try to change it like below, it did not work.
with ZipFile('CC-Data.zip', 'w', 'pass word') as zip:
Can anybody please tell how to change this code to add password and change result folder?
One last thing, currently it will zip #SFTPDWN folder, I just want to zip everything inside (Right now it will create two folders (CC-Data.zip and inside it #SFTPDWN )). Can anybody please tell me how to zip everything inside #SFTPDWN folder?
Code
from zipfile import ZipFile
import os
def get_all_file_paths(directory):
file_paths = []
for root, directories, files in os.walk(directory):
for filename in files:
filepath = os.path.join(root, filename)
file_paths.append(filepath)
return file_paths
def main():
# path to folder which needs to be zipped
directory = 'C:\#SFTPDWN'
file_paths = get_all_file_paths(directory)
print('Following files will be zipped:')
for file_name in file_paths:
print(file_name)
with ZipFile('CC-Data.zip', 'w') as zip:
# writing each file one by one
for file in file_paths:
zip.write(file)
print('Zipped successfully!')
if __name__ == "__main__":
main()

For the password question: from the documentation:
This module [...] supports decryption of encrypted files in ZIP archives, but it currently cannot create an encrypted file. Decryption is extremely slow as it is implemented in native Python rather than C.
https://docs.python.org/3/library/zipfile.html
You would need to use a 3rd party library to create an encrypted zip, or encrypt the archive some other way.
For the second part, in ZipFile.write the documentation also mentions:
ZipFile.write(filename, arcname=None, compress_type=None, compresslevel=None)
Write the file named filename to the archive, giving it the archive name arcname (by default, this will be the same as filename, but without a drive letter and with leading path separators removed). [...]
Note: Archive names should be relative to the archive root, that is, they should not start with a path separator.
https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.write
So you would need to strip off whatever prefix of your file variable and pass that as the arcname parameter. Using os.path.relpath may help, e.g. (I'm on Linux, but should work with Windows paths under Windows):
>>> os.path.relpath("/folder/subpath/myfile.txt", "/folder/")
'subpath/myfile.txt'
Sidebar: a path like "C:\Something" is an illegal string, as it has the escape \S. Python kinda tolerates this (I think in 3.8 it will error) and rewrites them literally. Either use "C:\\Something", r"C:\Something", or "C:/Something" If you attempted something like "C:\Users" it would actually throw an error, or "C:\nothing" it might silently do something strange...

Collecting comment data from multiple Rar files without unzipping

I wanted to collect comment data of a zip file from multiple files(as the optional comment you get on the side when opening a Zip or a Rar file)
but now I realize that they are not Zip but Rar files, what do i need to change in order for it to work on a Rar file?
import unicodedata
from zipfile import ZipFile
rootFolder = u"C:/Users/user/Desktop/archives/"
zipfiles = [os.path.join(rootFolder, f) for f in
os.listdir(rootFolder)] for zfile in zipfiles:
print("Opening: {}".format(zfile))
with ZipFile(zfile, 'r') as testzip:
print(testzip.comment) # comment for entire zip
l = testzip.infolist() #list all files in archive
for finfo in l:
# per file/directory comments
print("{}:{}".format(finfo.filename, finfo.comment))

You need to use RARFILE module. ZipFile.comment() can only get a comment object from the ZIP file.

taking data from files which are in folder

How do I get the data from multiple txt files that placed in a specific folder. I started with this could not fix. It gives an error like 'No such file or directory: '.idea' (??)
(Let's say I have an A folder and in that, there are x.txt, y.txt, z.txt and so on. I am trying to get and print the information from all the files x,y,z)
def find_get(folder):
for file in os.listdir(folder):
f = open(file, 'r')
for data in open(file, 'r'):
print data
find_get('filex')
Thanks.

If you just want to print each line:
import glob
import os
def find_get(path):
for f in glob.glob(os.path.join(path,"*.txt")):
with open(os.path.join(path, f)) as data:
for line in data:
print(line)
glob will find only your .txt files in the specified path.
Your error comes from not joining the path to the filename, unless the file was in the same directory you were running the code from python would not be able to find the file without the full path. Another issue is you seem to have a directory .idea which would also give you an error when trying to open it as a file. This also presumes you actually have permissions to read the files in the directory.
If your files were larger I would avoid reading all into memory and/or storing the full content.

First of all make sure you add the folder name to the file name, so you can find the file relative to where the script is executed.
To do so you want to use os.path.join, which as it's name suggests - joins paths. So, using a generator:
def find_get(folder):
for filename in os.listdir(folder):
relative_file_path = os.path.join(folder, filename)
with open(relative_file_path) as f:
# read() gives the entire data from the file
yield f.read()
# this consumes the generator to a list
files_data = list(find_get('filex'))
See what we got in the list that consumed the generator:
print files_data
It may be more convenient to produce tuples which can be used to construct a dict:
def find_get(folder):
for filename in os.listdir(folder):
relative_file_path = os.path.join(folder, filename)
with open(relative_file_path) as f:
# read() gives the entire data from the file
yield (relative_file_path, f.read(), )
# this consumes the generator to a list
files_data = dict(find_get('filex'))
You will now have a mapping from the file's name to it's content.
Also, take a look at the answer by #Padraic Cunningham . He brought up the glob module which is suitable in this case.

The error you're facing is simple: listdir returns filenames, not full pathnames. To turn them into pathnames you can access from your current working directory, you have to join them to the directory path:
for filename in os.listdir(directory):
pathname = os.path.join(directory, filename)
with open(pathname) as f:
# do stuff
So, in your case, there's a file named .idea in the folder directory, but you're trying to open a file named .idea in the current working directory, and there is no such file.
There are at least four other potential problems with your code that you also need to think about and possibly fix after this one:
You don't handle errors. There are many very common reasons you may not be able to open and read a file--it may be a directory, you may not have read access, it may be exclusively locked, it may have been moved since your listdir, etc. And those aren't logic errors in your code or user errors in specifying the wrong directory, they're part of the normal flow of events, so your code should handle them, not just die. Which means you need a try statement.
You don't do anything with the files but print out every line. Basically, this is like running cat folder/* from the shell. Is that what you want? If not, you have to figure out what you want and write the corresponding code.
You open the same file twice in a row, without closing in between. At best this is wasteful, at worst it will mean your code doesn't run on any system where opens are exclusive by default. (Are there such systems? Unless you know the answer to that is "no", you should assume there are.)
You don't close your files. Sure, the garbage collector will get to them eventually--and if you're using CPython and know how it works, you can even prove the maximum number of open file handles that your code can accumulate is fixed and pretty small. But why rely on that? Just use a with statement, or call close.
However, none of those problems are related to your current error. So, while you have to fix them too, don't expect fixing one of them to make the first problem go away.

Full variant:
import os
def find_get(path):
files = {}
for file in os.listdir(path):
if os.path.isfile(os.path.join(path,file)):
with open(os.path.join(path,file), "r") as data:
files[file] = data.read()
return files
print(find_get("filex"))
Output:
{'1.txt': 'dsad', '2.txt': 'fsdfs'}
After the you could generate one file from that content, etc.
Key-thing:
os.listdir return a list of files without full path, so you need to concatenate initial path with fount item to operate.
there could be ideally used dicts :)
os.listdir return files and folders, so you need to check if list item is really file

You should check if the file is actually file and not a folder, since you can't open folders for reading. Also, you can't just open a relative path file, since it is under a folder, so you should get the correct path with os.path.join. Check below:
import os
def find_get(folder):
for file in os.listdir(folder):
if not os.path.isfile(file):
continue # skip other directories
f = open(os.path.join(folder, file), 'r')
for line in f:
print line

How to open a list of files in Python

I'm reading data file (text), and generating a number of reports, each one is written to a different output file (also text). I'm opening them the long way:
fP = open('file1','w')
invP = open('inventory','w')
orderP = open('orders','w')
... and so on, with a corresponding group of close() lines at the end.
If I could open them with a for loop, using a list of fP names and file names, I could guarantee closing the same files.
I tried using a dictionary of fp:filename, but that [obviously] didn't work, because either the fP variable is undefined, or a string 'fP' isn't a good file object name.
Since these are output files, I probably don't need to check for open errors - if I can't open one or more, I can't go on anyway.
Is there any way to open a group of files (not more than 10 or so) from a list of names, in a loop?

Good news! Python 3.3 brings in a standard safe way to do this:
contextlib.ExitStack
From the docs:
Each instance maintains a stack of registered callbacks that are called in reverse order when the instance is closed.
(...)
Since registered callbacks are invoked in the reverse order of registration, this ends up behaving as if multiple nested with statements had been used with the registered set of callbacks.
Here's an example how to use it:
from contextlib import ExitStack
with ExitStack() as stack:
files = [
stack.enter_context(open(filename))
for filename in filenames
]
# ... use files ...
When the code leaves the with statement, all files that have already been opened will be closed.
This way you also know that if 2 files get opened and then third file fails to open, the two already-opened files will be closed correctly. Also if an exception is raised anytime inside the with block, you'll see correct cleanup.

Yes, you can use a list comprehension:
filenames = ['file1.txt', 'file2.txt', 'file3.txt'...]
filedata = {filename: open(filename, 'w') for filename in filenames}
Now, all of the opened instances are saved in filedata, assigned to the name of the file.
To close them:
for file in filedata.values():
file.close()

Since you are saying there are many data files.Instead of entering filenames manually into a list.You can get the filenames into a list with this.
from os import listdir
from os.path import isfile, join
files_in_dir = [ f for f in listdir('/home/cam/Desktop') if isfile(join('/home/cam/Desktop',f)) ]
Now you can
for file in files_in_dir:
with open(file, 'w') as f:
f.do_something

Use the with keyword to guarantee that opened files (and other similar resources, known as "context managers") are closed:
with open(file_path, 'w') as output_file:
output_file.write('whatever')
Upon exiting the with block, the file will be properly closed -- even if an exception occurs.
You could easily loop over a list of paths to the desired files:
files = ['fp1.txt', 'inventory', 'orders']
for file in files:
with open(file, 'w') as current_file:
current_file.do_some_stuff()

You can open as many files as you want and keep them in a list to close them later:
fds = [open(path, 'w') for path in paths]
# ... do stuff with files
# close files
for fd in fds:
fd.close()
Or you could use a dictionary for better readability:
# map each file name to a file descriptor
files = {path: open(path, 'w') for path in paths}
# use file descriptors through the mapping
files['inventory'].write("Something")
# close files
for path in files:
files[path].close()

Both answers above are good if you know or define ahead of time the list of files you will want to create. But, in case you want a more generic solution, you can build that list just in time, use your OS to create empty files on disk (this is done different ways depending on the OS you are), then create the list of files interactively this way:
import os
working_folder = input("Enter full path for the working folder/directory: ")
os.chdir(working_folder)
filenames_list = os.listdir()
#you can filter too, if you need so:
#filenames_list = [filename for filename in os.listdir() if '.txt' in filename]
#then you can do as Reut Sharabani and A.J. suggest above and make a list of file descriptors
file_descriptors = [open(filename, 'w') for filename in filenames_list]
#or a dictionary as Reut Sharabani suggests (I liked that one Reut :)
#do whatever you need to do with all those files already opened for writing
#then close the files
for fd in file_descriptors:
fd.close()
It is ok to use "with"; as some suggest, if you work with only one file at the time (from start to finish), but if you want to work with all the files at the same time, it is better a list or dictionary of file descriptors.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.