Unzipping files in Python

Unzipping files in Python - python

I read through the zipfile documentation, but couldn't understand how to unzip a file, only how to zip a file. How do I unzip all the contents of a zip file into the same directory?

import zipfile
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
zip_ref.extractall(directory_to_extract_to)
That's pretty much it!

If you are using Python 3.2 or later:
import zipfile
with zipfile.ZipFile("file.zip","r") as zip_ref:
zip_ref.extractall("targetdir")
You dont need to use the close or try/catch with this as it uses the
context manager construction.

zipfile is a somewhat low-level library. Unless you need the specifics that it provides, you can get away with shutil's higher-level functions make_archive and unpack_archive.
make_archive is already described in this answer. As for unpack_archive:
import shutil
shutil.unpack_archive(filename, extract_dir)
unpack_archive detects the compression format automatically from the "extension" of filename (.zip, .tar.gz, etc), and so does make_archive. Also, filename and extract_dir can be any path-like objects (e.g. pathlib.Path instances) since Python 3.7.

Use the extractall method, if you're using Python 2.6+
zip = ZipFile('file.zip')
zip.extractall()

You can also import only ZipFile:
from zipfile import ZipFile
zf = ZipFile('path_to_file/file.zip', 'r')
zf.extractall('path_to_extract_folder')
zf.close()
Works in Python 2 and Python 3.

try this :
import zipfile
def un_zipFiles(path):
files=os.listdir(path)
for file in files:
if file.endswith('.zip'):
filePath=path+'/'+file
zip_file = zipfile.ZipFile(filePath)
for names in zip_file.namelist():
zip_file.extract(names,path)
zip_file.close()
path : unzip file's path

If you want to do it in shell, instead of writing code.
python3 -m zipfile -e myfiles.zip myfiles/
myfiles.zip is the zip archive and myfiles is the path to extract the files.

from zipfile import ZipFile
ZipFile("YOURZIP.zip").extractall("YOUR_DESTINATION_DIRECTORY")
The directory where you will extract your files doesn't need to exist before, you name it at this moment
YOURZIP.zip is the name of the zip if your project is in the same directory.
If not, use the PATH i.e : C://....//YOURZIP.zip
Think to escape the / by an other / in the PATH
If you have a permission denied try to launch your ide (i.e: Anaconda) as administrator
YOUR_DESTINATION_DIRECTORY will be created in the same directory than your project

import os
zip_file_path = "C:\AA\BB"
file_list = os.listdir(path)
abs_path = []
for a in file_list:
x = zip_file_path+'\\'+a
print x
abs_path.append(x)
for f in abs_path:
zip=zipfile.ZipFile(f)
zip.extractall(zip_file_path)
This does not contain validation for the file if its not zip. If the folder contains non .zip file it will fail.

Related

How to use relative paths

I recently made a small program (.py) that takes data from another file (.txt) in the same folder.
Python file path: "C:\Users\User\Desktop\Folder\pythonfile.py"
Text file path: "C:\Users\User\Desktop\Folder\textfile.txt"
So I wrote: with open(r'C:\Users\User\Desktop\Folder\textfile.txt', encoding='utf8') as file
And it works, but now I want to replace this path with a relative path (because every time I move the folder I must change the path in the program) and I don't know how... or if it is possible...
I hope you can suggest something... (I would like it to be simple and I also forgot to say that I have windows 11)

Using os, you can do something like
import os
directory = os.path.dirname(__file__)
myFile = with open(os.path.join(directory, 'textfile.txt'), encoding='utf8') as file

If you pass a relative folder to the open function it will search for it in the loacl directory:
with open('textfile.txt', encoding='utf8') as f:
pass
This unfortunately will only work if you launch your script form its folder. If you want to be more generic and you want it to work regardless of which folder you run from you can do it as well. You can get the path for the python file being launched via the __file__ build-in variable. The pathlib module then provides some helpfull functions to get the parent directory. Putting it all together you could do:
from pathlib import Path
with open(Path(__file__).parent / 'textfile.txt', encoding='utf8') as f:
pass

import os
directory = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(directory, 'textfile.txt')
with open(file_path, encoding='utf8') as file:
# and then the code here

Remove auto-generated __MACOSX folder from inside a zip file in Python

I have zip files uploaded by clients through a web server that sometimes contain pesky __MACOSX directories inside that gum things up. How can I remove these?
I thought of using ZipFile, but this answer says that isn't possible and gives this suggestion:
Read out the rest of the archive and write it to a new zip file.
How can I do this with ZipFile? Another Python based alternative like shutil or something similar would also be fine.

The examples below are designed to determine if a '__MACOSX' file is contained within a zip file. If this pesky exist then a new zip archive is created and all the files that are not __MACOSX files are written to this new archive. This code can be extended to include .ds_store files. Please let me if you need to delete the old zip file and replace it with the new clean zip file.
Hopefully, these answers help you solve your issue.
Example One
from zipfile import ZipFile
original_zip = ZipFile ('original.zip', 'r')
new_zip = ZipFile ('new_archve.zip', 'w')
for item in original_zip.infolist():
buffer = original_zip.read(item.filename)
if not str(item.filename).startswith('__MACOSX/'):
new_zip.writestr(item, buffer)
new_zip.close()
original_zip.close()
Example Two
def check_archive_for_bad_filename(file):
zip_file = ZipFile(file, 'r')
for filename in zip_file.namelist():
print(filename)
if filename.startswith('__MACOSX/'):
return True
def remove_bad_filename_from_archive(original_file, temporary_file):
zip_file = ZipFile(original_file, 'r')
for item in zip_file.namelist():
buffer = zip_file.read(item)
if not item.startswith('__MACOSX/'):
if not os.path.exists(temporary_file):
new_zip = ZipFile(temporary_file, 'w')
new_zip.writestr(item, buffer)
new_zip.close()
else:
append_zip = ZipFile(temporary_file, 'a')
append_zip.writestr(item, buffer)
append_zip.close()
zip_file.close()
archive_filename = 'old.zip'
temp_filename = 'new.zip'
results = check_archive_for_bad_filename(archive_filename)
if results:
print('Removing MACOSX file from archive.')
remove_bad_filename_from_archive(archive_filename, temp_filename)
else:
print('No MACOSX file in archive.')

The idea would be to use ZipFile to extract the contents into some defined folder then remove the __MACOSX entry (os.rmdir, os.remove) and then compress it again.
Depending if you have zip command on your OS you might be able to skip the re-compressing part. You could as well control this command from python by using os.system or subprocess module.

Get the list of all the files with ".txt" after changing the current working directory

I changed the current working directory and then I want to list all the files in that directory with extension .txt. How can I achieve this?
# !/usr/bin/python
import os
path = "/home/pawn/Desktop/projects_files"
os.chdir(path)
## checking working directory
print ("working directory "+ os.getcwd()+'\n')
Now how to list all the files in the current directory (projects_files) with extension .txt?

Use glob, which is a wildcard search for files, and you don't need to change the directory either:
import glob
for f in glob.iglob('/home/pawn/Desktop/project_files/*.txt'):
print(f)

You can use glob. With glob you can use wildcards as you would do in your system command line. For example:
import glob
print( glob.glob('*.txt') )

You would need to use glob in python.
It will help you to fetch the files of a particular format..
Usage
>>> import glob
>>> glob.glob('*.gif')
Check the documentation for more details

if you don't want to import glob as suggested in the other good answers here, you can also do it with the same os moudle:
path = "/home/pawn/Desktop/projects_files"
[f for f in os.listdir(path) if f.endswith('.txt')]

Open and read sequential XML files with unknown files names in Python

I wish to read incoming XML files that have no specific name (e.g. date/time naming) to extract values and perform particular tasks.
I can step through the files but am having trouble opening them and reading them.
What I have that works is:-
import os
path = 'directory/'
listing = os.listdir(path)
for infile in listing:
print infile
But when I add the following to try and read the files it errors saying No such file or directory.
file = open(infile,'r')
Thank you.

You need to provide the path to file too:
file = open(os.path.join(path,infile),'r')

os.listdir provides the base names, not absolute paths. You'll need to do os.path.join(path, infile) instead of just infile (that may still be a relative path, which should be fine; if you needed an absolute path, you'd feed it through os.path.abspath).

As an alternative to joining the directory path and filename as in the other answers, you can use the glob module. This can also be handy when your directories might contain other (non-XML) files that you don't want to process:
import glob
for infile in glob.glob('directory/*.xml'):
print infile

You have to join the directory path and filename, using
os.path.join(path, infile)
Also use the path without the / :
path = 'directory'
Something like this (Not optimized, just a small change in your code):
import os
path = 'directory'
listing = os.listdir(path)
for infile in listing:
print infile
file_abs = os.path.join(path, infile)
file = open(file_abs,'r')

create zip of complete directory using zipfile python module

zip = zipfile.ZipFile(destination+ff_name,"w")
zip.write(source)
zip.close()
Above is the code that I am using, and here "source" is the path of the directory. But when I run this code it just zips the source folder and not the files and and folders contained in it. I want it to compress the source folder recursively. Using tarfile module I can do this without passing any additional information.

The standard os.path.walk() function will likely be of great use for this.
Alternatively, reading the tarfile module to see how it does its work will certainly be of benefit. Indeed, looking at how pieces of the standard library were written was an invaluable part of my learning Python.

I haven't tested this exactly, but it's something similar to what I use.
zip = zipfile.ZipFile(destination+ff_name, 'w', zipfile.ZIP_DEFLATED)
rootlen = len(source) + 1
for base, dirs, files in os.walk(source):
for file in files:
fn = os.path.join(base, file)
zip.write(fn, fn[rootlen:])
This example is from here:
http://bitbucket.org/jgrigonis/mathfacts/src/ff57afdf07a1/setupmac.py

I'd like to add a "new" python 2.7 feature to this topic: ZipFile can be used as a context manager and therefore you can do things like this:
with zipfile.ZipFile(my_file, 'w') as myzip:
rootlen = len(xxx) #use the sub-part of path which you want to keep in your zip file
for base, dirs, files in os.walk(pfad):
for ifile in files:
fn = os.path.join(base, ifile)
myzip.write(fn, fn[rootlen:])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unzipping files in Python - python

I read through the zipfile documentation, but couldn't understand how to unzip a file, only how to zip a file. How do I unzip all the contents of a zip file into the same directory?

import zipfile with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref: zip_ref.extractall(directory_to_extract_to) That's pretty much it!

If you are using Python 3.2 or later: import zipfile with zipfile.ZipFile("file.zip","r") as zip_ref: zip_ref.extractall("targetdir") You dont need to use the close or try/catch with this as it uses the context manager construction.

Use the extractall method, if you're using Python 2.6+ zip = ZipFile('file.zip') zip.extractall()

You can also import only ZipFile: from zipfile import ZipFile zf = ZipFile('path_to_file/file.zip', 'r') zf.extractall('path_to_extract_folder') zf.close() Works in Python 2 and Python 3.

try this : import zipfile def un_zipFiles(path): files=os.listdir(path) for file in files: if file.endswith('.zip'): filePath=path+'/'+file zip_file = zipfile.ZipFile(filePath) for names in zip_file.namelist(): zip_file.extract(names,path) zip_file.close() path : unzip file's path

If you want to do it in shell, instead of writing code. python3 -m zipfile -e myfiles.zip myfiles/ myfiles.zip is the zip archive and myfiles is the path to extract the files.

Related

How to use relative paths

Remove auto-generated __MACOSX folder from inside a zip file in Python

Get the list of all the files with ".txt" after changing the current working directory

Open and read sequential XML files with unknown files names in Python

create zip of complete directory using zipfile python module

Categories

Resources