How to unzip file in Python on all OSes? - python

Is there a simple Python function that would allow unzipping a .zip file like so?:
unzip(ZipSource, DestinationDirectory)
I need the solution to act the same on Windows, Mac and Linux: always produce a file if the zip is a file, directory if the zip is a directory, and directory if the zip is multiple files; always inside, not at, the given destination directory
How do I unzip a file in Python?

Use the zipfile module in the standard library:
import zipfile,os.path
def unzip(source_filename, dest_dir):
with zipfile.ZipFile(source_filename) as zf:
for member in zf.infolist():
# Path traversal defense copied from
# http://hg.python.org/cpython/file/tip/Lib/http/server.py#l789
words = member.filename.split('/')
path = dest_dir
for word in words[:-1]:
while True:
drive, word = os.path.splitdrive(word)
head, word = os.path.split(word)
if not drive:
break
if word in (os.curdir, os.pardir, ''):
continue
path = os.path.join(path, word)
zf.extract(member, path)
Note that using extractall would be a lot shorter, but that method does not protect against path traversal vulnerabilities before Python 2.7.4. If you can guarantee that your code runs on recent versions of Python.

Python 3.x use -e argument, not -h.. such as:
python -m zipfile -e compressedfile.zip c:\output_folder
arguments are as follows..
zipfile.py -l zipfile.zip # Show listing of a zipfile
zipfile.py -t zipfile.zip # Test if a zipfile is valid
zipfile.py -e zipfile.zip target # Extract zipfile into target dir
zipfile.py -c zipfile.zip src ... # Create zipfile from sources

Related

Using regular expression in subprocess module

I am trying to automate a particular process using subprocess module in python. For example, if I have a set of files that start with a word plot and then 8 digits of numbers. I want to copy them using the subprocess run command.
copyfiles = subprocess.run(['cp', '-r', 'plot*', 'dest'])
When I run the code, the above code returns an error "cp: plot: No such file or directory*"
How can I execute such commands using subprocess module? If I give a full filename, the above code works without any errors.
I have found a useful but probably not the best efficent code fragment from this post, where an additional python library is used (shlex), and what I propose is to use os.listdir method to iterate over folder that you need to copy files, save in a list file_list and filter using a lambda function to extract specific file names, define a command sentence as string and use subproccess.Popen() to execute the child process to copy the files on to a destination folder.
import shlex
import os
import subprocess
# chage directory where your files are located at
os.chdir('C:/example_folder/')
# you can use os.getcwd function to check the current working directory
print(os.getcwd)
# extract file names
file_list = os.listdir()
file_list = list(filter(lambda file_name: file_name.startswith('plot'), file_list))
# command sentence
cmd = 'find test_folder -iname %s -exec cp {} dest_folder ;'
for file in file_list:
subprocess.Popen(shlex.split(cmd % file))

make directory and copy file on the go , python equivalent of cp -n in bash

I have a json file which i am parsing and it gives me some paths like - abc1/xyz2/file1.txt
I have to copy this file to one other location(for eg- /scratch/userdid/pp)
I know there is a bash equivalent of this
cp -n file.txt --parents /scratch/userid/pp
and i can use this with os.system() in python and it creates the directory structure and copy the file in one go.
This is summerized script
#!/usr/bin/python
def parse_json():
//parse json file
def some():
#get a list and create dirs = list length
for i in len(list):
dir = TASK + str(i)
os.makedir(dir)
path=abc1/xyz2/file1.txt
os.system('cp -n path --parents /scratch/userid/pp')
This has to be done for several files and several times
i know this works, but i am looking for a more pythonic way(one liner may b) to do this
i tried
os.chdir(/scratch/userid/pp)
#split path to get folder and file
os.makedirs(path)
os.chdir(path)
shutil.copy(src, dest)
But there is a lot of makedirs and chdir involved for every file, as compared to one liner in bash
You can use shutil from python directly, without OS package
example:
from shutil import copyfile
source_file="/home/user/file.txt"
destinaton_file="/home/user/folder/file.txt"
copyfile(source_file, destinaton_file)
You can use subprocess(python 3+) or commands(python 2+) also to execute copy shell commands in python

Python tar.add files but omit parent directories

I am trying to create a tar file from a list of files stored in a text file, I have working code to create the tar, but I wish to start the archive from a certain directory (app and all subdirectories), and remove the parents directories. This is due to the software only opening the file from a certain directory.
package.list files are as below:
app\myFile
app\myDir\myFile
app\myDir\myFile2
If I omit the path in restore.add, it cannot find the files due to my program running from elsewhere. How do I tell the tar to start at a particular directory, or to add the files, but maintain the directory structure it got from the text file, e.g starting with app not all the parent dirs
My objective is to do this tar cf restore.tar -T package.list but with Python on Windows.
I have tried basename from here: How to compress a tar file in a tar.gz without directory?, this strips out ALL the directories.
I have also tried using arcname='app' in the .add method, however this gives some weird results by breaking the directory structure and renames loads of files to app
path = foo + '\\' + bar
file = open(path + '\\package.list', 'r')
restore = tarfile.open(path + '\\restore.tar', 'w')
for line in file:
restore.add(path + '\\' + line.strip())
restore.close()
file.close()
Using Python 2.7
You can use 2nd argument for TarFile.add, it specified the name inside the archive.
So assuming every path is sane something like this would work:
import tarfile
prefix = "some_dir/"
archive_path = "inside_dir/file.txt"
with tarfile.open("test.tar", "w") as tar:
tar.add(prefix+archive_path, archive_path)
Usage:
> cat some_dir/inside_dir/file.txt
test
> python2 test_tar.py
> tar --list -f ./test.tar
inside_dir/file.txt
In production, i'd advise to use appropriate module for path handling to make sure every slash and backslash is in right place.

Python os.walk/fnmatch.filter does not find file when in file's current directory

I'm trying to recursively walk through a directory and find files that match a certain pattern. The relevant snippet of my code is:
import sys, os, xlrd, fnmatch
for root, dirnames, filenames in os.walk('/myfilepath/'):
for dir in dirnames:
os.chdir(os.path.join(root, dir))
for filename in fnmatch.filter(filenames, 'filepattern*'):
print os.path.abspath(filename)
print os.getcwd()
print filename
wb = xlrd.open_workbook(filename)
My print lines demonstrate that os.getcwd() is equal to the directory of filename, so to me it seems like the file should be found, but IOError: [Errno 2] No such file or directory is thrown for wb = xlrd.open_workbook(filename) when the first pattern is matched.
The dirnames returned from os.walk don't represent the directories in which filenames exist. Rather, root represents the directory in which filenames exist. For your application you can effectively ignore the directories return.
Try something like this:
import os
import fnmatch
for root, _, filenames in os.walk('/tmp'):
print root, filenames
for filename in fnmatch.filter(filenames, '*.py'):
filename = os.path.join(root, filename)
# `filename` now unambiguously refers to a file that
# exists. Open it, delete it, xlrd.open it, whatever.
# For example:
if os.access(filename, os.R_OK):
print "%s can be accessed" % filename
else:
print "%s cannot be accessed" % filename
Aside: It is probably not safe to call os.chdir() inside an os.walk() iteration. This is especially true if the parameter to os.walk() is relative.
Like linux find in Python?
A more powerfull techneque to find files exists, use glob which is like Linux find and it uses the very powerful pathlib to handle the different formats of paths.
Works with Python 3.5 or higher, I'm using version 3.7 and it works on old mac, friendly Linux and even on spy Windows 10.
Clear example (no partial scraps of code permitted).
import glob
import subprocess
from pathlib import PurePath
# Return a list of matching files
def find_files(start_dir, pattern, recurse=True):
patt = start_dir.strip() + '/**/' + pattern
files = []
for f in glob.iglob(patt, recursive=recurse):
files.append(PurePath(f))
return files
When recurse is True it finds the files matching the pattern in the current and child directories.
When recurse is False, just the local current directory is searched.
glob.iglob() is an itererater version of glob.glob() and both output the same, except the iterator version does not accumulate the results untill needed.
USAGE EXAMPLE from inside a git project searching for source files using Python
cmd = ['git', 'rev-parse', '--show-toplevel']
result = subprocess.run(cmd, stdout=subprocess.PIPE)
gitroot = result.stdout.decode('utf-8')
print(gitroot)
ret = find_files(gitroot, "os.c")
print(ret)
That found the git root of a git project, then from there searched for os.c or any pattern you like.
Yes it executed the git command:
git rev-parse --show-toplevel
That is powerful, so that means you can search and act on files.
Hope that helps - see the output and notice the path formats mixed up.
That is bacause Windows is still confused and Linux (the Posix standard) is not.
OUTPUT
C:\Users\me\Python37\python.exe C:/Users/me/7pych/git_find_file.py *.c
C:/Users/me/git_projects/demo
[PureWindowsPath('C:/Users/me/git_projects/demo/device/src/os/os.c')]
Process finished with exit code 0
To obtain just the path name or just the file name use split(). See this example:
cross-platform splitting of path in python

Unzipping files in Python

I read through the zipfile documentation, but couldn't understand how to unzip a file, only how to zip a file. How do I unzip all the contents of a zip file into the same directory?
import zipfile
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
zip_ref.extractall(directory_to_extract_to)
That's pretty much it!
If you are using Python 3.2 or later:
import zipfile
with zipfile.ZipFile("file.zip","r") as zip_ref:
zip_ref.extractall("targetdir")
You dont need to use the close or try/catch with this as it uses the
context manager construction.
zipfile is a somewhat low-level library. Unless you need the specifics that it provides, you can get away with shutil's higher-level functions make_archive and unpack_archive.
make_archive is already described in this answer. As for unpack_archive:
import shutil
shutil.unpack_archive(filename, extract_dir)
unpack_archive detects the compression format automatically from the "extension" of filename (.zip, .tar.gz, etc), and so does make_archive. Also, filename and extract_dir can be any path-like objects (e.g. pathlib.Path instances) since Python 3.7.
Use the extractall method, if you're using Python 2.6+
zip = ZipFile('file.zip')
zip.extractall()
You can also import only ZipFile:
from zipfile import ZipFile
zf = ZipFile('path_to_file/file.zip', 'r')
zf.extractall('path_to_extract_folder')
zf.close()
Works in Python 2 and Python 3.
try this :
import zipfile
def un_zipFiles(path):
files=os.listdir(path)
for file in files:
if file.endswith('.zip'):
filePath=path+'/'+file
zip_file = zipfile.ZipFile(filePath)
for names in zip_file.namelist():
zip_file.extract(names,path)
zip_file.close()
path : unzip file's path
If you want to do it in shell, instead of writing code.
python3 -m zipfile -e myfiles.zip myfiles/
myfiles.zip is the zip archive and myfiles is the path to extract the files.
from zipfile import ZipFile
ZipFile("YOURZIP.zip").extractall("YOUR_DESTINATION_DIRECTORY")
The directory where you will extract your files doesn't need to exist before, you name it at this moment
YOURZIP.zip is the name of the zip if your project is in the same directory.
If not, use the PATH i.e : C://....//YOURZIP.zip
Think to escape the / by an other / in the PATH
If you have a permission denied try to launch your ide (i.e: Anaconda) as administrator
YOUR_DESTINATION_DIRECTORY will be created in the same directory than your project
import os
zip_file_path = "C:\AA\BB"
file_list = os.listdir(path)
abs_path = []
for a in file_list:
x = zip_file_path+'\\'+a
print x
abs_path.append(x)
for f in abs_path:
zip=zipfile.ZipFile(f)
zip.extractall(zip_file_path)
This does not contain validation for the file if its not zip. If the folder contains non .zip file it will fail.

Categories