Extracting inner file from zip file with python - python

I am able to extract the inner file, but it extracts the entire chain.
Suppose the following file structure
v a.zip
v folder1
v folder2
> inner.txt
and suppose I want to extract inner.txt to some folder target.
Currently what happens when I try to do this is that I end up extracting folder1/folder2/inner.txt to target. Is it possible to extract the single file instead of the entire chain of directories? So that when target is opened, the only thing inside is inner.txt.
EDIT:
Using python zip module to unzip files and extract only the inner files to the desired location.

You should use the -j (junk paths (do not make directories)) modifier (old v5.52 has it). Here's the full list: [DIE.Linux]: unzip(1) - Linux man page, or you could simply run (${PATH_TO}/)unzip in the terminal, and it will output the argument list.
Considering that you want to extract the file in a folder called target, use the command (you may need to specify the path to unzip):
"unzip" -j "a.zip" -d "target" "folder1/folder2/inner.txt"
Output (Win, but for Nix it's the same thing):
(py35x64_test) c:\Work\Dev\StackOverflow\q047439536>"unzip" -j "a.zip" -d "target" "folder1/folder2/inner.txt"
Archive: a.zip
inflating: target/inner.txt
Output (without -j):
(py35x64_test) c:\Work\Dev\StackOverflow\q047439536>"unzip" "a.zip" -d "target" "folder1/folder2/inner.txt"
Archive: a.zip
inflating: target/folder1/folder2/inner.txt
Or, since you mentioned Python,
code00.py:
import os
from zipfile import ZipFile
def extract_without_folder(arc_name, full_item_name, folder):
with ZipFile(arc_name) as zf:
file_data = zf.read(full_item_name)
with open(os.path.join(folder, os.path.basename(full_item_name)), "wb") as fout:
fout.write(file_data)
if __name__ == "__main__":
extract_without_folder("a.zip", "folder1/folder2/inner.txt", "target")

The zip doesn't have a folder structure in the same way as on the filesystem - each file has a name that is its entire path.
You'll want to use a method that allows you to read the file contents (such as zipfile.open or zipfile.read), extract the part of the filename you actually want to use, and save the file contents to that file yourself.

Related

make directory and copy file on the go , python equivalent of cp -n in bash

I have a json file which i am parsing and it gives me some paths like - abc1/xyz2/file1.txt
I have to copy this file to one other location(for eg- /scratch/userdid/pp)
I know there is a bash equivalent of this
cp -n file.txt --parents /scratch/userid/pp
and i can use this with os.system() in python and it creates the directory structure and copy the file in one go.
This is summerized script
#!/usr/bin/python
def parse_json():
//parse json file
def some():
#get a list and create dirs = list length
for i in len(list):
dir = TASK + str(i)
os.makedir(dir)
path=abc1/xyz2/file1.txt
os.system('cp -n path --parents /scratch/userid/pp')
This has to be done for several files and several times
i know this works, but i am looking for a more pythonic way(one liner may b) to do this
i tried
os.chdir(/scratch/userid/pp)
#split path to get folder and file
os.makedirs(path)
os.chdir(path)
shutil.copy(src, dest)
But there is a lot of makedirs and chdir involved for every file, as compared to one liner in bash
You can use shutil from python directly, without OS package
example:
from shutil import copyfile
source_file="/home/user/file.txt"
destinaton_file="/home/user/folder/file.txt"
copyfile(source_file, destinaton_file)
You can use subprocess(python 3+) or commands(python 2+) also to execute copy shell commands in python

Python tar.add files but omit parent directories

I am trying to create a tar file from a list of files stored in a text file, I have working code to create the tar, but I wish to start the archive from a certain directory (app and all subdirectories), and remove the parents directories. This is due to the software only opening the file from a certain directory.
package.list files are as below:
app\myFile
app\myDir\myFile
app\myDir\myFile2
If I omit the path in restore.add, it cannot find the files due to my program running from elsewhere. How do I tell the tar to start at a particular directory, or to add the files, but maintain the directory structure it got from the text file, e.g starting with app not all the parent dirs
My objective is to do this tar cf restore.tar -T package.list but with Python on Windows.
I have tried basename from here: How to compress a tar file in a tar.gz without directory?, this strips out ALL the directories.
I have also tried using arcname='app' in the .add method, however this gives some weird results by breaking the directory structure and renames loads of files to app
path = foo + '\\' + bar
file = open(path + '\\package.list', 'r')
restore = tarfile.open(path + '\\restore.tar', 'w')
for line in file:
restore.add(path + '\\' + line.strip())
restore.close()
file.close()
Using Python 2.7
You can use 2nd argument for TarFile.add, it specified the name inside the archive.
So assuming every path is sane something like this would work:
import tarfile
prefix = "some_dir/"
archive_path = "inside_dir/file.txt"
with tarfile.open("test.tar", "w") as tar:
tar.add(prefix+archive_path, archive_path)
Usage:
> cat some_dir/inside_dir/file.txt
test
> python2 test_tar.py
> tar --list -f ./test.tar
inside_dir/file.txt
In production, i'd advise to use appropriate module for path handling to make sure every slash and backslash is in right place.

Is it possible to extract single file from tar bundle in python

I need to fetch a couple of files from a huge svn repo. Whole repo takes almost an hour to be fetched. Files I am looking for are part of tar bundle.
Is it possible to fetch only those two files from tar bundle without extracting the whole bundle through Python Code?
If so, can anybody let me know how should I go about it?
It sounds like you have two parts to your question:
Fetching a single tar bundle from the SVN repo, without the rest of the repo's files.
Using Python to extract two files from the retrieved bundle.
For the first part, I'll simply refer to this post on svn export and sparse checkouts.
For the second part, here is a solution for extracting the two files from the retrieved tarball:
import tarfile
files_i_want = ['path/to/file1','path/to/file2']
tar = tarfile.open("bundle.tar")
tar.extractall(members=[x for x in tar.getmembers() if x.name in files_i_want])
Here is one way to get a tar file from svn and extract one file from it all:
import tarfile
from subprocess import check_output
# Capture the tar file from subversion
tmp='/home/me/tempfile.tar'
open(tmp, 'wb').write(check_output(["svn", "cat", "svn://url/some.tar"]))
# Extract the file we want, saving to current directory
tarfile.open(tmp).extract('dir1/fname.ext', path='dir2')
where 'dir1/fname.ext' is the full path to the file that you want within the tar archive. It will be saved in 'dir2/dir1/fname.ext'. If you omit the path argument, it will be saved in 'dir1/fname.ext' under the current directory.
The above can be understood as follows. On a normal shell command line, svn cat url tells subversion to send the file defined by url to stdout (see svn help cat for more info). url can be any type of url that svn understands such as svn://..., svn+ssh://..., or file://.... We run this command under python control using the subprocess module. To do this the svn cat url command is broken up into a list: ["svn", "cat", "url"]. The output from this svn command is saved to a local file defined by the tmp variable. We then use the tarfile module to extract the file you want.
Alternatively, you could use the extractfile method to capture the file data to a python variable:
handle = t.extractfile('dir1/fname.ext')
print handle.readlines() # show file contents
According to the documentation, tarfile should accept a subprocess's stdout as a file handle. This would simplify the code and eliminate the need to save the tar file locally. However, due to a bug, Issue 10436, that will not work.
Perhaps you want something like this?
#!/usr/local/cpython-3.3/bin/python
import tarfile as tarfile_mod
def main():
tarfile = tarfile_mod.TarFile('tar-archive.tar', 'r')
if False:
file_ = tarfile.extractfile('etc/protocols')
print(file_.read())
else:
tarfile.extract('etc/protocols')
tarfile.close()
main()

How to unzip file in Python on all OSes?

Is there a simple Python function that would allow unzipping a .zip file like so?:
unzip(ZipSource, DestinationDirectory)
I need the solution to act the same on Windows, Mac and Linux: always produce a file if the zip is a file, directory if the zip is a directory, and directory if the zip is multiple files; always inside, not at, the given destination directory
How do I unzip a file in Python?
Use the zipfile module in the standard library:
import zipfile,os.path
def unzip(source_filename, dest_dir):
with zipfile.ZipFile(source_filename) as zf:
for member in zf.infolist():
# Path traversal defense copied from
# http://hg.python.org/cpython/file/tip/Lib/http/server.py#l789
words = member.filename.split('/')
path = dest_dir
for word in words[:-1]:
while True:
drive, word = os.path.splitdrive(word)
head, word = os.path.split(word)
if not drive:
break
if word in (os.curdir, os.pardir, ''):
continue
path = os.path.join(path, word)
zf.extract(member, path)
Note that using extractall would be a lot shorter, but that method does not protect against path traversal vulnerabilities before Python 2.7.4. If you can guarantee that your code runs on recent versions of Python.
Python 3.x use -e argument, not -h.. such as:
python -m zipfile -e compressedfile.zip c:\output_folder
arguments are as follows..
zipfile.py -l zipfile.zip # Show listing of a zipfile
zipfile.py -t zipfile.zip # Test if a zipfile is valid
zipfile.py -e zipfile.zip target # Extract zipfile into target dir
zipfile.py -c zipfile.zip src ... # Create zipfile from sources

How to make SCons not include the base dir in zip files?

SCons provides a Zip builder to produce zip files from groups of files.
For example, suppose we have a folder foo that looks like this:
foo/
foo/blah.txt
and we create the zip file foo.zip from a folder foo:
env.Zip('foo.zip', 'foo/')
This produces a zip file:
$ unzip -l foo.zip
Archive: foo.zip
foo/
foo/foo.txt
However, suppose we are using a VariantDir of bar, which contains foo:
bar/
bar/foo/
bar/foo/foo.txt
Because we are in a VariantDir, we still use the same command to create the zip file, even though it has slightly different effects:
env.Zip('foo.zip', 'foo/')
This produces the zip file:
$ unzip -l bar/foo.zip
Archive: bar/foo.zip
bar/foo/
bar/foo/foo.txt
The problem is extra bar/ prefix for each of the files within the zip. If this was not SCons, the simple solution would be to cd into bar and call zip from within there with something like cd bar; zip -r foo.zip foo/. However, this is weird/difficult with SCons, and at any rate seems very un-SCons-like. Is there a better solution?
You can create a SCons Builder which accomplishes this task. We can use the standard Python zipfile to make the zip files. We take advantage of zipfile.write, which allows us to specify a file to add, as well as what it should be called within the zip:
zf.write('foo/bar', 'bar') # save foo/bar as bar
To get the right paths, we use os.path.relpath with the path of the base file to find the path to the overall file.
Finally, we use os.walk to walk through contents of directories that we want to add, and call the previous two functions to add them, correctly, to the final zip.
import os.path
import zipfile
def zipbetter(target, source, env):
# Open the zip file with appending, so multiple calls will add more files
zf = zipfile.ZipFile(str(target[0]), 'a', zipfile.ZIP_DEFLATED)
for s in source:
# Find the path of the base file
basedir = os.path.dirname(str(s))
if s.isdir():
# If the source is a directory, walk through its files
for dirpath, dirnames, filenames in os.walk(str(s)):
for fname in filenames:
path = os.path.join(dirpath, fname)
if os.path.isfile(path):
# If this is a file, write it with its relative path
zf.write(path, os.path.relpath(path, basedir))
else:
# Otherwise, just write it to the file
flatname = os.path.basename(str(s))
zf.write(str(s), flatname)
zf.close()
# Make a builder using the zipbetter function, that takes SCons files
zipbetter_bld = Builder(action = zipbetter,
target_factory = SCons.Node.FS.default_fs.Entry,
source_factory = SCons.Node.FS.default_fs.Entry)
# Add the builder to the environment
env.Append(BUILDERS = {'ZipBetter' : zipbetter_bld})
Call it just like the normal SCons Zip:
env.ZipBetter('foo.zip', 'foo/')
Using construct variable ‘ZIPROOT’
Directories can indeed be a challenge with SCons. There are a couple of different ways you can specify the directory of the files to include in the Zip() file as follows, assuming the files are 'in project':
relative to the root project dir, prepending the path with '#'. This option will include the complete directory, like you mentioned
relative to a particular SConscript file. Either specify files in the same directory, or specify a subdirectory relative to the SConscript.
Sounds like you want the second option. Do you have a SConscript file in the same dir that you want to zip, foo in your case? This should work even for the variant_dir.

Categories