How to make SCons not include the base dir in zip files? - python

SCons provides a Zip builder to produce zip files from groups of files.
For example, suppose we have a folder foo that looks like this:
foo/
foo/blah.txt
and we create the zip file foo.zip from a folder foo:
env.Zip('foo.zip', 'foo/')
This produces a zip file:
$ unzip -l foo.zip
Archive: foo.zip
foo/
foo/foo.txt
However, suppose we are using a VariantDir of bar, which contains foo:
bar/
bar/foo/
bar/foo/foo.txt
Because we are in a VariantDir, we still use the same command to create the zip file, even though it has slightly different effects:
env.Zip('foo.zip', 'foo/')
This produces the zip file:
$ unzip -l bar/foo.zip
Archive: bar/foo.zip
bar/foo/
bar/foo/foo.txt
The problem is extra bar/ prefix for each of the files within the zip. If this was not SCons, the simple solution would be to cd into bar and call zip from within there with something like cd bar; zip -r foo.zip foo/. However, this is weird/difficult with SCons, and at any rate seems very un-SCons-like. Is there a better solution?

You can create a SCons Builder which accomplishes this task. We can use the standard Python zipfile to make the zip files. We take advantage of zipfile.write, which allows us to specify a file to add, as well as what it should be called within the zip:
zf.write('foo/bar', 'bar') # save foo/bar as bar
To get the right paths, we use os.path.relpath with the path of the base file to find the path to the overall file.
Finally, we use os.walk to walk through contents of directories that we want to add, and call the previous two functions to add them, correctly, to the final zip.
import os.path
import zipfile
def zipbetter(target, source, env):
# Open the zip file with appending, so multiple calls will add more files
zf = zipfile.ZipFile(str(target[0]), 'a', zipfile.ZIP_DEFLATED)
for s in source:
# Find the path of the base file
basedir = os.path.dirname(str(s))
if s.isdir():
# If the source is a directory, walk through its files
for dirpath, dirnames, filenames in os.walk(str(s)):
for fname in filenames:
path = os.path.join(dirpath, fname)
if os.path.isfile(path):
# If this is a file, write it with its relative path
zf.write(path, os.path.relpath(path, basedir))
else:
# Otherwise, just write it to the file
flatname = os.path.basename(str(s))
zf.write(str(s), flatname)
zf.close()
# Make a builder using the zipbetter function, that takes SCons files
zipbetter_bld = Builder(action = zipbetter,
target_factory = SCons.Node.FS.default_fs.Entry,
source_factory = SCons.Node.FS.default_fs.Entry)
# Add the builder to the environment
env.Append(BUILDERS = {'ZipBetter' : zipbetter_bld})
Call it just like the normal SCons Zip:
env.ZipBetter('foo.zip', 'foo/')

Using construct variable ‘ZIPROOT’

Directories can indeed be a challenge with SCons. There are a couple of different ways you can specify the directory of the files to include in the Zip() file as follows, assuming the files are 'in project':
relative to the root project dir, prepending the path with '#'. This option will include the complete directory, like you mentioned
relative to a particular SConscript file. Either specify files in the same directory, or specify a subdirectory relative to the SConscript.
Sounds like you want the second option. Do you have a SConscript file in the same dir that you want to zip, foo in your case? This should work even for the variant_dir.

Related

unzipping a file with Python and returning all the directories it creates

How can I unzip a .zip file with Python into some directory output_dir and fetch a list of all the directories made by the unzipping as a result? For example, if I have:
unzip('myzip.zip', 'outdir')
outdir is a directory that might have other files/directories in it. When I unzip myzip.zip into it, I'd like unzip to return all the directories made in outdir/ as a result of the zipping. Here is my code so far:
import zipfile
def unzip(zip_file, outdir):
"""
Unzip a given 'zip_file' into the output directory 'outdir'.
"""
zf = zipfile.ZipFile(zip_file, "r")
zf.extractall(outdir)
How can I make unzip return the dirs it creates in outdir? thanks.
Edit: the solution that makes most sense to me is to get ONLY the top-level directories in the zip file and then recursively walk through them which will guarantee that I get all the files made by the zip. Is this possible? The system specific behavior of namelist makes it virtually impossible to rely on
You can read the contents of the zip file with the namelist() method. Directories will have a trailing path separator:
>>> import zipfile
>>> zip = zipfile.ZipFile('test.zip')
>>> zip.namelist()
['dir2/', 'file1']
You can do this before or after extracting contents.
Depending on your operating environment, the result of namelist() may be limited to the top-level paths of the zip archive (e.g. Python on Linux) or may cover the full contents of the archive (e.g. IronPython on Windows).
The namelist() returns a complete listing of the zip archive contents, with directories marked with a trailing path separator. For instance, a zip archive of the following file structure:
./file1
./dir2
./dir2/dir21
./dir3
./dir3/file3
./dir3/dir31
./dir3/dir31/file31
results in the following list being returned by zipfile.ZipFile.namelist():
[ 'file1',
'dir2/',
'dir2/dir21/',
'dir3/',
'dir3/file3',
'dir3/dir31/',
'dir3/dir31/file31' ]
ZipFile.namelist will return a list of the names of the items in an archive. However, these names will only be the full names of files including their directory path. (A zip file can only contain files, not directories, so directories are implied by archive member names.) To determine the directories created, you need a list of every directory created implicitly by each file.
The dirs_in_zip() function below will do this and collect all dir names into a set.
import zipfile
import os
def parent_dirs(pathname, subdirs=None):
"""Return a set of all individual directories contained in a pathname
For example, if 'a/b/c.ext' is the path to the file 'c.ext':
a/b/c.ext -> set(['a','a/b'])
"""
if subdirs is None:
subdirs = set()
parent = os.path.dirname(pathname)
if parent:
subdirs.add(parent)
parent_dirs(parent, subdirs)
return subdirs
def dirs_in_zip(zf):
"""Return a list of directories that would be created by the ZipFile zf"""
alldirs = set()
for fn in zf.namelist():
alldirs.update(parent_dirs(fn))
return alldirs
zf = zipfile.ZipFile(zipfilename, 'r')
print(dirs_in_zip(zf))
Let it finish and then read the content of the directory - here is a good example of this.
Assuming no one else will be writing the target directory at the same time, walk the directory recursively prior to unzipping, then afterwards, and compare the results.

Directory is not being recognized in Python

I'm uploading a zipped folder that contains a folder of text files, but it's not detecting that the folder that is zipped up is a directory. I think it might have something to do with requiring an absolute path in the os.path.isdir call, but can't seem to figure out how to implement that.
zipped = zipfile.ZipFile(request.FILES['content'])
for libitem in zipped.namelist():
if libitem.startswith('__MACOSX/'):
continue
# If it's a directory, open it
if os.path.isdir(libitem):
print "You have hit a directory in the zip folder -- we must open it before continuing"
for item in os.listdir(libitem):
The file you've uploaded is a single zip file which is simply a container for other files and directories. All of the Python os.path functions operate on files on your local file system which means you must first extract the contents of your zip before you can use os.path or os.listdir.
Unfortunately it's not possible to determine from the ZipFile object whether an entry is for a file or directory.
A rewrite or your code which does an extract first may look something like this:
import tempfile
# Create a temporary directory into which we can extract zip contents.
tmpdir = tempfile.mkdtemp()
try:
zipped = zipfile.ZipFile(request.FILES['content'])
zipped.extractall(tmpdir)
# Walk through the extracted directory structure doing what you
# want with each file.
for (dirpath, dirnames, filenames) in os.walk(tmpdir):
# Look into subdirectories?
for dirname in dirnames:
full_dir_path = os.path.join(dirpath, dirname)
# Do stuff in this directory
for filename in filenames:
full_file_path = os.path.join(dirpath, filename)
# Do stuff with this file.
finally:
# ... Clean up temporary diretory recursively here.
Usually to make things handle relative paths etc when running scripts you'd want to use os.path.
It seems to me that you're reading from a Zipfile the items you've not actually unzipped it so why would you expect the file/dirs to exist?
Usually I'd print os.getcwd() to find out where I am and also use os.path.join to join with the root of the data directory, whether that is the same as the directory containing the script I can't tell. Using something like scriptdir = os.path.dirname(os.path.abspath(__file__)).
I'd expect you would have to do something like
libitempath = os.path.join(scriptdir, libitem)
if os.path.isdir(libitempath):
....
But I'm guessing at what you're doing as it's a little unclear for me.

Following a nested directory structure until the end

I have a some directories that contain some other directories which, at the lowest level, contain bunch of csv files such as (folder) a -> b -> c -> (csv files). There is usually only one folder at each level. When I process a directory how can I follow this structure until the end to get the csv files ? I was thinking maybe a recursive solution but I think there may be better ways to do this. I am using python. Hope I was clear.
The os package has a walk function that will do exactly what you need:
for current_path, directory, files in walk("/some/path"):
# current_path is the full path of the directory we are currently in
# directory is the name of the directory
# files is a list of file names in this directory
You can use os.path's to derive the full path to each file (if you need it).
Alternately, you might find the glob module to be of more use to you:
for csv_file in glob(/some/path/*/*.csv"):
# csv_file is the full path to the csv file.

Creating a relative symlink in python without using os.chdir()

Say I have a path to a file:
/path/to/some/directory/file.ext
In python, I'd like to create a symlink in the same directory as the file, that
points to the file. I'd like to end up with this:
/path/to/some/directory/symlink -> file.ext
I can do this fairly easily using os.chdir() to cd into the directory and
create the symlinks. But os.chdir() is not thread safe, so I'd like to avoid
using it. Assuming that the current working directory of the process is not
the directory with the file (os.getcwd() != '/path/to/some/directory'),
what's the best way to do this?
I guess I could create a busted link in whatever directory I'm in, then
move it to the directory with the file:
import os, shutil
os.symlink('file.ext', 'symlink')
shutil.move('symlink', '/path/to/some/directory/.')
Is there a better way to do this?
Note, I don't want to end up with is this:
/path/to/some/directory/symlink -> /path/to/some/directory/file.ext
You can also use os.path.relpath() so that you can use symlinks with relative paths. Say your script is in a directory foo/ and this directory has subdirectories src/ and dst/, and you want to create relative symlinks in dst/ to point to the files in src/. To do so, you can do:
import os
from glob import glob
for src_path in glob('src/*'):
os.symlink(
os.path.relpath(
src_path,
'dst/'
),
os.path.join('dst', os.path.basename(src_path))
)
Listing the contents of dst/ then shows:
1.txt -> ../src/1.txt
2.txt -> ../src/2.txt
Relative symlinks are useful for if you want to create a tarball of the whole foo directory tree, as I don't believe tar updates symlinks to point to the relative path inside of the generated tarball.
You could just set the second argument to the destination, like:
import os
os.symlink('file.ext', '/path/to/some/directory/symlink')
python function to create a relative symlink:
def relative_symlink(src, dst):
dir = os.path.dirname(dst)
Src = os.path.relpath(src, dir)
Dst = os.path.join(dir, os.path.basename(src))
return os.symlink(Src, Dst)
Nowadays, this can be accomplished using pathlib
from pathlib import Path
target = Path('../target.txt')
my_symlink = Path('symlink.txt')
my_symlink.symlink_to(target)
where target is a relative Path or str.

Running a python script on all the files in a directory

I have a Python script that reads through a text csv file and creates a playlist file. However I can only do one at a time, like:
python playlist.py foo.csv foolist.txt
However, I have a directory of files that need to be made into a playlist, with different names, and sometimes a different number of files.
So far I have looked at creating a txt file with a list of all the names of the file in the directory, then loop through each line of that, however I know there must be an easier way to do it.
for f in *.csv; do
python playlist.py "$f" "${f%.csv}list.txt"
done
Will that do the trick? This will put foo.csv in foolist.txt and abc.csv in abclist.txt.
Or do you want them all in the same file?
Just use a for loop with the asterisk glob, making sure you quote things appropriately for spaces in filenames
for file in *.csv; do
python playlist.py "$file" >> outputfile.txt;
done
Is it a single directory, or nested?
Ex.
topfile.csv
topdir
--dir1
--file1.csv
--file2.txt
--dir2
--file3.csv
--file4.csv
For nested, you can use os.walk(topdir) to get all the files and dirs recursively within a directory.
You could set up your script to accept dirs or files:
python playlist.py topfile.csv topdir
import sys
import os
def main():
files_toprocess = set()
paths = sys.argv[1:]
for p in paths:
if os.path.isfile(p) and p.endswith('.csv'):
files_toprocess.add(p)
elif os.path.isdir(p):
for root, dirs, files in os.walk(p):
files_toprocess.update([os.path.join(root, f)
for f in files if f.endswith('.csv')])
if you have directory name you can use os.listdir
os.listdir(dirname)
if you want to select only a certain type of file, e.g., only csv file you could use glob module.

Categories