How to correctly zip the whole python standard library?

How to correctly zip the whole python standard library? - python

I recently successfully embedded a python distribution with an application in Mac OS X using a homebrew installed python3.7 as per the methodology outlined in Joao Ventura's very useful two part series, provided here for reference (http://joaoventura.net/blog/2016/embeddable-python-osx/) and (http://joaoventura.net/blog/2016/embeddable-python-osx-from-src/).
The only remaining issue for me was to reduce the size of the python distribution size in the application by zip compressing the whole standard library minus lib-dynload, config-3.7m-darwin and site-packages.
My directory structures is as follows:
- python3.7/
- include/
- lib/
- python3.7/
- libpython3.7.dylib
- python3.7 <executable>
The basic initial step is to move lib-dynload and config-3.7m-darwin from lib/python3.7, so that I can compress the sodlib source files into lib/python37.zip and then move lib-dynload and config-3.7m-darwin back into now empty lib/python3.7 to end up with the desired structure:
- python3.7/
- include/
- lib/
- python3.7/
- lib-dynload/
- config-3.7m-darwin
- python37.zip
- libpython3.7.dylib
- python3.7 <executable>
To test whether it worked or not, I would check sys.path from the executable and try to import a module and check its __file__ attribute to see if it came from the zip archive.
On this basis, I would cd into lib/python3.7 and try the following:
Select all files and folders and zip using OS X's Finder's compress to generate python37.zip
Using the python zipfile module:
python -m zipfile -c python37.zip lib/python3.7/*
Using the zip method from How can you bundle all your python code into a single zip file?
cd lib/python3.7
zip -r9 ../python37.zip *
In all cases, I got it to work by setting PYTHONPATH to the zipped library, as in:
PYTHONPATH=lib/python37.zip ./python3.7`
Doing, I was able to successfully import from the zip archive and verify that the modules came from the zip archive. But without setting PYTHONPATH, it did not work.
Hence, I would very much appreciate some help to establish the correct and most straightforward way to zip the standard library such that it would be recognized automatically from sys.path (without any extra steps such as specifying the PYTHONPATH environment value which may not be possible on a user's machine).
Thanks in advance for any help provided.
S

Finally figured it out through a long process of elimination.
The only module you have to keep in site packages is os.py.
Here's a bash script for the whole process which may or may not work. It assumes you have downloaded a python source distribution from python.org
Then cd into the resultant source folder and run this script in the root
#!/usr/bin/env bash
# build_python.sh
# NOTE: need os.py to remain in site-packages or it will fail
NAME=xpython
PWD=$(pwd)
PREFIX=${PWD}/${NAME}
VERSION=3.8
VER="${VERSION//./}"
LIB=${PREFIX}/lib/python${VERSION}
MAC_DEP_TARGET=10.13
remove() {
echo "removing $1"
rm -rf $1
}
rm_lib() {
echo "removing $1"
rm -rf ${LIB}/$1
}
clean() {
echo "removing __pycache__ .pyc/o from $1"
find $1 | grep -E "(__pycache__|\.pyc|\.pyo$)" | xargs rm -rf
}
clean_tests() {
echo "removing 'test' dirs from $1"
find $1 | grep -E "(tests|test)" | xargs rm -rf
}
clean_site_packages() {
echo "removing everything in $LIB/site-packages"
rm -rf $LIB/site-packages/*
}
rm_ext() {
echo "removing $LIB/lib-dynload/$1.cpython-${VER}-darwin.so"
rm -rf $LIB/lib-dynload/$1.cpython-38-darwin.so
}
rm_bin() {
echo "removing $PREFIX/bin/$1"
rm -rf $PREFIX/bin/$1
}
./configure MACOSX_DEPLOYMENT_TARGET=${MAC_DEP_TARGET} \
--prefix=$PREFIX \
--enable-shared \
--with-universal-archs=64-bit \
--with-lto \
--enable-optimizations
make altinstall
clean $PREFIX
clean_tests $LIB
clean_site_packages
remove ${LIB}/site-packages
# remove what you want here...
rm_lib config-${VERSION}-darwin
rm_lib idlelib
rm_lib lib2to3
rm_lib tkinter
rm_lib turtledemo
rm_lib turtle.py
rm_lib ensurepip
rm_lib venv
remove $LIB/distutils/command/*.exe
remove $PREFIX/lib/pkgconfig
remove $PREFIX/share
# remove what you want here...
rm_ext _tkinter
rm_bin 2to3-${VERSION}
rm_bin idle${VERSION}
rm_bin easy_install-${VERSION}
rm_bin pip${VERSION}
mv $LIB/lib-dynload $PREFIX
cp $LIB/os.py $PREFIX
clean $PREFIX
python -m zipfile -c $PREFIX/lib/python${VER}.zip $LIB/*
remove $LIB
mkdir -p $LIB
mv $PREFIX/lib-dynload $LIB
mv $PREFIX/os.py $LIB
mkdir $LIB/site-packages
This is for a mac user, can be easily adapted for other platforms. It's not very well tested, so post feedback if you encounter any issues.

Related

How can I list all the virtual environments created with venv?

Someone's just asked me how to list all the virtual environments created with venv.
I could only think of searching for pyvenv.cfg files to find them. Something like:
from pathlib import Path
venv_list = [str(p.parent) for p in Path.home().rglob('pyvenv.cfg')]
This could potentially include some false positives. Is there a better way to list all the virtual environment created with venv?
NB: The question is about venv specifically, NOT Anaconda, virtualenv, etc.

On Linux/macOS this should get most of it
find ~ -d -name "site-packages" 2>/dev/null
Looking for directories under your home that are named "site-packages" which is where venv puts its pip-installed stuff. the /dev/null bit cuts down on the chattiness of things you don't have permission to look into.
Or you can look at the specifics of a particular expected file. For example, activate has nondestructive as content. Then you need to look for a pattern than matches venv but not anaconda and the rest.
find ~ -type f -name "activate" -exec egrep -l nondestructive /dev/null {} \; 2>/dev/null
macos mdfind
On macos, this is is pretty fast, using mdfind (locate on Linux would probably have similar performance.
mdfind -name activate | egrep /bin/activate$| xargs -o egrep -l nondestructive 2>/dev/null | xargs -L 1 dirname | xargs -L 1 dirname
So we :
look for all activate files
egrep to match only bin/activate files (mdfind matches on things like .../bin/ec2-activate-license)
look for that nondestructive and print filename where there is a match.
the 2 xargs -L 1 dirname allow us to "climb up" from /bin/activate to the virtual env's root.
Helper function with -v flag to show details.
jvenvfindall(){ # search for Python virtual envs. -v for verbose details
unset verbose
OPTIND=1
while getopts 'v' OPTION; do
case "$OPTION" in
v)
verbose=1
;;
?)
;;
esac
done
shift "$(($OPTIND -1))"
local bup=$PWD
for dn in $(mdfind -name activate | egrep /bin/activate$| xargs -o egrep -l nondestructive 2>/dev/null | xargs -L 1 dirname | xargs -L 1 dirname)
do
if [[ -z "$verbose" ]]; then
printf "$dn\n"
else
printf "\n\nvenv info for $dn:\n"
cd $dn
echo space usage, $(du -d 0 -h)
#requires the jq and jc utilities... to extract create and modification times
echo create, mod dttm: $(stat . | jc --stat | jq '.[]|{birth_time, change_time}')
tree -d -L 1 lib
fi
done
cd $bup
}
output:
...
venv info for /Users/me/kds2/issues2/067.pip-stripper/010.fixed.p1.check_venv/venvtest:
space usage, 12M .
create, mod dttm: { "birth_time": "Apr 16 13:04:43 2019", "change_time": "Sep 30 00:00:39 2019" }
lib
└── python3.6
...
Hmmm, disk usage is not that bad, but something similar for node_modules might save some real space.

The standard library venv does not keep track of any of the created virtual environments. Therefore, the only possibility to list all of them is to search for your hard drive for folders that meet certain criterion.
The PEP 405 gives quite good listing about what should be included in a folder so that it is a virtual environment. Also this blog post explains part of the virtual environment internals quite well. The definition of a virtual environment is
A Python virtual environment in its simplest form would consist
of nothing more than a copy or symlink of the Python binary
accompanied by a pyvenv.cfg file and a site-packages directory. (PEP 405)
In summary, you will have to search your hard drive for folders that:
Linux / macOS
Has pyvenv.cfg with home key*
Has bin/python3 or bin/python
Has lib/<python-version>/site-packages/, where <python-version> is for example python3.3.
Optional: If created with venv, has also bin/activate (source). A folder is considered virtual environment even if this would be lacking. (PEP 405)
Windows
Has pyvenv.cfg with home key*
Has Script/python.exe
Has lib/<python-version>/site-packages/, where <python-version> is for example python3.3.
Optional: If created with venv, has also Scripts/activate.bat and Scripts/Activate.ps1 (source). A folder is considered virtual environment even if these would be lacking. (PEP 405)
* pyvenv.cfg
The pyvenv.cfg can actually be in the same folder or one subfolder above the python executable. The pyvenv.cfg that belongs to a virtual environment must have home = <home> row, where <home> is the directory containing the Python executable used to create the virtual environment. (PEP 405).

Make always invoks some dependency rules each time it runs

I have situation where I need to run some python scripts remotly and need to select and copy a few files to a remote folder. I am doing this in two stages. I copy files to a temp folder and then make an archive ready to send.
I created makefile to automate the first stage but it seems to work a little strange. The makefile looks as follows:
# Makefile and user paths
mkfile_path = $(dir $(realpath $(firstword $(MAKEFILE_LIST))))
user_path = $(shell echo $$HOME)
# Dependencies
ENTDIR = entropy
BINDIR = binary-files
MODDIR = modules
NORTH = $(BINDIR)/north
SOUTH = $(BINDIR)/south
WEST = $(BINDIR)/west
DISK = $(MODDIR)/disk
GEN = $(MODDIR)/general
PROB = $(MODDIR)/probability
NLTK = nltk_data
METAVAR = obj-meta-vars
# Target
TARGET=scripts-to-run-remotely.tar.gz
# Rules
all : $(TARGET)
#echo "Complete"
$(TARGET) : $(NORTH)/north.obj \
$(SOUTH)/south.obj \
$(WEST)/west.obj \
$(ENTDIR)/mifunction.py \
$(ENTDIR)/miopt.py \
$(ENTDIR)/miprint.py \
$(ENTDIR)/run-logs.py \
$(DISK)/%.py \
$(GEN)/%.py \
$(PROB)/%.py \
$(METAVAR)/%.obj \
$(NLTK)
tar -czf $(TARGET) $(ENTDIR)/* $(BINDIR)/* $(MODDIR)/* $(NLTK)/* $(METAVAR)/*
# Files
$(NORTH)/north.obj: $(NORTH)
cp /home/user/Documents/python/$(NORTH)/north.obj ./$(NORTH)
$(SOUTH)/south.obj: $(SOUTH)
cp /home/user/Documents/python/$(SOUTH)/south.obj ./$(SOUTH)
$(WEST)/west.obj: $(WEST)
cp /home/user/Documents/python/$(WEST)/west.obj ./$(WEST)
$(DISK)/%.py: $(DISK)
cp /home/user/Documents/python/$(DISK)/*.py ./$(DISK)
$(GEN)/%.py: $(GEN)
cp /home/user/Documents/python/$(GEN)/*.py ./$(GEN)
$(PROB)/%.py: $(PROB)
cp /home/user/Documents/python/$(PROB)/*.py ./$(PROB)
$(ENTDIR)/mifunction.py: $(ENTDIR)
cp /home/user/Documents/python/$(ENTDIR)/mifunction.py ./$(ENTDIR)
$(ENTDIR)/optmi.py: $(ENTDIR)
cp /home/user/Documents/python/$(ENTDIR)/miopt.py ./$(ENTDIR)
$(ENTDIR)/printmi.py: $(ENTDIR)
cp /home/user/Documents/python/$(ENTDIR)/miprint.py ./$(ENTDIR)
$(ENTDIR)/run-logs.py: $(ENTDIR)
cp /home/user/Documents/python/$(ENTDIR)/run-logs.py ./$(ENTDIR)
$(METAVAR)/%.obj: $(METAVAR)
cp /home/user/Dropbox/data/outputs/$(METAVAR)/*.obj ./$(METAVAR)
# Folders
$(NORTH):
mkdir -p $#
$(SOUTH):
mkdir -p $#
$(WEST):
mkdir -p $#
$(ENTDIR):
mkdir -p $#
$(DISK):
mkdir -p $#
$(GEN):
mkdir -p $#
$(PROB):
mkdir -p $#
$(METAVAR):
mkdir -p $#
$(NLTK):
mkdir -p $#
#python3 -m nltk.downloader wordnet wordnet_ic averaged_perceptron_tagger -d $(mkfile_path)/$(NLTK)
clean:
#rm -rf ./$(TARGET) ./$(ENTDIR) ./$(BINDIR) ./$(MODDIR) ./$(METAVAR) ./$(NLTK)
#echo "All files and folders removed"
# Always run those:
.PHONY: all
The first things I'd like to ask is how to avoid redundancy; if possible, how to avoid repeating parts of a code.
The second thing is when I run make for the first time, it runs through all rules where fodlers need to be created and then through all rules where files need to be copied. When I run make again, it invokes rules related to copying files:
cp /home/user/Documents/python/entropy/mifunction.py ./entropy
cp /home/user/Documents/python/entropy/miopt.py ./entropy
cp /home/user/Documents/python/entropy/miprint.py ./entropy
cp /home/user/Documents/python/entropy/run-logs.py ./entropy
cp /home/user/Documents/python/modules/disk/*.py ./modules/disk
cp /home/user/Documents/python/modules/general/*.py ./modules/general
cp /home/user/Documents/python/modules/probability/*.py ./modules/probability
cp /home/user/Dropbox/data/outputs/obj-meta-vars/*.obj ./obj-meta-vars
tar -czf enropy/* binary-files/* modules/* nltk_data/* obj-meta-vars/*
Complete
I am guessing there must be something wrong with the dependency related to checking existing folders.
Thanks.

The problem is that your copying rules have this form:
$(ENTDIR)/mifunction.py: $(ENTDIR)
cp /home/user/Documents/python/$(ENTDIR)/mifunction.py ./$(ENTDIR)
Notice that the destination directory is a prerequisite. Make will consider the target out of date if the directory has a later timestamp than the target, and the OS updates the timestamp of the directory when a file is added to it (or removed). Since this makefile copies other files to that directory, this target will appear to be out of date the next time you run Make.
There is more than one way to solve this. The simplest is to change the prerequisite to an order-only prerequisite by adding a pipe ('|'):
$(ENTDIR)/mifunction.py: | $(ENTDIR)
cp /home/user/Documents/python/$(ENTDIR)/mifunction.py ./$(ENTDIR)
Once you confirm that this works, you can work on other improvements. You might consider using the original files as prerequisites:
$(ENTDIR)/mifunction.py: /home/user/Documents/python/$(ENTDIR)/mifunction.py | $(ENTDIR)
cp /home/user/Documents/python/$(ENTDIR)/mifunction.py ./$(ENTDIR)
This looks ungainly until you introduce automatic variables:
$(ENTDIR)/mifunction.py: /home/user/Documents/python/$(ENTDIR)/mifunction.py | $(ENTDIR)
cp $< $#
Whether or not you do that, you can introduce another variable:
PYTHON_DIR := /home/user/Documents/python
which will remove a lot of redundancy.
Further improvements are possible, but that's probably enough for now.

Trying to answer to the avoid-repition part of the questions: First, use automatic variables to refer to targets or prerequisites inside rules, e.g.
$(WEST)/west.obj: $(WEST)
cp /home/user/Documents/python/$# $<
The rule should then look identical at quite a few places, which enables the next change - define identical rules as a variable:
COPY = cp /home/user/Documents/python/$# $<
$(WEST)/west.obj: $(WEST)
$(COPY)
Next, use variables for your paths, e.g.
PYTHON_SOURCE_PATH = /home/user/Documents/python/
and use this variable in all your rules that need it (or the COPY variable as shown before). You should be able to change this path by only editing this particular line where the variable is set up. Next, collect your directories that are possibly created in a variable, too. Then, a lot of rules can be replaced by a single one:
DIRECTORIES = $(NORTH) $(SOUTH) # ... the others
$(DIRECTORIES):
mkdir -p $#

No such file or directory in find running .sh

Running this on osx...
cd ${BUILD_DIR}/mydir && for DIR in $(find ./ '.*[^_].py' | sed 's/\/\//\//g' | awk -F "/" '{print $2}' | sort |uniq | grep -v .py); do
if [ -f $i/requirements.txt ]; then
pip install -r $i/requirements.txt -t $i/
fi
cd ${DIR} && zip -r ${DIR}.zip * > /dev/null && mv ${DIR}.zip ../../ && cd ../
done
cd ../
error:
(env) ➜ sh package_lambdas.sh find: .*[^_].py: No such file or directory
why?

find takes as an argument a list of directories to search. You provided what appears to be regular expression. Because there is no directory named (literally) .*[^_].py, find returns an error.
Below I have revised your script to correct that mistake (if I understand your intention). Because I see so many ill-written shell scripts these days, I've taken the liberty of "traditionalizing" it. Please see if you don't also find it more readable.
Changes:
use #!/bin/sh, guaranteed to be on an Unix-like system. Faster than bash, unless (like OS X) it is bash.
use lower case for variable names to distinguish from system variables (and not hide them).
eschew braces for variables (${var}); they're not needed in the simple case
do not pipe output to /usr/bin/true; route it to dev/null if that's what you mean
rm -f by definition cannot fail; if you meant || true, it's superfluous
put then and do on separate lines, easier to read, and that's how the Bourne shell language was meant to be used
Let && and || serve as line-continuation, so you can see what's happening step by step
Other changes I would suggest:
Use a subshell when changing the working directory temporarily. When it terminates, the working directory is restored automatically (retained by the parent), saving you the cd .. step, and errors.
Use set -e to cause the script to terminate on error. For expected errors, use || true explicitly.
Change grep .py to grep '\.py$', just for good measure.
To avoid Tilting Matchstick Syndrome, use something other than / as a sed substitute delimiter, e.g., sed 's://:/:g'. But sed could be avoided altogether with awk -F '/+' '{print $2}'.
Revised version:
#! /bin/sh
src_dir=lambdas
build_dir=bin
mkdir -p $build_dir/lambdas
rm -rf $build_dir/*.zip
cp -r $src_dir/* $build_dir/lambdas
#
# The sed is a bit complicated to be osx / linux cross compatible :
# ( .//run.sh vs ./run.sh
#
cd $build_dir/lambdas &&
for L in $(find . -exec grep -l '.*[^_].py' {} + |
sed 's/\/\//\//g' |
awk -F "/" '{print $2}' |
sort |
uniq |
grep -v .py)
do
if [ -f $i/requirements.txt ]
then
echo "Installing requirements"
pip install -r $i/requirements.txt -t $i/
fi
cd $L &&
zip -r $L.zip * > /dev/null &&
mv $L.zip ../../ &&
cd ../
done
cd ../

The find(1) manpage says its args are [path ...] [expression], where "expression" consists of "primaries" and "operands" (-flags). '.*[^-].py' doesn't look like any expression, so it's being interpreted as a path, and it's reporting that there is no file named '.*[^-].py' in the working directory.
Perhaps you meant:
find ./ -regex '.*[^-].py'

virtualenvwrapper change cd behaviour and working with mkproject

I tried to change the default behaviour of cd with virtualenvwrapper via the instructions here: http://virtualenvwrapper.readthedocs.org/en/latest/tips.html#changing-the-default-behavior-of-cd
and placing the code in my .virtualenvs folder and postactivate and postdeactivate files.
postactivate:
#!/bin/bash
# This hook is sourced after every virtualenv is activated.
cd () {
if (( $# == 0 ))
then
builtin cd $VIRTUAL_ENV
else
builtin cd "$#"
fi
}
cd
post deactivate:
#!/bin/bash
# This hook is sourced after every virtualenv is deactivated.
cd () {
builtin cd "$#"
}
However it doesn't seem to work properly and now when I use workon project it doesn't automatically cd to the project folder listed in the .project file (which can be made with the mkproject command.
(Note if relevant I'm using zshell & prezto)

The recipe you posted isn't supposed to do what you're expecting. What it actually does is that whenever you type cd without any path in the terminal it navigates to the virtualenv root instead of the home folder.
I'd recommend you to set up virtualenvwrapper projects so that you can separate your codebase from the virtualenv (use a requirements file instead for portability!). I.e. to add to your shell file
PROJECT_HOME='path/to/your/projects/folder'
So that mkproject will create a path/to/your/projects/folder/[PROJECT_NAME] folder for you and workon will automatically cd into it.
However, if you don't want to use projects, you should change your postactivate script like this in order to achieve what you want:
cd $VIRTUAL_ENV

Any pointers on using Ropevim? Is it a usable library?

Rope is a refactoring library for Python and RopeVim is a Vim plugin which calls into Rope.
The idea of using RopeVim seems great to me, is there any documentation on "getting started" with RopeVim?
I've followed what documentation there is: https://bitbucket.org/agr/ropevim/src/tip/README.txt
I suppose I'm looking for:
look at this blog post / article
/ link it makes it all make sense.
alternate recommendations like
"forget about RopeVim", it doesn't
work very well or say "use this
instead of ropevim".

For basic renaming, hover your vim cursor over the variable/method/etc that you wish to rename and then type:
:RopeRename <enter>
From there it should be self-explanatory. It asks for the root path to the project you wish to do the renaming in. Then it asks you for the new name. Then you can preview/confirm changes.
If you have tab-complete setup in your vim command-area you can look through the other rope features by typing:
:Rope<Tab>

The documentation you found only shows the Vim particulars. If you want to see what those rope functions can do, see the rope documentation. Note, it's incomplete and points to the unittests for a full overview of what it can do.

i use this script and is the best to automate all the process
https://gist.github.com/15067
#!/bin/bash
# Plant rope vim's plugin
# This is a script to install or update 'ropevim'
# Copyright Alexander Artemenko, 2008
# Contact me at svetlyak.40wt at gmail com
function create_dirs
{
mkdir -p src
mkdir -p pylibs
}
function check_vim
{
if vim --version | grep '\-python' > /dev/null
then
echo You vim does not support python plugins.
echo Please, install vim with python support.
echo On debian or ubuntu you can do this:
echo " sudo apt-get install vim-python"
exit 1
fi
}
function get_or_update
{
if [ -e $1 ]
then
cd $1
echo Pulling updates from $2
hg pull > /dev/null
cd ..
else
echo Cloning $2
hg clone $2 $1 > /dev/null
fi
}
function pull_sources
{
cd src
get_or_update rope http://bitbucket.org/agr/rope
get_or_update ropevim http://bitbucket.org/agr/ropevim
get_or_update ropemode http://bitbucket.org/agr/ropemode
cd ../pylibs
ln -f -s ../src/rope/rope
ln -f -s ../src/ropemode/ropemode
ln -f -s ../src/ropevim/ropevim.py
cd ..
}
function gen_vim_config
{
echo "let \$PYTHONPATH .= \":`pwd`/pylibs\"" > rope.vim
echo "source `pwd`/src/ropevim/ropevim.vim" >> rope.vim
echo "Now, just add \"source `pwd`/rope.vim\" to your .vimrc"
}
check_vim
create_dirs
pull_sources
gen_vim_config

If you can live without vim, use Eric, which has rope support.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.