Distributing a Python script to unpack .tar.xz - python

Is there a way to distribute a Python script that can unpack a .tar.xz file?
Specifically:
This needs to run on other people's machines, not mine, so I can't require any extra modules to have been installed.
I can get away with assuming the presence of Python 2.7, but not 3.x.
So that seems to amount to asking whether out-of-the-box Python 2.7 has such a feature, and as far as I can tell the answer is no, but is there anything I'm missing?

First decompress the xz file into tar data and then extract the tar data:
import lzma
import tarfile
with lzma.open("file.tar.xz") as fd:
with tarfile.open(fileobj=fd) as tar:
content = tar.extractall('/path/to/extract/to')
For python2.7 you need to install pip27.pylzma

Related

Trying to read JSON file within a Python package

I am in the process of packaging up a python package that I'll refer to as MyPackage.
The package structure is:
MyPackage/
script.py
data.json
The data.json file comprises cached data that is read in script.py.
I have figured out how to include data files (use of setuptools include_package_data=True and to also include path to data file in the MANIFEST.in file) but now when I pip install this package and import the installed MyPackage (currently testing install by pip from the GitHub repository) I get a FileNotFound exception (data.json) in the script that is to utilize MyPackage. However, I see that the data.json file is indeed installed in Lib/site-packages/MyPackage.
Am I doing something wrong here by trying to read in a json file in a package?
Note that in script.py I am attempting to read data.json as open('data.json', 'r')
Am I screwing up something regarding the path to the data file?
You're not screwing something up, accessing package resources is just a little tricky - largely because they can be packaged in formats where your .json might strictly speaking not exist as an actual file on the system where your package is installed (e.g. as zip-app). The right way to access your data file is not by specifying a path to it (like "MyPackage/data.json"), but by accessing it as a resource of your installed package (like "MyPackage.data.json"). The distinction might seem pedantic, but it can matter a lot.
Anyway, the access should be done using the builtin importlib.resources module:
import importlib.resources
import json
with importlib.resources.open_text("MyPackage", "data.json") as file:
data = json.load(file)
# you should be able to access 'data' like a dictionary here
If you happen to work on a python version lower than 3.7, you will have to install it as importlib_resources from pyPI.
I resolved the issue by getting the 'relative path' to where the package is.
self.data = self.load_data(path=os.path.join(
os.path.dirname(os.path.abspath(__file__)),
'data.json'))
load_data just reads the data file
Any constructive criticism is still very much welcome. Not trying to write stupid code if I can't help it :)

Mimic 7zip with python

I am using Python 3.6, and currently I subprocess out to my 7zip program to get the compression I need.
subprocess.call('7z a -t7z -ms=off {0} *'.format(filename))
I know the zipfile class has ‘ZIP_LZMA’ compression, but the application I am passing this too says the output file isn’t correct. So what else do I have to add to the ZipFile class to make it mimic the above command?
If you do not care much for Windows, then perhaps libarchive could help. In Ubuntu, for example:
$ sudo apt install python3-libarchive-c
Then:
import libarchive
with libarchive.file_writer('test.7z', '7zip') as archive:
archive.add_files('first.file', 'second.file', 'third.file')
Then there is the pylib7zip library, which wraps the existing 7z.dll and seems to offer a Windows-only alternative.

Python ZipFile module extracts password protected zips slowly

i am trying to write a python-script, which should extract a zip file:
Board: Beagle-Bone black ~ 1GHz Arm-Cortex-a8, debian wheezy
Zipfile: /home/milo/my.zip, ~ 8 MB
>>> from zipfile import ZipFile
>>> zip = ZipFile("/home/milo/my.zip")
>>> zip.extractall(pwd="tst")
other solutions with opening and reading-> writing the zipfile and extracting even
particular file have the same effect. extracting take about 3-4 minutes.
Extracting the same file with just using unzip-tool takes less than 2 seconds.
Does anyone know what is wonrg with my code, or even with python zipfile lib??
Thanks
Ajava
This seems to be a documented issue with the ZipFile module in Python 2.7. If you look at the documentation for ZipFile, it clearly mentions:
Decryption is extremely slow as it is implemented in native Python
rather than C.
If you need faster performance, you can either invoke an an external program (like unzip or 7zip) from your code, or make sure the zip files you are working with are not password protected.
Copy from my answer https://stackoverflow.com/a/72513075/10860732
It's quite stupid that Python doesn't implement zip decryption in pure c.
So I make it in cython, which is 17 times faster.
Just get the dezip.pyx and setup.py from this gist.
https://gist.github.com/zylo117/cb2794c84b459eba301df7b82ddbc1ec
And install cython and build a cython library
pip3 install cython
python3 setup.py build_ext --inplace
Then run the original script with two more lines.
import zipfile
# add these two lines
from dezip import _ZipDecrypter_C
setattr(zipfile, '_ZipDecrypter', _ZipDecrypter_C)
z = zipfile.ZipFile('./test.zip', 'r')
z.extractall('/tmp/123', None, b'password')

Is there a faster method to load a yaml file than the standard .load method? Django/Python

I am loading a big yaml file and it is taking forever. I am wondering if there is a faster method than the yaml.load() method.
I have read that there is a CLoader method but havent been able to run it.
The website that suggested this CLoader method asks me to do this:
Download the source package PyYAML-3.08.tar.gz and unpack it.
Go to the directory PyYAML-3.08 and run:
$ python setup.py install
If you want to use LibYAML bindings, which are much faster than the pure Python version, you need to download and install LibYAML.
Then you may build and install the bindings by executing
$ python setup.py --with-libyaml install
In order to use LibYAML based parser and emitter, use the classes CParser and CEmitter:
from yaml import load, dump
try:
from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
from yaml import Loader, Dumper
This looks like this will work but I dont have a setup.py directory anywhere in my Django project and therefore can't install/import any of these things
Can anyone help me figure out how to do this or let me know about another faster loading method??
Thanks for the help!!
I have no idea what's faster - bspymaster's ideas might be the most useful.
When you download PyYAML-3.08.tar.gz, inside the archive there will be a setup.py what you can run.
Note to use LibYAML, download this: http://pyyaml.org/download/libyaml/yaml-0.1.4.tar.gz
And run using the instructions from http://pyyaml.org/wiki/LibYAML
You will need a set a build tools, which should be installed on linux/unix, for osx make sure xcode is installed, and I'm not sure about windows.

Including a Python Library (suds) in a portable way

I'm using suds (brilliant library, btw), and I'd like to make it portable (so that everyone who uses the code that relies on it, can just checkout the files and run it).
I have tracked down 'suds-0.4-py2.6.egg' (in python/lib/site-packages), and put it in with my files, and I've tried:
import path.to.egg.file.suds
from path.to.egg.file.suds import *
import path.to.egg.file.suds-0.4-py2.6
The first two complain that suds doesn't exist, and the last one has invalid syntax.
In the __init__.py file, I have:
__all__ = [ "FileOne" ,
"FileTwo",
"suds-0.4-py2.6"]
and have previously tried
__all__ = [ "FileOne" ,
"FileTwo",
"suds"]
but neither work.
Is this the right way of going about it? If so, how can I get my imports to work. If not, how else can I achieve the same result?
Thanks
You must add your egg file to sys.path, like this:
import sys
# insert at 0 instead of appending to end to take precedence
# over system-installed suds (if there is one).
sys.path.insert(0, "suds-0.4-py2.6.egg")
import suds
.egg files are zipped archives; hence you cannot directly import them as you have discovered.
The easy way is to simply unzip the archive, and then copy the suds directory to your application's source code directory. Since Python will stop at the first module it discovers; your local copy of suds will be used even if it is not installed globally for Python.
One step up from that, is to add the egg to your path by appending it to sys.path.
However, the proper way would be to package your application for distribution; or provide a requirements file that lets other people know what external packages your program depends on.
Usually I distribute my program with a requirements.txt file that contain all dependencies and their version.
The users can then install these libraries with:
pip install -r requirements.txt
I don't think including eggs with your code is a good idea, what if the user use python2.7 instead of python2.6
More info about requirement file: http://www.pip-installer.org/en/latest/requirements.html

Categories