Python imports and file embebed

Python imports and file embebed - python

Im working on a project that imports several packages and when the script runs, I load a neural net model.
I want to know if the following is achievable:
If i run the script in another python environment, i need to install all the packages im importing. Is it possible to avoid this? This will remove the need to install all the packages the first time.
Is it possible to embed the neuralnet .pb into the code? Keep in mind that it weighs 80mb, so an hex dump doesnt work (text file with the dump weighs 700 mb)
The idea is to have 1 .py with everything necessary within. Is it possible?
Thank you!

If i run the script in another python environment, i need to install all the packages im importing. Is it possible to avoid this?
Well, not really but kinda (TL;DR no, but depends on exactly what you mean). It really just boils down to being a limitation of the environment. Somewhere, someplace, you need the packages where you can grab them from disk -- it's as simple as that. They have to be available and locatable.
By available, I mean accessible by means of the filesystem. By locatable, I mean there has to somewhere you are looking. A system install would place it somewhere that would be accessible, and could be reliably used as a place to install, and look for, packages. This is part of the responsibility of your virtual environment. The only difference is, your virtual environment is there to separate you from your system Python's packages.
The advantage of this is straight forward: I can create a virtual environment that uses the package slamjam==1.2.3, where the 1.2.3 is a specific version of the package slamjam, and also run a program that uses slamjam==1.7.9 without causing a conflict in my global environment.
So here's why I give the "kinda" vibe: if your user already has a package on your system, then your user needs to install nothing. They don't need a virtual environment for that package if it's already globally installed on their system. Likewise, they don't need a new one if it's in another virtual environment, although it is a great idea to separate your projects dependencies with one.
Is it possible to embed the neuralnet .pb into the code? Keep in mind that it weighs 80mb, so an hex dump doesnt work (text file with the dump weighs 700 mb)
So, yeah, actually it's extremely doable. The thing is, it depends on how you mean.
Like you are aware, a hex dump of the file takes a lot of space. That's very true. But it seems that you are talking about raw hex, which for every byte takes 2 bytes at minimum. Then, you might be dumping out extra information with that if you used a tool like hexdump, yada, yada yada.
Moral of the story, you're going to waste a lot of space doing that. So I'll give you a couple options, of which you can choose one, or more.
Compress your data, even more, if it is possible.
I haven't worked with TensorFlow data, but after a quick read, it appears it uses compression with ProtoBufs, and it's probably pretty compressed already. Well, whatever, go ahead and see if you can squeeze any more juice out of the fruit.
Take binary data, and dump it into a different encoding (hint, hint: base64!)
Watch what happens when we convert something to hex...
>>> binary_data=b'this is a readable string, but really it just boils down to binary information. i can be expressed in a more efficient way than a binary string or hex, however'
>>> hex_data = binary_data.hex()
>>> print(hex_data)
746869732069732061207265616461626c6520737472696e672c20627574207265616c6c79206974206a75737420626f696c7320646f776e20746f2062696e61727920696e666f726d6174696f6e2e20692063616e2062652065787072657373656420696e2061206d6f726520656666696369656e7420776179207468616e20612062696e61727920737472696e67206f72206865782c20686f7765766572
>>> print(len(hex_data))
318
318 characters? We can do better.
>>> import base64
>>> hex_data = binary_data.hex()
>>> import base64
>>> b64_data = base64.b64encode(binary_data)
>>> print(b64_data)
b'dGhpcyBpcyBhIHJlYWRhYmxlIHN0cmluZywgYnV0IHJlYWxseSBpdCBqdXN0IGJvaWxzIGRvd24gdG8gYmluYXJ5IGluZm9ybWF0aW9uLiBpIGNhbiBiZSBleHByZXNzZWQgaW4gYSBtb3JlIGVmZmljaWVudCB3YXkgdGhhbiBhIGJpbmFyeSBzdHJpbmcgb3IgaGV4LCBob3dldmVy'
>>> print(len(b64_data))
212
You've now made your data smaller, by 33%!
Package a non-Python file with your .whl distribution. Yeah, totally doable. Have I done it before? Nope, never needed to yet. Will I ever? Yep. Do I have great advice on how to do it? No. But I have a link for you, it's totally doable.
You can download the file from within the application and only provide the URL. Something quick and easy, like
import wget
file_contents_in_memory = wget.download('some.site.com/a_file`)
Yeah, sure there are other libraries like requests which do similar things, but for the example, I chose wget because it's got a simple interface too, and is always an option.
The idea is to have 1 .py with everything necessary within. Is it possible?
Well, file, yeah. For what you're asking -- a .py file with nothing else that will install your packages? If you really want to copy and paste library after library and all the data into one massive file nobody will download, I'm sure there's a way.
Let's look at a more supported approach for what you're asking: a whl file is one file, and it can have an internal list of packages you need to install the .whl, which will handle doing everything for you (installing, unpacking, etc). I'd look in that direction.
Anyway, a lot of information I know, but there's some logic as to why you can or can't do something. Hoped it helped, and best of luck to you.

Related

Python3: Is there any package/function/script to get a reference to a package's metadata (egg-info or dist-info) ON DISK?

I have very little experience with (i.e. don't really know) python but I'm currently trying to produce a mac app bundle for a package (and ideally after, a script to make it reproducible).
I've tried several approaches but made the most progress by combining py2app, gtk-mac-bundler and good old shell scripting. Still, this produces due to missing distributions (I realized yesterday it's caused by missing egg/dist-info, which are not supported and have been on the py2app feature request list since forever)
Of course, I could just copy all these files in a dumb way, using glob patterns in my shell but I fear that would go against reproducible builds in the future. So, is there any way by which I can get a reference to the files on disk, taking a package name or similar as an argument?

These libraries are able to read metadata from installed distributions (and more):
https://docs.python.org/3/library/importlib.metadata.html
https://setuptools.readthedocs.io/en/stable/pkg_resources.html
https://distlib.readthedocs.io/en/stable/index.html

Self-updating python Scripts

I wrote 2-3 Plugins for pyload.
Sometimes they change and i let users know over forum that theres a new version.
To avoid that i'd like to give my scripts an auto selfupdate function.
https://github.com/Gutz-Pilz/pyLoad-stuff/blob/master/FileBot.py
Something like that easy to setup ?
Or someone can point me in a direction ?
Thanks in advance!

It is possible, with some caveats. But it can easily become very complicated. Before you know it, your auto-update "feature" will be bigger than the original code!
First you need to have an URL that always contains the latest version. Since you are using github, using raw.githubusercontent might do very well.
Have your code download the latest version from that URL (e.g. using requests), and compare the version with that in the current code. For this purpose I would recommend a simple integer version number, so you don't need any complicated parsing logic.
However, you might want to consider only running that check once per day, or once per week. If you do it every time your file is run, the server might get hammered! So now you have to save a file with the date when the check was last done, and read that to see if it is time to run the check again. This file will need to be saved in a location that you can access on every platform your code is liable to run on. That in itself can be a challenge.
If it is just a single python file, which is installed as the user that is running it, updating is relatively easy. But if the original was installed as root in the global Python directory and your script is running as a nonprivileged user it will be difficult. Especially if it is running as a plugin and cannot ask the user for (temporary) root credentials to install the file.
And what are you going to do if a newer version has more dependencies outside the standard library?
Last but not least, as a sysadmin I don't really like auto-updating software. Especially for critical system infrstructure I like to be able to estimate the consequences before an update.

Sublime Text 3 - Module Functions Suggestions?

I've been looking around here but I haven't finded what I was searching, so I hope it's not answer around here. If it is, I would delete my question.
I was wondering if Sublime Text can suggest you functions from a module when you write "module.function". For example, if I write "import PyQt4", then sublime Text suggests me "PyQt4.QtCore" when I write "PyQt4.Q".
For now, I'll installed "SublimeCodeIntel" and just does it but for only some modules (like math or urllib). It's possible to configure it for any module? Or you can recommend me any other plugin?
Thanks for reading!
PD: also, could be possible to configute it also for my own modules? I mean, for example, module that I have written and are in the same folder as the current file I'm editing.

SublimeCodeIntel will work for any module, as long as it's indexed. After you first install the plugin, indexing can take a while, depending on the number and size of third-party modules you have in site-packages. If you're on Linux and have multiple site-packages locations, make sure you define them all in the settings. I'd also recommend changing "codeintel_max_recursive_dir_depth" to 25, especially if you're on OS X, as the default value of 10 may not reach all the way into deep directory trees.
Make sure you read through all the settings, and modify them to suit your needs. The README also contains some valuable information for troubleshooting, so if the indexing still isn't working after a while, and after restarting Sublime a few times, you may want to delete the database and start off fresh.

Python .pyc files and Windows UAC

I'm working on an Inno Setup installer for a Python application for Windows 7, and I have these requirements:
The app shouldn't write anything to the installation directory
It should be able to use .pyc files
The app shouldn't require a specific Python version, so I can't just add a set of .pyc files to the installer
Is there a recommended way of handling this? Like give the user a way to (re)generate the .pyc files? Or is the shorter startup time benefit from the .pyc files usually not worth worrying about?

PYC files aren't guaranteed to be compatible for different python versions. If you don't know that all your customers are running the same python versions, you really don't want to distribute pyc's directly. So, you have to choose between distributing PYCs and supporting multiple python versions.
You could create build process that compiles all your files using py_compile and zips them up into a version-specific package. You can do this with setuptools.; however it will be awkward to do because you'll have to run py_compile in every version you need to support.
If you are basically distributing a closed application and don't want people to have trivial access to your source code, then py2exe is probably a simpler alternative. If your python is supposed to be integrated into the user's python install, then it's probably simpler to just create a zip of your .py files and add a one-line .py stub that imports the zipped package(s) using zipfile
if it makes you feel better, PYC doesn't provide much extra security and it doesn't really boost perf much either :)

If you haven't read PEP 3147, that will probably answer your questions.
I don't mean the solution described in that PEP and implemented as of Python 3.2. That's great if your "multiple Python versions" just means "3.2, 3.3, and probably future 3.x". Or even if it means "2.6+ and 3.1+, but I only really care about 3.2 and 3.3, so if I don't get the pyc speedups for other ones that's OK".
But when I asked your supported versions, you said, "2.7", which means you can't rely on PEP 3147 to solve your problems.
Fortunately, the PEP is full of discussion of earlier attempts to solve the problem, and the pitfalls of each, and there should be more than enough there to figure out what the options are and how to implement them.
The one problem is that the PEP is very linux-centric—mainly because it's primarily linux distros that tried to solve the problem in the past. (Apple also did so, but their solution was (a) pretty much working, and (b) tightly coupled with the whole Mac-specific "framework" thing, so they were mostly ignored…)
So, it largely leaves open the question of "Where should I put the .pyc files on Windows?"
The best choice is probably an app-specific directory under the user's local application data directory. See Known Folders if you can require Vista or later, CSIDL if you can't. Either way, you're looking for the FOLDERID_LocalAppData or CSIDL_LOCAL_APPDATA, which is:
The file system directory that serves as a data repository for local (nonroaming) applications. A typical path is C:\Documents and Settings\username\Local Settings\Application Data.
The point is that it's a place for applications to store data that's separate for each user (and inside that user's profile directory), and also for each machine the user's roaming profile might end up on, which means you can safely put stuff there and know that the user has the permissions to write there without UAC getting involved, and also know (as well as you ever can) that no other user or machine will interfere with what's there.
Within that directory, you create a directory for your program, and put whatever you want there, and as long as you picked a unique name (e.g., My Unique App Name or My Company Name\My App Name or a UUID), you're safe from accidental collision with other programs. (There used to be specific guidelines on this in MSDN, but I can no longer find them.)
So, how do you get to that directory?
The easiest way is to just use the env variable %LOCALAPPDATA%. If you need to deal with older Windows, you can use %USERPROFILE% and tack \Local Settings\Application Data onto the end, which is guaranteed to either be the same, or end up in the same place via junctions.
You can also use pywin32 or ctypes to access the native Windows APIs (since there are at least 3 different APIs for this and at least two ways to access those APIs, I don't want to give all possible ways to write this… but a quick google or SO search for "pywin32 SHGetFolderPath" or "ctypes SHGetKnownFolderPath" or whatever should give you what you need).
Or, there are multiple third-party modules to handle this. The first one both Google and PyPI turned up was winshell.

Re-reading the original question, there's a much simpler answer that probably fits your requirements.
I don't know much about Inno, but most installers give you a way to run an arbitrary command as a post-copy step.
So, you can just use python -m compileall to create the .pyc files for you at install time—while you've still got elevated privileges, so there's no problem with UAC.
In fact, if you look at pywin32, and various other Python packages that come as installer packages, they do exactly this. This is an idiomatic thing to do for installing libraries into the user's Python installation, so I don't see why it wouldn't be considered reasonable for installing an executable that uses the user's Python installation.
Of course if the user later decides to uninstall Python 2.6 and install 2.7, your .pyc files will be hosed… but from your description, it sounds like your entire program will be hosed anyway, and the recommended solution for the user would probably be to uninstall and reinstall anyway, right?

Py2App Can't find standard modules

I've created an app using py2app, which works fine, but if I zip/unzip it, the newly unzipped version can't access standard python modules like traceback, or os. The manpage for zip claims that it preserves resource forks, and I've seen other applications packaged this way (I need to be able to put this in a .zip file). How do I fix this?

This is caused by building a semi-standalone version that contains symlinks to the natively installed files and as you say, the links are lost when zipping/unzipping unless the "-y" option is used.
An alternate solution is to build for standalone instead, which puts (public domain) files inside the application and so survives zipping/unzipping etc. better. It also means the app is more resilient to changes in the underlying OS. The downside is that it is bigger, of course, and is more complicated to get it set up.
To build a stand alone version, you need to install the python.org version which can be repackaged.
An explanation of how to do this is here, but read the comments as there have been some changes since the blog post was written.

use zip -y ... to create the file whilst preserving symlinks.

You probably need to give it your full PYTHONPATH.
Depends on your os. Here's how to find out:
import os [or any other std module]
os.file()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.