Problems when distributing a python drag&drop file - python

I am writing a script for a project I am working on, where you can easily drag&drop a Orthomosaic of a drone survey on the .py-File and it automatically calculates a bunch of vegetation indices. It is thought, that this script will be distributed to the partners, so that all project members use the same way of calculating the indices and statistics.
The problem is, that I can only manage to get it run on my Computer at home - if I try to run it on my or my colleagues work computers, it does not work. On my colleagues computer it seems to struggle with the sys.argv[] arguments of the script, whereas on my work computer I get the error that the variable name is too long (or something like that - I am currently in HomeOffice and did not write down the exact error)
Since it works on my computer at home, I suspect it has to do with the packages that have to be installed (gdal, rasterio, geopandas, ...) or the environment in which the packages are installed and possibly various python versions being installed on the work computers, even though I already reinstalled python on my working computer, which did not solve the problem.
I am sorry if my descriptions are scarce and vague.
So my questions would be: Do you have experience with distributing a drag&drop-File? Is there a way to build an executable, which can be run without the packages being installed?
If you are interested I can send you my scripts and some example files.
Thanks and best wishes,
Matthias

Related

Explain why Python virtual environments are “better”? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 months ago.
Improve this question
I have yet to come across an answer that makes me WANT to start using virtual environments. I understand how they work, but what I don’t understand is how can someone (like me) have hundreds of Python projects on their drive, almost all of them use the same packages (like Pandas and Numpy), but if they were all in separate venv’s, you’d have to pip install those same packages over and over and over again, wasting so much space for no reason. Not to mention if any of those also require a package like tensorflow.
The only real benefit I can see to using venv’s in my case is to mitigate version issues, but for me, that’s really not as big of an issue as it’s portrayed. Any project of mine that becomes out of date, I update the packages for it.
Why install the same dependency for every project when you can just do it once for all of them on a global scale? I know you can also specify —-global-dependencies or whatever the tag is when creating a new venv, but since ALL of my python packages are installed globally (hundreds of dependencies are pip installed already), I don’t want the new venv to make use of ALL of them? So I can specify only specific global packages to use in a venv? That would make more sense.
What else am I missing?
UPDATE
I’m going to elaborate and clarify my question a bit as there seems to be some confusion.
I’m not so much interested in understanding HOW venv’s work, and I understand the benefits that can come with using them. What I’m asking is:
Why would someone with (for example) have 100 different projects that all require tensorflow to be installed into their own venv’s. That would mean you have to install tensorflow 100 separate times. That’s not just a “little” extra space being wasted, that’s a lot.
I understand they mitigate dependency versioning issues, you can “freeze” packages with their current working versions and can forget about them, great. And maybe I’m just unique in this respect, but the versioning issue (besides the obvious difference between python 2 and 3) really hasn’t been THAT big of an issue. Yes I’ve run into it, but isn’t it better practise to keep your projects up to date with the current working/stable versions than to freeze them with old, possibly no longer supported versions? Sure it works, but that doesn’t seem to be the “best” option to me either.
To reiterate on the second part of my question, what I would think is, if I have (for example) tensorflow installed globally, and I create a venv for each of my 100 tensorflow projects, is there not a way to make use of the already globally installed tensorflow inside of the venv, without having to install it again? I know in pycharm and possibly the command line, you can use a — system-site-packages argument (or whatever it is) to make that happen, but I don’t want to include ALL of the globally installed dependencies, cuz I have hundreds of those too. Is —-system-site-packages -tensorflow for example a thing?
Hope that helps to clarify what I’m looking for out of this discussion because so far, I have no use for venv’s, other than from everyone else claiming how great they are but I guess I see it a bit differently :P
(FINAL?) UPDATE
From the great discussions I've had with the contributors below, here is a summation of where I think venv's are of benefit and where they're not:
USE a venv:
You're working on one BIG project with multiple people to mitigate versioning issues among the people
You don't plan on updating your dependencies very often for all projects
To have a clearer separation of your projects
To containerize your project (again, for distribution)
Your portfolio is fairly small (especially in the data science world where packages like Tensorflow are large and used quite frequently across all of them as you'd have to pip install the same package to each venv)
DO NOT use a venv:
Your portfolio of projects is large AND requires a lot of heavy dependencies (like tensorflow) to mitigate installing the same package in every venv you create
You're not distributing your projects across a team of people
You're actively maintaining your projects and keeping global dependency versions up to date across all of them (maybe I'm the only one who's actually doing this, but whatever)
As was recently mentioned, I guess it depends on your use case. Working on a website that requires contribution from many people at once, it makes sense to all be working out of one environment, but for someone like me with a massive portfolio of Tensorflow projects, that do not have versioning issues or the need for other team members, it doesn't make sense. Maybe if you plan on containerizing or distributing the project it makes sense to do so on an individual basis, but to have (going back to this example) 100 Tensorflow projects in your portfolio, it makes no sense to have 100 different venv's for all of them as you'd have to install tensorflow 100 times into each of them, which is no different than having to pip install tensorflow==2.2.0 for specific old projects that you want to run, which in that case, just keep your projects up to date.
Maybe I'm missing something else major here, but that's the best I've come up with so far. Hope it helps someone else who's had a similar thought.
I'm a data scientist and sometimes I run into these things called "virtual environments" and I don't get what the use case is? I already have all of these packages and modules and widgets downloaded! Why should I set up a separate place where I manage all of the stuff I'm already managing globally?
Python is a very powerful tool. In this answer consider two such ways to swing the metaphorical hammer:
Data Science
Software Engineering
For a data scientist (working alone) using Python to write a poc for a research paper, make a lstm nn, or predict the price of TSLA dependent on the frequency of Elon Musk's tweets all that really matters is being able to use the best library (tensorflow, pytorch, sklearn, ...) for whatever task they're trying to get done. In whatever directory they're working in when they need it. It is very tempting to use one global Python installation and just use the same stuff everywhere. Frankly, this is probably fine. As it's just one person managing their own space. So the configuration of their machine would be one single Python environment and everything, everywhere uses it. Or if the data scientist wanted to they could have a single directory that contains a virtual environment and some sub directories containing all the scripts (projects) they work on.
Now consider a software engineer who has multiple git repos with complete CI/CD pipelines that each build into separate entities that then get deployed to some cloud environment. Them and the 9 other people on their team need to be able to be sure that they are all making changes that won't break any piece of the code. For example in Python 3.6 the function dict.popitem subtly changed from returning a random element in a dict to LIFO order guaranteed. It's pretty easy to see that that could cause issues if Jerry had implemented a function that relies on the original random nature of the function and Bob implemented a function with the LIFO behavior guaranteed. This team of engineers would have git repos that each contain a single virtual environment (a single isolated Python environment) that allows them to manage dependencies for that "project".
The data scientist has one Python installation/environment that allows them to do whatever.
The engineer has a Python installation and a bunch of environments so that they can work across multiple repos with multiple people and (hopefully) nothing breaks.
I can see where you're coming from with your question. It can seem like a lot of work to set up and maintain multiple virtual environments (venvs), especially when many of your projects might use similar or even the same packages.
However, there are some good reasons for using venvs even in cases where you might be tempted to just use a single global environment. One reason is that it can be helpful to have a clear separation between your different projects. This can be helpful in terms of organization, but it can also be helpful if you need to use different versions of packages in different projects.
If you try to share a single venv among all of your projects, it can be difficult to use different versions of packages in those projects when necessary. This is because the packages in your venv will be shared among all of the projects that use that venv. So, if you need to use a different version of a package in one project, you would need to change the version in the venv, which would then affect all of the other projects that use that venv. This can be confusing and make it difficult to keep track of what versions of packages are being used in which projects.
Another issue with sharing a single venv among all of your projects is that it can be difficult to share your code with others. This is because they would need to have access to the same environment (which contains lots of stuff unrelated to the single project you are trying to share). This can be confusing and inconvenient for them.
So, while it might seem like a lot of work to set up and maintain multiple virtual environments, there are some good reasons for doing so. In most cases, it is worth the effort in order to have a clear separation between your different projects and to avoid confusion when sharing your code with others.
It's the same principle as in monouser vs multiuser, virtualization vs no virtualization, containers vs no containers, monolithic apps vs micro services, etcetera; to avoid conflict, maintain order, easily identify a state of failure, among other reasons as scalability or portability. If necessary apply it, and always keeping in mind KISS philosophy as well, managing complexity, not creating more.
And as you have already mentioned, considering that resources are finite.
Besides, a set of projects that share the same base of dependencies of course that is not the best example of separation necessity.
In addition to that, technology evolve taking into account not redundancy of knowingly base of commonly used resources.
Well, there are a few advantages:
with virtual environments, you have knowledge about your project's dependencies: without virtual environments your actual environment is going to be a yarnball of old and new libraries, dependencies and so on, such that if you want to deploy a thing into somewhere else (which may mean just running it in your new computer you just bought) you can reproduce the environment it was working in
you're eventually going to run into something like the following issue: project alpha needs version7 of library A, but project beta needs library B, which runs on version3 of library A. if you install version3, A will probably die, but you really need to get B working.
it's really not that complicated, and will save you a lot of grief in the long term.
There are several motivations for venvs,
or for their moral equivalent: conda environments.
1. author a package
You create a cool "scrape my favorite site" package
which graphs a timeseries of some widget product.
Naturally it depends on BeautifulSoup.
You happened to have html5lib 1.1 lying around
due to some previous project, so you tested with that.
A user downloads your scrape-widget package from pypi,
happens to have lxml 4.7.1 available, and finds
that scraping crashes when using that library.
Wouldn't it have been better for your package
to specify that user shall run against the same
deps that you tested with?
2. use a package
Same scenario, but now you're using someone's scrape-widget
package. Author tested with lxml 4.7.1 but you have lxml 4.9.1,
which behaves differently, and this makes the app behave
differently, crashing in ways the author never saw.
3. use two packages
You want to run both scrape-frobozz-magic-widgets
and scrape-acme-widget. Their authors tested using
different versions of requests, and of lxml.
Changing dep changes the app behavior.
You can only use one or the other, unless you're
willing to re-run pip quite frequently.
4. collaborate on a team
You write code that has deps.
So does your colleague.
You have to coordinate things,
so testing on one laptop
instills confidence the test
would succeed on other laptops.
5. use CI
You have a teammate named Jenkins, and
want to communicate to him that you used
a specific version of a dep when you saw the test succeed.
6. get a new laptop
Things were working.
Then your laptop exploded,
you got a new one,
and you (quickly) want to see things work again.
Some of your deps were downrev, due to
recently released bugs and breaking changes.
Reading a file full of dep versions from your github repo
lets you immediately reproduce the state of the world
back when things were working.

Best approach to distribute a python package on a local segregated network

Dear community members,
I would like to have your opinions about my situation. I wrote some python modules to simplify the daily routine at work. I'm the only 'developer' and the user community is a handful of people with limited computer skills. The scripts should be available on several computers (Windows 7, 10 and 11) connected on a segregated network with no internet access.
I'm writing the code on a single PC (Windows 10) using Anaconda as environment and Spyder as IDE. The scripts are saved on a shared network disk that is accessible from all other PC in the segregated LAN.
And here comes my question: how should I package and distribute the code on all client PC?
My first idea was to not distribute it at all. I mean, I wanted to leave the code on the shared disk, and let the users double click on a shortcut on the desktop to have it running. The advantage is that I don't have to care about package creation and distribution.
Nevertheless I can see these limitations:
I need to install Anaconda on all PCs and, in particular on windows 7, I can only install an old release of it.
I need to modify the code in order that user-based configurations are saved on a local file and not on the shared disc.
In order to have access to shared modules between scripts, I need to add all relevant paths to the python search path on all PCs.
My second approach was to build an exe file for each script with pyinstaller and have them distributed on all clients. I can automatize the build and the copy on all pcs, so that I'm sure that everybody is using the same latest version. The advantage is that I don't need to install Anaconda everywhere but it has some drawbacks:
Each exe file is huge. All scripts have a Qt GUI and the size of the one-file exe generated by pyinstaller can easily reach 500 MB. This means that when the user double click on the icon, (s)he may have to wait a couple of seconds (depending on the disk speed and caching) before it is loaded and (s)he may thing that the computer is blocked and not working.
pyinstaller is multi-platform but not cross-platform. It means that I need to have two other development pcs, one with Win 7 and one with Win 11, to generate the exe file.
My third possibility was to generate a real python package that can be installed on all PC. And here the tricky point is, should it be installed with conda or with pip. I have a lot of confusion in mind about package building. I have seen and followed the tutorial on how to build a source and wheel python package on the python doc, but I don't know if it is the correct approach being my python environment inside anaconda.
I have seen that on git-hub you can automatically build an anaconda package starting from your python code and I would need to read the whole workflow documentation on how to do it because it doesn't look so easy to me. The drawback is that the clients PC have no access to GitHub, so I would need to manually copy the output package from a pc with internet access to somewhere on the segregated net and then let it install on all clients.
So, at the end of this long message, I hope I managed to describe my problem and I'm sure your answers will shed some light on it. I know that the question may sound trivial to the more advanced developers, but there are also newbies out here in need of good advices!
Thanks!

how to make my own copy of a python package

I have a python script that I use to analyze data. I rely on number crunching packages like numpy and others to work with my data. However, the packages constantly evolve and some functions depreciate, etc. This forces me to go through the script several times per year to fix errors and make it work again.
One of the solutions is to keep an older version of numpy. However, there are other packages that require a new version of numpy.
So the question I have is: Is there a way to 1) keep multiple versions of a package installed or 2) have a local copy the package located in the directory of my script so I am in control what I am importing. For example, I can have my own package where I will have all the different packages and versions I need.
Later, I can simply import libraries I want
from my_package.numpy_1_15 as np115
from my_package.numpy_1_16_4 as np1164
and later in my code, I can decide which function to use from which numpy version. For example:
index = np115.argwhere(x == 0)
This is my vision of the solution to my problem where I want to keep using old functions from previous versions of numpy (or other libraries). In addition, in this way, I can always have all the libraries needed with me in my script directory. So, if I need to run the script on a different machine I don't need to spend hours figuring out if everything is compatible.
Here are possible proposed solutions and why they do not solve my problem.
Virtual Environments in Python or Anaconda.
There are a bunch of introductions (for example) available that explain how to use them. However, virtual environments require maintenance and initial setup. Imagine, if I can just have a python code that performs well a specific computational task independent on what year it is and what packages are installed on any machine. The code can be shared among different research groups and will always work.
python create standalone executable linux
I can create standalone executable (example). However, it will be compiled​ and cannot be dynamically changed the really nice feature of Python

Is there a Python editor with python built in?

I need an editor with python built into it. Currently I use blender so I do not have install python. Blender comes with the python32.dll to use python. is there another editor out there that I can execute python commands without it being installed?
I don't understand the question fully either. Why NOT install python? But if the question is how to be able to edit and run python on machines without installing it, there's Movable Python (http://www.voidspace.org.uk/python/movpy/) with a small fee to purchase and Portable Python (http://www.portablepython.com/wiki/About), free, donation requested. I've used Movable Python and can vouch for it. I've never tried Portable Python.
ViennaMike referenced Movable Python which has a small fee, after I had asked the question, I did more searching and found movable python about the same time he suggested it. I seems to find something different.
http://code.google.com/p/movable-python/
this seems to be a free version of movable python. This is only the IDLE portion of python, but can be used to run *.py files. It is considerably smaller than a normal python installation, and comes in a zip file.
Several people asked about my ability to install an editor but not python. At my High school (I work with the IT dept as one of my classes) I find python helps a lot with some tasks. I am unable to install python due to admin rights (which I will have next year) so anytime I did install python, because of its size on the Network drive, it would be automatically deleted.
Thank you ViennaMike again for finding movable python, unfortunately, it only works with python 2.5, so I may see if there is a way I can get that to 3.2

Including python and other files

All of the python I've written so far have been fine on my own computer, but now I'd like to send some programs to friends to have them test certain features. Suppose I wrote an application in python with wxpython. Assuming people I send code to will not have either installed, what is the best way to include both python, and the wxpython library so the other person isn't struggling to get it running? I've never had to do this at this point in my learning and would love some feedback!
Thanks.
You can create a bundle using py2exe and installer using NSIS and ship it as executable so that your friend will get the complete working executable. But mind you, this will increase the size of the file enormously and I have often found it easier to ask them to install via README.txt files.
There are lots of binary builders: py2exe, cx_freeze, bbfreeze, PyInstaller, GUI2Exe. I have a whole slew of articles on these:
http://www.blog.pythonlibrary.org/2010/08/31/another-gui2exe-tutorial-build-a-binary-series/
http://www.blog.pythonlibrary.org/2010/07/31/a-py2exe-tutorial-build-a-binary-series/
http://www.blog.pythonlibrary.org/2010/08/19/a-bbfreeze-tutorial-build-a-binary-series/
http://www.blog.pythonlibrary.org/2010/08/12/a-cx_freeze-tutorial-build-a-binary-series/
http://www.blog.pythonlibrary.org/2010/08/10/a-pyinstaller-tutorial-build-a-binary-series/
Unless they are going to develop with Python too, then I don't see any reason for them to want to install a bunch of multi-megabyte installers versus your own. You can read about how to use Inno Setup to create an installer here:
http://www.blog.pythonlibrary.org/2008/08/27/packaging-wxpymail-for-distribution/

Categories