How to download a full package dependency chain using conda? - python

I have a requirements file that describes the conda packages I need. These are required dependencies in a script I want to run. The script needs to run on a machine that is not connected to a network or internet. Therefore, I have decided to use download-only as explained here.
This is not working for me. When I choose just one dependency to download, I get an error. This is my command-line statement:
conda install --download-only grpcio=1.35.0
This is the error I get:
CondaExitZero: Package caches prepared. UnlinkLinkTransaction
cancelled with --download-only option
Apparently, the download-only is in this cases helps me to cancel the download.... That is not what I want. I want to download the full dependency chain in order to use it in an offline environment.
How do I do that?

Please file a bug report on the Conda repository, since this appears that it shouldn't be happening. I suspect the issue is that for download-only transactions, Conda still determines that a package might need to be removed in order to eventually install the requested package. That shouldn't be necessary, so I'd call that a bug.
In the meantime, either try creating a new environment and then using the --download-only flag with that environment activated. Alternatively, this answer shows how to download raw packages for a full solve, ignoring any existing package cache.

Related

Run a Python script on a host without Internet connection

I read all the proposed related questions but didn't find an answer to my case.
As for the title, I need to run a Python script (e.i. not an executable created with pyInstall) on a Windows VM that is not connected to the Internet. I can only download packages via browser.
Install Python is not a problem, but pip doesn't work, so I have the libraries issue.
Do you have any suggestions?
If you go to PyPi, where pip searches for packages, and find a package you are interested in, you can click on "Download Files" on the left panel. Using numpy as an example: https://pypi.org/project/numpy/#files
You can manually download the files that you want for each package and install them directly with pip. Keep in mind that the first couple of rounds will likely involve error messages telling you that there are missing dependencies. You will have to go back to PyPi and get those dependencies as well, until everything installs correctly.
You may want to check the following question for information regarding manual dependency management: How to extract dependencies from a PyPi package.

Get the list of packages used in Anaconda

Is there a way to get a list of packages that are being used rather than just installed in the environment?
Example: I can install matplotlib with conda install matplotlib, but if I never used it in any of the files I don't want it to be in the list.
Interesting idea to check the 'frequently used' packages in your environment.
It appears to me that there is no direct way for checking.
I am also attempting to work out this topic now. My layman idea is that we can do it in two consecutive stages: (a) to find out the most-used packages which were either often updated (checked using conda list --revisions) or easily recognized by the user; (b) to trace the dependencies of those packages (whether one package related to another package, or not) through pipdeptree command for checking packages' dependencies. This Anaconda link might also be useful: Managing Anaconda packages
The first step is to identify those most-used packages in your applications from time to time. Then only tracing their dependencies with other packages so that related packages were not unfavorably removed. Despite that, I still think it is better to stick with the default packages provided by Conda and will only add more packages if required.

Can conda perform an install whilst minimally updating dependencies?

The conda install man page says
Conda attempts to install the newest versions of the requested
packages. To accomplish this, it may update some packages that
are already installed, or install additional packages.
So first, does this also apply to dependencies that it determines it needs to install or update? Assuming that the answer is "yes"; can that behaviour be changed? For example when working with legacy code it can be useful to update dependencies as little as possible, or install the oldest version of a dependency that will still work. Is there some way to get the conda dependency resolver to figure this out automatically, or does one have to resort to manually figuring out the dependency updates in this case?
Or maybe I am wrong entirely and this is the default behaviour? The dependency resolution rules are not clear to me from the documentation.
Conda's Two-Stage Solving
Conda first tries to find a version of the requested package that could be installed without changing any installed packages (a frozen solve). If that fails, it simply re-solves the entire environment from scratch with the new constraint added (a full solve). There is no in-between (e.g., minimize packages updated). Perhaps this will change in the future, but this has been the state of things for versions 4.6[?]-4.12.
Mamba
If one needs to manually work things out, I'd strongly suggest looking into Mamba. In addition to being a compiled (fast!) drop-in replacement for conda, the mamba repoquery tool could be helpful for identifying the constraints that are problematic. It has a depends subcommand for identifying dependencies and a whoneeds subcommand for reverse dependencies.
Suggested Workflow
Were I working with legacy code, I might try defining a YAML for the environment (env.yaml) and place upper bounds on crucial packages. If I needed new packages, I would dry run adding it (e.g., mamba install -d somepkg) to see how it affects the environment, figure out what if any constraint (again upper bound) it needs, add it to the YAML, then actually install it with mamba env update -f env.yaml.

How to manage conda shared environment and local environment?

Say the team has a shared conda environment called 'env1' located at this directory:
/home/share/conda/envs/env1/
While my working directory is at:
/home/my_name/...
I do not have write permission for any files under /home/share/, just read permission.
Now I want to use the 'env1' environment with one additional library installed (this library is not originally appearing in /home/share/conda/envs/env1/)
How should I achieve that without re-installing everything in env1 again to my own directory. Also I have to use 'conda install' for that additional package.
I feel that this has something to do with 'conda install --use-local' to handle such shared-local-combined environment situation, but not sure exactly about the procedure.
Thanks for any help and explanation!
It looks like the --use-local flag only refers to whether conda should install a package built locally and maybe is not distributed through the usual channels (or instead of the usual channels). So I don't think this directly relates to your case.
Perhaps one solution is to clone this shared environment into a new one under your own account, where you have write permissions. Then conda install the new package you need in that environment. If you are concerned that it takes up space or duplicates the packages, I recommend reading this answer here, which explains that conda actually tries to not waste space by using hardlinks, so most likely the packages will not actually re-installed, but rather reused, in this new environment.
Finally, I'd personally try to create a new environment even just for the reason of clarity. If I later came back to this project, I'd like to know that it requires your "base/shared" env + an additional package. If it's named identically to the shared one, perhaps this difference wouldn't be so obvious.

How to make pip check for already installed pgks from multiple directories when installing to a --target dir?

For internal reasons my group shares a conda environment with a number of different groups. This limits flexibility of the package installation, because we don't want to accidentally update dependent packages (I know we live in the past...) To get around the inflexibility my group wants install the packages we develop in a remote directory. Using pip to install the packages works fine using the --target flag to designate the new/remote install folder. We will then modify our PYTHONPATHin our .bashrc to access our newly installed packages via standard import x.
The issue I have is the packages in our setup.py defined in the install_requires=['pandas==0.24.1']are also being installed in the remote directory, even though that requirement is satisfied by the shared python site_packages. What appears to be happening is that pip is installing the dependencies only looking in the remote packages directory. Is there some way install our packages while also having pip look in multiple places for package requirement satisfaction, specifically our python installation's site-packages?
I was thinking pip would use PYTHONPATH to check if a dependency is met, but that does not seem to be the case.
Please let me know if this does not make sense, packaging is still new to me. So i am sure I used the wrong terms all over the place.
I believe using "path configuration files" might help.
Say you have some packages installed in /path/to/external-packages and the regular location for site packages in the current environment is /path/to/site-packages.
Then you could add a file /path/to/site-packages/external-packages.pth with the following content:
/path/to/external-packages
I believe this should at least work for some pip commands: check, list, show, maybe more.
Be careful to read about and experiment with this technique, as it may have undesired side effects. Additionally, if I am not mistaken, there should be no need for modification to PYTHONPATH environment variable.

Categories