packaging scientific project in python

packaging scientific project in python - python

I am trying to build a package for an apps in python. It uses sklearn, pandas, numpy, boto and some other scientific module from anaconda. Being very unexperienced with python packaging, I have various questions:
1- I have some confidential files .py in my project which I don't want anyone to be able to see. In java I would have defined private files and classes but I am completely lost in python. What is the "good practice" to deal with these private modules? Can anyone link me some tutorial?
2- What is the best way to package my apps? I don't want to publish anything on Pypi, I only need it to execute on Google App engine for instance. I tried a standalone package with PyInstaller but I could not finish it because of numpy and other scipy packages which makes it hard. Is there a simple way to package in a private way python projects made with anaconda?
3- Since I want to build more apps in a close future, shall I try to make sub-packages in order to use them for other apps?

The convention is to lead with a single underscore _ if something is internal. Note that this is a convention. If someone really wants to use it, they still can. Your code is not strictly confidential.
Take a look at http://python-packaging-user-guide.readthedocs.org/en/latest/. You don't need to publish to pypi to create a Python package that uses tools such as pip. You can create a project with a setup.py file and a requirements.txt file and then use pip to install your package from wherever you have it (e.g., a local directory or a repository on github). If you take this approach then pip will install all the dependencies you list.
If you want to reuse your package, just include it in requirements.txt and the install_requires parameter in setup.py (see http://python-packaging-user-guide.readthedocs.org/en/latest/requirements/). For example, if you install your package with pip install https://github/myname/mypackage.git then you could include https://github/myname/mypackage.git in your requirements.txt file in future projects.

Related

Sharing projects with external Python libraries

How can I share a project (through Github) with someone when my project is using multiple personal modules/packages?
Using pip freeze > requirements.txt is obviously not enough, because there is no place to download packages I created specifically for this project.
Is pushing the contents of the project in my repository as a virtual environment viable solution?
What if I also need to have an older version of Python (3.6.5) for example?
Ty,

What these "personal modules/packages" are? The only reason to make a separate package is reusability.
If you do not plan to use it somewhere else, then just include them as modules in your project.
If you plan to use it somewhere else but don't think it could help somebody else then make a package and push it to to Github/Gitlab/Bitbucket/etc. It's possible to install dependencies from there using pip.
If you plan to use it somewhere else and want to share your work with all the python community - deploy package on PyPI.

Is it Possible to Install Part of Python Package Via Pip?

I have an internal utility library that is used by many projects. There is quite a bit of overlap between the projects in the code they pull from the utility library, but as the library grows, so does the amount of extra stuff any individual project gets that it won't be using. This wouldn't be an issue if the library consisted of only python, but the library also bundles in binaries.
Example-
psycopg2 is used in a handful of places within the utility library, but not all projects need db access. Because the development environment isn't the same as the production environment, the utility library also includes psycopg2 binaries for the prod environment.
This grows with openssl libraries, pandas, numpy, scipy, pyarrow, etc. The result is that a small, 50 line, single purpose script that may need db access is bundled into a 100mb+ deployment package.
So what I'd like to do is carve up the utility library into pieces where the projects down stream can choose which pieces to pull in, but keep the utility library code in one easy-to-manage place. That way, this small single purpose application can choose to import internal-util#core, internal-util#db and not include internal-util#numpy and internal-util#openssl
Is what i'm describing possible to do?

Not directly, to my best knowledge. pip installs a package fully, or not at all.
However, if you're careful in your package about how you import things that may require psycopg2 or someotherlargebinarything, you could use the extras_require feature and thus have the package's users choose which dependencies they want to pull in:
setup(
# ...
name='myawesometoolbelt',
extras_require={
'db': ['psycopg2'],
'math': ['numpy'],
},
)
and then, in your requirements.txt, or pip invocation,
myawesometoolbelt[db,math]

Have you tried looking at pip freeze > requirements.txt and pip install -r requirements.txt?
Once you generated your pip list via pip freeze, it is possible to edit which packages you want installed and which to omit from the requirements.txt generated.
You can then pip install -r requirements.txt the things you want back in.

Are there naming conventions for pip requirements files?

Are there conventions for storing multiple requirements.txt files in a Python code repository. For example, one file for simply running the program, another for day-to-day development, another for making a Windows build.
Some repositories contain two files, requirements.txt and requirements_dev.txt, or requirements.txt and requirements_win.txt - this seems pretty ad-hoc.
I have others with a requires subfolder. But I'm not sure what the meaning of requires/requirements.txt is in this context -- running the application, or for development?
There is no mention of storing multiple requirements files in Structuring your project (Hitchhiker's guide to Python) or pip install (pip documentation).

As far as I am aware, there are no hard-and-fast rules here. At least not via PEP (but someone feel free to correct me).
The Hitchhiker's Guide to Python recommends putting the pip requirements file at the root of your project.
There does not appear to be any requirement that a pip requirements file has to be called requirements.txt. The pip install documentation even uses the example of pip install -r example-requirements.txt.
I would think that the conventions vary from project-to-project and largely depend on your deployment process and project-specific documentation.

creating a pip package to allow others to to download - problems installing the created package [duplicate]

I'm writing a django application in my spare time for a footy-tipping competition we're running at work. I figured I'd use this time wisely, and get up to speed on virtualenv, pip, packaging, django 1.3, and how to write an easily redistributable application. So far, so good.
I'm up to the packaging part. A lot of the django apps on GitHub for instance are mostly bundled (roughly) the same way. I'll use django-uni-forms as an example.
An assumption I'm making is that the MANIFEST.in and setup.py are the only required pieces that pip needs to do its job. Is that correct? What other components are necessary if my assumption is wrong?
Are the required packaging files generally generated, or are they crafted by hand? Can dependencies be described and then installed also? My application depends on django-uni-forms, and I have it listed in a requirements.txt file within my app which I used to install the dependency; but is that something that the packaging system can take care of?
What are the steps I need to follow to package my application in such a way that pip will be able to install it and any dependencies?

Yes, MANIFEST.in and setup.py should be sufficient.
This blog post really has some good information on this topic:
Packaging a Django reusable app
And here's another good, detailed overview that helped me a lot:
Python Packaging User Guide
Especially the tips to get your static files (templates) included are important as this might not be obvious at first.
And yes, you can specify required packages in your setup.py which are automatically fetched when installing your app.
For example:
install_requires = [
'django-profiles',
'django-uni-forms',
],
Obviously now we have two places where dependencies are defined, but that doesn't necessarily mean that these information are duplicated: setup.py vs requirements.txt
With this setup your package should be installable via pip.
As Pierre noted in the comments, there's now also a relevant section in Django's official documentation: Packaging your app
And then there is this "completely incomplete" guide, which really gives a great overview over packaging and uploading a package to PyPI: Sharing Your Labor of Love: PyPI Quick And Dirty

setup.py + virtualenv = chicken and egg issue?

I'm a Java/Scala dev transitioning to Python for a work project. To dust off the cobwebs on the Python side of my brain, I wrote a webapp that acts as a front-end for Docker when doing local Docker work. I'm now working on packaging it up and, as such, am learning about setup.py and virtualenv. Coming from the JVM world, where dependencies aren't "installed" so much as downloaded to a repository and referenced when needed, the way pip handles things is a bit foreign. It seems like best practice for production Python work is to first create a virtual environment for your project, do your coding work, then package it up with setup.py.
My question is, what happens on the other end when someone needs to install what I've written? They too will have to create a virtual environment for the package but won't know how to set it up without inspecting the setup.py file to figure out what version of Python to use, etc. Is there a way for me to create a setup.py file that also creates the appropriate virtual environment as part of the install process? If not — or if that's considered a "no" as this respondent stated to this SO post — what is considered "best practice" in this situation?

You can think of virtualenv as an isolation for every package you install using pip. It is a simple way to handle different versions of python and packages. For instance you have two projects which use same packages but different versions of them. So, by using virtualenv you can isolate those two projects and install different version of packages separately, not on your working system.
Now, let's say, you want work on a project with your friend. In order to have the same packages installed you have to share somehow what versions and which packages your project depends on. If you are delivering a reusable package (a library) then you need to distribute it and here where setup.py helps. You can learn more in Quick Start
However, if you work on a web site, all you need is to put libraries versions into a separate file. Best practice is to create separate requirements for tests, development and production. In order to see the format of the file - write pip freeze. You will be presented with a list of packages installed on the system (or in the virtualenv) right now. Put it into the file and you can install it later on another pc, with completely clear virtualenv using pip install -r development.txt
And one more thing, please do not put strict versions of packages like pip freeze shows, most of time you want >= at least X.X version. And good news here is that pip handles dependencies by its own. It means you do not have to put dependent packages there, pip will sort it out.
Talking about deploy, you may want to check tox, a tool for managing virtualenvs. It helps a lot with deploy.

Python default package path always point to system environment, that need Administrator access to install. Virtualenv able to localised the installation to an isolated environment.
For deployment/distribution of package, you can choose to
Distribute by source code. User need to run python setup.py --install, or
Pack your python package and upload to Pypi or custom Devpi. So the user can simply use pip install <yourpackage>
However, as you notice the issue on top : without virtualenv, they user need administrator access to install any python package.
In addition, the Pypi package worlds contains a certain amount of badly tested package that doesn't work out of the box.
Note : virtualenv itself is actually a hack to achieve isolation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

packaging scientific project in python - python

Related

Sharing projects with external Python libraries

Is it Possible to Install Part of Python Package Via Pip?

Are there naming conventions for pip requirements files?

creating a pip package to allow others to to download - problems installing the created package [duplicate]

setup.py + virtualenv = chicken and egg issue?

Categories

Resources