rule of thumb to group/split your own functions/classes into modules

rule of thumb to group/split your own functions/classes into modules - python

Context
I write my own library for data analysis purpose. It includes classes to import data from the server, procedures to clean, analyze and display results. It also includes functions to compare results.
Concerns
I put all these in a single module and import it when I do a new project.
I put any newly-developed classes/functions in the same file.
I have concerns that my module becomes longer and harder to browse and explain.
Questions
I started Python six months ago and want to know common practices:
How do you group your function/classes and put them into separated files?
By Purpose? By project? By class/function?
Or you are not doing it at all?
In general how many lines of code in a single module?
What's the way to track the dependency among your own libraries?
Feel free to suggest any thoughts.

I believe the best way to answer this question is to look at what the leaders in this field are doing. There is a very healthy eco-system of modules available on pypi whose authors have wrestled with this question. Take a look at some of the modules you use frequently and thus are already installed on your system. Or better yet, many of those modules have their development versions hosted on GitHub (The pypi page usually has a pointer). Go there and look around.

Related

Integrating Python monorepo application into Yocto

A little context before the actual questions. As a project that I have worked on grew, the number of repositories used for the application we're developing grew as well. To the point that it's gotten unbearable to keep making modifications to the codebase: when something is changed in the core library in one repository we need to make adjustments to other Python middleware projects that use this library. Whereas this sounds not that terrifying, managing this in Yocto has become a burden: on every version bump of the Python library we need to bump all of the dependant projects as well. After some thinking, we decided to try a monorepo approach to deal with the complexity. I'm omitting the logic behind choosing this approach but I can delve into it if you think this is wrong.
1) Phase 1 was easy. Just bring every component of our middleware to one repository. Right now we have a big repository with more than git 20 submodules. Submodules will be gone once we finish the transition but as the project doesn't stop while we're doing the transition, submodules were chosen to track the changes and keep the monorepo up to date. Every submodule is a Python code with a setup.py/Pipfile et al. that does the bootstrapping.
2) Phase 2 is to integrate everything in Yocto. It has turned out to be more difficult than I had anticipated.
Right now we have this:
Application monorepo
Pipfile
python-library1/setup.py
python-library1/Makefile
python-library1/python-library1/<code>
python-library2/setup.py
python-library2/Makefile
python-library2/python-library2/<code>
...
python-libraryN/setup.py
python-libraryN/Makefile
python-libraryN/python-libraryN/<code>
Naturally, right now we have python-library1_<some_version>.bb, python-library2_<some_other_version>.bb in Yocto. Now we want to get rid of this version hell and stick to monorepo versions so that we could have just monorepo_1.0.bb that is to be updated when anything changes in any part.
Right now I have this monorepo_1.0.bb (actually, would like to have as this approach doesn't work)
SRCREV=<hash>
require monorepo.inc
monorepo.inc is as follows:
SRC_URI=<gitsm://the location of the monorepo,branch=...,>
S = "${WORKDIR}/git"
require python-library1.inc
require python-library2.inc
...
require python-libraryN.inc
python-libraryN.inc:
S = "${WORKDIR}/git/python-libraryN"
inherit setuptools
This approach doesn't work for multiple reasons. Basically every new require will override ${S} to another git/python-library and the only package that will be built is the last one.
Even though I know that this approach won't end up being the final solution I just wanted to tell you what I've strived for: very easy updates. The only thing that I need to do is just update SRCREV (or ${PV} in the future when this will be deployed) and the whole stack will be brought up to date.
So, well, the question is how to structure the Yocto recipes for Python monorepo manipulation?
P.S.
1) The monorepo structure is not set in stone so if you have any suggestions, I'm more than open to any criticism
2) The .inc file structure might not be suitable for another reason. This stack is deployed across a dozen of different devices and some of those python-library_N.bb have .bbappends in other layers that are custom for the devices. I mean that some of them might require different components of the system installed or some configs and files/ modifications.
Any suggestion on how to deal with the complexity of a big Python application in Yocto will be welcome. Thanks in advance.

I'm currently looking into the same issue as you.
I found that the recipe for the boost library also builds several packages from a "monorepo".
This recipe can be found in the poky layer:
http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-support/boost
Look especially into the boost.inc file, which builds the actual package list, while the other files define both SRC_URI and various patches.
Hope that helps !

What are the norms for releasing the same package under multiple names on PyPI?

Suppose I've uploaded a package called foobar to PyPI. Because the package is a Django module, I'd also like to publish it as django-foobar.
What is the general consensus towards releasing the same package under multiple names? Is it allowed or forbidden? Encouraged or discouraged?
(To prevent this question from appearing or becoming too opinion-based, I'm hoping someone can point me to some specific, published norms or obvious precedents. Thank you!)

The published recommendations are in PEP-0423, but the status of that document is 'deferred' (not 'rejected', but not 'approved' either). That is, they don't have much official standing, but they are generally good recommendations nonetheless.
Specifically, the Use a single name recommendation seems relevant. If your code works as a standalone package, then foobar (i.e., its own name) would be appropriate. If it is dependent on Django, then django-foobar is more meaningful. Of course, if Django has published recommendations for packaging modules, then those should be followed.
Whichever way you choose, stick to one. It is confusing to have the same code under two project names (not to mention the headaches of having to maintain and push updates to both projects with each release).

Total Downloads of Module Missing on PyPi

Up until recently, it was possible to see how many times a python module indexed on https://pypi.python.org/pypi had been downloaded (each module listed downloads for the past 24hrs, week and month). Now that information seems to be missing.
Download numbers are very helpful information when evaluating whether to build code off of one module or another. They also seem to be referenced by sites such as https://img.shields.io/
Does anyone know what happened? And/or, where I can view/retrieve that information?

This email from Donald Stufft (PyPI maintainer) from distutils mailing list says:
Just an FYI, I've disabled download counts on PyPI for the time being. The statistics stack is broken and needs engineering effort to fix it back up to deal with changes to PyPI. It was suggested that hiding the counts would help prevent user confusion when they see things like "downloaded 0 times" making people believe that a library has no users, even if it is a significantly downloaded library.
I'm unlikely to get around to fixing the current stack since, as part of Warehouse, I'm working on a new statistics stack which is much better. The data collection and storage parts of that stack are already done and I just need to get querying done (made more difficult by the fact that the new system queries can take 10+ seconds to complete, but can be queried on any dimension) and a tool to process the historical data and put it into the new storage engine.
Anyways, this is just to let folks know that this isn't a permanent loss of the feature and we won't lose any data.
So i guess we'll have to wait for a new stats stack in PyPI.

I just released http://pepy.tech/ to view the downloads of a package. I use the official data which is stored in BigQuery. I hope you will find it interesting :-)
Also the site is open source https://github.com/psincraian/pepy

Don't know what happened (although it happened before, i.e.) but you might wan't to try the PyPI ranking, or any of the several available modules and recipes to do this. For example:
Vanity
pyStats
random recipe
But consider that a lot of the downloads might be mirrors and not necessarily "real" user downloads. You should that into account in you evaluation. The libs mailing list (or other preferred media) might be a better way to know what version you should install.

PYPI count is disable temporarily as posted by dmand but there are some sites which may tells you python package statistics like pypi-stats.com (they said it shows real time information) and pypi-ranking.info (this might not gives you real time information).
You can also found some pypi packages which can gives you downloads information.

Python package name conventions

Is there a package naming convention for Python like Java's com.company.actualpackage? Most of the time I see simple, potentially colliding package names like "web".
If there is no such convention, is there a reason for it? What do you think of using the Java naming convention in the Python world?

Python has two "mantras" that cover this topic:
Explicit is better than implicit.
and
Namespaces are one honking great idea -- let's do more of those!
There is a convention for naming of and importing of modules that can be found in The Python Style Guide (PEP 8).
The biggest reason that there is no such convention to consistently prefix your modules names in a Java style, is because over time you end up with a lot of repetition in your code that doesn't really need to be there.
One of the problems with Java is it forces you to repeat yourself, constantly. There's a lot of boilerplate that goes into Java code that just isn't necessary in Python. (Getters/setters being a prime example of that.)
Namespaces aren't so much of a problem in Python because you are able to give modules an alias upon import. Such as:
import com.company.actualpackage as shortername
So you're not only able to create or manipulate the namespace within your programs, but are able to create your own keystroke-saving aliases as well.

The Java's conventions also has its own drawbacks. Not every opensource package has a stable website behind it. What should a maintainer do if his website changes? Also, using this scheme package names become long and hard to remember. Finally, the name of the package should represent the purpose of the package, not its owner

An update for anyone else who comes looking for this:
As of 2012, PEP 423 addresses this. PEP 8 touches on the topic briefly, but only to say: all lowercase or underscores.
The gist of it: pick memorable, meaningful names that aren't already used on PyPI.

There is no Java-like naming convention for Python packages. You can of course adopt one for any package you develop yourself, but you might have to invasively edit any package you may adopt from third parties, and the "culturally alien" naming convention will probably sap the changes of your own packages to be widely adopted outside of your organization.
Technically, there would be nothing wrong with Java's convention in Python (it would just make some from statements a tad longer, no big deal), but in practice the cultural aspects make it pretty much unfeasible.

The reason there's normally no package hierarchy is because Python packages aren't easily extended that way. Packages are actual directories, and though you can make packages look in multiple directories for sub-modules (by adding directories to the __path__ list of the package) it's not convenient, and easily done wrong. As for why Python packages aren't easily extended that way, well, that's a design choice. Guido didn't like deep hierarchies (and still doesn't) and doesn't think they're necessary.
The convention is to pick a toplevel package name that's obvious but unique to your project -- for example, the name of the project itself. You can structure everything inside it however you want (because you are in control of it.) Splitting the package into separate bits with separate owners is a little more work, but with a few guidelines it's possible. It's rarely needed.

There's nothing stopping you using that convention if you want to, but it's not at all standard in the Python world and you'd probably get funny looks. It's not much fun to take care of admin on packages when they're deeply nested in com.
It may sound sloppy to someone coming from Java, but in reality it doesn't really seem to have caused any big difficulties, even with packages as poorly-named as web.py.
The place where you often do get namespace conflicts in practice is relative imports: where code in package.module1 tries to import module2 and there's both a package.module2 and a module2 in the standard library (which there commonly is as the stdlib is large and growing). Luckily, ambiguous relative imports are going away.

I've been using python for years and 99.9% of the collisions I have seen comer from new developers trying to name a file "xml.py". I can see some advantages to the Java scheme, but most developers are smart enough to pick reasonable package names, so it really is't that big of a problem.

Does one often use libraries outside the standard ones?

I am trying to learn Python and referencing the documentation for the standard Python library from the Python website, and I was wondering if this was really the only library and documentation I will need or is there more? I do not plan to program advanced 3d graphics or anything advanced at the moment.
Edit:
Thanks very much for the responses, they were very useful. My problem is where to start on a script I have been thinking of. I want to write a script that converts images into a web format but I am not completely sure where to begin. Thanks for any more help you can provide.

For the basics, yes, the standard Python library is probably all you'll need. But as you continue programming in Python, eventually you will need some other library for some task -- for instance, I recently needed to generate a tone at a specific, but differing, frequency for an application, and pyAudiere did the job just right.
A lot of the other libraries out there generate their documentation differently from the core Python style -- it's just visually different, the content is the same. Some only have docstrings, and you'll be best off reading them in a console, perhaps.
Regardless of how the other documentation is generated, get used to looking through the Python APIs to find the functions/classes/methods you need. When the time comes for you to use non-core libraries, you'll know what you want to do, but you'll have to find how to do it.
For the future, it wouldn't hurt to be familiar with C, either. There's a number of Python libraries that are actually just wrappers around C libraries, and the documentation for the Python libraries is just the same as the documentation for the C libraries. PyOpenGL comes to mind, but it's been a while since I've personally used it.

As others have said, it depends on what you're into. The package index at http://pypi.python.org/pypi/ has categories and summaries that are helpful in seeing what other libraries are available for different purposes. (Select "Browse packages" on the left to see the categories.)

One very common library, that should also fit your current needs, is the Python Image Library (PIL).
Note: the latest version is still in beta, and available only at Effbot site.

If you're just beginning, all you'll need to know is the stuff you can get from the Python website. Failing that a quick Google is the fastest way to get (most) Python answers these days.
As you develop your skills and become more advanced, you'll start looking for more exciting things to do, at which point you'll naturally start coming across other libraries (for example, pygame) that you can use for your more advanced projects.

It's very hard to answer this without knowing what you're planning on using Python for. I recommend Dive Into Python as a useful resource for learning Python.
In terms of popular third party frameworks, for web applications there's the Django framework and associated documentation, network stuff there's Twisted ... the list goes on. It really depends on what you're hoping to do!

Assuming that the standard library doesn't provide what we need and we don't have the time, or the knowledge, to implement the code we reuse 3rd party libraries.
This is a common attitude regardless of the programming language.

If there's a chance that someone else ever wanted to do what you want to do, there's a chance that someone created a library for it. A few minutes Googling something like "python image library" will find you what you need, or let you know that someone hasn't created a library for your purposes.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.