How can I track my python package's user base? - python

The PyPI used to give you the ability to track the downloads of your packages by release and month. However, since they changed server setups this feature is no longer available.
Has anyone here come up with another way to get an idea of how your python package's user base is growing?
I know many people dislike the idea of a program "phoning-home" but is there ANY way that this can be done that is considered 'kosher'? Can anyone suggest a less invasive way of tracking user base?

I have never used it but this looks like it may work for you. Granted it looks like just a "number of downloads" counter, so people with virtualenvs will get counted multiple time.
https://pypi.python.org/pypi/pypstats/

So it seems that the PyPI folks re-enabled the download tracking function yesterday!
example:
Blacktie Downloads

Related

How to programmatically sync anki flashcard database with local file?

I would like to have a script ran by cron or an anki background job that will automatically read in a file (e.g. csv, tsv) containing all my flashcards and update the flashcard database in anki automatically so that i don't have to manually import my flashcards 1000 times a week.
any have any ideas how this can be achieved ?
Some interesting links I've came across, including from answers, that might provide a lead towards solutions:
https://github.com/langfield/ki
https://github.com/patarapolw/ankisync
https://github.com/towercity/anki-cli
The most robust approach there is so far is to have your collection under git, and use Ki to make Anki behave like a remote repository, so it's very easy to synchronise. The only constraint is the format of your collection. Each card is kept as a single file, and there is no real way around this.
I'm the maintainer of ki, one of the tools you linked! I really appreciate the shout-out #BlackBeans.
It's hard to give you perfect advice without more details about your workflow, but it sounds to me like you've got the source-of-truth for your notes in tabular files, and you import these files into Anki when you've made edits or added new notes.
If this is the case, ki may be what you're looking for. As #BlackBeans mentioned, this tool allows you to convert Anki notes into markdown files, and more generally, handles moving your data from your collection to a git repository, and back.
Basically, if the reason why you've got stuff in tabular files is (1) because you want to version it, (2) because you want to use an external editor, or (3) because your content is generated programmatically, then you might gain some use from using ki.
Feel free to open an issue describing your use case in detail. I'd love to give you some support in figuring out a good workflow if you think it would be helpful. I am in need of more user feedback at the moment, so you'd be helping me out, too!

Total Downloads of Module Missing on PyPi

Up until recently, it was possible to see how many times a python module indexed on https://pypi.python.org/pypi had been downloaded (each module listed downloads for the past 24hrs, week and month). Now that information seems to be missing.
Download numbers are very helpful information when evaluating whether to build code off of one module or another. They also seem to be referenced by sites such as https://img.shields.io/
Does anyone know what happened? And/or, where I can view/retrieve that information?
This email from Donald Stufft (PyPI maintainer) from distutils mailing list says:
Just an FYI, I've disabled download counts on PyPI for the time being. The statistics stack is broken and needs engineering effort to fix it back up to deal with changes to PyPI. It was suggested that hiding the counts would help prevent user confusion when they see things like "downloaded 0 times" making people believe that a library has no users, even if it is a significantly downloaded library.
I'm unlikely to get around to fixing the current stack since, as part of Warehouse, I'm working on a new statistics stack which is much better. The data collection and storage parts of that stack are already done and I just need to get querying done (made more difficult by the fact that the new system queries can take 10+ seconds to complete, but can be queried on any dimension) and a tool to process the historical data and put it into the new storage engine.
Anyways, this is just to let folks know that this isn't a permanent loss of the feature and we won't lose any data.
So i guess we'll have to wait for a new stats stack in PyPI.
I just released http://pepy.tech/ to view the downloads of a package. I use the official data which is stored in BigQuery. I hope you will find it interesting :-)
Also the site is open source https://github.com/psincraian/pepy
Don't know what happened (although it happened before, i.e.) but you might wan't to try the PyPI ranking, or any of the several available modules and recipes to do this. For example:
Vanity
pyStats
random recipe
But consider that a lot of the downloads might be mirrors and not necessarily "real" user downloads. You should that into account in you evaluation. The libs mailing list (or other preferred media) might be a better way to know what version you should install.
PYPI count is disable temporarily as posted by dmand but there are some sites which may tells you python package statistics like pypi-stats.com (they said it shows real time information) and pypi-ranking.info (this might not gives you real time information).
You can also found some pypi packages which can gives you downloads information.

pyCharm: safe refactoring information for application on depending code

If I do refactoring in a library pyCharm does handle all depending applications which are known to the current running pyCharm instance.
But code which is not known to the current pyCharm does not get updated.
Is there a way to store the refactoring information in version control, so that depending applications can be updated if they get the update to the new version of the library?
Use Case:
class Server:
pass
gets renamed to
class ServerConnection:
pass
If a team mate updates the code of my library, his usage of Server needs to be changed to ServerConnection.
It would be very nice if pyCharm (or an other tool) could help my team mate to update his code automatically.
As far as I can tell this is not possible neither with a vanilla PyCharm nor with a plugin nor with a 3rd party tool.
It is not mentioned in the official documentation
There is no such plugin in the JetBrains Plugin Repositories
If PyCharm writes refactoring information to it's internal logs, you could build this yourself (but would you really want to?)
I am also not aware of any python specific refactorig tool that does that. You can check for yourself: there is another SO question for the most popular refactoring tools
But ...
I am sure there are reasons why your situation is like it is - there always are good reasons (and most of the time the terms 'historic and 'grown' turn up in explanations of these reasons) but I still feel obligated to point out what qarma already mentioned in his comment: the fact that you want to do something like replaying a refactoring on a different code base points towards a problem that should be solved in a different way.
Alternative 1: introduce an API
If you have different pieces of software that depend on each other on such a deep level, it might be a good idea to define an API that decouples the code bases from each others internals. With an API it is clear which parts have to be stable. If changes have to be done on the API level they must be communicated and coordinated with the involved teams.
Alternative 2: Make it what it actually is: one code base
If A1 for whatever reason is not possible I would conclude that you actually have one system distributed over different code bases and then those should be merged into one code base. Different teams can still work on the same code base (hopefully using a DVCS) but global refactorings can be done with tooling help and they reach all parts of the system.
Alternative 3: Make these refactorings in PyCharm over all involved code bases
Even if you can't merge them into one code base you could combine them easily in PyCharm by loading different projects into the same Window. I do this without problems with two git projects that have to be in different repositories but still share certain aspects. PyCharm handles commits to these repositories transparently: if you make changes in several repositories and commit them you write one commit message and the commits will be done to all repositories.

What builtin functions shouldn't be run by untrusted users?

I'm creating a corewars type application that runs on django and allows a user to upload some python code that will control their character. Now, I know the real answer to this is that as long as I'm taking code input from untrusted users I'll have security vulnerabilities. I'm just trying to minimize the risk as much as possible. Here are some that spring to mind:
__import__ (I'll probably also do some ast scanning to make sure there aren't any import statements)
open
file
input
raw_input
Are there any others I'm missing?
There are lots of answers on what to do in general about restricting Python at http://wiki.python.org/moin/SandboxedPython. When I looked at it some time ago, the Zope RestrictedPython looked the best solution, working with a whitelist system. You'll still need to take care in your own code so that you don't expose any security vulnerabilities, but that seems to be the best system out there.
Since you sound determined to do this, I'll link you to the standard rexec module, not because I think you should use it (don't - it has known vulnerabilities), but because it might be a good starting point for getting your webserver compromised your own restricted-execution framework.
In particular, under the heading "Defining restricted environments" several modules and functions are listed that were considered reasonably safe by the rexec designer; these might be usable as an initial whitelist of sorts. I'd also suggest examining its code for other gotchas you might not have thought of.
You will really need to avoid eval.
Imagine code such as:
eval("__impor" + "t__('whatever').destroy_your_server")
This is probably the most important one.
Yeah, you have to whitelist. There are so many ways to hide the bad commands.
This is NOT the worst case scenario:
the worst case scenario is that someone gets into the database
The worst case scenario is getting the entire machine rooted and you not noticing as it probes your other machines and keylogs your passwords. Isolate this machine and consider it hostile (DMZ, block it from being able to launch attacks internally and externally, etc). Run tripwire or AIDE on non-writeable media and log everything to a second host.
Finally, as plash shows, there are a lot of dangerous system calls that need to be protected against.
If you're not committed to using Python as the language inside the game, one possibility would be to embed Lua using LunaticPython (I suggest the bugfixes branch at https://code.launchpad.net/~dne/lunatic-python/bugfixes).
It's much easier to sandbox Lua than Python, and it's much easier to embed Lua than to create your own programming language.
You should use a whitelist, rather than a blacklist. If you use a blacklist, you will always miss something. Even if you don't, Python will add a function to the standard library, and you won't update your blacklist in time.
Things you're currently allowing but probably should not include:
compile
eval
reload (if they do access the filesystem somehow, this is basically import)
I agree that this would be very tricky to do correctly. One complication (among many) could be a user accessing one of these functions through a field in another class.
I would consider using another isolation mechanism, such as a virtual machine, instead or in addition to this. You might look at how codepad does it.

Is there any preferable way to get user/group information from an Active Directory domain in Python?

For a Django application that I'm working on, I wanted to allow group membership to be determined by Active Directory group. After a while of digging through the pywin32 documentation, I came up with this:
>>> import win32net
>>> win32net.NetUserGetGroups('domain_name.com', 'username')
[(u'Domain Users', 7), ...]
I spent a while googling before I figured this out though, and the examples I found almost exclusively used LDAP for this kind of thing. Is there any reason why that's to be preferred over this method? Bear a couple things in mind:
I'm not using Active Directory to actually perform authentication, only permissions. Authentication is performed by another server.
While it would be nice to have some cross-platform capabilities, this will probably run almost exclusively on Windows.
AD's LDAP interface has quite a few 'quirks' that make it more difficult to use than it might appear on the surface, and it tends to lag significantly behind on features. When I worked with it, I mostly dealt with authentication, but it's probably the same no matter what you're doing. There's a lot of weirdness in terms of having to be bound as a certain user just to do simple searches that a normal LDAP server would let you do as anonymous.
Also, at least as of a year ago, when I worked on this, python-ldap was the only Python LDAP implementation to support anywhere close to the full feature set, since it's built on top of OpenLDAP, However, OpenLDAP is rather difficult to build on Windows (and in general), so most builds will be missing one or more features. Although you're not doing authentication, a lack of SASL/Kerberos support (which was missing at the time I used it) might make things complicated for you.
If you have something that works, and only need to run it on Windows, I would really recommend sticking to it; using AD via LDAP can turn into a big project.
import wmi
oWMI = wmi.WMI(namespace="directory\ldap")
ADUsers = oWMI.query("select ds_name from ds_user")
for user in ADUsers:
print user.ds_name
Check out Tim Golden's Python Stuff.
import active_directory
user = active_directory.find_user(user_name)
groups = user.memberOf

Categories