I have a script in python which uses a resource which can not be used by more than a certain amount of concurrent scripts running.
Classically, this would be solved by a named semaphores but I can not find those in the documentation of the multiprocessing module or threading .
Am I missing something or are named semaphores not implemented / exposed by Python? and more importantly, if the answer is no, what is the best way to emulate one?
Thanks,
Boaz
PS. For reasons which are not so relevant to this question, I can not aggregate the task to a continuously running process/daemon or work with spawned processes - both of which, it seems, would have worked with the python API.
I suggest a third party extension like these, ideally the posix_ipc one -- see in particular the sempahore section in the docs.
These modules are mostly about exposing the "system V IPC" (including semaphores) in a unixy way, but at least one of them (posix_ipc specifically) is claimed to work with Cygwin on Windows (I haven't verified that claim). There are some documented limitations on FreeBSD 7.2 and Mac OSX 10.5, so take care if those platforms are important to you.
You can emulate them by using the filesystem instead of a kernel path (named semaphores are implemented this way on some platforms anyhow). You'll have to implement sem_[open|wait|post|unlink] yourself, but it ought to be relatively trivial to do so. Your synchronization overhead might be significant (depending on how often you have to fiddle with the semaphore in your app), so you might want to initialize a ramdisk when you launch your process in which to store named semaphores.
Alternatively if you're not comfortable rolling your own, you could probably wrap boost::interprocess::named_semaphore (docs here) in a simple extension module.
Related
I am starting to read up over possible ways to parallelise Python code.
DISCLAIMER. This is NOT a question about Multiprocessing vs Multithreading.
At this link https://ipyparallel.readthedocs.io/en/latest/demos.html one finds references to several
concurrency packages for Python to avoid the GIL: https://scipy.github.io/old-wiki/pages/ParallelProgramming
-IPython1
-mpi4py
-parallel python
-Numba
There is also a multiprocessing package:
https://docs.python.org/3/library/multiprocessing.html
And another one called processing:
https://pypi.org/project/processing/
First of all, it is not at all clear to me the difference between the latter two above; what is the difference in using between the multiprocessing module and the processing module?.
In general, I fail to understand the differences between those all -- which must be there, given some developers made the effort to create a mpi4py version for the MPI used in C++. I guess this is not just about the dualism between "threading" and "multiprocessing" approaches, where in one case the memory is shared while the other has each process with its own memory and interpreter, something more must be different between all of those different packages out there.
Thanks to all of those who will dedicate time to answer this!
The difference is that the last version of processing was released in April of 2008 and multiprocessing was added in Python 2.6 in October 2008.
processing was a library that was used before multiprocessing was distributed with Python.
As far as the specific difference between other modules designed for multiprocessing: The scipy page you linked says that "This is a subject for graduate courses in computer science, and I'm not going to address it here....there are some python tools you can use to implement the things you learn in that graduate course." While they admit that may be a bit of an exaggeration, independent study of multiprocessing in general will be required to discern the difference between these libraries, you should probably just stick to the built in multiprocessing module for your initial experiments while you learn how it works. One you're more comfortable with multiprocessing, you might want to check out the pathos framework.
But here are the basics for the packages you mention:
Numba adds decorators that automatically compile functions to make them run faster, it isn't really a multiprocessing tool as much as a JIT compiling tool.
Parallel Python overcomes the GIL to utilize multiple cores or multiple computers, it's designed to be easy to use and to handle all the complex stuff behind the scenes.
MPI for Python is like Paralell Python with less emphasis on simplicity.
IPython is a toolkit with many features, including a shell and Jupyter kernel, it's also not really a multiprocessing tool.
Keep in mind that plenty of libraries/modules do the same thing, there doesn't need to be a reason more than one exists. Use whatever works for you.
I am working with an ARM Cortex M3 on which I need to port Python (without operating system). What would be my best approach? I just need the core Python and basic I/O.
Golly, that's kind of a tall order. There are so many services of a kernel that Python depends upon, and that you'd have to provide yourself. I'd think you'd be far better off looking for a lightweight OS -- maybe Minix 3? -- to put on your embedded processor.
Failing that, I'd be horribly tempted to think about hand-translating to C and building the essentials on that.
You should definitely look at eLua:
http://www.eluaproject.net
"Embedded power, driven by Lua
Quickly prototype and develop embedded software applications with the power of Lua and run them on a wide range of microcontroller architectures"
There are a few projects that have attempted to port Python to the situation you mention, take a look at python-on-a-chip, PyMite or tinypy. These are aimed at lower power microcontrollers without an OS and tend to focus on slightly older versions of the Python language and reduced library support.
One possible approach is to build your own stack machine in software to interpret and execute Python byte code directly. Certainly not a porting job and quite labor-intensive to implement, but a self-contained Python byte code stack processor built for your embedded system gets you around needing an operating system.
Another approach is writing your own low level executive (one step below a general purpose OS) that contains the bare minimum in services that a core Python interpreter port requires. I am not certain if this is more or less labor intensive than building a stack processor.
I am not recommending either of these approaches - personally, I like Charlie Martin's Minix 3 approach best since it is a balanced requirements compromise. On the other hand, what I suggest might be interesting if your project absolutely requires Python without an operating system and if the project has an excellent time and money budget.
Update 5 Mar 2012: Given a strict adherence to your Python/No OS requirements, another possibility of a path to a solution may lie in using an OS-less Java VM (e.g., jnode, currently in beta) and use Jython to create Java byte code from Python. Certainly not an ideal off-the-shelf solution, and it does seem to meet an OS-less Python requirement.
Compile it to c :)
http://shed-skin.blogspot.com/
fyi I just ported CPython 2.7x to non-POSIX OS. That was easy.
You need write pyconfig.h in right way, remove most of unused modules. Disable unused features.
Then fix compile, link errors. Then it just works after fixing some simple problems on run.
If You have no some POSIX header, write one by yourself. Implement all POSIX functions, that needed, such as file i/o.
Took 2-3 weeks in my case. Although I have heavily customized Python core. Unfortunately cannot opensource it :(.
After that I think Python can be ported easily to any platform, that has enough RAM.
This question already has answers here:
How do I watch a file for changes?
(28 answers)
Closed 6 years ago.
I'm trying to use a method within a Python program to detect whether a file on the file system has been modified. I know that I could have something run on an every-5-seconds to check the last modification date off of the system, but I was curious as to whether there's an easier method for doing this, without needing to require my program to check repeatedly.
Does anyone know of such a method?
watchdog
Excellent cross platform library for watching directories.
From the website
Supported Platforms
Linux 2.6 (inotify)
Mac OS X (FSEvents, kqueue)
FreeBSD/BSD (kqueue)
Windows (ReadDirectoryChangesW with I/O completion ports; ReadDirectoryChangesW worker threads)
OS-independent (polling the disk for directory snapshots and comparing them periodically; slow and not recommended)
I've used it on a couple projects and it seems to work wonderfully.
For linux, there is pyinotify.
From the homepage:
Pyinotify is a Python module for
monitoring filesystems changes.
Pyinotify relies on a Linux Kernel
feature (merged in kernel 2.6.13)
called inotify. inotify is an
event-driven notifier, its
notifications are exported from kernel
space to user space through three
system calls. pyinotify binds these
system calls and provides an
implementation on top of them offering
a generic and abstract way to
manipulate those functionalities.
Thus it is obviously not cross-platform and relies on a new enough kernel version. However, as far as I can see, requiring kernel support would be true about any non-polling mechanism.
On windows there is:
watcher, which is a nice python port of the .NET FileSystemWatcher API.
Also there's (the one I wrote) dirwatch.
Both rely on the windows ReadDirectoryChangesW function. Though for real work, I'd use watcher (proper C extension, good API, python 2 & 3 support).
Mine is mostly an experiment calling the relevant APIs on windows, so it's only interesting if you want an example of calling these things from python.
You should also see inotifyx which is very similar to the previously mentioned pyinotify, but is said to have an API which changes less.
I created a module in Python which provides about a dozen functionalities. While it will be mostly used from within Python, there is a good fraction of legacy users which will be calling it from Perl.
What is the best way to make a plug in to this module? My thoughts are:
Provide the functionalities as command line utilities and make system calls
Create some sort of server and handle RPC calls (say, via JSON RPC)
Any advise?
One other choice is to inline Python directly in your Perl script, using Inline::Python.
This may be simpler than other solutions, and only requires one additional module.
In the short run the easiest solution is to use Inline::Python. Closely followed by calling a command-line script.
In the long run, using a server to provide RPC functionality or simply calling a command-line script will give you the most future proof solution.
Why?
Becuase that way you aren't tied to Perl or Python as the language used to build the systems that consume the services provided by your library. Either method creates a clear, language independent interface that you can use with whatever development environment you adopt.
Depending on your needs any of the presented options may be the "best choice". Depending on how your needs evolve over time, a different choice may be revealed as "best".
My approach to this would be to ask a couple of questions:
How often do you change development tools. You've switched to Python from Perl. Did you start with Tcl and go to Perl? Are you going to switch to the exciting new language X in 1, 5 or 10 years? If you change tools 'often' (whatever that means) emphasize cross tool compatibility.
How fast is fast enough? Is the start up time for command line solutions ok? Does Inline::Python slow things down too much (you are still initializing a Python interpreter, it's just embedded in your Perl interpreter)?
Based on the answers to these questions, I would do the simplest thing that is likely to work.
My guess is that means in order:
Inline::Python
Command line scripts
Build an RPC server
Provide the functionalities as command line utilities and make system calls
Works really nicely. This is the way programs like Python (and Perl) are meant to use used.
I write tools that are used in a shared workspace. Since there are multiple OS's working in this space, we generally use Python and standardize the version that is installed across machines. However, if I wanted to write some things in C, I was wondering if maybe I could have the application wrapped in a Python script, that detected the operating system and fired off the correct version of the C application. Each platform has GCC available and uses the same shell.
One idea was to have the C compiled to the users local ~/bin, with timestamp comparison with C code so it is not compiled each run, but only when code is updated. Another was to just compile it for each platform, and have the wrapper script select the proper executable.
Is there an accepted/stable process for this? Are there any catches? Are there alternatives (assuming the absolute need to use native C code)?
Clarification: Multiple OS's are involved that do not share ABI. Eg. OS X, various Linuxes, BSD etc. I need to be able to update the code in place in shared folders and have the new code working more or less instantaneously. Distributing binary or source packages is less than ideal.
Launching a Python interpreter instance just to select the right binary to run would be much heavier than you need. I'd distribute a shell .rc file which provides aliases.
In /shared/bin, you put the various binaries: /shared/bin/toolname-mac, /shared/bin/toolname-debian-x86, /shared/bin/toolname-netbsd-dreamcast, etc. Then, in the common shared shell .rc file, you put the logic to set the aliases according to platform, so that on OSX, it gets alias toolname=/shared/bin/toolname-mac, and so forth.
This won't work as well if you're adding new tools all the time, because the users will need to reload the aliases.
I wouldn't recommend distributing tools this way, though. Testing and qualifying new builds of the tools should be taking up enough time and effort that the extra time required to distribute the tools to the users is trivial. You seem to be optimizing to reduce the distribution time. Replacing tools that quickly in a live environment is all too likely to result in lengthy and confusing downtime if anything goes wrong in writing and building the tools--especially when subtle cross-platform issues creep in.
Also, you could use autoconf and distribute your application in source form only. :)
You know, you should look at static linking.
These days, we all have HUGE hard drives, and a few extra megabytes (for carrying around libc and what not) is really not that big a deal anymore.
You could also try running your applications in chroot() jails and distributing those.
Depending on your mix os OSes, you might be better off creating packages for each class of system.
Alternatively, if they all share the same ABI and hardware architecture, you could also compile static binaries.