I have a pretty complex computation code written in Octave and a python script which receives user input, and needs to run the Octave code based on the user inputs. As I see it, I have these options:
Port the Octave code to python.
Use external libraries (i.e. oct2py) which enable you to run the Octave/Matlab engine from python.
Communicate between a python process and an octave process. One such possibility would be to use subprocess from the python code and wait for the answer.
Since I'm pretty reluctant to port my code to python and I don't want to rely on maintenance of external libraries such as oct2py, I am in favor of option 3. However, since the system should scale well, I do not want to spawn a new octave process for every request, and a tasks queue system seems more reasonable. Is there any (recommended) tasks queue system to enqueue tasks in python and have an octave worker on the other end process it?
The way it is described here, option 3 degenerates to option 2 because Octave does not have an obvious way (an API or package) for the 'Octave worker' to connect to a task queue.
The only way Octave does "networking" is by the sockets package and this means implementing the protocol for communicating with the task queue from scratch (in Octave).
The original motivation for having an 'Octave worker' is to have the main process of Octave launch once and then "direct it" to execute functions and return results, rather than launching the main process of Octave for every call to a function.
Since Octave cannot do 'a worker' (that launches, listens to a 'channel' and executes code) out of the box, the only other way to achieve this is to have the task queue framework all work in Python and only call Octave when you need its functionality, most likely via oct2py (i.e. option 2).
There are many different ways to do this ranging from Redis, to PyPubSub, Celery and RabbitMQ. All of them straightforward and very well documented. PyPubSub does not require any additional components.
(Just as a note: The solution of having an 'executable' octave script, calling it via Python and blocking until it returns is not as bad as it sounds however and for some parallel-processing frameworks it is the only way to have multiple copies of the same Octave script operate on different data segments.)
All three options are reasonable depending on your particular case.
I don't want to rely on maintenance of external libraries such as oct2py, I am in favor of option 3
oct2py is implemented using option 3. You can reinvent what it already does or use it directly. oct2py is pure Python and it has permissive license: if its development were to stop tomorrow; you could include its code alongside yours.
Related
I'm trying to understand how I can develop a system that is made from two python script running in a linux box.
The first one starts on system boot and always run, basically connects to a MQTT server and wait for the other python script. The second one is called from the command line, doing some works and then pass some data, basically three strings, to the first one and than exits.
Which way is the "correct" one to be used in this situation to pass data from script two to script one?
There are multiple ways to accomplish this (blocking and non-blocking). Depending on your MQTT server you could use it in a pub/sub approach to pass the data from the temporary script to the running one.
If that is not possible another pub/sub server could be used as e.g. redis. Especially the pub/sub functionality of redis is very useful for this. redis is well supported in python.
Another more lightweight possibility is to use a First In First Out (FIFO) queue c.f. article on using fifos in python or blocking vs. non-blocking fifos
FIFOs are easy to use if both processes run on the same computer. redis would be preferable if both scripts run on different computers.
There are more complex packages for inter-process communication around, e.g. rabbitMQ, zeroMQ,.. but they might be overkill for your use case...
I have a running CLI application in Python that uses threads to execute some workers. Now I am writing a GUI using electron for this application. For simple requests/responses I am using gRPC to communicate between the Python application and the GUI.
I am, however, struggling to find a proper publishing mechanism to push data to the GUI: gRPCs integrated streaming won't work since it uses generators; as already mentioned my longer, blocking tasks are executed using threads (subclasses of threading.Thread). Also I'd like to emit certain events (e.g., the progress) from within those threads.
Then I've found the Flasks SocketIO implementation, which is, however, a blocking execution and thus not really suited for what I have in mind - I'd have to again execute two processes (Flask and my CLI application)...
Another package I've found is websockets but I can't get my head around how I could implement this producer() function that they mention in the patterns.
My last idea would be to deploy a broker-based message system like Redis or simply fall back to the brokerless zmq, which is a bit of a hassle to setup for the GUI application.
So the simple question:
Is there any easy framework that allows to create a server-"task" in a Python that I can pass messages to publish to?
For anyone struggling with concurrency in python:
No, there isn't any simple framework. IMHO pythons' concurrency handling is a bit of a mess (compared to other languages like golang, where concurrency is built in). There's multiple major packages implementing this, one of them asyncio, but most of them are incompatible. I've ended up using a similar solution like proposed in this question.
I love Python's global interpreter lock because it makes the underlying C code simple.
But it means that each Python interpreter main loop is restricted to one thread at a time.
This is bad because recently the number of cores per processor chip has been doubling frequently.
One of the supposed advantages to zeromq is that it makes multi-threaded programming "easy" or easier.
Is it possible to launch multiple Python interpreters in the same process and have them communicate only using in-process zeromq with no other shared state? Has anyone tried it? Does it work well? Please comment and/or provide links.
I don't know of any way to create multiple instances of the Python interpreter within a single process, but I do have experience with splitting multiple instances across multiple processes and communicating with zmq.
I've been using multiprocessing to implement an island-model architecture for global optimization, with zmq for managing communication between the islands. Each island is its own process with its own Python interpreter, created and managed by the master archipelago process.
Using multiprocessing allows you to launch as many independent Python interpreters as you wish, but they all reside in their own processes with a separate memory space. I believe the OS scheduler takes care of assigning processes to cores and sharing CPU time. The separate memory space is the hardest part, because it means you have to explicitly communicate. To communicate between processes, the objects/data you wish to send must be serializable, because zmq sends byte-strings.
The nice thing about zmq is that it's a piece of cake to scale across systems distributed over a network, and it's pretty lightweight. You can create just about any communication pattern you wish, using REP/REQ, PUB/SUB, or whatever.
But no, it's not as easy as just spinning up a few threads from the threading module.
Edit: Also, here's a Stack Overflow question similar to yours. Inside are some more relevant links which indicate that it may be possible to run multiple Python interpreters within a single process, but it doesn't look simple. Multiple independent embedded Python Interpreters on multiple operating system threads invoked from C/C++ program
Does anyone know of a working and well documented implementation of a daemon using python? Please post a link here if you know of a project that fits these two requirements.
Three options I can think of-
Make a cron job that calls your script. Cron is a common name for a GNU/Linux daemon that periodically launches scripts according to a schedule you set. You add your script into a crontab or place a symlink to it into a special directory and the daemon handles the job of launching it in the background. You can read more at wikipedia. There is a variety of different cron daemons, but your GNU/Linux system should have it already installed.
Pythonic approach (a library, for example) for your script to be able to daemonize itself. Yes, it will require a simple event loop (where your events are timer triggering, possibly, provided by sleep function). Here is the one I recommend & use - A simple unix/linux daemon in Python
Use python multiprocessing module. The nitty-gritty of trying to fork a process etc. are hidden in this implementation. It's pretty neat.
I wouldn't recommend 2 or 3 'coz you're in fact repeating cron functionality. The Linux system paradigm is to let multiple simple tools interact and solve your problems. Unless there are additional reasons why you should make a daemon (in addition to trigger periodically), choose the other approach.
Also, if you use daemonize with a loop and a crash happens, make sure that you have logs which will help you debug. Also devise a way so that the script starts again. While if the script is added as a cron job, it will trigger again in the time gap you kept.
If you just want to run a daemon, consider Supervisor, a daemon that itself controls and manages daemons.
If you want to look at the nitty-gritty, you can check out Supervisor's launch script or some of the responses to this lazyweb request.
Check this link for a double-fork daemon: http://code.activestate.com/recipes/278731-creating-a-daemon-the-python-way/
The code is readable and well-documented. You want to take a look at chapter 13 of W. Richard's book 'Advanced Programming in the UNix Environment' for detailed information on Unix daemons.
Even though Python and Ruby have one kernel thread per interpreter thread, they have a global interpreter lock (GIL) that is used to protect potentially shared data structures, so this inhibits multi-processor execution. Even though the portions in those languajes that are written in C or C++ can be free-threaded, that's not possible with pure interpreted code unless you use multiple processes. What's the best way to achieve this? Using FastCGI? Creating a cluster or a farm of virtualized servers? Using their Java equivalents, JRuby and Jython?
I'm not totally sure which problem you want so solve, but if you deploy your python/django application via an apache prefork MPM using mod_python apache will start several worker processes for handling different requests.
If one request needs so much resources, that you want to use multiple cores have a look at pyprocessing. But I don't think that would be wise.
The 'standard' way to do this with rails is to run a "pack" of Mongrel instances (ie: 4 copies of the rails application) and then use apache or nginx or some other piece of software to sit in front of them and act as a load balancer.
This is probably how it's done with other ruby frameworks such as merb etc, but I haven't used those personally.
The OS will take care of running each mongrel on it's own CPU.
If you install mod_rails aka phusion passenger it will start and stop multiple copies of the rails process for you as well, so it will end up spreading the load across multiple CPUs/cores in a similar way.
Use an interface that runs each response in a separate interpreter, such as mod_wsgi for Python. This lets multi-threading be used without encountering the GIL.
EDIT: Apparently, mod_wsgi no longer supports multiple interpreters per process because idiots couldn't figure out how to properly implement extension modules. It still supports running requests in separate processes FastCGI-style, though, so that's apparently the current accepted solution.
In Python and Ruby it is only possible to use multiple cores, is to spawn new (heavyweight) processes.
The Java counterparts inherit the possibilities of the Java platform. You could imply use Java threads. That is for example a reason why sometimes (often) Java Application Server like Glassfish are used for Ruby on Rails applications.
For Python, the PyProcessing project allows you to program with processes much like you would use threads. It is included in the standard library of the recently released 2.6 version as multiprocessing. The module has many features for establishing and controlling access to shared data structures (queues, pipes, etc.) and support for common idioms (i.e. managers and worker pools).