Python multiprocessing avoid global variables - python

My handbrake compressor uses a multiprocessing queue to store movies to compress. In my main class I call a process which gets an queue element if exists. These process spawn another process which calls then a subprocess that calls handbrake. After that in the first process a second process will be called which reads the actual stdout of handbrake. That works very well.
Now I want to write the state (processing, waiting, finished) and the progress to a global variable. I need this, because later, I will implement an xml-rpc interface. Over this interface it should be possible to get the status of each compress thread.
Global variables are bad style, and I read, that there is something I do wrong, when I have to use it. But I don't know how I can make it right?
I have also a sqlite3 db, where I can store the progress. If I implement a web gui later I don't like to call the db file directly, I want my application to do this.

Related

Redirect stdout to file, but only within a certain class

There are a ton of resources on how to manage sys.stdout so that it prints to either just a file or tees to both a file and sys.stdout. However, my use case is different.
I am using code from a code base that uses print calls, and it would take too long to swap it out for a logger. Instead, I created a custom Tee class to duplicate output from print calls to a log file. It worked fine for the synchronous case.
Template for Tee class: How to duplicate sys.stdout to a log file?
However, I adapted the code to use multiple threads at once using the multiprocessing module. Because of this, I am unable to track which thread the print calls are coming from, and therefore which Tee to use.
It would be really useful to be able to redirect print calls, but only from within a certain block of code. For example, if I create a Tee class inside a function and then delete it, then all print calls within that function and their children would redirect to a specific file, whereas if I create a Tee class inside another thread, the same would happen.
I can imagine a complicated class using locks and synchronization to manage everything, but I am stuck on how to determine the origin of the print calls. Any suggestions?

Interprocess Communication between two python scripts without STDOUT

I am trying to create a Monitor script that monitors all the threads or a huge python script which has several loggers running, several thread running.
From Monitor.py i could run subprocess and forward the STDOUT which might contain my status of the threads.. but since several loggers are running i am seeing other logging in that..
Question: How can run the main script as a separate process and get custom messages, thread status without interfering with logging. ( passing PIPE as argument ? )
Main_Script.py * Runs Several Threads * Each Thread has separate Loggers.
Monitor.py * Spins up the Main_script.py * Monitors the each of the threads in MainScript.py ( may be obtain other messages from Main_script in the future)
So Far, I tried subprocess, process from Multiprocessing.
Subprocess lets me start the Main_script and forward the stdout back to monitor but I see the logging of threads coming in through the same STDOUT. I am using the “import logging “ Library to log the data from each threads to separate files.
I tried “process” from Multiprocessing. I had to call the main function of the main_script.py as a process and send a PIPE argument to it from monitor.py. Now I can’t see the Main_script.py as a separate process when I run top command.
Normally, you want to change the child process to work like a typical Unix userland tool: the logging and other side-band information goes to stderr (or to a file, or syslog, etc.), and only the actual output goes to stdout.
Then, the problem is easy: just capture stdout to a PIPE that you process, and either capture stderr to a different PIPE, or pass it through to real stderr.
If that's not appropriate for some reason, you need to come up with some other mechanism for IPC: Unix or Windows named pipes, anonymous pipes that you pass by leaking the file descriptor across the fork/exec and then pass the fd as an argument, Unix-domain sockets, TCP or UDP localhost sockets, a higher-level protocol like a web service on top of TCP sockets, mmapped files, anonymous mmaps or pipes that you pass between processes via a Unix-domain socket or Windows API calls, …
As you can see, there are a huge number of options. Without knowing anything about your problem other than that you want "custom messages", it's impossible to tell you which one you want.
While we're at it: If you can rewrite your code around multiprocessing rather than subprocess, there are nice high-level abstractions built in to that module. For example, you can use a Queue that automatically manages synchronization and blocking, and also manages pickling/unpickling so you can just pass any (picklable) object rather than having to worry about serializing to text and parsing the text. Or you can create shared memory holding arrays of int32 objects, or NumPy arrays, or arbitrary structures that you define with ctypes. And so on. Of course you could build the same abstractions yourself, without needing to use multiprocessing, but it's a lot easier when they're there out of the box.
Finally, while your question is tagged ipc and pipe, and titled "Interprocess Communication", your description refers to threads, not processes. If you actually are using a bunch of threads in a single process, you don't need any of this.
You can just stick your results on a queue.Queue, or store them in a list or deque with a Lock around it, or pass in a callback to be called with each new result, or use a higher-level abstraction like concurrent.futures.ThreadPoolExecutor and return a Future object or an iterator of Futures, etc.

how do get data from a created processor outside program without its original python object

am letting users in my django python app to call a function which in turn creates a process with the multiprocessor module, but i need to let users check for the processor progress (retrieve data inside it) and status (alive or done executing) and possibly terminate it (which i found easy to do with system commands) without having access to processor object since i can't store it somewhere.
is it possible to store in database only PID of the process (or some convenient identifier) and call it later? maybe try to get informations from it and manage it.
the information if i could get it directly from the process would be much reliable rather than storing it somewhere else and retrieving it there.
the function that runs in the background as a process/daemon would be completely independent of the program and has its own data that changes overtime which i need to let users be able to check its progress.

How to create a socket object using multiprocessing and pass it back to the parent process

I wish to create a sock.Socket object from a main program using multiprocessing to avoid blocking the main program if the socket is not available during the creation process.
The socket gets created but can not be returned to the main program.
After some reading it seems that objects can not be shared between python processes. It there a possible design pattern to circumvent this? Is multiprocessing suitable for this application or should I be considering another approach?
You should keep it really simple and have a socket handler thread which:
Opens the Socket,
Retains ownership of it,
Handles it for your main program, i.e. on data arriving passes it to the parent via one of an event, pubsub or a callback,
If the connection is lost either reopens it or lets the parent know and end cleanly,
On shutdown closes the socket for you.
Then everything becomes non-blocking. This is a much cleaner design pattern especially if you use pubsub.

User Input Python Script Executing Daemon

I am working on a web service that requires user input python code to be executed on my server (we have checks for code injection). I have to import a rather large module so I would like to make sure that I am not starting up python and importing the module from scratch each time something runs (it takes about 4-6s).
To do this I was planning to create a python (3.2) deamon that imports the user input code as a module, executes it and then delete/garbage collect that module. I need to make sure that that module is completely gone from RAM since this process will continue until the server is restarted. I have read a bunch of things that say this is a very difficult thing to do in python.
What is the best way to do this? Would it be better to use exec to define a function with the user input code (for variable scoping) and then execute that function and somehow remove the function? Or is there a better way to do this process that I have missed?
You could perhaps consider to create a pool of python daemon processes?
Their purpose would be to serve one request and to die afterwards.
You would have to write a pool-manager that ensures that there are always X daemon processes waiting for an incoming request. (X being the number of waiting daemon processes: depending on the required workload). The pool-manager would have to observe the pool of daemon processes and start new instances every time a process was finished.

Categories