Multiprocessing & Multithreading - Theoretical Clarification - python

I have two Python concurrent-related questions that want someone's clarification.
Task Description:
Let us say I set up two py scripts. Each script is running two IO-bound tasks (API Calls) with multithreading (max workers as 2).
Questions:
If I don't use a virtual environment, and run both scripts through the global Python interpreter (the one in system-wide Python installation). Does this make the task I described the single process and multithreaded? Since we are using one interpreter (single process) and have two scripts running a total of 4 threads?
If I use the Pycharm to create two separate projects where each project has its own Python interpreter. Does such a setting turn the task into multiprocessing and multithreaded? Since we have two Python interpreters running and each running two threads?

Each running interpreter process has its own GIL that's separate from any other GILs in other interpreters that happen to be running. The project and virtual environment associated with the script being run are irrelevant. Virtual environments are to isolate different versions of Python and libraries so libraries from one project don't interfere with libraries in another.
If you run two scripts separately like python script.py, this will start two independent interpreters that will be unaffected by the other.
Does such a setting turn the task into multiprocessing and multithreaded?
I don't think it's really meaningful to call it a "multiprocess task" if the two processes are completely independent of each other and never talk. You have multiple processes running, but "multiprocessing" within the context of a Python program typically means one coherant program that makes use of multiple processes for a common task.

Related

Running multiple programs in parallel in Spyder

I want to run multiple programs in parallel in Spyder. How do I do it? I tried deactivating Use a single instance under Preferences---Applications but this didn't help.

Processes not spawned properly with unittest, Python 3.9 and Windows

I have a very peculiar bug with unittest (and pytest, which has the same behaviour), only on Windows and only on Python 3.8 and later.
My program depends on a 3rd party application that has to be installed separately. This application adds a DLL to a folder on the PATH, and I have an extension module that depends on that DLL. My program also uses the multiprocessing pool - so this DLL is loaded from the main process as well as all the spawned processes.
Starting with Python 3.8, DLLs are not loaded from everywhere in the PATH, but just from a specific set of directories. I call os.add_dll_directory with the proper directory.
My program works when I run it. It finds the DLL, loads my extension module, fires up child processes which all do their jobs, and everything is great. Until I try the same in a unit test.
When I try the same code in a unit test, it fails. The spawned child processes do not find the DLL. It is as if os.add_dll_directory is not carried over to the child processes. Again, only when running the unit test. It is carried over when running independently.
The unit tests also work with Python 3.7 or on Linux - where os.add_dll_directory is not required.
I'm rather stuck. I really want to have a test that tests this parallel functionality, and it seems as if I can't.

Kill an MPI process in all machines

Suppose that I run an MPI program involving 25 processes on 25 different machines. The program is initiated at one of them called the "master" with a command like
mpirun -n 25 --hostfile myhostfile.txt python helloworld.py
This is executed on Linux with some bash script and it uses mpi4py. Sometimes, in the middle of execution, I want to stop the program in all machines. I don't care if this is done graciously or not since the data I might need is already saved.
Usually, I press Ctrl + C on terminal of the "master" and I think it works as described above. Is this true? In other words, will it stop this specific MPI program in all machines?
Another method I tried is to get the PID of the process in the "master" and kill it. I am not sure about this either.
Do the above methods work as described? If no, what else do you suggest? Note that I want to avoid the use of MPI calls for that purpose like MPI_Abort that some other discussions here and here suggest.

Is there any disadvantage in using PYTHONDONTWRITEBYTECODE in Docker?

In many Docker tutorials based on Python (such as: this one) they use the option PYTHONDONTWRITEBYTECODE in order to make Python avoid to write .pyc files on the import of source modules (This is equivalent to specifying the -B option).
What are the risks and advantages of setting this option up?
When you run a single python process in the container, which does not spawn other python processes itself during its lifetime, then there is no "risk" in doing that.
Storing byte code on disk is used to compile python into byte code just upon the first invocation of a program and its dependent libraries to save that step upon the following invocations. In a container the process runs just once, therefore setting this option makes sense.

Python - Two processes after compiling?

I'm currently working on a small python script, for controlling my home PC (really just a hobby project - nothing serious).
Inside the script, there is two threads running at the same time using thread (might start using threading instead) like this:
thread.start_new_thread( Function, (Args) )
Its works as intended when testing the script... but after compiling the code using Pyinstaller there are two processes (One for each thread - I think).
How do I fix this?
Just kill the loader from the main program if it really bothers you. Here's one way to do it.
import os
import win32com.client
proc_name = 'MyProgram.exe'
my_pid = os.getpid()
wmi = win32com.client.GetObject('winmgmts:')
all_procs = wmi.InstancesOf('Win32_Process')
for proc in all_procs:
if proc.Properties_("Name").Value == proc_name:
proc_pid = proc.Properties_("ProcessID").Value
if proc_pid != my_pid:
print "killed my loader %s\n" % (proc_pid)
os.kill(proc_pid, 9)
Python code does not need to be "compiled with pyinstaller"
Products like "Pyinstaller" or "py2exe" are usefull to create a single executable file that you can distribute to third parties, or relocate inside your computer without worrying about the Python instalation - however, they don add "speed" nor is the resulting binary file any more "finished" than your original .py (or .pyw on Windows) file.
What these products do is to create another copy of the Python itnrepreter, alogn with all the modules your porgram use, and pack them inside a single file. It is likely the Pyinstaller keep a second process running to check things on the main script (like launching it, maybe there are options on it to keep the script running and so on). This is not part of a standard Python program.
It is not likely Pyinstaller splits the threads into 2 separate proccess as that would cause compatibility problems - thread run on the same process and can transparently access the same data structures.
How a "canonical" Python program runs: the main process, seen by the O.S. is the Python binary (Python.exe on Windows) - it finds the Python script it was called for - if there is a ".pyc" file for it, that is loaded - else, it loads your ".py" file and compiles that to Python byte code (not to windwos executable). This compilation is authomatic and transparent to people running the program. It is analogous to a Java compile from a .java file to a .class - but there is no explicit step needed by the programmer or user - it is made in place - and other factors control wether Python will store the resulting bytecode as .pyc file or not.
To sum up: there is no performance impact in running the ".py" script directly instead of generating an .exe file with Pyinstaller or other product. You have a disk-space usage inpact if you do, though, as you will have one copy of the Python interpreter and libraries for each of your scripts.
The URL pointeded by Janne Karila on the comment nails it - its even worse than I thought:
in order to run yioru script, pyinstaller unpacks Python DLLs and modules in a temporary directory. The time and system resources needed todo that, compared with a single script run is non-trivial.
http://www.pyinstaller.org/export/v2.0/project/doc/Manual.html?format=raw#how-one-file-mode-works

Categories