How to run a parallel python script - python

I am using a code from VAST https://vast.readthedocs.io/en/latest/VoidFinder_examples.html called Gadget_VoidFinder_periodic.py. This code is supposedly parallelized. I have set up the parameter num_cpu = 54 as I am running the code in a server. When I check how many cores the code is using, I see it's using all of them but not the 100% of them. How can I change this?
I write python Gadget_VoidFinder_periodic.py in order to run the script in the server

Related

Python Subprocess in Multiple Terminals in VSCode

I'm using Python's subprocess to spawn new processes. The processes are independent of each other and output some of the data related to the account creation.
for token in userToken:
p = subprocess.Popen(['python3','create_account.py',token)
sleep(1)
I'm trying to find a way to get the output of each of the Python scripts to run in the separate VSCode terminals to clearly see how the processes are running.
For example, in VSCode you can split the terminals as in the screenshot below. It would be great if each of the processes would have its own terminal window.
I've also checked that you can run tasks in VSCode in separate terminals as described here. Is there a way to launch multiple subprocess threads in separate terminals like that?
If that's not possible, is there another way I can run subprocess in multiple terminals in VSCode?
Currently in VS Code, it supports running python code in a single thread in the terminal by default.
If you want to run the python code in two or more VS Code terminals separately, and not run them sequentially, you could manually enter the run command in the two VS Code terminals, for example:
The command to run the python file'c.py': "..:/.../python.exe ..:/.../c.py".
And for multi-threaded synchronous operation, except for the manual input of execution commands in two or more newly created terminals to make the code run synchronously, VSCode currently does not have other local support that supports this function.
I have submitted an application for this feature in Github and we are looking forward to the realization of this feature:
Github link: Can VSCode automatically run python scripts in two or more terminals at the same time?

How to start asyncio server on remote server with Python?

I have a virtual server available which runs Linux, having 8 cores. 32 GB RAM, and 1 TB in addition. It should be a development environment. (same for test and prod) This is what I could get from IT. Server can only be accessed via so-called jump servers by putty or direct tcp/ip ports (ssh is a must).
The application I am working on starts several processes via multiprocessing. In every process an asyncio event loop is started, and an asyncio socket server in some cases. Basically it is a low level data streaming and processing application (unfortunately no kafka or similar technology available yet). The live application runs forever, no or limited interaction with the user (reads/processes/writes data).
I assume, IPython is an option for this, but - and maybe I am wrong - I think it starts new kernels per client request, but I need to start new process from the main code w/o user interaction. If so, this can be an option for monitoring the application, gathering data from it, sending new user commands to the main module, but not sure how to run processes and asyncio servers remotely.
I would like to understand how these can be done on the given environment. I do not know where to start, what alternatives there are. And I do not understand ipython properly, their page is not obviuos to me yet.
Please help me out! Thank you in advance!
After lots of research and learning I came across to a possible solution in our "sandbox" environment. First, I had to split the problem into several sub-problems:
"remote" development
paralllization
scheduling and executing parallel codes
data sharing between these "engines"
controlling these "engines"
Let's see in details:
Remote development means you want write your code on your laptop, but the code must be executed on a remote server. Easy answer is Jupyter Notebook (or equivalent solution) although it has several trade-offs, also other solutions are available, but this was faster to deploy and use and had the least dependency, maintenance, etc.
parallelization: had several challenges with iPython kernel when working with multiprocessing, so every code that must run parallel will be written in separated Jupyter Notebook. In a single code I can still use eventloop to get async behaviour
executing parallel codes: there are several options I will use :
iPyParallel - "workaround" for multiprocessing
papermill - execute JNs with parameters from command line (optional)
using %%writefile magic command in Jupyter Notebook - create importables
os task scheduler like cron.
async with eventloops
No option yet: docker, multiprocessing, multithreading, cloud (aws, azure, google...)
data sharing: selected ZeroMQ, took time to learn but was simpler and easier than writing everything on pure sockets. There are alternatives but come with extra dependency, and some very useful benefit (will check them later): RabbitMQ, Redis message broker, etc. The reasons for preferring ZMQ: fast, simple, elegant, and just a library. (Knonw risk: our IT will prefer RabbitMQ, but that problem comes later :-) )
controlling the engines: now this answer is obvious: separate python code (can be tested as JN code but easy to turn into pure .py and schedule it). This one can communicate with the other modules via ZMQ sockets: healthcheck, sending new parameters, commands, etc....

Can the SSH client affect how long it takes to run some code?

I'm running a CNN algorithm on an interpreter which is running on a remote machine. The code is pretty simple and resembles this: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
But I added a TensorBoard writer to draw the training and validation loss as well as 3 images per epoch.
I used both PyCharm and MobaXterm to access the remote server (an Amazon AWS EC2 instance). But I've noticed significant performance differences whether I run the code from PyCharm or from MobaXterm with a command line.
The results averaged on 10 iterations.
Pycharm: Each iteration takes 55s.
MobaXterm: Each iteration takes 95s.
I've made sure that both are using the same conda environment and the same Python to run (Python 3.7).
Maybe I'm missing a key comprehension of MobaXterm and working with a remote interpreter?

Benchmarking a Python script on a remote machine with a distributed access

I want to benchmark execution time of some Python script that involves some heavy computations. Although I can run it in acceptable time against a few datasets on my local machine, the waiting time becomes unacceptable when I test it against thousands of datasets, which is something I would like to do, so I have to resort to using a remote computation. The problem is that the remote machine is not used just by me, but by a few people, and their code often drains computation power from CPUs that I use, making benchmark w.r.t. time barely meaningful.
At the moment, I just run Python script via executing bash script like this:
python myscript.py --dataset dataset1
I can't ask the remote server owner to grant me full access to CPUs, which is a perfect case scenario, of course. I would like to do something like this: check if current CPU is used by anything else at the moment, if it is, then freeze the process and wait until it is free again, then resume the process. Is there a way to accomplish this or are there alternatives up to this task out there?

what are the good ways to deploy and manage python script on production server?

I've written a lot of python scripts. Now I want to run it on another computer which running non-stop to crawling, analyzing data and update to an sql database.
Normally I open a command prompt and run the scripts:
python [script directory]
But with many scripts I have to open many cmd and every script call an python interpreter, so It end up with huge mess using a lot of memory.
What should I do to manage these scripts.
You haven't specified what OS your server is, but assuming that it's a Linux server you should probably research a process management tool such as Supervisord or Systemd. These are tools designed to run and monitor your program automatically, and even restart it if it crashes.
If you're using Ubuntu 16.04 then it comes with Systemd out of the box, however I personally find Supervisord easier to configure and use for simple tasks.
These programs won't necessarily help with your memory consumption issues however. Sure you can place caps on memory use for a process, but that's not really going to help you if it stops your program from working. You're probably best to re-evaluate your code and look for ways to reduce its memory footprint or use a server with more ram.
EDIT:
You've just added that the OS is Windows 10, which makes the above irrelevant. You can use the Windows Task Scheduler to automatically execute long running tasks.
you can use pythonw *.py , and it will run in background.

Categories