I want to run two python scripts.
Each one takes long time to complete.
I am working on a dual core FreeBSD machine and want to make sure that I use both the cores.
When I run both the scripts I find that both end up running in the same CPU.
How can I control that two scripts are taken by different CPUs?
I know in Linux we can specify taskset -c X python foo.py where X is the CPU number liks 0,1,2.
How can I do something similar in FreeBSD system.
The term you are looking for is "CPU affinity."
cpuset -c -l X python foo.py
See How to set CPU affinity for a process in FreeBSD for more details.
Related
I noticed my Python code always takes longer to run on Windows than it does on Mac. Is there some way to improve this? The Windows machine is very powerful so I don't think it's a hardware issue (36 core Xeon, 96GB RAM, SSD). Python versions are similar, I'm running 3.7.9 on Windows 10, and 3.7.7 on MacOS Mojave.
For example, a simple print statement takes 7 times longer. Checking the version takes 12 times longer.
I uninstalled all pip modules on Windows.
I'm trying to write some very lightweight scripts where fast runtime is important.
$ time python3 -c "print('hello world')"
hello world
real 0m0.030s
user 0m0.019s
sys 0m0.009s
$ time python3 --version
Python 3.7.7
real 0m0.015s
user 0m0.003s
sys 0m0.005s
And on Windows 10 Powershell:
(Measure-Command {python -c "print('hello world')"}).TotalSeconds
0.2249363
(Measure-Command {python --version}).TotalSeconds
0.1776381
Edit: I captured the events with SysInternals Process Monitor and it shows 11,222 events for a single invocation of python --version. Wow, no wonder it takes so long! Unfortunately this doesn't really explain it, because it shows 0.233 seconds delay between "Thread Create" and Load Image".
I would like to run two python scripts at the same time on my lap top without any decreasing in their calculation's speed.
I have searched and saw this question saying that we should use bash file.
I have searched but I did not understand what should I do and how to run those scrips with this way called bash.
python script1.py &
python script2.py &
I am inexperienced in it and I need your professional advice.
I do not understand how to do that, where and how.
I am using Windows 64bit.
Best
PS: The answer I checked the mark is a way to run in parallel two tasks, but it does not decrease the calculation time for two parallel tasks at all.
If you can install GNU Parallel on Windows under Git Bash (ref), then you can run the two scripts on separate CPUs this way:
▶ (cat <<EOF) | parallel --jobs 2
python script1.py
python script2.py
EOF
Note from the parallel man page:
--jobs N
Number of jobslots on each machine. Run up to N jobs in parallel.
0 means as many as possible. Default is 100% which will run one job per
CPU on each machine.
Note that the question has been updated to state that parallelisation does not improve calculation time, which is not generally a correct statement.
While the benefits of parallelisation are highly machine- and workload-dependent, parallelisation significantly improves the processing time of CPU-bound processes on multi-core computers.
Here is a demonstration based on calculating 50,000 digits of Pi using Spigot's algorithm (code) on my quad-core MacBook Pro:
Single task (52s):
▶ time python3 spigot.py
...
python3 spigot.py 52.73s user 0.32s system 98% cpu 53.857 total
Running the same computation twice in GNU parallel (74s):
▶ (cat <<EOF) | time parallel --jobs 2
python3 spigot.py
python3 spigot.py
EOF
...
parallel --jobs 2 74.19s user 0.48s system 196% cpu 37.923 total
Of course this is on a system that is busy running an operating system and all my other apps, so it doesn't halve the processing time, but it is a big improvement all the same.
See also this related Stack Overflow answer.
I use a batch file which contains these lines:
start python script1.py
start python script2.py
This opens a new window for each start statement.
A quite easy way to run parallel jobs of every kind is using nohup. This redirect the output to a file call nohup.out (by default). In your case you should just write:
nohup python script1.py > output_script1 &
nohup python script2.py > output_script2 &
That's it. With nohup you can also logout and the script will be continuing until they have finished
I'm facing a problem in python:
My script, at a certain point, has to run some test script written in bash, and I have to do it in parallel, and wait until they end.
I've already tried :
os.system("./script.sh &")
inside a for loop but it did not worked.
Any suggest?
Thank you!
edit
I have nt correctly explained my situation:
My phyton script resides in the home dir;
my sh scripts resides in other dirs, for instance /tests/folder1 and /tests/folder2;
Trying to use os.system implies the usage of os.chdir prior to call os.system (to avoid troubles on "no such files or directory", my .sh scripts contains some relative references), and also this method is blocking my terminal output.
Trying to use Popen and passing all the path fro home folder to my .sh lead to launch zombie processes without any responses or other.
Hope to find a solution,
Thank you guys!
Have you looked at subprocess? The convenience functions call and check_output block, but the default Popen object doesn't:
processes = []
processes.append(subprocess.Popen(['script.sh']))
processes.append(subprocess.Popen(['script2.sh']))
...
return_codes = [p.wait() for p in processes]
Can you use GNU Parallel?
ls test_scripts*.sh | parallel
Or:
parallel ::: script1.sh script2.sh ... script100.sh
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
I am using a python script to perform some calculations in my image and save the array obtained into a .png file. I deal with 3000 to 4000 images. To perform all these I use a shell script in Ubuntu. It gets the job done. But is there anyway to make it fast. I have 4 cores in my machine. How to use all of them. The script I am using is below
#!/bin/bash
cd $1
for i in $(ls *.png)
do
python ../tempcalc12.py $i
done
cd ..
tempcalc12.py is my python script
This question might be trivial. But I am really new to programming.
Thank you
xargs has --max-procs= ( or -P) option which does the job in parallel.
The following code does the job in maximum of 4 processes.
ls *.png | xargs -n 1 -P 4 python ../tempcalc12.py
You can just add a & to the python line to have everything executed in parallel:
python ../tempcalc12.py $i &
This is a bad idea though, as having too many processes will just slow everything down.
What you can do is limit the number of threads, like this:
MAX_THREADS=4
for i in $(ls *.png); do
python ../tempcalc12.py $i &
while [ $( jobs | wc -l ) -ge "$MAX_THREADS" ]; do
sleep 0.1
done
done
Every 100ms, it will check the number of running jobs, and if it is inferior to MAX_THREADS, add new jobs in background.
This is a nice hack if you just want a quick working solution, but you might also want to investigate what GNU Parallel can do.
If you have GNU Parallel you can do:
parallel python ../tempcalc12.py ::: *.png
It will do The Right Thing by spawning a job per core, even if the names your PNGs have space, ', or " in them. It also makes sure the output from different jobs are not mixed together, so if you use the output you are guaranteed that you will not get half-a-line from two different jobs.
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
I have a question about the Python Interpreter. How does is treat the same script running 100 times, for example with different sys.argv entries? Does it create a different memory space for each script or something different?
System is Linux , CentOS 6.5. Is there any operational limit that can be observed and tuned?
You won't have any problem with what you're trying to do. You can call the same script in parallel a lot of times, with different input arguments. (sys.argv entries). For each run, a new memory space will be allocated.