How does os.system() share resources if I call python? - python

Let's say I have a python script to read and process a csv in which each line can be processed independently. Then lets say I have another python script that I am using to call the original script using os.system() as such:
Script A:
with open(sys.argv[1], r) as f:
# do some processing for each line
Script B:
import os
os.system('python Script_A.py somefile.csv')
How are computing resources shared between Script A and Script B? How does the resource allocation change if I call Script A as a subprocess of Script B instead of a system command? What happens if I multiprocess within either of those scenarios?
To further complicate, how would the GIL play with these different instances of python?
I am not looking for a library or a solution, but rather I'd like to understand, from the lens of python, how resources would be allocated in such scenarios so that I can optimize my code to my processing use-case.
Cheers!

Related

how can I run python file from another file, then have the new file restart the first file?

So far I don't think this is actually possible, but basically what I am trying to do is have one python program call another and run it, like how you would use import.
But then I need to be able to go from the second file back to the beginning of the first.
Doing this with import doesn't work because the first program never closed and will be still running, so running it again will only return to where it left off when it ran the second file.
Without understanding a bit more about what you want to do, I would suggest looking into the threading or multiprocessing libraries. These should allow you to create multiple instances of a program or function.
This is vague and I'm not quite sure what you're trying to do, but you can also explore the Subprocess module for Python. It will allow you to spawn new processes similarly to if you were starting them from the command-line, and your processes will also be able to talk to the child processes via stdin and stdout.
If you don't want to import any modules:
exec("file.py")
Otherwise:
import os
os.system('file.py')
Or:
import subprocess
subprocess.call('file.py')

Import Python library in terminal

I need to run a Python script in a terminal, several times. This script requires me to import some libraries. So every time I call the script in the terminal, the libraries are loaded again, which results in a loss of time. Is there any way I can import the libraries once and for all at the beginning?
(If I try the "naive" way, calling first a script just to import libraries then running my code, it doesn't work).
EDIT: I need to run the script in a terminal because actually it is made to serve in another program developed in Java. The Java code calls the Pythin script in the terminal, reads its result and processes it, then calls it again.
One solution is that you can leave the python script always running and use a pipe to communicate between processes like the code below taken from this answer.
import os, time
pipe_path = "/tmp/mypipe"
if not os.path.exists(pipe_path):
os.mkfifo(pipe_path)
# Open the fifo. We need to open in non-blocking mode or it will stalls until
# someone opens it for writting
pipe_fd = os.open(pipe_path, os.O_RDONLY | os.O_NONBLOCK)
with os.fdopen(pipe_fd) as pipe:
while True:
message = pipe.read()
if message:
print("Received: '%s'" % message)
print("Doing other stuff")
time.sleep(0.5)
The libraries will be unloaded once the script finishes, so the best way you can handle this is to write the script so it can iterate however many times you want, rather than running the whole script multiple times. I would likely use input() (or raw_input() if you're running Python2) to read in however many times you want to iterate over it, or use a library like click to create a command line argument for it.

Running executables in parallel and consecutively using python

I have two folders. FolderA contains dozens of executables, FolderB contains an initial input file and a subsequent input file (both text files).
I would like to write a script that will do the following:
Create folder for each of the executables
Copy corresponding executable and a copy of the initial input file into this new folder
Run executable for this input file
Once process is complete, copy subsequent input file and run executable again
End when this second process is done
This could easily be a for loop and I could accomplish this using the os package, unfortunately I'd like to see if there is a way to run this process in parallel for all the executables, or some strategic number of executables at a given iteration.
I've never done parallel processing before and I also have no idea how it can be accomplished for such a two-step execution process. Any help would be most appreciated. Thanks.
You can easily use multiprocessing for that.
Write a function which runs the entire process for a given executable:
def foo(exe_path):
do stuff
Then feed it into map:
import multiprocessing
pool = multiprocessing.Pool(os.cpu_count() - 1)
pool.map(foo, list_of_paths)

Python multiprocessing from Abaqus/CAE

I am using a commercial application called Abaqus/CAE1 with a built-in Python 2.6 interpreter and API. I've developed a long-running script that I'm attempting to split into simultaneous, independent tasks using Python's multiprocessing module. However, once spawned the processes just hang.
The script itself uses various objects/methods available only through Abaqus's proprietary cae module, which can only be loaded by starting up the Python bundled with Abaqus/CAE first, which then executes my script with Python's execfile.
To try to get multiprocessing working, I've attempted to run a script that avoids accessing any Abaqus objects, and instead just performs a calculation and prints the result to file2. This way, I can run the same script from the regular system Python installation as well as from the Python bundled with Abaqus.
The example code below works as expected when run from the command line using either of the following:
C:\some\path>python multi.py # <-- Using system Python
C:\some\path>abaqus python multi.py # <-- Using Python bundled with Abaqus
This spawns the new processes, and each runs the function and writes the result to file as expected. However, when called from the Abaqus/CAE Python environment using:
abaqus cae noGUI=multi.py
Abaqus will then start up, automatically import its own proprietary modules, and then executes my file using:
execfile("multi.py", __main__.__dict__)
where the global namespace arg __main__.__dict__ is setup by Abaqus. Abaqus then checks out licenses for each process successfully, spawns the new processes, and ... and that's it. The processes are created, but they all hang and do nothing. There are no error messages.
What might be causing the hang-up, and how can I fix it? Is there an environment variable that must be set? Are there other commercial systems that use a similar procedure that I can learn from/emulate?
Note that any solution must be available in the Python 2.6 standard library.
System details: Windows 10 64-bit, Python 2.6, Abaqus/CAE 6.12 or 6.14
Example Test Script:
# multi.py
import multiprocessing
import time
def fib(n):
a,b = 0,1
for i in range(n):
a, b = a+b, a
return a
def workerfunc(num):
fname = ''.join(('worker_', str(num), '.txt'))
with open(fname, 'w') as f:
f.write('Starting Worker {0}\n'.format(num))
count = 0
while count < 1000: # <-- Repeat a bunch of times.
count += 1
a=fib(20)
line = ''.join((str(a), '\n'))
f.write(line)
f.write('End Worker {0}\n'.format(num))
if __name__ == '__main__':
jobs = []
for i in range(2): # <-- Setting the number of processes manually
p = multiprocessing.Process(target=workerfunc, args=(i,))
jobs.append(p)
print 'starting', p
p.start()
print 'done starting', p
for j in jobs:
print 'joining', j
j.join()
print 'done joining', j
1A widely known finite element analysis package
2The script is a blend of a fairly standard Python function for fib(), and examples from PyMOTW
I have to write an answer as I cannot comment yet.
What I can imagine as a reason is that python multiprocessing spawns a whole new process with it's own non-shared memory. So if you create an object in your script, the start a new process, that new process contains a copy of the memory and you have two objects that can go into different directions. When something of abaqus is present in the original python process (which I suspect) that gets copied too and this copy could create such a behaviour.
As a solution I think you could extend python with C (which is capable to use multiple cores in a single process) and use threads there.
Just wanted to say that I have run into this exact issue. My solution at the current time is to compartmentalize my scripting. This may work for you if you're trying to run parameter sweeps over a given model, or run geometric variations on the same model, etc.
I first generate scripts to accomplish each portion of my modelling process:
Generate input file using CAE/Python.
Extract data that I want and put it in a text file.
With these created, I use text replacement to quickly generate N python scripts of each type, one for each discrete parameter set I'm interested in.
I then wrote a parallel processing tool in Python to call multiple Abaqus instances as subprocesses. This does the following:
Call CAE through subprocess.call for each model generation script. The script allows you to choose how many instances to run at once to keep you from taking every license on the server.
Execute the Abaqus solver for the generated models using the same, with parameters for cores per job and total number of cores used.
Extract data using the same process as 1.
There is some overhead in repeatedly checking out licenses for CAE when generating the models, but in my testing it is far outweighed by the benefit of being able to generate 10+ input files simultaneously.
I can put some of the scripts up on Github if you think the process outlined above would be helpful for your application.
Cheers,
Nathan

Control executed programm with python

I want to execute a testrun via bash, if the test needs too much time. So far, I found some good solutions here. But since the command kill does not work properly (when I use it correctly it says it is not used correctly), I decided to solve this problem using python. This is the Execution call I want to monitor:
EXE="C:/program.exe"
FILE="file.tpt"
HOME_DIR="C:/Home"
"$EXE" -vm-Xmx4096M --run build "$HOME_DIR/test/$FILE" "Auslieferung (ML) Execute"
(The opened *.exe starts a testrun which includes some simulink simulation runs - sometimes there are simulink errors - in this case, the execution time of the tests need too long and I want to restart the entire process).
First, I came up with the idea, calling a shell script containing these lines within a subprocess from python:
import subprocess
import time
process = subprocess.Popen('subprocess.sh', shell = True)
time.sleep(10)
process.terminate()
But when I use this, *.terminate() or *.kill() does not close the program I started with the subprocess call.
That´s why I am now trying to implement the entire call in python language. I got the following so far:
import subprocess
file = "somePath/file.tpt"
p = subprocess.Popen(["C:/program.exe", file])
Now I need to know, how to implement the second call "Auslieferung (ML) Execute" of the bash function. This call starts an intern testrun named "Auslieferung (ML) Execute". Any ideas? Or is it better to choose one of the other ways? Or can I get the "kill" option for bash somewhere, somehow?

Categories