Python subprocess.Popen : sys.stdout vs .txt file vs Cpickle.dump

Python subprocess.Popen : sys.stdout vs .txt file vs Cpickle.dump - python

I would like to know what is the best practice when you want to "return" something from a python script.
Here is my problem. I'm running a Python childScript from a parentScript using the subprocess.Popen method. I would like to get a tuple of two floats from the execution of the first script.
Now, the first method I have seen is by using sys.stdout and a pipe in the subprocess function as follow:
child.py:
if __name__ == '__main__':
myTuple = (x,y)
sys.stdout.write(str(myTuple[0]) +":"+str(myTuple[1]))
sys.stdout.flush()
parent.py:
p = subprocess.Popen([python, "child.py"], stdout=subprocess.PIPE)
out, err = p.communicate()
Though here it says that it is not recommended in most cases but I don't know why...
The second way would be to write my tuple into a text file in Script1.py and open it in Script2.py. But I guess writing and reading file takes a bit of time so I don't know if it is a better way to do?
Finally, I could use CPickle and dump my tuple and open it from script2.py. I guess that would be a bit faster than using a text file but would it be better than using sys.stdout?
What would be the proper way to do?
---------------------------------------EDIT------------------------------------------------
I forgot to mention that I cannot use import since parent.py actually generates child.py in a folder. Indeed I am doing some multiprocessing.
Parent.py creates say 10 directories where child.py is copied in each of them. Then I run each of the child.py from parent.py on several processors. And I want parent.py to gather the results "returned" by all the child.py. So parent.py cannot import child.py since it is not generated yet, or maybe I can do some sort of dynamic import? I don't know...
---------------------------------------EDIT2-----------------------------------------------
Another edit to answer a question with regards to why I proceed this way. Child.py actually calls ironpython and another script to run a .Net assembly. The reason why I HAVE to copy all the child.py files in specific folders is because this assembly generates a resource file which is then used by itself. If I don't copy child.py (and the assembly by the way) in each subfolders the resource files are copied at the root which creates conflicts when I call several processes using the multiprocessing module. If you have some suggestions about this overall architecture it is more than welcome :).
Thanks

Ordinary, you should use import other_module and call various functions:
import other_module
x, y = other_module.some_function(param='z')
If you can run the script, you also can import it.
If you want to use subprocess.Popen() then to pass a couple of floats, you could use json format: it is human readable, exact (in this case), and it is machine-readable. For example:
child.py:
#!/usr/bin/env python
import json
import sys
numbers = 1.2345, 1e-20
json.dump(numbers, sys.stdout)
parent.py:
#!/usr/bin/env python
import json
import sys
from subprocess import check_output
output = check_output([sys.executable, 'child.py'])
x, y = json.loads(output.decode())
Child.py actually calls ironpython and another script to run a .Net assembly. The reason why I HAVE to copy all the child.py files is because this assembly generates a resource file which is then used by it. If I don't copy child.py in each subfolders the resource files are copied at the root which creates conflicts when I call several processes using the multiprocessing module. If you have some suggestions about this overall architecture it is more than welcome :).
You can put the code from child.py into parent.py and call os.chdir() (after the fork) to execute each multiprocessing.Process in its own working directory or use cwd parameter (it sets the current working directory for the subprocess) if you run the assembly using subprocess module:
#!/usr/bin/env python
import os
import shutil
import tempfile
from multiprocessing import Pool
def init(topdir='.'):
dir = tempfile.mkdtemp(dir=topdir) # parent is responsible for deleting it
os.chdir(dir)
def child(n):
return os.getcwd(), n*n
if __name__ == "__main__":
pool = Pool(initializer=init)
results = pool.map(child, [1,2,3])
pool.close()
pool.join()
for dirname, _ in results:
try:
shutil.rmtree(dirname)
except EnvironmentError:
pass # ignore errors

Related

Running python program with arguments from another program

I want to call a python program from current program,
def multiply(a):
return a*5
mul = sys.argv[1]
I saved this file as test.py.so from the current file I'm calling it but i want to run the code in parallel like multiprocessing queue,But its not working.
what I tried so far,
import os
import sys
import numpy as np
cwd = os.getcwd()
l = [20,5,12,24]
for i in np.arange(len(l)):
os.system('python test.py multiply[i]')
I want to run the main script for all list items parallelly like multiprocessing. How to achieve that?

If you want to make your program work like that using that the os.system, you need to change the test.py file a bit:
import sys
def multiply(a):
return a*5
n = int(sys.argv[1])
print(multiply(n))
This code written like this takes the second element (the first one is just the name of the file) of argv that it took when you executed it with os.system and converted it to an integer, then executed your function (multiply by 5).
In these cases though, it's way better to just import this file as a module in your project.

Run another Python script in different folder

How to run another python scripts in a different folder?
I have main program:
calculation_control.py
In the folder calculation_folder, there is calculation.py
How do I run calculation_folder/calculation.py from within calculation_control.py?
So far I have tried the following code:
calculation_file = folder_path + "calculation.py"
if not os.path.isfile(parser_file) :
continue
subprocess.Popen([sys.executable, parser_file])

There are more than a few ways. I'll list them in order of inverted
preference (i.e., best first, worst last):
Treat it like a module: import file. This is good because it's secure, fast, and maintainable. Code gets reused as it's supposed
to be done. Most Python libraries run using multiple methods stretched
over lots of files. Highly recommended. Note that if your file is
called file.py, your import should not include the .py
extension at the end.
The infamous (and unsafe) exec command: execfile('file.py'). Insecure, hacky, usually the wrong answer.
Avoid where possible.
Spawn a shell process: os.system('python file.py'). Use when desperate.
Source: How can I make one python file run another?
Solution
Python only searches the current directory for the file(s) to import. However, you can work around this by adding the following code snippet to calculation_control.py...
import sys
sys.path.insert(0, 'calculation_folder') # Note: if this relavtive path doesn't work or produces errors try replacing it with an absolute path
import calculation

Workaround for using name=='main' in Python multiprocessing

As we all know we need to protect the main() when running code with multiprocessing in Python using if __name__ == '__main__'.
I understand that this is necessary in some cases to give access to functions defined in the main but I do not understand why this is necessary in this case:
file2.py
import numpy as np
from multiprocessing import Pool
class Something(object):
def get_image(self):
return np.random.rand(64,64)
def mp(self):
image = self.get_image()
p = Pool(2)
res1 = p.apply_async(np.sum, (image,))
res2 = p.apply_async(np.mean, (image,))
print(res1.get())
print(res2.get())
p.close()
p.join()
main.py
from file2 import Something
s = Something()
s.mp()
All of the functions or imports necessary for Something to work are part of file2.py. Why does the subprocess need to re-run the main.py?
I think the __name__ solution is not very nice as this prevents me from distribution the code of file2.py as I can't make sure they are protecting their main.
Isn't there a workaround for Windows?
How are packages solving that (as I never encountered any problem not protecting my main with any package - are they just not using multiprocessing?)
edit:
I know that this is because of the fork() not implemented in Windows. I was just asking if there is a hack to let the interpreter start at file2.py instead of main.py as I can be sure that file2.py is self-sufficient

When using the "spawn" start method, new processes are Python interpreters that are started from scratch. It's not possible for the new Python interpreters in the subprocesses to figure out what modules need to be imported, so they import the main module again, which in turn will import everything else. This means it must be possible to import the main module without any side effects.
If you are on a different platform than Windows, you can use the "fork" start method instead, and you won't have this problem.
That said, what's wrong with using if __name__ == "__main__":? It has a lot of additional benefits, e.g. documentation tools will be able to process your main module, and unit testing is easier etc, so you should use it in any case.

As others have mentioned the spawn() method on Windows will re-import the code for each instance of the interpreter. This import will execute your code again in the child process (and this will make it create it own child, and so on).
A workaround is to pull the multiprocessing script into a separate file and then use subprocess to launch it from the main script.
I pass variables into the script by pickling them in a temporary directory, and I pass the temporary directory into the subprocess with argparse.
I then pickle the results into the temporary directory, where the main script retrieves them.
Here is an example file_hasher() function that I wrote:
main_program.py
import os, pickle, shutil, subprocess, sys, tempfile
def file_hasher(filenames):
try:
subprocess_directory = tempfile.mkdtemp()
input_arguments_file = os.path.join(subprocess_directory, 'input_arguments.dat')
with open(input_arguments_file, 'wb') as func_inputs:
pickle.dump(filenames, func_inputs)
current_path = os.path.dirname(os.path.realpath(__file__))
file_hasher = os.path.join(current_path, 'file_hasher.py')
python_interpreter = sys.executable
proc = subprocess.call([python_interpreter, file_hasher, subprocess_directory],
timeout=60,
)
output_file = os.path.join(subprocess_directory, 'function_outputs.dat')
with open(output_file, 'rb') as func_outputs:
hashlist = pickle.load(func_outputs)
finally:
shutil.rmtree(subprocess_directory)
return hashlist
file_hasher.py
#! /usr/bin/env python
import argparse, hashlib, os, pickle
from multiprocessing import Pool
def file_hasher(input_file):
with open(input_file, 'rb') as f:
data = f.read()
md5_hash = hashlib.md5(data)
hashval = md5_hash.hexdigest()
return hashval
if __name__=='__main__':
argument_parser = argparse.ArgumentParser()
argument_parser.add_argument('subprocess_directory', type=str)
subprocess_directory = argument_parser.parse_args().subprocess_directory
arguments_file = os.path.join(subprocess_directory, 'input_arguments.dat')
with open(arguments_file, 'rb') as func_inputs:
filenames = pickle.load(func_inputs)
hashlist = []
p = Pool()
for r in p.imap(file_hasher, filenames):
hashlist.append(r)
output_file = os.path.join(subprocess_directory, 'function_outputs.dat')
with open(output_file, 'wb') as func_outputs:
pickle.dump(hashlist, func_outputs)
There must be a better way...

The main module is imported (but with __name__ != '__main__' because Windows is trying to simulate a forking-like behavior on a system that doesn't have forking). multiprocessing has no way to know that you didn't do anything important in you main module, so the import is done "just in case" to create an environment similar to the one in your main process. If it didn't do this, all sorts of stuff that happens by side-effect in main (e.g. imports, configuration calls with persistent side-effects, etc.) might not be properly performed in the child processes.
As such, if they're not protecting their __main__, the code is not multiprocessing safe (nor is it unittest safe, import safe, etc.). The if __name__ == '__main__': protective wrapper should be part of all correct main modules. Go ahead and distribute it, with a note about requiring multiprocessing-safe main module protection.

the if __name__ == '__main__' is needed on windows since windows doesnt have a "fork" option for processes.
In linux, for example, you can fork the process, so the parent process will be copied and the copy will become the child process (and it will have access to the already imported code you had loaded in the parent process)
Since you cant fork in windows, python simply imports all the code that was imported by the parent process, in the child process. This creates a similar effect, but if you dont do the __name__ trick, this import will execute your code again in the child process (and this will make it create it own child, and so on).
so even in your example main.py will be imported again (since all the files are imported again). python cant guess what specific python script the child process should import.
FYI there are other limitations you should be aware of like using globals, you can read about it here https://docs.python.org/2/library/multiprocessing.html#windows

Distributing set of python files as a single executable

I have a python script, distributed across several files, all within a single directory. I would like to distribute this as a single executable file (for linux systems, mainly), such that the file can be moved around and copied easily.
I've come as far as renaming my main file to __main__.py, and zipping everything into myscript.zip, so now I can run python myscript.zip. But that's one step too short. I want to run this as ./myscript, without creating an alias or a wrapper script.
Is this possible at all? What's the best way? I thought that maybe the zip file could be embedded in a (ba)sh script, that passes it to python (but without creating a temporary file, if possible).
EDIT:
After having another go at setuptools (I had not managed to make it work before), I could create sort of self-contained script, an "eggsecutable script". The trick is that in this case you cannot have your main module named __main__.py. Then there are still a couple of issues: the resulting script cannot be renamed and it still creates the __pycache__ directory when run. I solved these by modifying the shell-script code at the beginning of the file, and adding the -B flag to the python command there.
EDIT (2): Not that easy either, it was working because I still had my source .py files next to the "eggsecutable", move something and it stops working.

You can edit the raw zip file as a binary and insert the shebang into the first line.
#!/usr/bin/env python
PK...rest of the zip
Of course you need an appropriate editor for this, that can handle binary files (e.g.: vim -b) or you can make it with a small bash script.
{ echo '#!/usr/bin/env python'; cat myscript.zip; } > myscript
chmod +x myscript
./myscript

Firstly, there's the obligatory "thus isn't the way things are normally done, are you sure you want to do it THIS way?" warning. That said, to answer your question and not try to substitute it with what someone else thinks you should do...
You can write a script, and prepend it to a python egg. The script would extract the egg from itself, and obviously call exit before the egg file data is encountered. Egg files are importable but not executable, so the script would have to
Extract the egg from itself as an egg file with a known name in the current directory
Run python -m egg
Delete the file
Exit
Sorry, I'm on a phone at the moment, so I'll update with actual code later

Continuing my own attempts, I concocted something that works for me:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import glob, os.path
from setuptools import setup
files = [os.path.splitext(x)[0] for x in glob.glob('*.py')]
thisfile = os.path.splitext(os.path.basename(__file__))[0]
files.remove(thisfile)
setup(
script_args=['bdist_wheel', '-d', '.'],
py_modules=files,
)
# Now try to make the package runnable:
# Add the self-running trick code to the top of the file,
# rename it and make it executable
import sys, os, stat
exe_name = 'my_module'
magic_code = '''#!/bin/sh
name=`readlink -f "$0"`
exec {0} -c "import sys, os; sys.path.insert(0, '$name'); from my_module import main; sys.exit(main(my_name='$name'))" "$#"
'''.format(sys.executable)
wheel = glob.glob('*.whl')[0]
with open(exe_name, 'wb') as new:
new.write(bytes(magic_code, 'ascii'))
with open(wheel, 'rb') as original:
data = True
while (data):
data = original.read(4096)
new.write(data)
os.remove(wheel)
st = os.stat(exe_name)
os.chmod(exe_name, st.st_mode | stat.S_IEXEC)
This creates a wheel with all *.py files in the current directory (except itself), and then adds the code to make it executable. exe_name is the final name of the file, and the from my_module import main; sys.exit(main(my_name='$name')) should be modified depending on each script, in my case I want to call the main method from my_module.py, which takes an argument my_name (the name of the actual file being run).
There's no guarantee this will run in a system different from the one it was created, but it is still useful to create a self-contained file from the sources (to be placed in ~/bin, for instance).

Another less hackish solution (I'm sorry for answering my own question twice, but this doesn't fit in a comment, and I think it is better in a separate box).
#!/usr/bin/env python3
modules = [
[ 'my_aux', '''
def my_aux():
return 7
'''],
['my_func', '''
from my_aux import my_aux
def my_func():
print("and I'm my_func: {0}".format(my_aux()))
'''],
['my_script', '''
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from __future__ import (unicode_literals, division, absolute_import, print_function)
import sys
def main(my_name):
import my_aux
print("Hello, I'm my_script: {0}".format(my_name))
print(my_aux.my_aux())
import my_func
my_func.my_func()
if (__name__ == '__main__'):
sys.exit(main(__file__))
'''],
]
import sys, types
for m in modules:
module = types.ModuleType(m[0])
exec(m[1], module.__dict__)
sys.modules[m[0]] = module
del modules
from my_script import main
main(__file__)
I think this is more clear, although probably less efficient. All the needed files are included as strings (they could be zipped and b64-encoded first, for space efficiency). Then they are imported as modules and the main method is run. Care should be taken to define the modules in the right order.

how to execute nested python files

I have 3 python files.(first.py, second.py, third.py) I'm executing 2nd python file from the 1st python file. 2nd python file uses the 'import' statement to make use of 3rd python file. This is what I'm doing.
This is my code.
first.py
import os
file_path = "folder\second.py"
os.system(file_path)
second.py
import third
...
(rest of the code)
third.py (which contains ReportLab code for generating PDF )
....
canvas.drawImage('xyz.jpg',0.2*inch, 7.65*inch, width=w*scale, height=h*scale)
....
when I'm executing this code, it gives error
IOError: Cannot open resource "xyz.jpg"
But when i execute second.py file directly by writing python second.py , everything works fine..!!
Even i tried this code,
file_path = "folder\second.py"
execfile(file_path)
But it gives this error,
ImportError: No module named third
But as i stated everything works fine if i directly execute the second.py file. !!
why this is happening? Is there any better idea for executing such a kind of nested python files?
Any idea or suggestions would be greatly appreciated.
I used this three files just to give the basic idea of my structure. You can consider this flow of execution as a single process. There are too many processes like this and each file contains thousandth lines of codes. That's why i can't change the whole code to be modularize which can be used by import statement. :-(
So the question is how to make a single python file which will take care of executing all the other processes. (If we are executing each process individually, everything works fine )

This should be easy if you do it the right way. There's a couple steps that you can follow to set it up.
Step 1: Set your files up to be run or imported
#!/usr/bin/env python
def main():
do_stuff()
if __name__ == '__main__':
The __name__ special variable will contain __main__ when invoked as a script, and the module name if imported. You can use that to provide a file that can be used either way.
Step 2: Make your subdirectory a package
If you add an empty file called __init__.py to folder, it becomes a package that you can import.
Step 3: Import and run your scripts
from folder import first, second, third
first.main()
second.main()
third.main()

The way you are doing thing is invalid.
You should: create a main application, and import 1,2,3.
In 1,2,3: You should define the things as your functions. Then call them from the main application.
IMHO: I don't need that you have much code to put into separate files, you just also put them into one file with function definitions and call them properly.

I second S.Lott: You really should rethink your design.
But just to provide an answer to your specific problem:
From what I can guess so far, you have second.py and third.py in folder, along with xyz.jpg. To make this work, you will have to change your working directory first. Try it in this way in first.py:
import os
....
os.chdir('folder')
execfile('second.py')
Try reading about the os module.

Future readers:
Pradyumna's answer from here solved Moin Ahmed's second issue for me:
import sys, change "sys.path" by appending the path during run
time,then import the module that will help
[i.e. sys.path.append(execfile's directory)]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python subprocess.Popen : sys.stdout vs .txt file vs Cpickle.dump - python

Related

Running python program with arguments from another program

Run another Python script in different folder

Workaround for using name=='main' in Python multiprocessing

Distributing set of python files as a single executable

how to execute nested python files

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python subprocess.Popen : sys.stdout vs .txt file vs Cpickle.dump - python

Related

Running python program with arguments from another program

Run another Python script in different folder

Workaround for using __name__=='__main__' in Python multiprocessing

Distributing set of python files as a single executable

how to execute nested python files

Categories

Resources

Workaround for using name=='main' in Python multiprocessing