I'm working on a web application (using Django) that use another software to make some processing. This software needs to set its working directory to be in the environment variables. When a client make a request the app create the working directory (create data to be used by the external software). Then set the environment variable used by the external software to the created directory. Finally we call the external software and get the result.
Here's a summary of what the app is doing :
def request(data):
path = create_working_directory(data)
os.environ['WORKING_DIRECTORY'] = path
result = call_the_external_software()
I haven't tested this yet (in reality it's not as simple as in this example). I'm thinking to execute this function in new process. Will I have problems when multiple client make simultaneous requests? If yes what should I do to fix the problems?
ps : I can't change anything on the external program.
See https://docs.python.org/2/library/subprocess.html#subprocess.Popen. Note that Popen takes a "env" argument that you can use to define environment variables in the child call.
def request(data):
path = create_working_directory(data)
env = {"WORKING_DIRECTORY": path}
result = subprocess.call([ext_script] + ext_args, env=env)
return result # presumably
Related
I like to avoid using spark-submit and instead start my PySpark code with python driver_file.py
We have some proxy settings we set up using spark.driver.extraJavaOptions with spark-submit or spark-defaults config file. I would instead like to set this option inside my Python code so I can run it with python driver_file.py
For some reason though when I try to do so with the following code, I cannot access the resource I am trying to access. But by using the same option within spark-defaults I can. What am I doing wrong?
sconf = SparkConf().set("spark.serializer","org.apache.spark.serializer.KryoSerializer").set('spark.driver.extraJavaOptions', 'proxy_stuffness')
sconf.setAppName("something")
sc = SparkContext(conf = sconf)
print 'Config: ', sc.getConf().getAll()
The issue is that some settings can not be set at runtime in your driver. This depends somewhat on what type of environment/cluster you are running spark in and how you submit the application to it. I believe the java options settings can only be set through spark-defaults.conf or on the command line call to spark-submit.
From the docs:
Spark properties mainly can be divided into two kinds: one is related
to deploy, like “spark.driver.memory”, “spark.executor.instances”,
this kind of properties may not be affected when setting
programmatically through SparkConf in runtime, or the behavior is
depending on which cluster manager and deploy mode you choose, so it
would be suggested to set through configuration file or spark-submit
command line options; another is mainly related to Spark runtime
control, like “spark.task.maxFailures”, this kind of properties can be
set in either way.
https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties
I am working on migrating an existing python GAE (Google App Engine) standard environment app to the flexible environment. I read through the guide and decided to try out the python-compact runtime, as it's always good to re-use as much code as possible.
In the standard environment app, we use background_thread.start_new_background_thread() to spawn a bunch of infinite-loop threads to work on some background work forever. However, I couldn't get start_new_background_thread working in the flexible environment, even for some really simple app. Like this sample app:
github.com/GoogleCloudPlatform/python-docs-samples/tree/master/appengine/background
I keep getting the following error while running the app in the cloud (it works fine locally though).
I debugged into it by using the cloud debugger, but there was no any error message available at all while the exception was raised in the background_thread.py
Any idea how I can run a long-live background thread in the flexible environment with python-compact runtime? Thanks!
One of the differences between App Engine standard and App Engine flexible is that with Flex we're really just running a docker container. I can think of 2 approaches to try out.
1. Just use Python multiprocessing
App Engine standard enforces a sandbox that mostly means no direct use of threads or processes. With Flex, you should be able to just use the standard Python lib for starting a new sub process:
https://docs.python.org/3/library/subprocess.html
2. Use supervisord and docker
If that doesn't work, another approach you could take here is to customize the docker image you're using in Flex, and use supervisord to start multiple processes. First, generate the dockerfile by cd-ing into folder with your sources and running:
gcloud preview app gen-config --custom
This will create a Dockerfile that you can customize. Now, you are going to want to start 2 processes - the process we were starting (I think for python-compat it's gunicorn) and your background process. The easiest way to do that with docker is to use supervisord:
https://docs.docker.com/engine/admin/using_supervisord/
After modifying your Dockerfile and adding a supervisord.conf, you can just deploy your app as you normally would with gcloud preview app deploy.
Hope this helps!
I wish the documentation said that background_thread was not a supported API.
Anyway, I've found some hacks to help with some thread incompatibilities. App Engine uses os.environ to read a lot of settings. The "real" threads in your application will have a bunch of environment variables set there. The background threads you start will have none. One hack I've used is to copy some of the environment variables. For example, I needed to copy set the SERVER_SOFTWARE variable in the background threads in order to get the App Engine cloud storage library to work. We use something like:
_global_server_software = None
_SERVER_SOFTWARE = 'SERVER_SOFTWARE'
def environ_wrapper(function, args):
if _global_server_software is not None:
os.environ[_SERVER_SOFTWARE] = _global_server_software
function(*args)
def start_thread_with_app_engine_environ(function, *args):
# HACK: Required for the cloudstorage API on Flexible environment threads to work
# App Engine relies on a lot of environment variables to work correctly. New threads get none
# of those variables. loudstorage uses SERVER_SOFTWARE to determine if it is a test instance
global _global_server_software
if _global_server_software is None and os.environ.get(_SERVER_SOFTWARE) is not None:
_global_server_software = os.environ[_SERVER_SOFTWARE]
t = threading.Thread(target=environ_wrapper, args=(
function, args))
t.start()
I've been digging into the world of Python and GUI applications and have made some considerable progress. However, I'd like some advice on how to proceed with the following:
I've created a GUI application using python (2.6.6 - cannot upgrade system due to it being legacy) and gtk that displays several buttons e.g. app1, app2, app3
When I click on a button, it then runs a bash shell script. This script will set up some required environment variables and then execute another external application (that uses these env variables)
Example:
1) use clicks on button app1
2) GUI then launches app1.sh to set up environment variables
3) GUI then runs external_app1
# external_app1 is an example application
# that requires that some environment
# variables to be set before it can launch
Example app1.sh contents:
#/bin/bash
export DIR=/some/location/
export LICENSE=/some/license/
export SOMEVAR='some value'
NOTE: Due to the way the environment is configured, it has to launch shell scripts first to set up the environment etc, and then launch the external applications. The shell scripts will be locked down so it cannot be edited by anyone once I've tested them.
So I've thought about how to have the python GUI execute this and so far, I am doing the following:
When user clicks on app1, check if app1.sh is executable/readable, if not, return error
Create another helper script, let's say helper1.sh that will contain the app1.sh followed by the external_app1 command and then have python execute that helper1.sh script via the below:
subprocess.Popen(helper1.sh, shell=True, stdout=out, stderr=subprocess.PIPE, close_fds=True)
Example helper1.sh contents:
#!/usr/bin/env bash
source app1.sh # sets up env variables
if [ $? = 0 ]; then
external_app & # Runs the actual application in background
else
echo "Error executing app1.sh" 2>/dev/stderr
fi
This is done so that the helper script executes in its own subshell and so that I can run multiple environment setup / external applications (app2, app3 etc).
So I ask:
Is there a better perhaps more pythonic way of doing this? Can someone point me in the right direction?
And when it comes to logging and error handling, how to effectively capture stderr or stdout from the helper scripts (e.g. helper1.sh) without blocking/freezing the GUI? Using threads or queues?
Thank you.
As I understand your question, you're trying to execute an external command with one of n sets of environment variables, where n is specified by the user. The fact that it's a GUI application doesn't seem relevant to the problem. Please correct me if I'm missing something.
You have several choices:
Execute a command in Python with custom environment variables
Rather than store the environment variables in separate files, you can set them directly with the env argument to Popen():
If env is not None, it must be a mapping that defines the environment variables for the new process; these are used instead of inheriting the current process’ environment, which is the default behavior.
So instead of having app1.sh, app2.sh, app3.sh, and so on, store your environment variable sets in Python, and add them to the environment you pass to Popen(), like so:
env_vars = {
1: {
'DIR': '/some/location/',
'LICENSE': '/some/license/'
'SOMEVAR': 'some value'
},
2: ...
}
...
environ_copy = dict(os.environ)
environ_copy.update(env_vars[n])
subprocess.Popen('external_application', shell=True, env=environ_copy, ...)
Modify the environment with a wrapper script
If your environment vars must live in separate, dedicated shell scripts something like your helper is the best you can do.
We can clean it up a little, though:
#!/usr/bin/env bash
if source "$1"; then # Set up env variables
external_app # Runs the actual application
else
echo "Error executing $1" 2>/dev/stderr
exit 1 # Return a non-zero exit status
fi
This lets you pass app1.sh to the script, rather than create n separate helper files. It's not clear why you're using & to background the process - Popen starts a separate process which doesn't block the Python process from continuing. With subprocess.PIPE you can use Popen.communicate() to get back the process' stdout and stderr.
Avoid setting environment variables at all
If you have control of external_process (i.e. you wrote it, and can modify it), you'd be much better off changing it to use command line arguments, rather than environment variables. That way you could call:
subprocess.Popen('external_command', '/some/location/', '/some/license/', 'some value')
and avoid needing shell=True or a wrapper script entirely. If external_command expects a number of variables it might be better to use --flags (e.g. --dir /some/location/) rather than positional arguments. Most programming languages have a argument processing library (or several) to make this easy; Python provides argparse for this purpose.
Using command line arguments rather than environment variables will make external_process much more user friendly, especially for the use case you're describing. This is what I would suggest doing.
I'm trying to use setfsuid() with python 2.5.4 and RHEL 5.4.
Since it's not included in the os module, I wrapped it in a C module of my own and installed it as a python extension module using distutils.
However when I try to use it I don't get the expected result.
setfsuid() returns value indicating success (changing from a superuser), but I can't access files to which only the newly set user should have user access (using open()), indicating that fsuid was not truely changed.
I tried to verify setfsuid() worked, by running it consecutively twice with the same user input
The result was as if nothing had changed, and on every call the returned value was of old user id different from the new one. I also called getpid() from the module, and from the python script, both returned the same id. so this is not the problem.
Just in case it's significant, I should note that I'm doing all of this from within an Apache daemon process (WSGI).
Anyone can provide an explanation to that?
Thank you
The ability to change the FSUID is limited to either root or non-root processes with the CAP_SETFCAP capability. These days it's usually considered bad practice to run a webserver with root permissions so, most likely, you'll need to set the capability on the file server (see man capabilities for details). Please note that doing this could severly affect your overall system's security. I'd recommend considering spawning a small backend process that runs as root and converses with your WSGI app via a local UNIX socket prior to mucking with the security of a high-profile target like Apache.
I'm working on a grid system which has a number of very powerful computers. These can be used to execute python functions very quickly. My users have a number of python functions which take a long time to calculate on workstations, ideally they would like to be able to call some functions on a remote powerful server, but have it appear to be running locally.
Python has an old function called "apply" - it's mostly useless these days now that python supports the extended-call syntax (e.g. **arguments), however I need to implement something that works a bit like this:
rapply = Rapply( server_hostname ) # Set up a connection
result = rapply( fn, args, kwargs ) # Remotely call the function
assert result == fn( *args, **kwargs ) #Just as a test, verify that it has the expected value.
Rapply should be a class which can be used to remotely execute some arbitrary code (fn could be literally anything) on a remote server. It will send back the result which the rapply function will return. The "result" should have the same value as if I had called the function locally.
Now let's suppose that fn is a user-provided function I need some way of sending it over the wire to the execution server. If I could guarantee that fn was always something simple it could could just be a string containing python source code... but what if it were not so simple?
What if fn might have local dependencies: It could be a simple function which uses a class defined in a different module, is there a way of encapsulating fn and everything that fn requires which is not standard-library? An ideal solution would not require the users of this system to have much knowledge about python development. They simply want to write their function and call it.
Just to clarify, I'm not interested in discussing what kind of network protocol might be used to implement the communication between the client & server. My problem is how to encapsulate a function and its dependencies as a single object which can be serialized and remotely executed.
I'm also not interested in the security implications of running arbitrary code on remote servers - let's just say that this system is intended purely for research and it is within a heavily firewalled environment.
Take a look at PyRO (Python Remote objects) It has the ability to set up services on all the computers in your cluster, and invoke them directly, or indirectly through a name server and a publish-subscribe mechanism.
It sounds like you want to do the following.
Define a shared filesystem space.
Put ALL your python source in this shared filesystem space.
Define simple agents or servers that will "execfile" a block of code.
Your client then contacts the agent (REST protocol with POST methods works well for
this) with the block of code.
The agent saves the block of code and does an execfile on that block of code.
Since all agents share a common filesystem, they all have the same Python library structure.
We do with with a simple WSGI application we call "batch server". We have RESTful protocol for creating and checking on remote requests.
Stackless had ability to pickle and unpickle running code. Unfortunately current implementation doesn't support this feature.
You could use a ready-made clustering solution like Parallel Python. You can relatively easily set up multiple remote slaves and run arbitrary code on them.
You could use a SSH connection to the remote PC and run the commands on the other machine directly. You could even copy the python code to the machine and execute it.
Syntax:
cat ./test.py | sshpass -p 'password' ssh user#remote-ip "python - script-arguments-if-any for test.py script"
1) here "test.py" is the local python script.
2) sshpass used to pass the ssh password to ssh connection