Execute gcloud commands with python subprocess in Airflow task - python

I want to build Airflow tasks that use multiple gcloud commands.
A simple example :
def worker(**kwargs) :
exe = subprocess.run(["gcloud", "compute", "instances", "list"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
print(exe.returncode)
for line in exe.stdout.splitlines() :
print(line.decode())
exe = subprocess.run(["gcloud", "compute", "ssh", "user#host", "--command=pwd"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
print(exe.returncode)
for line in exe.stdout.splitlines() :
print(line.decode())
dag = DAG("TEST", default_args=default_args, schedule_interval=None)
worker_task = PythonOperator(task_id='sample-task', python_callable=worker, provide_context = True, dag=dag)
worker_task
I have this error :
ERROR: gcloud crashed (AttributeError): 'NoneType' object has no attribute 'isatty'
Apart from airflow, these commands work fine.
I've already tried disabling gcloud interactive mode with "--quiet", but that doesn't help.
I don't want to use the GcloudOperator operator from airflow, because these commands must be integrated in a custom operator.
thank you in advance for your help

As I see, your two commands are independent, so you can run them in two separate task from the operator BashOperator, and if you want to access the output of the commands, the output of each one will be available as a xcom, you can read it using ti.xcom_pull(task_ids='<the task id>').

Maybe use BashOperator?
worker_task = BashOperator(task_id="sample-task",bash_command='gcloud compute instances list', dag=dag)

Related

subprocess doesn't write the output file

I'm working with python in a jupyter notebook.
I want to execute the following command :
$ gdalbuildvrt tmp_merge files
with tmp_merge being an output file of the function and set to : /home/prambaud/gfc_results/test/tmp_tile.vrt
and files being all the tiles to merge in the vrt file set to :/home/prambaud/gfc_results/test/tile_*.tif
This function authorize the use of the wildcard.
To run it in my Jupyter notebook I use the subprocess module:
command = [
'gdalbuildvrt',
'/home/prambaud/gfc_results/test/tmp_tile.vrt',
'/home/prambaud/gfc_results/test/tile_*.tif'
]
process = subprocess.run(
command,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
#cwd=os.path.expanduser('~')
)
print(process. stdout)
as a result I obtain the following :
0...10...20...30...40...50...60...70...80...90...100 - done.
with no error messages. BUT the output file is not created. Does anyone know what could prevent the subprocess.run function to create and write in a file ?
PS:
I've also tried to run the command from the jupyter notebook with ! and the same parameters and the tmp file have of course bee created...
My command can only be executed from shell so using subprocess I need to add the shell keyword as :
process = subprocess.run(
command,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
shell=True
)

How can I do a nohup command using airflows ssh_operator?

I'm new to airflow and I'm trying to run a job on an ec2 instance using airflow's ssh_operator like shown below:
t2 = SSHOperator(
ssh_conn_id='ec2_ssh_connection',
task_id='execute_script',
command="nohup python test.py &",
retries=3,
dag=dag)
The job takes few hours and I want airflow to execute the python script and end. However when the command is executed and the dag completes the script is terminated on the ec2 instance. I also noticed that the above code doesn't create a nohup.out file.
I'm looking at how to run nohup using SSHOperator. It seems like this might be a python related issue because I'm getting the following error on EC2 script when the nohup has been executed:
[Errno 32] Broken pipe
Thanks!
Airflow's SSHHook uses the Paramiko module for SSH connectivity. There is an SO question regarding Prarmiko and nohup. One of the answers suggests to add sleep after the nohup command. I cannot explain exactly why, but it actually works. It is also necessary to set get_pty=True in SSHOperator.
Here is a complete example that demonstrates the solution:
from datetime import datetime
from airflow import DAG
from airflow.contrib.operators.ssh_operator import SSHOperator
default_args = {
'start_date': datetime(2001, 2, 3, 4, 0),
}
with DAG(
'a_dag', schedule_interval=None, default_args=default_args, catchup=False,
) as dag:
op = SSHOperator(
task_id='ssh',
ssh_conn_id='ssh_default',
command=(
'nohup python -c "import time;time.sleep(30);print(1)" & sleep 10'
),
get_pty=True, # This is needed!
)
The nohup.out file is written to the user's $HOME.

Structuring python code to run a message through subprocess.Popen

I am in a process of building a simple remote shell tool to communicate with Windows 10. Server sends a "message" through its own shell to the client who runs the message. I need this received message to be run by other process other that default cmd (shell=True) - a specified app.exe. Here is the code that runs on the client:
1)
def work( storage, message ) :
import subprocess
process = subprocess.Popen([message], stdout=subprocess.PIPE, stderr=None, shell=True)
#Launch the shell command:
output = process.communicate()
print output[0]
I tried including "app.exe" or "cmd" to execute the message but with that I get error: TypeError: bufsize must be an integer.
I have also tried pinpointing the issue locally and I can run:
2)
import subprocess
import sys
subprocess.Popen(["C:\\Users\\User\\Desktop\\app.exe", "-switch"] + sys.argv[1:], shell=False)
and pass arguments from a command terminal and it works as it should. Now I am trying to apply the same logic to a remote execution with my program and use either solution 1 or 2.
Update:
3) Trying to implement what I did locally to a remote solution:
def work( storage, message ) :
import subprocess
import sys
process = subprocess.Popen(["C:\\Users\\User\\Desktop\\app.exe", "-switch"] + sys.argv[1:], shell=False)
#Launch the shell command:
output = process.communicate()
print output[0]
I tried replacing sys.argv[1:] with message but I get:
TypeError: can only concatenate list (not "str") to list
shell=True doesn't mean the first argument to Popen is a list of arguments to the shell; it just means the first argument is processed by the shell, rather than being arguments to whatever system call your system would use to execute a new process.
In this case, you appear to want to run app.exe with a given argument; that's simply
cmd = r"C:\Users\User\Desktop\app.exe"
subprocess.Popen([cmd, "-switch", message], stdout=subprocess.PIPE)
#chepner sir, you are a very helpful. That was it! I am so happy, thanks for your help.
Your solution:
Popen(["...\\app.exe", "-switch", message], stdout=subprocess.PIPE, stderr=None)
That was the badger!

Bash Operator error: No such file or directory in airflow

I am a newbie to Airflow and struggling with BashOperator. I want to access a shell script using bash operatory in my dag.py.
I checked:
How to run bash script file in Airflow
and
BashOperator doen't run bash file apache airflow
on how to access shell script through bash operator.
This is what I did:
cmd = "./myfirstdag/dag/lib/script.sh "
t_1 = BashOperator(
task_id='start',
bash_command=cmd
)
On running my recipe and checking in airflow I got the below error:
[2018-11-01 10:44:05,078] {bash_operator.py:77} INFO - /tmp/airflowtmp7VmPci/startUDmFWW: line 1: ./myfirstdag/dag/lib/script.sh: No such file or directory
[2018-11-01 10:44:05,082] {bash_operator.py:80} INFO - Command exited with return code 127
[2018-11-01 10:44:05,083] {models.py:1361} ERROR - Bash command failed
Not sure why this is happening. Any help would be appreciated.
Thanks !
EDIT NOTE: I assume that it's searching in some airflow tmp location rather than the path I provided. But how do I make it search for the right path.
Try this:
bash_operator = BashOperator(
task_id = 'task',
bash_command = '${AIRFLOW_HOME}/myfirstdag/dag/lib/script.sh '
dag = your_dag)
For those running a docker version.
I had this same issue, took me a while to realise the problem, the behaviour can be different with docker. When the DAG is run it moves it tmp file, if you do not have airflow on docker this is on the same machine. with my the docker version it moves it to another container to run, which of course when it is run would not have the script file on.
check the task logs carefully, you show see this happen before the task is run.
This may also depend on your airflow-docker setup.
Try the following. It needs to have a full file path to your bash file.
cmd = "/home/notebook/work/myfirstdag/dag/lib/script.sh "
t_1 = BashOperator(
task_id='start',
bash_command=cmd
)
Are you sure of the path you defined?
cmd = "./myfirstdag/dag/lib/script.sh "
With the heading . it means it is relative to the path where you execute your command.
Could you try this?
cmd = "find . -type f"
try running this:
path = "/home/notebook/work/myfirstdag/dag/lib/script.sh"
copy_script_cmd = 'cp ' + path + ' .;'
execute_cmd = './script.sh'
t_1 = BashOperator(
task_id='start',
bash_command=copy_script_cmd + execute_cmd
)

Execute .R script within Python using Rscript.exe shell

I have an .R file saved locally at the following path:
Rfilepath = "C:\\python\\buyback_parse_guide.r"
The command for RScript.exe is:
RScriptCmd = "C:\\Program Files\\R\\R-2.15.2\\bin\\Rscript.exe --vanilla"
I tried running:
subprocess.call([RScriptCmd,Rfilepath],shell=True)
But it returns 1 -- and the .R script did not run successfully. What am I doing wrong? I'm new to Python so this is probably a simple syntax error... I also tried these, but they all return 1:
subprocess.call('"C:\Program Files\R\R-2.15.2\bin\Rscript.exe"',shell=True)
subprocess.call('"C:\\Program Files\\R\\R-2.15.2\\bin\\Rscript.exe"',shell=True)
subprocess.call('C:\Program Files\R\R-2.15.2\bin\Rscript.exe',shell=True)
subprocess.call('C:\\Program Files\\R\\R-2.15.2\\bin\\Rscript.exe',shell=True)
Thanks!
The RScriptCmd needs to be just the executable, no command line arguments. So:
RScriptCmd = "\"C:\\Program Files\\R\\R-2.15.2\\bin\\Rscript.exe\""
Then the Rfilepath can actually be all of the arguments - and renamed:
RArguments = "--vanilla \"C:\\python\\buyback_parse_guide.r\""
It looks like you have a similar problem to mine. I had to reinstall RScript to a path which has no spaces.
See: Running Rscript via Python using os.system() or subprocess()
This is how I worked out the communication between Python and Rscript:
part in Python:
from subprocess import PIPE,Popen,call
p = subprocess.Popen([ path/to/RScript.exe, path/to/Script.R, Arg1], stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
out = p.communicate()
outValue = out[0]
outValue contains the output-Value after executing the Script.R
part in the R-Script:
args <- commandArgs(TRUE)
argument1 <- as.character(args[1])
...
write(output, stdout())
output is the variable to send to Python

Categories