I'm trying to submit my pyspark code through cron job. When I run manually, its working fine. Through cron its not working.
Here is the project structure I have:
my-project
|
|--src
|----jobs
|------execute_metrics.py
|----utils
|------get_spark_session.py
The main code lies in execute_metrics.py from src/jobs. I'm using get_spark_session.py
in execute_metrics.py using from src.utils import get_spark_session.
I created a shell script execute_metric.sh with below content for executing the cron job
#!/bin/bash
PATH=<included entire path here>
spark-submit <included required options> src/jobs/execute_metrics.py
my-project
|
|--src
|----jobs
|------execute_metrics.py
|----utils
|------get_spark_session.py
|--execute_metric.sh
When I run this shell script using ./execute_metric.sh, I'm able to see the results.
Now, I need this to run the job every minute. So, I created a cron file with below content and copied in the same directory
* * * * * ./execute_metric.sh > execute_metric_log.log
my-project
|
|--src
|----jobs
|------execute_metrics.py
|----utils
|------get_spark_session.py
|--execute_metric.sh
|--execute_cron.crontab
This cron is running for every minute, but giving me the error:
ModuleNotFoundError: No module named 'src'
Can someone please tell me what went wrong here?
Thanks in advance
Your module directories are not getting into the python path. Try one of the following:
Explicitly set the PYTHONPATH:
#!/bin/bash
PATH=<included entire path here>
PYTHONPATH=somewhere/my-project/src
spark-submit <included required options> src/jobs/execute_metrics.py
Invoke the spark shell from your project directory:
#!/bin/bash
PATH=<included entire path here>
cd somewhere/my-project/src
spark-submit <included required options> execute_metrics.py
I got it fixed by adding a main.py file in the project directory and changed my cron to execute main.py. The project structure now looks like:
my-project
|
|--src
|----jobs
|------execute_metrics.py
|----utils
|------get_spark_session.py
|--execute_metric.sh
|--execute_cron.crontab
|--main.py
In main.py, I'm invoking the functions of execute_metrics.py.
Related
I have a folder called TEST with inside :
script.py
script.sh
The bash file is :
#!/bin/bash
# Run the python script
python script.py
If I run the bash file like this :
./TEST/script.sh
I have the following error :
python: can't open file 'script.py': [Errno 2] No such file or directory
How could I do, to tell my script.sh to look in the directory (which may change) and to allow me to run it for inside the TEST directory ?
Tricky, my python file run a sqlite database and I have the same problem when calling the script from outside the folder, it didn't look inside the folder to find the database!
Alternative
You are able to run the script directly by adding this line to the top of your python file:
#!/usr/bin/env python
and then making the file executable:
$ chmod +x script.py
With this, you can run the script directly with ./TEST/script.py
What you asked for specifically
This works to get the path of the script, and then pass that to python.
#!/bin/sh
SCRIPTPATH="$( cd "$(dirname "$0")" ; pwd -P )"
python "$SCRIPTPATH/script.py"
Also potentially useful:
You mentioned having this problem with accessing a sqlite DB in the same folder, if you are running this from a script to solve this problem, it will not work. I imagine this question may be of use to you for that problem: How do I get the path of a the Python script I am running in?
You could use $0 which is the name of the currently executing program, as invoked, combined with dirname which provides the directory component of a file path, to determine the path (absolute or relative) that the shell script was invoked under. Then, you can apply it to the python invocation.
This example worked for me:
$ t/t.sh
Hello, world!
$ cat t/t.sh
#!/bin/bash
python "$(dirname $0)/t.py"
Take it a step farther and change your current working directory which will also be inherited by python, thus helping it to find its database:
$ t/t.sh; cat t/t.sh ; cat t/t.py ; cat t/message.txt
hello, world!
#!/bin/bash
cd "$(dirname $0)"
python t.py
with(open('message.txt')) as msgf:
print(msgf.read())
hello, world!
From the shell script, you can always find your current directory: Getting the source directory of a Bash script from within. While the accepted answer to this question provide a very comprehensive and robust solution, your relatively simple case only really needs something like
#!/bin/bash
dir="$(dirname "${BASH_SOURCE[0]}")"
# Run the python script
python "$(dir)"/script.py
Another way to do it would be to change the directory from which you run the script:
#!/bin/bash
dir="$(dirname "${BASH_SOURCE[0]}")"
# Run the python script
(cd "$dir"; python script.py)
The parentheses ((...)) around cd and python create a subprocess, so that the directory does not change for the rest of your bash script. This may not be necessary if you don't do anything else in the bash portion, but is still useful to have if you ever decide to say source your script instead of running it as a subprocess.
If you do not change the directory in bash, you can do it in Python using a combination of sys.argv\[0\], os.path.dirname and os.chdir:
import sys
import os
...
os.chdir(os.path.dirname(sys.argv[0]))
#!/usr/bin/python
import requests, zipfile, StringIO, sys
extractDir = "myfolder"
zip_file_url = "download url"
response = requests.get(zip_file_url)
zipDocument = zipfile.ZipFile(StringIO.StringIO(response.content))
zipinfos = zipDocument.infolist()
for zipinfo in zipinfos:
extrat = zipDocument.extract(zipinfo,path=extractDir)
System configuration
Ubuntu OS 16.04
Python 2.7.12
$ python extract.py
when I run the code on Terminal with above command, it works properly and create the folder and extract the file into it.
Similarly, when I create a cron job using sodu rights the code executes but don't create any folder or extracts the files.
crontab command:-
40 10 * * * /usr/bin/sudo /usr/bin/python /home/ubuntu/demo/directory.py > /home/ubuntu/demo/logmyshit.log 2>&1
also tried
40 10 * * * /usr/bin/python /home/ubuntu/demo/directory.py > /home/ubuntu/demo/logmyshit.log 2>&1
Notes :
I check the syslog, it says the cron is running successfully
The above code gives no errors
also made the python program executable by chmod +x filename.py
Please help where am I going wrong.
Oups, there is nothing really wrong in running a Python script in crontab, but many bad things can happen because the environment is not the one you are used to.
When you type in an interactive shell python directory.py, the PATH and all required PYTHON environment variable have been set as part of login and interactive shell initialization, and the current directory is your home directory by default or anywhere you currently are.
When the same command is run from crontab, the current directory is not specified (but may not be what you expect), PATH is only /bin:/usr/bin and python environment variables are not set. That means that you will have to tweak environment variables in crontab file until you get a correct Python environment, and set the current directory.
I had a very similar problem and it turned out cron didn’t like importing matplotlib, I ended up having to specify Agg backend. I figured it out by putting log statements after each line to see how far the program got before it crapped out. Of course, my log was empty which tipped me off that it crashed on imports.
TLDR: log each line inside the script
I currently have a folder structure that will contain a few python scripts which need to be fired from a certain folder but I would like to write a global script that runs each python script via a seperate script in each folder.
-Obtainer
--Persona
---Arthur
----start.sh
--Initialise.sh
-Persona
--Arthur
---lib
----pybot
-----pybot.py
When I run initialise I am aiming to make initialise run "start.sh" Arthur is the bot and there will be more folders with different names and initialise with find and fire each start.sh.
In initialise.sh I have:
#!/bin/bash
. ./Persona/Arthur/start.sh
In start.sh I have:
#!/bin/bash
python ../../../Persona/Arthur/lib/pybot/pybot.py
I get this error:
python: can't open file '../../../Persona/Arthur/lib/pybot/pybot.py': [Errno 2] No such file or directory
However if I run the start.sh itself from its directory it runs fine. This is because I assume it's running it from the proper shell and consequently directory. Is there a way to make the main script run the start.sh in it's own shell like it is being run by itself? The reason why is because the pybot.py saves a bunch of files to where the start script is and because there will be more than one bot I need them to save in each seperate folder.
In the first place, do not source when you mean calling it,
#!/bin/bash
. ./Persona/Arthur/start.sh
Don't do this.
Your script has a number of issue. It won't work because of your current working directory is uncertain. You'd better have your script derive the path to relieve yourself from the hustle of abs paths or relative paths.
The general code could be
script_dir=`dirname "${BASH_SOURCE[0]}"`
then you can use this to derive the path of your target file,
#!/bin/bash
script_dir=`dirname "${BASH_SOURCE[0]}"`
"$script_dir/Persona/Arthur/start.sh"
Your python invocation becomes:
#!/bin/bash
script_dir=`dirname "${BASH_SOURCE[0]}"`
python "$script_dir/../../../Persona/Arthur/lib/pybot/pybot.py"
This should work out properly.
Regarding BASH_SOURCE, check out https://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html
If you want the directory of start.sh to be cwd, you should call cd:
#!/bin/bash
script_dir=`dirname "${BASH_SOURCE[0]}"`
cd "$script_dir"
python "$script_dir/../../../Persona/Arthur/lib/pybot/pybot.py"
*
SUMMARY
If Monit or Cron does not start your script, it could be a PATH issue.
See below.
*
I have 2 main files:
the first, A.py, is the main script in Python, that updates an sqlite database db.sqlite continously (it should never stop);
the second, B.sh, is a shell script, that - if needed - kills and restart the 1st script (it will be run under Monit pre-configured condition - see below)
Both files are executable:
A.py first line #!/usr/bin/env python
B.sh first line #!/bin/sh
Then:
chmod +x A.py
chmod +x B.sh
I configured Monit to check the timestamp of the db.sqlite file, and if the timestamp is greater than 1 minute (which implies that for some unknown reason the A.py updating function has stopped though python script may be still running - that's why I cannot check the A.py status), then it will run the B.sh shell script, which restart A.py.
All works well if I run the scripts by hand, in a shell terminal.
But under Monit it seems not to work.
I add the following in the Monit configuration file (then I check the syntax sudo monit -t and reload the configuration sudo monit reload):
check file db.sqlite with path /right_dir/db.sqlite
if timestamp > 1 minute then exec "/right_dir/restart.sh"
The monit.log report:
error : 'db.sqlite' timestamp for /right_dir/db.sqlite failed -- current timestamp is ...
info : 'db.sqlite' exec: /right_dir/restart.sh
error : 'db.sqlite' timestamp for /right_dir/db.sqlite failed -- current timestamp is ...
error : 'db.sqlite' timestamp for /right_dir/db.sqlite failed -- current timestamp is ...
and so on...
I ps aux|grep A.py but the script is not running.
I really appreciate any help.
Thank you for your time,
gil
UPDATE
I tried a simple file, with cron instead of monit: if I run the script on a terminal all work well. Cron does not.
FYI: I use Anaconda (Python suite)
File A.py:
#!/usr/bin/env python
print("OK")
import matplotlib # this do block
import math # this do not block
while True:
pass
File B.sh:
/usr/bin/pkill -f A.py
nohup /right_path/A.py &
Crontab:
*/1 * * * * /right_path/B.sh
After crontab start I check ps aux|grep A.py.
This line does not block (I see the process with ps aux):
import math
This line blocks (I do not see the process):
import matplotlib
So the problem seems to be related to the module import (some work, some not).
May be a PATH/Env issue?
Any idea?
SOLVED
It is a PATH problem.
Cron does not see the complete PATH, but only a minimal subset.
Just try for yourself: add to crontab (crontab -e) the following:
* * * * * env > /tmp/env.output
and compare the output to the env command run in your terminal.
They are probably different.
The solution is to copy the complete PATH from your terminal and paste it as the second line of B.sh just like that (take it as an example, your situation may be slightly different):
#!/bin/sh
PATH=/home/user/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
# rest of the script
Thank to this thread: https://askubuntu.com/questions/23009/reasons-why-crontab-does-not-work
crontab fails to execute a Python script. The command line I am using to run the Python script is ok.
These are solutions I had tried:
add #!/usr/bin/env python at the top of the main.py
add PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin at the top of crontab
chmod 777 to the main.py file
service cron restart
my crontab is:
PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
*/1 * * * * python /home/python_prj/main.py
and the log in /var/log/syslog is:
Nov 6 07:08:01 localhost CRON[28146]: (root) CMD (python /home/python_prj/main.py)
and nothing else.
The main.py script calls some methods from other modules under python_prj, does that matter?
Anyone can help me?
The main.py script calls some methods from other modules under python_prj, does that matter?
Yes, it does. All modules need to be findable at run time. You can accomplish this in several ways, but the most appropriate might be to set the PYTHONPATH variable in your crontab.
You might also want to set the MAILTO variable in crontab so you get emails with any tracebacks.
[update] here is the top of my crontab:
www:~# crontab -l
DJANGO_SETTINGS_MODULE=djangocron.settings
PATH=...
PYTHONPATH=/home/django
MAILTO="cron-notices#example.com"
...
# m h dom mon dow command
10-50/10 * * * * /home/django/cleanup_actions.py
...
(running cleanup actions every 10 minutes, except at the top of the hour).
Any file access in your scripts? And if so, have you used relative paths (or even: no explicit path) in your script?
When run from commandline, the actual folder is 'your path', where you start the script from. When run by cron, 'your path' may be different depending on environment variables.
So try using absolute paths to any files you access.
Check the permissions of the script. Make sure that it's executable by cron-- try chmod +x main.py.