Schedule Python script with crontab doesn't work - python

Trying to schedule a python code in crontab, but it doesn't work.
How Can I understand the reason why?
I've already add cron and termianl to Full disk access, so this shouldn't be a problem
everything works fine when I run in terminal command
python /users/myuser/slots_update.py
Crontab command which doesnt'work:
45 12 * * * /usr/bin/python /users/myuser/slots_update.py
Python script (put different sql inside to make it simplier)
#!/usr/bin/env python
# coding: utf-8
# In[2]:
# importing the required libraries
import gspread
import pandas as pd
from oauth2client.service_account import ServiceAccountCredentials
# In[50]:
# define the scope
scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']
# add credentials to the account
creds = ServiceAccountCredentials.from_json_keyfile_name('key.json', scope)
# authorize the clientsheet
client = gspread.authorize(creds)
# In[51]:
# get the instance of the Spreadsheet
sheet = client.open('EGE_slots1')
# get the first sheet of the Spreadsheet
sheet_instance = sheet.get_worksheet(0)
# In[ ]:
sheet_instance.col_count
# In[52]:
sheet_instance.cell(col=1,row=1)
# In[12]:
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
from datetime import datetime as dt
# In[13]:
connection = create_engine('postgresql://')
# In[47]:
slots = pd.read_sql("""
select * from teachers
""",connection)
# In[53]:
sheet_instance.update('A2',slots.values.tolist())
# In[ ]:

Use full path to the json file.
creds = ServiceAccountCredentials.from_json_keyfile_name('key.json', scope) --> creds = ServiceAccountCredentials.from_json_keyfile_name('/a/b/c/key.json', scope)

Related

Reading file remotly using paramiko (or not) in python

I would like to read a file in a remote machine. I can do it using paramiko.
The file is constantly updated by newlines. I have tried to implement a python script for reading it. Here the interesting part of the code:
import glob
import sys
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import os
import pandas as pd
from scipy.linalg import norm
import time
import paramiko
import select
if __name__ == "__main__":
print("...starting")
# a lot of stuff here in the middle
ssh_client = paramiko.SSHClient()
ssh_client.load_system_host_keys()
ssh_client.connect(hostname='xxx.xx.xx.xxx',username='user',password='pass')
print("...starting transport:")
transport = ssh_client.get_transport()
channel = transport.open_session()
channel.exec_command("cat /tmp/ciao.txt")
while True:
rl, wl, xl = select.select([channel],[],[],0.0)
#print(rl.readlines())
if len(rl) > 0:
#print("printing")
string_in_file = channel.recv(1024)
if len(string_in_file) > 0:
#print("printing")
print(string_in_file)
Problem: the file is correctly read at the beginning and after, every newly written line is completely ignored or, at least, it does not produce any effect on the output of the proposed script. Any suggestions on how to read new lines when written?
Any other idea on how to achieve the same result (even without paramiko) is more than welcome. The only restriction is the use of python.
tail -f will keep following the file giving you more output as you go.
import glob
import sys
import os
import time
import paramiko
import select
if __name__ == "__main__":
print("...starting")
# a lot of stuff here in the middle
ssh_client = paramiko.SSHClient()
ssh_client.load_system_host_keys()
# for test get "user,pw" in ./test.pw
user, pw = open('test.pw').readline().strip().split(",")
ssh_client.connect(hostname='localhost',username=user,password=pw)
print("...starting transport:")
transport = ssh_client.get_transport()
channel = transport.open_session()
# 1GB is include first Gig - just a way to get all of the
# file instead of the last few lines
# include --follow=name instead of -f if you want to keep
# following files that are log rotated
channel.exec_command("tail -f --lines=1GB /tmp/test.txt")
while True:
# (don't melt cpu's with a zero timeout)
rl, wl, xl = select.select([channel],[],[])
#rl, wl, xl = select.select([channel],[],[],0.0)
if rl:
string_in_file = channel.recv(1024)
if len(string_in_file) > 0:
print(string_in_file)
else:
print("channel disconnected")
break

Python script gives Syntax error while running it from Jenkins

I have a python script to update a Google sheet.The script is working fine when i execute it locally and update the Google sheet as expected, i want to execute it automatically every 3 hours.We are using Jenkins for job scheduling and when i tried to execute it from jenkins it is showing syntax error.
Error and scripts are mentioned below.Any suggestions on how to resolve it?
Started by user admin_123
Running as SYSTEM
[EnvInject] - Loading node environment variables.
Building in workspace /var/lib/jenkins/jobs/update_oos_gs/workspace
[workspace] $ /bin/sh -xe /tmp/jenkins6318169151390457385.sh
+ export PYTHONPATH=/home/etl/bi/
+ cd /home/etl/bi/crm
+ python3 -u oos_gs_update.py
File "oos_gs_update.py", line 22
r = f"{col_name}{header}:{col_name}{len(col)+header}"
^
SyntaxError: invalid syntax
Build step 'Execute shell' marked build as failure
Finished: FAILURE
Below is my Python script,
import os
import sys
import datetime
import psycopg2
import gspread
from oauth2client.service_account import ServiceAccountCredentials
from time import sleep
from utils.config import Configuration as Config
from utils.postgres_helper import get_connection
from utils.utils import get_global_config
sys.path.append('/home/etl/bi/')
GSHEET_CONFIG_SECTION = 'gsheet'
SCOPE = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive']
SHEET_KEY='1Mq7_********y5WtB1R-ZKfz6o'
def update_sheet(sheet, table, columns="ABC", header=4):
to_update = []
table = list(zip(*table))
for col_name, col in zip(columns, table):
r = f"{col_name}{header}:{col_name}{len(col)+header}"
cells = sheet.range(r)
for cell, value in zip(cells, col):
cell.value = value
to_update.extend(cells)
sheet.update_cells(to_update)
cnx_psql =get_connection(get_global_config(), 'pg_dwh')
print('DB connected')
psql_cursor = cnx_psql.cursor()
METADATA_QUERY = '''SELECT sku,product_name,CAST(oos as TEXT) as oos FROM staging.oos_details order by oos DESC;'''
psql_cursor.execute(METADATA_QUERY)
results = psql_cursor.fetchall()
cell_values = (results)
home_dir = os.path.expanduser('~')
config=get_global_config()
gsheet_config_section = GSHEET_CONFIG_SECTION
secret_file_path = os.path.join(home_dir,config.get(gsheet_config_section, 'service_account_credentials'))
creds = ServiceAccountCredentials.from_json_keyfile_name(secret_file_path, scopes=SCOPE)
client = gspread.authorize(creds)
sheet = client.open_by_key(SHEET_KEY).sheet1
#Function Call
update_sheet(sheet, cell_values)
psql_cursor.close()
cnx_psql.close()
Python 3.6 introduced the f'string{interpolation}' format described in PEP 498. Given the error message Jenkins gave you about line 22 in your code is about the newer string formatting, just change the line as follows.
r = f"{col_name}{header}:{col_name}{len(col)+header}"
to
r = "{}{}:{}{}".format(col_name, header, col_name, len(col) + header)
Try this:
r = col_name + str(header) + ':' + col_name + str(len(col)+header)
Or you can use other formatting method as well
Or upgrade python to latest version

Passing AWS Credentials in Python Script

I have a python script that gets called by a PHP. The user that invokes this php script is apache and hence, the python file also gets invoked by apache. So, it gives "Unable to locate credentials ". I've set the default credentials via awscli and when I invoke the python script as root, it works.
This is my line of code :
client = boto3.client('ses', region_name=awsregion, aws_access_key_id='AJHHJHJHJ', aws_secret_access_key='asdasd/asdasd/asd')
But, this gives "Invalid Syntax" Error. So, I tried this :
client = boto3.Session(aws_access_key_id='ASDASD', aws_secret_access_key='asd/asdasd/asdasd')
client = boto3.client('ses', region_name=awsregion, aws_access_key_id='ASDASD', aws_secret_access_key='asd/asdasd/asdasd')
Gives the same error as above. Weird thing is that this same thing is mentioned in the documentation. Even though it's not recommended, it should work.
Can somebody help me in fixing this?
Did you ever get this resolved? Here is how I connect to boto3 in my Python scripts:
import boto3
from botocore.exceptions import ClientError
import re
from io import BytesIO
import gzip
import datetime
import dateutil.parser as dparser
from datetime import datetime
import tarfile
import requests
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## Needed glue stuff
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
##
## currently this will run for everything that is in the staging directory of omniture
# set needed parms
myProfileName = 'MyDataLake'
dhiBucket = 'data-lake'
#create boto3 session
try:
session = boto3.Session(aws_access_key_id='aaaaaaaaaaaa', aws_secret_access_key='abcdefghijklmnopqrstuvwxyz', region_name='us-east-1')aws_session_token=None, region_name=None, botocore_session=None
s3 = session.resource('s3') #establish connection to s3
except Exception as conne:
print ("Unable to connect: " + str(conne))
errtxt = requests.post("https://errorcapturesite", data={'message':'Unable to connect to : ' + myProfileName, 'notify':True,'color':'red'})
print(errtxt.text)
exit()

How can I use request module in Spark?

this is the code I used.
from __future__ import print_function
import sys
from pyspark.sql import SparkSession
import sys
sys.path.append('/usr/local/lib/python2.7/site-packages')
import requests
if __name__ == "__main__":
s = Session()
toGet = s.get
spark = SparkSession\
.builder\
.appName("PythonDockerRepoStat")\
.getOrCreate()
lines = spark.read.text('/data/urls.txt').rdd.map(lambda r: r[0])
res = lines.flatMap(lambda x: x.split("\n"))\
.map(lambda x: toGet(x))
output = res.collect()
print(output)
However, I got this error: ImportError: No module named requests.sessions
When launching Spark jobs all dependencies have to be accessible for:
driver interpreter.
executor interpreter.
Extending path:
sys.path.append('/usr/local/lib/python2.7/site-packages')
will affect only local driver interpreter. To set executor environment variables you can:
modify $SPARK_HOME/conf/spark-env.sh
use spark.executorEnv.[EnvironmentVariableName] configuration option (for example by editing $SPARK_HOME/conf/spark-defaults.conf or setting corresponding SparkConf key.
At the same time you should make sure that requests is installed / accessible on every worker node (if not using local / pseudo-distributed mode).

Appengine logservice with remote_api

I am trying to get my appengine application logs from remote.
I am using remote_api, I tried with appcfg but I discarded it because it has a limit on the download/buffer so I can't download all the logs.
Now I am using the logservice, but if I use it in my code it doesn't return anything.
Here is my code:
import time
import urllib2
from google.appengine.ext.remote_api import remote_api_stub
from google.appengine.api.logservice import logservice
import getpass
import base64
import os
from appcfg import *
import getpass
import subprocess
os.environ['HTTP_X_APPENGINE_TASKRETRYCOUNT']='1'
os.environ["SERVER_SOFTWARE"] = "Developement"
os.environ['HTTP_HOST'] = 'unitTest'
os.environ['CURRENT_MODULE_ID']='default'
os.environ['CURRENT_VERSION_ID']='1.0'
email_address = "iacopo#indiegala.com"
application_url = "store-indiegala.appspot.com"
def aut():
app_name = "store-indiegala.appspot.com"
f = lambda : ("*EMAIL*", "*PASSWORD*")
remote_api_stub.ConfigureRemoteApi(None, '/_ah/remote_api', auth_func,app_name)
print("successfully authenticated")
fetch_logs()
def fetch_logs():
end_time = time.time()
print ("starting")
for req_log in logservice.fetch(end_time = end_time, offset = None, minimum_log_level = logservice.LOG_LEVEL_INFO,
include_app_logs=True, include_incomplete=True):
print req_log.ip
def auth_func():
global email_address
return (email_address, getpass.getpass('Password:'))
aut()
It successfully connects to my app and he make the logservice.fetch(), but it returns an empty object... why?
Go to your logs in the App Engine admin and make sure you have the right module and version. They can be found in each log entry, for example:
2015-01-24 21:58:43.425 / active start=2015-01-24,13:57:36.773 AppEngine-Google; (+http://code.google.com/appengine) module=default version=baseline
Becomes:
import os
os.environ["CURRENT_MODULE_ID"] = "default"
os.environ['CURRENT_VERSION_ID']= "baseline"`

Categories