I'm using PyGitHub to update my GitHub.com repos from inside a Ubuntu server using a python script.
I have noticed there are times, when my script just hangs there and there's no error message indicating what went wrong.
This is my script
from pathlib import Path
from typing import Optional
import typer
from github import Github, GithubException
app = typer.Typer()
#app.command()
def add_deploy_key(
token: Optional[str] = None, user_repo: Optional[str] = None, repo_name: Optional[str] = None
):
typer.echo("Starting to access GitHub.com... ")
try:
# using an access token
g = Github(token)
# I skipped a bunch of code to save space
for key in repo.get_keys():
if str(key.key) == str(pure_public_key):
typer.echo(
"We found an existing public key in " + user_repo + ", so we're NOT adding it"
)
return
rep_key = repo.create_key(
"DigitalOcean for " + repo_name, current_public_key, read_only=True
)
if rep_key:
typer.echo("Success with adding public key to repo in GitHub.com!")
typer.echo("")
typer.echo("The url to the deposited key is: " + rep_key.url)
else:
typer.echo("There's some issue when adding public key to repo in GitHub.com")
except GithubException as e:
typer.echo("There's some issue")
typer.echo(str(e))
return
if __name__ == "__main__":
app()
The way I trigger is inside a bash script
output=$(python /opt/github-add-deploy-keys.py --token="$token" --user-repo="$user_repo" --repo-name="$repo_name")
It works. But sometimes it just hangs there without any output. And since it happens intermittently and not consistently, it's hard to debug.
I cannot be sure if it's a typer issue or a network issue or a GitHub.com issue. There's just nothing.
I want it to fail fast and often. I know there's a timeout and a retry for the GitHub object.
See https://pygithub.readthedocs.io/en/latest/github.html?highlight=retry#github.MainClass.Github
I wonder if I can do anything with these two parameters so at least I have a visual of knowing something is being done. I can add a lot of typer.echo statements but that would be extremely verbose.
I'm also unfamiliar with the retry object. I wish even if a retry is made, there will be some echo statements to tell me a retry is being attempted.
What can I try?
The timeout should prevent the github request to hang and retries should make it work but based on the documentation there is a default timeout already. Since this question is for debugging I would suggest using the logging python library to just the steps your script runs in a file. You can find a good logging tutorial here.
As for the logging style, since your case has a lot of unknowns I would log when the script starts, before "Create Key" step and after "Create Key" an maybe in case of errors. You can go over the log file when the script hangs.
You can create a bash script to run your script with timeout and get a notification if the program exits with non zero exit code and leave it to run over night:
timeout 5 python <yourscript>.py
Related
How do I implement a try and except on my python script, I am using the kubernetes python client to communicate with my GKE cluster. And I want to create a deployment if it doesn't exists. I am currently doing this (code below) but it doesn't seem to work as it returns an API exception error and the program crashes.
Here is my current implementation
def get_job():
conn.connect()
api = client.CustomObjectsApi()
for message in consumer:
message = message.value
try:
resource = api.get_namespaced_custom_object(
group="machinelearning.seldon.io",
version="v1",
name=message["payload"]["metadata"]["name"],
namespace="seldon",
plural="seldondeployments",
) # execution stops here and doesn't move to the next step.
api.create_namespaced_custom_object(
group="machinelearning.seldon.io",
version="v1",
namespace="seldon",
plural="seldondeployments",
body=message['payload'])
print("Resource created")
print(e)
else:
pass
How can I fix this?
The logic I am trying to follow is this
If the deployment exists, return a already exist message without crashing the script.
If the deployment doesn't exist, create a new deployment.
I am currently doing this (code below) but it doesn't seem to work as it returns an API exception error and the program crashes.
The reason it crashes is due to no handling of exception within the code. Your approach also seems flawed. Your try statements can be split into something like this (credit)
try:
statements # statements that can raise exceptions
except:
statements # statements that will be executed to handle exceptions
else:
statements # statements that will be executed if there is no exception
So what you have to do is add the operations as where they are required to be.
If the deployment exists, return a already exist message without crashing the script.
You can handle this in the else statement and simply print a message to state that it already exists.
If the deployment doesn't exist, create a new deployment.
You can add this operation within the except part of the try statement, so when an error is thrown it will catch it and perform the operation as you specified.
I have a python script that pulls from various internal network sources. With how our systems are set up we will initiate a urllib pull from a network location and it will get hung up waiting forever for a response on certain parts of the network. I would like my script to check that if it hasnt finished the pull in lets say 5 minutes it will pass the function and attempt to pull from the next address, and record it to a bad directory repository(so we can go check out which systems get hung up, there's like over 20,000 IP addresses we are checking some with some older scripts running on them that no longer work but will still try and run when requested, and they never stop trying to run)
Im familiar with having a script pause at a certain point
import time
time.sleep(300)
What Im thinking from a psuedo code perspective (not proper python just illustrating the idea)
import time
import urllib2
url_dict = ['http://1', 'http://2', 'http://3', ...]
fail_log_path = 'C:/Temp/fail_log.txt'
for addresses in url_dict:
clock_value = time.start()
while clock_value <= 300:
print str(clock_value)
res = urllib2.retrieve(url)
if res != []:
pass
else:
fail_log = open(fail_log_path, 'a')
fail_log.write("Failed to pull from site location: " + str(url) + "\n")
faile_log.close
Update: a specific option for this dealing with urls timeout for urllib2.urlopen() in pre Python 2.6 versions
Found this answer which is more in line with the overall problem of my question:
kill a function after a certain time in windows
Your code as is doesn't seem to describe what you were saying. It seems you want the if/else check inside your while loop. On top of that, you would want to loop over the ip addresses and not over a time period as your code is currently written (otherwise you will keep requesting the same ip address every time). Instead of keeping track of time yourself, I would suggest reading up on urllib.request.urlopen - specifically the timeout parameter. Once set, that function call will throw a socket.timeout exception once the time limit is reached. Surround that with a try/except block catching that error and then handle it appropriately.
I am in the process of upgrading an older legacy system that is using Biztalk, MSMQs, Java, and python.
Currently, I am trying to upgrade a particular piece of the project which when complete will allow me to begin an in-place replacement of many of the legacy systems.
What I have done so far is recreate the legacy system in a newer version of Biztalk (2010) and on a machine that isn't on its last legs.
Anyway, the problem I am having is that there is a piece of Python code that picks up a message from an MSMQ and places it on another server. This code has been in place on our legacy system since 2004 and has worked since then. As far as I know, has never been changed.
Now when I rebuilt this, I started getting errors in the remote server and, after checking a few things out and eliminating many possible problems, I have established that the error occurs somewhere around the time the Python code is picking up from the MSMQ.
The error can be created using just two messages. Please note that I am using sample XMls here as the actual ones are pretty long.
Message one:
<xml>
<field1>Text 1</field1>
<field2>Text 2</field2>
</xml>
Message two:
<xml>
<field1>Text 1</field1>
</xml>
Now if I submit message one followed by message two to the MSMQ, they both appear correctly on the queue. If I then call the Python script, message one is returned correctly but message two gains extra characters.
Post-Python message two:
<xml>
<field1>Text 1</field1>
</xml>1>Te
I thought at first that there might have been scoping problems within the Python code but I have gone through that as well as I can and found none. However, I must admit that the first time that I've looked seriously at Python code is this project.
The Python code first peeks at a message and then receives it. I have been able to see the message when the script peeks and it has the same error message as when it receives.
Also, this error only shows up when going from a longer message to a shorter message.
I would welcome any suggestions of things that might be wrong, or things I could do to identify the problem.
I have googled and searched and gone a little crazy. This is holding an entire project up, as we can't begin replacing the older systems with this piece in place to act as a new bridge.
Thanks for taking the time to read through my problem.
Edit: Here's the relevant Python code:
import sys
import pythoncom
from win32com.client import gencache
msmq = gencache.EnsureModule('{D7D6E071-DCCD-11D0-AA4B-0060970DEBAE}', 0, 1, 0)
def Peek(queue):
qi = msmq.MSMQQueueInfo()
qi.PathName = queue
myq = qi.Open(msmq.constants.MQ_PEEK_ACCESS,0)
if myq.IsOpen:
# Don't loose this pythoncom.Empty thing (it took a while)
tmp = myq.Peek(pythoncom.Empty, pythoncom.Empty, 1)
myq.Close()
return tmp
The function calls this piece of code. I don't have access to the code that calls this until Monday, but the call is basically:
msg= MSMQ.peek()
2nd Edit.
I am attaching the first half of the script. this basically loops around
import base64, xmlrpclib, time
import MSMQ, Config, Logger
import XmlRpcExt,os,whrandom
QueueDetails = Config.InQueueDetails
sleeptime = Config.SleepTime
XMLRPCServer = Config.XMLRPCServer
usingBase64 = Config.base64ing
version=Config.version
verbose=Config.verbose
LogO = Logger.Logger()
def MSMQToIAMS():
# moved svr cons out of daemon loop
LogO.LogP(version)
svr = xmlrpclib.Server(XMLRPCServer, XmlRpcExt.getXmlRpcTransport())
while 1:
GotOne = 0
for qd in QueueDetails:
queue, agency, messagetype = qd
#LogO.LogD('['+version+"] Searching queue %s for messages"%queue)
try:
msg=MSMQ.Peek(queue)
except Exception,e:
LogO.LogE("Peeking at \"%s\" : %s"%(queue, e))
continue
if msg:
try:
msg = msg.__call__().encode('utf-8')
except:
LogO.LogE("Could not convert massege on \"%s\" to a string, leaving it on queue"%queue)
continue
if verbose:
print "++++++++++++++++++++++++++++++++++++++++"
print msg
print "++++++++++++++++++++++++++++++++++++++++"
LogO.LogP("Found Message on \"%s\" : \"%s...\""%(queue, msg[:40]))
try:
rv = svr.accept(msg, agency, messagetype)
if rv[0] != "OK":
raise Exception, rv[0]
LogO.LogP('Message has been sent successfully to IAMS from %s'%queue)
MSMQ.Receive(queue)
GotOne = 1
StoreMsg(msg)
except Exception, e:
LogO.LogE("%s"%e)
if GotOne == 0:
time.sleep(sleeptime)
else:
gotOne = 0
This is the full code that calls MSMQ. Creates a little program that watches MSMQ and when a message arrives picks it up and sends it off to another server.
Sounds really Python-specific (of which I know nothing) rather then MSMQ-specific. Isn't this just a case of a memory variable being used twice without being cleared in between? The second message is shorter than the first so there are characters from the first not being overwritten. What do the relevant parts of the Python code look like?
[[21st April]]
The code just shows you are populating the tmp variable with a message. What happens to tmp before the next message is accessed? I'm assuming it is not cleared.
Normally Fabric quits as soon as a run() call returns a non-zero exit code. For some calls, however, this is expected. For example, PNGOut returns an error code of 2 when it is unable to compress a file.
Currently I can only circumvent this limitation by either using shell logic (do_something_that_fails || true or do_something_that_fails || do_something_else), but I'd rather be able to keep my logic in plain Python (as is the Fabric promise).
Is there a way to check for an error code and react to it rather than having Fabric panic and die? I still want the default behaviours for other calls, so changing its behaviour by modifying the environment doesn't seem like a good option (and as far as I recall, you can only use that to tell it to warn instead of dying anyway).
You can prevent aborting on non-zero exit codes by using the settings context manager and the warn_only setting:
from fabric.api import settings
with settings(warn_only=True):
result = run('pngout old.png new.png')
if result.return_code == 0:
do something
elif result.return_code == 2:
do something else
else: #print error to user
print result
raise SystemExit()
Update: My answer is outdated. See comments below.
Yes, you can. Just change the environment's abort_exception. For example:
from fabric.api import settings
class FabricException(Exception):
pass
with settings(abort_exception = FabricException):
try:
run(<something that might fail>)
except FabricException:
<handle the exception>
The documentation on abort_exception is here.
Apparently messing with the environment is the answer.
fabric.api.settings can be used as a context manager (with with) to apply it to individual statements. The return value of run(), local() and sudo() calls isn't just the output of the shell command, but also has special properties (return_code and failed) that allow reacting to the errors.
I guess I was looking for something closer to the behaviour of subprocess.Popen or Python's usual exception handling.
try this
from fabric.api import run, env
env.warn_only = True # if you want to ignore exceptions and handle them yurself
command = "your command"
x = run(command, capture=True) # run or local or sudo
if(x.stderr != ""):
error = "On %s: %s" %(command, x.stderr)
print error
print x.return_code # which may be 1 or 2
# do what you want or
raise Exception(error) #optional
else:
print "the output of %s is: %s" %(command, x)
print x.return_code # which is 0
Using: Django with Python
Overall objective: Call a function which processes video conversion (internally makes a curl command to the media server) and should immediately return back to the user.
Using message queue would be an overkill for the app.
So I had decided to use threads, I have written a class which overwrites the init and run method and calls the curl command
class process_video(Thread):
def __init__ (self,video_id,video_title,fileURI):
Thread.__init__(self)
self.video_id = video_id
self.video_title = video_title
self.fileURI = fileURI
self.status =-1
def run(self):
logging.debug("FileURi" + self.fileURI)
curlCmd = "curl --data-urlencode \"fileURI=%s\" %s/finalize"% (self.fileURI, settings.MEDIA_ROOT)
logging.debug("Command to be executed" + str(curlCmd))
#p = subprocess.call(str(curlCmd), shell=True)
output_media_server,error = subprocess.Popen(curlCmd,stdout = subprocess.PIPE).communicate()
logging.debug("value returned from media server:")
logging.debug(output_media_server)
And I instantiate this class from another function called createVideo
which calls like this success = process_video(video_id, video_title, fileURI)
Problem:
The user gets redirected back to the other view from the createVideo and the processVideo gets called, however for some reason the created thread (process_video) doesn't wait for the output from the media server.
I wouldn't rely on threads being executed correctly within web applications. Depending on the web server's MPM, the process that executes the request might get killed after a request is done (I guess).
I'd recommend to make the media server request synchronously but let the media server return immediately after it started the encoding without problems (if you have control over its source code). Then a background process (or cron) could poll for the result regularly. This is only one solution - you should provide more information about your infrastructure (e.g. do you control the media server?).
Also check the duplicates in another question's comments for some answers about using task queues in such a scenario.
BTW I assume that no exception occurs in the background thread?!
Here is the thing what I did for getting around the issue which I was facing.
I used django piston to create an API for calling the processvideo with the parameters passed as GET, I was getting a 403 CSRF error when I was trying to send the parameters as POST.
and from the createVideo function I was calling the API like this
cmd = "curl \"%s/api/process_video/?video_id=%s&fileURI=%s&video_title=%s\" > /dev/null 2>&1 &" %(settings.SITE_URL, str(video_id),urllib.quote(fileURI),urllib.quote(video_title))
and this worked.
I feel it would have been helpful if I could have got the session_id and post parameters to work. Not sure how I could get off that csrf thing to work.