How to design a failsafe upload mechanism? - python

At my Python application, I do a lot of data processing which in the end generates a lot of small files, sometimes more than 20.000 per job.
Later in my processing-flow, I upload all these files to an S3 storage. The problem is that sometimes for some reason not all files reach the S3 storage, which I don't understand as I explicitly check if the file is there:
count_lock = threading.Lock()
obj_count = 0
def __upload(object_path_pair):
global obj_count
sleep_time = 5
num_retries = 10
for x in range(0, num_retries):
sleep(random.uniform(1, 5))
with count_lock:
libera_resource_status = libera_resource.Object(object_path_pair[1]).get()['ResponseMetadata'].get('HTTPStatusCode')
if libera_resource_status == 200 and obj_count > 0:
print(f'Item: {file_name} - HLS segment {obj_count} / {len(segment_upload_list)} uploaded successfully.')
elif libera_resource_status != 200:
print(f'Item: {file_name} - HLS segment {obj_count} / {len(segment_upload_list)} uploaded failed, will be tried again.')
obj_count += 1
upload_error = None
except Exception as upload_error:
if upload_error or libera_resource_status != 200:
sleep(sleep_time) # wait before trying to fetch the data again
sleep_time *= 2
def upload_segments(segment_upload_list):
global obj_count
obj_count = 0
with ThreadPoolExecutor(max_workers=100) as executor:, segment_upload_list)
Here, libera_ressource basically is boto3.resource. Can somebody tell where and why I might sometimes miss a file?
Thanks in advance

This code probably isn't doing what you expect when an exception is encountered:
# (stuff)
upload_error = None
except Exception as upload_error:
if upload_error or libera_resource_status != 200:
# more stuff
If an exception is encountered, it's assigned into upload_error for the except clause, but upload_error is then deleted on exit from the except clause. See PEP 3110 and this Reddit discussion.
So if you get an exception, the subsequent if statement throws (because uploadError is now unassigned) and you've crashed out of your __upload function without retrying.
This won't cause the other threads in your pool to fail, so it's easy to miss if you're not checking for it.


pytest: TypeError: int() can't convert non-string with explicit base

def _get_trace(self) -> None:
"""Retrieves the stack trace via debug_traceTransaction and finds the
return value, revert message and event logs in the trace.
# check if trace has already been retrieved, or the tx warrants it
if self._raw_trace is not None:
self._raw_trace = []
if self.input == "0x" and self.gas_used == 21000:
self._modified_state = False
self._trace = []
if not web3.supports_traces:
raise RPCRequestError("Node client does not support `debug_traceTransaction`")
trace = web3.provider.make_request( # type: ignore
"debug_traceTransaction", (self.txid, {"disableStorage": CONFIG.mode != "console"})
except (requests.exceptions.Timeout, requests.exceptions.ConnectionError) as e:
msg = f"Encountered a {type(e).__name__} while requesting "
msg += "`debug_traceTransaction`. The local RPC client has likely crashed."
if CONFIG.argv["coverage"]:
msg += " If the error persists, add the `skip_coverage` marker to this test."
raise RPCRequestError(msg) from None
if "error" in trace:
self._modified_state = None
self._trace_exc = RPCRequestError(trace["error"]["message"])
raise self._trace_exc
self._raw_trace = trace = trace["result"]["structLogs"]
if not trace:
self._modified_state = False
# different nodes return slightly different formats. its really fun to handle
# geth/nethermind returns unprefixed and with 0-padding for stack and memory
# erigon returns 0x-prefixed and without padding (but their memory values are like geth)
fix_stack = False
for step in trace:
if not step["stack"]:
check = step["stack"][0]
if not isinstance(check, str):
if check.startswith("0x"):
fix_stack = True
> c:\users\xxxx\appdata\local\programs\python\python310\lib\site-packages\brownie\network\
-> step["pc"] = int(step["pc"], 16)
I am doing Patricks Solidity course and ran into this error. I ended up copying and pasting his code:
def test_only_owner_can_withdraw():
if network.show_active() not in LOCAL_BLOCKCHAIN_ENVIRONMENTS:
pytest.skip("only for local testing")
fund_me = deploy_fund_me()
bad_actor = accounts.add()
with pytest.raises(exceptions.VirtualMachineError):
fund_me.withdraw({"from": bad_actor})
Pytest worked for my other tests however When I tried to do this one it wouldnt work.
Ok, So after looking at my scripts and contracts I found the issue. The was an issue with my .sol contract and instead of returning a variable, it was returning the error message from my retrieve function in the contract. Its fixed and working now

What's the fastest way to expand url in python

I have a checkin list which contains about 600000 checkins, and there is a url in each checkin, I need to expand them back to original long ones. I do so by
now = time.time()
files_without_url = 0
for i, checkin in enumerate(NYC_checkins):
foursquare_url = urllib2.urlopen("(?P<url>https?://[^\s]+)", checkin[5]).group("url")).url
files_without_url += 1
if i%1000 == 0:
print("from %d to %d: %2.5f seconds" %(i-1000, i, time.time()-now))
now = time.time()
But this takes too long time: from 0 to 1000 checkins, it takes 3241 seconds! Is this normal? What's the most efficient way to expand url by Python?
MODIFIED: Some Urls are from Bitly while some others are not, and I am not sure where they come from. In this case, I wanna simply use urllib2 module.
for your information, here is an example of checkin[5]:
I'm at The Diner (2453 18th Street NW, Columbia Rd., Washington) w/ 4 others. http...... (this is the short url)
I thought I would expand on my comment regarding the use of multiprocessing to speed up this task.
Let's start with a simple function that will take a url and resolve it as far as possible (following redirects until it gets a 200 response code):
import requests
def resolve_url(url):
r = requests.get(url)
except requests.exceptions.RequestException:
return (url, None)
if r.status_code != 200:
longurl = None
longurl = r.url
return (url, longurl)
This will either return a (shorturl, longurl) tuple, or it will
return (shorturl, None) in the event of a failure.
Now, we create a pool of workers:
import multiprocessing
pool = multiprocessing.Pool(10)
And then ask our pool to resolve a list of urls:
resolved_urls = []
for shorturl, longurl in, urls):
resolved_urls.append((shorturl, longurl))
Using the above code...
With a pool of 10 workers, I can resolve 500 URLs in 900 seconds.
If I increase the number of workers to 100, I can resolve 500 URLs in 30 seconds.
If I increase the number of workers to 200, I can resolve 500 URLs in 25 seconds.
This is hopefully enough to get you started.
(NB: you could write a similar solution using the threading module rather than multiprocessing. I usually just grab for multiprocessing first, but in this case either would work, and threading might even be slightly more efficient.)
Thread are most appropriate in case of network I/O. But you could try the following first.
pat = re.compile("(?P<url>https?://[^\s]+)") # always compile it
missing_urls = 0
bad_urls = 0
def check(checkin):
match =[5])
if not match:
global missing_urls
missing_urls += 1
url ="url")
urllib2.urlopen(url) # don't lookup .url if you don't need it later
except URLError: # or just Exception
global bad_urls
bad_urls += 1
for i, checkin in enumerate(NYC_checkins):
print(bad_urls, missing_urls)
If you get no improvement, now that we have a nice check function, create a threadpool and feed it. Speedup is guaranteed. Using processes for network I/O is pointless

Using sys.exit in for loops

I am writing a small python script that iterates through a large json output and grabs the information I need and puts it into small dictionaries. It then iterates through the dictionaries to look for an key called restartcount. If the count is more than more than 3 but less than 5 it prints warning. If greater than 5 it prints critical. However this script is set to be a nagios plugin which requires exit codes to be placed with warning sys.exit(1), and sys.exit(2) for critical. If you look at my script I use my function to grab the info I need into a small dictionary, then run a for loop. If I place a sys.exit after inside any if statement I iterate only through the first dictionary and the rest are not checked. Any help will be appreciated as to how to incorporate the exit codes without losing skipping or missing any information.
import urllib2
import json
import argparse
from sys import exit
def get_content(pod):
kube = {}
kube['name'] = pod["metadata"]["name"]
kube['phase'] = pod["status"]["phase"]
kube['restartcount'] = pod["status"]["containerStatuses"][0]["restartCount"]
return kube
if __name__ == '__main__':
parser = argparse.ArgumentParser( description='Monitor Kubernetes Pods' )
parser.add_argument('-w', '--warning', type=int, help='levels we should look into',default=3)
parser.add_argument('-c', '--critical', type=int, help='its gonna explode',default=5)
parser.add_argument('-p', '--port', type=int, help='port to access api server',default=8080)
args = parser.parse_args()
api_call = "http://localhost:{}/api/v1/namespaces/default/pods/".format(args.port)
req = urllib2.urlopen(api_call).read()
content = json.loads(req)
except urllib2.URLError:
print 'URL Error. Please re-check the API call'
for pods in content.get("items"):
block = get_content(pods)
print block
except KeyError:
print 'Container Failed'
if block["restartcount"] >= args.warning and block["restartcount"] < args.critical:
print "WARNING | {} restart count is {}".format(block["name"], block["restartcount"])
if block["restartcount"] >= args.critical:
print "CRITICAL | {} restart count is {}".format(block["name"], block["restartcount"])
what the block variable looks like:
{'phase': u'Running', 'restartcount': 0, 'name': u'pixels-1.0.9-k1v5u'}
Create a variable called something like exit_status. Initialize it to 0, and set it as needed in your code (e.g. where you are currently calling exit). At the end of program execution, call sys.exit(exit_status) (and no where else).
Rewriting the last section of your code:
exit_status = 0
for pods in content.get("items"):
block = get_content(pods)
print block
except KeyError:
print 'Container Failed'
if block["restartcount"] >= args.warning and block["restartcount"] < args.critical:
print "WARNING | {} restart count is {}".format(block["name"], block["restartcount"])
if exit_status < 1: exit_status = 1
if block["restartcount"] >= args.critical:
print "CRITICAL | {} restart count is {}".format(block["name"], block["restartcount"])
exit_status = 2
The variable approach is correct
Problem is that as you check further you probably set it to 1 when it was already 2 so I would suggest add here a condition not to set it to 1 if it is already 2

Automatic Stack Trace not being given in threads other than the main thread

I have a rather large program which loads some data from an excel file and populates a form, this can take a long time due to the size of the file so I have been moving the loading function onto a separate thread, the only problem is for some reason in this new thread I am not getting an automatic stack trace in the console whenever an error occurs. It has just been failing silently which is making debugging it a real pain.
I am using pydev in eclipse, I wrote the following test case to be sure everything is working correctly.
from PyQt4 import QtCore
class OtherThread(QtCore.QThread):
def __init__(self):
super(OtherThread, self).__init__()
def run(self):
except Exception as e:
print("exception caught in other thread: \n{0}".format(e))
class MainThread():
def __init__(self):
self.otherThread = OtherThread()
def run(self):
except Exception as e:
print("exception caught in main thread: \n{0}".format(e))
def main():
mainThread = MainThread()
if __name__ == '__main__':
When I run this both exceptions are caught properly and when I comment out the try block in the tread object it also works just fine, I get my stack trace as expected. I am really at a loss as to what is going on. Is there something I could have done to cause this behavior?
Here is the code of the program I am working on.
def run(self):
print("excel thread running")
workbook = xlrd.open_workbook(self.path)
worksheet = workbook.sheet_by_name('PNA Form')
currentRow = 13 # start grabbing pna data
numRowStart = currentRow
newPartCol= 0
oldPartCol = 10
descriptionCol = 2
numberOfRows = worksheet.nrows - 1
print("number of rows = {0}".format(numberOfRows))
PNA = []
current_color = False
while (currentRow < numberOfRows):
print("about to parse excel rows")
newPartCell = int(worksheet.cell(currentRow,newPartCol).value)
oldPartCell = int(worksheet.cell(currentRow,oldPartCol).value)
descriptionCell = QtCore.QString(worksheet.cell(currentRow,descriptionCol).value)
print("excel rows parsed: {0}, {1}, {2}, {3}".format(oldPartCell,newPartCell,descriptionCell,current_color))
print("running line excel row {0}: {1}".format(currentRow, str(descriptionCell)))
if not self.isStrikethrough(currentRow,0): #make sure the line does not have strike through
#self.guiHandel.BOMVal.addPNARow(oldPN = oldPartCell, newPN = newPartCell, disc = descriptionCell)
print("about to emit pna row tracker for {0}".format(descriptionCell))
print("thread still running after pna row tracker emit")
if (oldPartCell != "" and not self.isStrikethrough(currentRow,0)):
current_color = not current_color
#self.guiHandel.pnaVerticalLayoutScroll.addWidget(PNACell(oldPartCell,newPartCell,descriptionCell,color = current_color))
print("about to emit addPNARow: {0}, {1}, {2}, {3}".format(oldPartCell,newPartCell,descriptionCell,current_color))
#self.guiHandel.widgetStack.append(PNACell(oldPartCell,newPartCell,descriptionCell,color = current_color))
print("thread still running after add pna row emit")
currentRow += 1
print("currentRow =",currentRow)
Here is the console output when it fails.
slot add pna row tracker called
running is about to return
about to emit addPNARow: 28458820, 28489881, INST CSTR-ASM,DIESEL,KM,UP,GAT, False
thread still running after add pna row emit
('currentRow =', 29slot add pna row called)
about to parse excel rows
Added addPNARow: 28458820, 28489881, INST CSTR-ASM,DIESEL,KM,UP,GAT, False
excel progress update called ------- progress = 20
When running through a debugger it stops at this line:
newPartCell = int(worksheet.cell(currentRow,newPartCol).value)
I tried wrapping it in a try block but it never got to the exception. The cell it is trying to read is blank.
What is going on here? Any ideas would be greatly appreciated.
Answer to the question found here: error in pyqt qthread not printed
Basically you need to manually encapsulate the entire run method of the QThread object and then manually rethrow errors to stderr

twython search api rate limit: Header information will not be updated

I want to handle the Search-API rate limit of 180 requests / 15 minutes. The first solution I came up with was to check the remaining requests in the header and wait 900 seconds. See the following snippet:
results = search_interface.cursor(, q=k, lang=lang, result_type=result_mode)
while True:
tweet = next(results)
if limit_reached(search_interface):
def limit_reached(search_interface):
remaining_rate = int(search_interface.get_lastfunction_header('X-Rate-Limit-Remaining'))
return remaining_rate <= 2
But it seems, that the header information are not reseted to 180 after it reached the two remaining requests.
The second solution I came up with was to handle the twython exception for rate limitation and wait the remaining amount of time:
results = search_interface.cursor(, q=k, lang=lang, result_type=result_mode)
while True:
tweet = next(results)
except TwythonError as inst:
except StopIteration:
def wait_for_reset(search_interface):
reset_timestamp = int(search_interface.get_lastfunction_header('X-Rate-Limit-Reset'))
now_timestamp =
seconds_offset = 10
t = reset_timestamp - now_timestamp + seconds_offset'Waiting {0} seconds for Twitter rate limit reset.'.format(t))
But with this solution I receive this message INFO: Resetting dropped connection:" and the loop will not continue with the last element of the generator. Have somebody faced the same problems?
just rate limit yourself is my suggestion (assuming you are constantly hitting the limit ...)
QUERY_PER_SEC = 15*60/180.0 #180 per 15 minutes
#~5 seconds per query
class TwitterBot:
def doQuery(self,*args,**kwargs):
tdiff = time.time()-self.last_update
if tdiff < QUERY_PER_SEC:
self.last_update = time.time()
return search_interface.cursor(*args,**kwargs)
