Making so that my get python-requests are faster [duplicate] - python

This question already has answers here:
What is the fastest way to send 100,000 HTTP requests in Python?
(21 answers)
Closed 6 years ago.
I have a python-script with a lot of exceptions. I'm trying to make around 50,000 requests. And it is very slow as of now also I'd like for my script to be running therefore I added almost all the exceptions request has which has mostly to do with connectionError etc.
Is there a way I can make this script so it's much faster than it is now and more modular?
for i in range(50450000,50500000):
try:
try:
try:
try:
try:
try:
try:
try:
try:
try:
try:
try:
check_response = 'http://www.barneys.com/product/adidas--22human-race-22-nmd-sneakers-'+str(i)+'.html'
make_requests = requests.get(check_response,headers=headers).text
soup = BeautifulSoup(make_requests)
try:
main_wrapper = soup.find('h1',attrs={'class':'title'}).text
print main_wrapper + ' ' + str(i)
except AttributeError:
arr.append(check_response)
with open('working_urls.json','wb') as outfile:
json.dump(arr,outfile,indent=4)
except requests.exceptions.InvalidURL:
continue
except requests.exceptions.InvalidSchema:
continue
except requests.exceptions.MissingSchema:
continue
except requests.exceptions.TooManyRedirects:
continue
except requests.exceptions.URLRequired:
continue
except requests.exceptions.ConnectTimeout:
continue
except requests.exceptions.Timeout:
continue
except requests.exceptions.SSLError:
continue
except requests.exceptions.ProxyError:
continue
except requests.exceptions.HTTPError:
continue
except requests.exceptions.ReadTimeout:
continue
except requests.exceptions.ConnectionError:
continue

First, please replace all these ugly try/except blocks by a single one, like:
for i in range(50450000,50500000):
try:
check_response = 'http://www.barneys.com/product/adidas--22human-race-22-nmd-sneakers-'+str(i)+'.html'
make_requests = requests.get(check_response,headers=headers).text
soup = BeautifulSoup(make_requests)
try:
main_wrapper = soup.find('h1',attrs={'class':'title'}).text
print main_wrapper + ' ' + str(i)
except AttributeError:
arr.append(check_response)
with open('working_urls.json','wb') as outfile:
json.dump(arr,outfile,indent=4)
except requests.exceptions.InvalidURL:
continue
except requests.exceptions.InvalidSchema:
continue
except requests.exceptions.MissingSchema:
continue
...
And if everything you do is continue in all cases, use the base class RequestException. It becomes:
try:
check_response = 'http://www.barneys.com/product/adidas--22human-race-22-nmd-sneakers-'+str(i)+'.html'
make_requests = requests.get(check_response,headers=headers).text
soup = BeautifulSoup(make_requests)
try:
main_wrapper = soup.find('h1',attrs={'class':'title'}).text
print main_wrapper + ' ' + str(i)
except AttributeError:
arr.append(check_response)
with open('working_urls.json','wb') as outfile:
json.dump(arr,outfile,indent=4)
except requests.exceptions.RequestException:
pass
Maybe not faster, but for sure far easier to read!
As for the speed issue, you should consider using threads/processes. Take a look at the threading and multiprocessing modules.

Related

try-catch in a while-loop (python)

while var == 1:
test_url = 'https://testurl.com'
get_response = requests.get(url=test_url)
parsed_json = json.loads(get_response.text)
test = requests.get('https://api.telegram.org/botid/' + 'sendMessage', params=dict(chat_id=str(0815), text="test"))
ausgabe = json.loads(test.text)
print(ausgabe['result']['text'])
time.sleep(3)
How do i put in a try-catch routine to this code, once per 2 days i get an Error in Line 4 at json.loads() and i cant reproduce it. What i´m trying to do is that the while loop is in a "try:" block and an catch block that only triggers when an error occurs inside the while loop. Additionally it would be great if the while loop doesnt stop on an error. How could i do this. Thank you very much for your help. (I started programming python just a week ago)
If you just want to catch the error in forth line, a "Try except" wrap the forth line will catch what error happened.
while var == 1:
test_url = 'https://testurl.com'
get_response = requests.get(url=test_url)
try:
parsed_json = json.loads(get_response.text)
except Exception as e:
print(str(e))
print('error data is {}',format(get_response.text))
test = requests.get('https://api.telegram.org/botid/' + 'sendMessage', params=dict(chat_id=str(0815), text="test"))
ausgabe = json.loads(test.text)
print(ausgabe['result']['text'])
time.sleep(3)
You can simply
while var == 1:
try:
test_url = 'https://testurl.com'
get_response = requests.get(url=test_url)
parsed_json = json.loads(get_response.text)
test = requests.get('https://api.telegram.org/botid/' + 'sendMessage', params=dict(chat_id=str(0815), text="test"))
ausgabe = json.loads(test.text)
print(ausgabe['result']['text'])
time.sleep(3)
except Exception as e:
print "an exception {} of type {} occurred".format(e, type(e).__name__)

Exception not caught in multiprocessing

I'm using multiprocessing module for files processing in parallel, which works perfectly fine almost every time.
Also I've written that in try , except block to catch any exception.
I've come across a situation where except block doesn't catch the exception.
Since the code is huge I'm just putting relevant block which is giving problem.
def reader(que, ip, start, end, filename):
""" Reader function checks each line of the file
and if the line contains any of the ip addresses which are
being scanned, then it writes to its buffer.
If the line field doesn't match date string it skips the line.
"""
logging.info("Processing : %s" % os.path.basename(filename))
ip_pat = re.compile("(\d+\.\d+\.\d+\.\d+\:\d+)")
chunk = 10000000 # taking chunk of 10MB data
buff = ""
with bz2.BZ2File(filename,"rb", chunk) as fh: # open the compressed file
for line in fh:
output = []
fields = line.split()
try:
ts = fields[1].strip() + "/" +fields[0]+"/"+fields[3].split("-")[0]+" "+fields[2]
times = da.datetime.strptime(ts,"%d/%b/%Y %H:%M:%S")
if times < start:
continue
if times > end:
break
ips = re.findall(ip_pat,line)
if len(ips) < 3:
continue
if ips[0].split(":")[0] == ip:
output.append(times.strftime("%d/%m/%Y %H:%M:%S"))
status = "SESSION_OPEN" if "SESSION_OPEN" in line or "CREATE" in line else "SESSION_CLOSE"
protocol = "TCP" if "TCP" in line else "UDP"
output.append(status)
output.append(protocol)
ips[1], ips[2] = ips[2], ips[1]
output.extend(ips)
res = "|".join(output)
buff += res + "\n"
except IndexError, ValueError:
continue
logging.info("Processed : %s of size [ %d ]" % (os.path.basename(filename), os.path.getsize(filename)))
if buff:
que.put((ip,buff))
return buff
And this is what is received as error.
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in get
raise self._value
ValueError: time data '2/Dec/20 10:59:59' does not match format '%d/%b/%Y %H:%M:%S'
What I don't understand is why the exception is not caught, I've mentioned ValueError in except block.
What's the best way to get through this problem.
Provide the multiple exceptions as a tuple:
except (IndexError, ValueError):
continue
The relevant doc is https://docs.python.org/2/tutorial/errors.html#handling-exceptions
Snippet from the page:
Note that the parentheses around this tuple are required, because except ValueError, e: was the syntax used for what is normally written as except ValueError as e: in modern Python (described below). The old syntax is still supported for backwards compatibility. This means except RuntimeError, TypeError is not equivalent to except (RuntimeError, TypeError): but to except RuntimeError as TypeError: which is not what you want.

Python Exception Handling with in a loop

An exception occurs when my program can't find the element its looking for, I want to log the event within the CSV, Display a message the error occurred and continue. I have successfully logged the event in the CSV and display the message, Then my program jumps out of the loop and stops. How can I instruct python to continue. Please check out my code.
sites = ['TCF00670','TCF00671','TCF00672','TCF00674','TCF00675','TCF00676','TCF00677']`
with open('list4.csv','wb') as f:
writer = csv.writer(f)
try:
for s in sites:
adrs = "http://turnpikeshoes.com/shop/" + str(s)
driver = webdriver.PhantomJS()
driver.get(adrs)
time.sleep(5)
LongDsc = driver.find_element_by_class_name("productLongDescription").text
print "Working.." + str(s)
writer.writerows([[LongDsc]])
except:
writer.writerows(['Error'])
print ("Error Logged..")
pass
driver.quit()
print "Complete."
Just put the try/except block inside the loop. And there is no need in that pass statement at the end of the except block.
with open('list4.csv','wb') as f:
writer = csv.writer(f)
for s in sites:
try:
adrs = "http://turnpikeshoes.com/shop/" + str(s)
driver = webdriver.PhantomJS()
driver.get(adrs)
time.sleep(5)
LongDsc = driver.find_element_by_class_name("productLongDescription").text
print "Working.." + str(s)
writer.writerows([[LongDsc]])
except:
writer.writerows(['Error'])
print ("Error Logged..")
NOTE It's generally a bad practice to use except without a particular exception class, e.g. you should do except Exception:...

Python: Handling requests exceptions the right way

I recently switched from urlib2 to requests and I'm not sure how to deal with exceptions. What is best practice? My current code looks like this, but is not doing any good:
try:
response = requests.get(url)
except requests.ConnectionError , e:
logging.error('ConnectionError = ' + str(e.code))
return False
except requests.HTTPError , e:
logging.error('HTTPError = ' + str(e.reason))
return False
except requests.Timeout, e:
logging.error('Timeout')
return False
except requests.TooManyRedirects:
logging.error('TooManyRedirects')
return False
except Exception:
import traceback
logging.error('generic exception: ' + traceback.format_exc())
return False
Since it looks bad as a comment, have you tried:
try:
# some code
except Exception as e:
print e

python "local variable referenced before assignment" with hundreds of threads

I am having a problem with a piece of code that is executed inside a thread in python. Everything works fine until I start using more than 100 or 150 threads, then I get the following error in several threads:
resp.read(1)
UnboundLocalError: local variable 'resp' referenced before assignment.
The code is the following:
try:
resp = self.opener.open(request)
code = 200
except urllib2.HTTPError as e:
code = e.code
#print e.reason,_url
#sys.stdout.flush()
except urllib2.URLError as e:
resp = None
code = None
try:
if code:
# ttfb (time to first byte)
resp.read(1)
ttfb = time.time() - start
# ttlb (time to last byte)
resp.read()
ttlb = time.time() - start
else:
ttfb = 0
ttlb = 0
except httplib.IncompleteRead:
pass
As you can see if "resp" is not assigned due to an exception, it should raise the exception and "code" coundn't be assigned so it couldn't enter in "resp.read(1)".
Anybody has some clue on wht it is failing? I guess it is related to scopes but I don't know how to avoid this or how to implement it differently.
Thanks and regards.
Basic python:
If there is a HttpError during the open call, resp will not be set, but code will be set to e.code in the exception handler.
Then code is tested and resp.read(1) is called.
This has nothing to do with threads directly, but maybe the high number of threads caused the HTTPError.
Defining and using resp variable are not is same code block. One of them in a try/except, the other is in another try/except block. Try to merge them:
Edited:
ttfb = 0
ttlb = 0
try:
resp = self.opener.open(request)
code = 200
resp.read(1)
ttfb = time.time() - start
resp.read()
ttlb = time.time() - start
except urllib2.HTTPError as e:
code = e.code
#print e.reason,_url
#sys.stdout.flush()
except urllib2.URLError as e:
pass
except httplib.IncompleteRead:
pass

Categories