I'm getting a ConnectionResetError(104, 'Connection reset by peer'), and its not really in my control, from the other posts about this I've seen on SO, I've seen people add sleeps and it works for them. Here's my code:
for i,id in enumerate(id_list):
base_endpoint=f"https://endpoint.io/v1/resource/{id}/"
print("i:",i)
if i % 100 == 0:
print("sleeping")
sleep(10) #told it to sleep every 100 calls
with requests.Session() as session:
session.auth = (key, '')
sleep(1) #even added this
r = session.get(base_endpoint)
This is a toy example, I know I can add better exception handling, but point is there a better way to get around this stingy api? This is a SaaS product that we pay for, the api isn't meant to be used this way, but going to the devs is a several week long haul even to get a meeting.
Is there a different way to do this beyond just increasing the sleep time until it works?
Related
Currently I'm making a python bot for whatsapp manually without APIs or that sort because I am clueless. As such, I'm using Selenium to take in messages and auto reply. Currently, I'm noticing that every few messages, one message doesn't get picked up because the loops ran are too slow and my computer is already pretty fast. Here's the code:
def incoming_msges():
msges = driver.find_elements_by_class_name("message-in")
msgq = []
tq = []
try:
for msg in msges:
txt_msg = msg.find_elements_by_class_name("copyable-text")
time = msg.find_elements_by_class_name("_18lLQ")
for t in time:
tq.append(t.text.lower())
for txt in txt_msg:
msgq.append(txt.text.lower())
msgq = msgq[-1]
tq = tq[-1]
if len(msgq) > 0:
return (msgq, tq)
except StaleElementReferenceException:
pass
return False
Previously, I didn't add the time check thing, and the message sent would be saved, with this loop continuously running such that even if the other party sent the same thing again, the code would not recognise it as a new message because it thinks it's the same one as before. So now, the problem is that my code is super time consuming and I have no idea how to speed it up. I tried doing this:
def incoming_msges():
msges = browser.find_elements_by_class_name("message-in")
try:
msg = msges[-1]
txt_msg = msg.find_element_by_xpath("/span[#class=\"copyable-text\"]").text.lower()
time = msg.find_element_by_xpath("/span[#class=\"_18lLQ\"]").text.lower()
return (txt_msg, time)
except Exception:
pass
return False
However, like this, the code just doesn't find any messages. I have gotten the elements' types and classes correct according to the whatsapp web website but it just doesn't run. What's the correct way of rewriting my first code block as it is still correct? Thanks in advance.
First thing first ...
I definitely recommend using API ... Because what you are trying to do here is to reinvent the wheel. API has the power of telling you if there is a change in your status and you can queue these changes ... So I definitely recommend to use API ... It might be hard at the beginning, but trust me, its worth it ...
Next I would recommend you to use normal variable names. msges msgq tq (these are kindof unreadable and I still dont get what they are supposed to be after reading the code twice ...)
But to your speed problem ... "try - catch (aka except)" blocks are really heavy on a performance ... I would recommend to use safe programming if possible (20 if statements might be faster, but might not a same time) ... Also I think you are kind of unaware of a python language (atleast from what i can see here)
msgq = msgq[-1] # you are telling it to take the last element and change array variable to string .. to be more specific...
msgq ([1,2,3,4]) = msgq[-1] (4) will result to -> msgq = 4 (which in my option hits you performance as well)
tq = tq[-1] # same here
This would be better :)
if len(msgq[-1]) > 0:
return (msgq[-1], tq[-1])
If I understand your code correctly, you are trying to scrape the messages, but if its like you are saying that you want to make auto-reply bot, I would recommend you to eighter get ready for some JS magic or switch tool. I personally noticed that the selenium has a problem with dynamic content ... to be more specific ... once its at the end of the file it does not scrape it again ... so if you do not want to auto refresh every 5-10 seconds to get the latest HTML file, I recommend eighter to create this bot in JS (that will trigger everytime that an element changes) or use the API and use selenium just for responses. I was told that Selenium was created to simulate the common user to check if user interface works as it should (if buttons exists, if the website contains all what it should etc.) ... I think that selenium is for this job something like a flower small sponge for a car clean ... you can do it ... buts gonna cost you alot of time and you might miss some spots (like you missed those messages) ...
Lastly ... the work with strings in general is really costly. you are doing O(n^2) of operations in a try block ... which i can imagine can be really costly ... if its possible, I would reduce the number of inner for loops.
I wish you good luck in this project and I hope you find the answer you seek, while I hope my answer was at least a little helpful.
I have a program like this:
for i in range(25200):
time.sleep(1)
with requests.Session() as s:
data = {'ContractCode' : 'SAFMO98' }
r = s.post('http://cdn.ime.co.ir/Services/Fut_Live_Loc_Service.asmx/GetContractInfo', json = data ).json()
for key, value in r.items():
plt.clf()
last_prices = (r[key]['LastTradedPrice'])
z.append(last_prices)
plt.figure(1)
plt.plot(z)
Sometimes server rejects the connection and gives Exceeds request message. Or sometimes I lost my connection, etc.
Then I must re run my program and I will loose my plotted graph, and also the time my program was disconnected and the data I lost through this time. So what I like to do is add something to my program to keep my connection against interupts/desconnections. I mean my program wouldn't stop when it lost the connection or rejected from server side and will keep it's work when it connected again.
How is it possible?
EDIT: I edited my code like following but don't know how good is this way?
try:
for i in range(25200):
time.sleep(1)
with requests.Session() as s:
data = {'ContractCode' : 'SAFMO98' }
r =s.post('http://cdn.ime.co.ir/Services/Fut_Live_Loc_Service.asmx/GetContractInfo', json = data ).json()
for key, value in r.items():
plt.clf()
last_prices = (r[key]['LastTradedPrice'])
z.append(last_prices)
plt.figure(1)
plt.plot(z)
except:
pass
You have at least two connection failure events here, and either might result in an inability to connect for undefined amounts of time. A good option here is exponential backoff.
Basically, you attempt an operation, detect failures you know will require retrying, and wait. Each subsequent time the operation fails (in this case, presumably throwing an exception), you wait a multiple of the previous wait time. The idea is that, if you're being rate limited, you'll wait longer and longer until the API you're connecting to stops rejecting your requests. Also, if you've been physically disconnected, you'll attempt fewer connections over time, rather than spamming requests at a dead adapter.
There's a Python library, backoff, that handles most of the work involved in this for you with a decorator.
I'm writing a Twitter application with tweepy that crawls up the tweets by looking at in_reply_to_status_ID.
Everything works fine up to the rate limit, after a few minutes, I have to wait another 15 minutes or so.
This is strange because I used nearly identical code until a few months ago before API 1.0 got deprecated, and it didn't have the rate limit problem.
Is there a known way I can get rid of, or at least increase the rate limit?
Or is there a workaround?
Seems like a lot of people are having trouble with this, but can't find a definite solution..
i will greatly appreciate it if you could help.
auth1 = tweepy.auth.OAuthHandler('consumer_token','consumer_secret')
auth1.set_access_token('access_token','access_secret')
api=tweepy.API(auth1)
def hasParent(s):
#return true if s is not None, i.e., s is an in_reply_to_status_id numbe
....
while hasParent(ps):
try:
parent=api.get_status(ps)
except tweepy.error.TweepError:
print 'tweeperror'
break
newparent = parent.in_reply_to_status_id
......
ps=newparent
I put a limit and worked:
def index(request):
statuses = tweepy.Cursor(api.user_timeline).items(10)
return TemplateResponse(request, 'index.html', {'statuses': statuses})
This is due to you reached max limit. Just disconnect your internet connection and reconnect again, no need to wait.
Use cursor:
statuses = tweepy.Cursor(api.user_timeline).items(2)
If you get the error again, just reduce items.
I have scripts in both Python and Ruby that run for days at a time and rely on the internet to go to certain domains and collect data. Is there a way to implement a network connectivity check into my script so that I could pause/retry iterations of a loop if there is no connectivity and only restart when there is connectivity?
There may be a more elegant solution, but I'd do this:
require 'open-uri'
def internet_connectivity?
open('http://google.com')
true
rescue => ex
false
end
Well in Python I do something similar with a try except block like the following:
import requests
try:
response = requests.get(URL)
except Exception as e:
print "Something went wrong:"
print e
this is just a sample of what you could do, you can check for error_code or some information on the exception and according to that you can define what to do. I usually put the script to sleep for 10 minutes when something goes wrong on the request.
import time
time.sleep(600)
here's a unix-specific solution:
In [18]: import subprocess
In [19]: subprocess.call(['/bin/ping', '-c1', 'blahblahblah.com'])
Out[19]: 1
In [20]: subprocess.call(['/bin/ping', '-c1', 'google.com'])
Out[20]: 0
ie, ping will return 0 if the ping is successful
Inline way of doing it:
require 'open-uri'
def internet_access?; begin open('http://google.com'); true; rescue => e; false; end; end
puts internet_access?
In Python you can do something like this:
def get_with_retry(url, tries=5, wait=1, backoff=2, ceil=60):
while True:
try:
return requests.get(url)
except requests.exceptions.ConnectionError:
tries -= 1
if not tries:
raise
time.sleep(wait)
wait = min(ceil, wait * backoff)
This tries each request up to tries times, initially delaying wait seconds between attempts, but increasing the delay by a factor of backoff for each attempt up to a maximum of ceil seconds. (The default values mean it will wait 1 second, then 2, then 4, then 8, then fail.) By setting these values, you can set the maximum amount of time you want to wait for the network to come back, before your main program has to worry about it. For infinite retries, use a negative value for tries since that'll never reach 0 by subtracting 1.
At some point you want the program to tell you if it can't get on the network, and you can do that by wrapping the whole program in a try/except that notifies you in some way if ConnectionError occurs.
I have a list consisting of ID's, about 50k per day.
and i have to make 50k request per day to the server { the server is at the same city } , and fetch the information and store it into database ... i've done that using loop and Threads
and i've notice that after unknown period of time it's stop fetching and storing ...
take a look of my code fragment
import re,urllib,urllib2
import mysql.connector as sql
import threading
from time import sleep
import idvalid
conn = sql.connect(user="example",password="example",host="127.0.0.1",database="students",collation="cp1256_general_ci")
cmds = conn.cursor()
ids=[] #here is going to be stored the ID's
def fetch():
while len(ids)>0:#it will loop until the list of ID's is finish
try:
idnumber = ids.pop()
content = urllib2.urlopen("http://www.example.com/fetch.php?id="+idnumber,timeout=120).read()
if content.find('<font color="red">') != -1:
pass
else:
name=content[-20:]
cmds.execute("INSERT INTO `students`.`basic` (`id` ,`name`)VALUES ('%s', '%s');"%(idnumber,name))
except Exception,r:
print r,"==>",idnumber
sleep(0.5)#i think sleep will help in threading ? i'm not sure
pass
print len(ids)#print how many ID's left
for i in range(0,50):#i've set 50 threads
threading.Thread(target=fetch).start()
output:it will continue printing how many ID's left and at unknown moment it stops printing and fetching & storing
Both networking and threading are non-trivial... most probably the cause is a networking event that results in a hanging thread. I'd be interested to hear whether people have solutions for this, because I have suffered the same problem of threads that stop responding.
But there are some things I would definitely change in your code:
I would never catch "Exception". Just catch those exceptions that you know how to deal with. If a network error occurs in one of your threads, you could retry rather than giving up on the id.
There is a race condition in your code: you first check whether there is remaining content, and then you take it out. At the second point in time, the remaining work may have disappeared, resulting in an exception. If you find this difficult to fix, there is a brilliant python object that is meant to pass objects between threads without race conditions and deadlocks: the Queue object. Check it out.
The "sleep(0.5)" is not helping threading in general. It should not be necessary. It may reduce the chance of hitting race conditions, but it is better to program race conditions totally out. On the other hand, having 50 threads at full spead banging the web server may not be a very friendly thing to do. Make sure to stay within the limits of what the service can offer.