I try this code, proxies are good, but I see my ip in logger, using them. Help, my 'requests' is 2.19.1. Some of my code:
for [...]:
url=''
proxy={'http':'http://'+get_proxy()} #works
useragent={'User-Agent':get_useragent()}
try:
r=requests.get(url,headers=useragent,proxies=proxy)
print('sent')
except:
print('error')
So I can skip bad proxies, but good dont work (I want to change my IP with them, but i actually see my own IP).
Related
I am trying to write a script that checks if HTTPS is available when I used http.
My Idea was to collect all of the HTTP links and use urllib2 in order to open a connection to the server using HTTPS as follows (please ignore syntax problems if there are. I have tried to simplify the code so it will be easier to understand the problem itself):
count=0
for packet in trafficPackets:
if packet["http.host"] != None:
if https_supported(packet["ip.dest"]):
count+=1
where https_supported is the following function:
def https_supported(ip):
try:
if len(urlopen("https://"+ip))>0
return True
except:
return False
return False
I have tried to run the code on a little traffic file which contains an HTTP connection to a site that supports https but the result was unexpected- it was always returned zero.
Where did I go wrong ? Does anyone have an idea of how can I do it?
Thank you!
Using the exact same code with the http.host field instead of the IP seems to work.
In the following code snippet, I know for a fact that https://asdasdasdasd.vm:8080/v2/api-docs does not exist. It fails a DNS lookup. Unfortunately, the get() never seems to return, raise, or timeout. My logs have only "A" in them. I would expect A C D or A B D. But I only ever see A in the logs.
try:
sys.stderr.write("A")
resp = requests.get("https://asdasdasdasd.vm:8080/v2/api-docs", timeout=1.0)
sys.stderr.write("B")
except:
sys.stderr.write("C")
sys.stderr.write("D")
sys.stderr.flush()
return swag
(That URL is not sanitized for this post. That's actually the URL I'm trying to use while working on this question.)
What am I missing here?
EDIT - I have also tried specifying the timeout as (1.0,1.0) but the behavior did not change.
EDIT2 - Per suggestions below, I ran my code from the python and ipython consoles. The code behaves as I expect (ACD). Of course, in my real application, I am not running this code from the command line. I don't know how this matters, but the method containing the code is being invoked by a web service. Specifically, a Swagger endpoint. With my browser, I hit an endpoint that's supposed to return our Swagger documentation. The endpoint (which uses flask_swagger) invokes init_swagger(...). init_swagger() calls my method with a Swagger object. That's it. How this matters, I cannot say. It doesn't make any sense to me that something outside of my method should somehow be able to mess with my exception handling.
The only thing I can think of is that Swagger has jacked with the requests class. But now it is dinner time and I am going home.
The following code for me returns A, C, D
import requests
from requests.exceptions import ConnectionError
try:
print("A")
resp = requests.get("https://asdasdasdasd.vm:8080/v2/api-docs", timeout=1.0)
print("B")
except ConnectionError:
print("C")
print("D")
This is because the host cannot be resolved for me, if I swap it out for localhost...
resp = requests.get("http://localhost/v2/api-docs", timeout=1.0)
...then I see an A, followed by a period of time before C and D show.
From reading the comments, I know what is up...
builtins has a ConnectionError that can be used (without importing anything). Requests doesn't use this exception, instead it uses the one found in requests.exceptions if you wish to catch the ConnectionError you must catch the correct exception, or it will drop out and not execute the except clause.
I am trying to force Python to retry loading the page when I get a timeout error. Is there a way that I can make it retry a specific number of times, possibly after a specific time delay?
Any help would be appreciated.
Thank you.
urllib2 doesn't have anything built-in for that, but you can write it yourself.
The tricky part is that, as the urlopen docs say, no matter what goes wrong, you just get a URLError. So, how do you know whether it was a timeout, or something else?
Well, if you look up URLError, it says it will have a reason which will be a socket.error for remote URLs. And if you look up socket.error it tells you that it's a subclass of either IOError or OSError (depending on your Python version). And if you look up OSError, it tells you that it has an errno that represents the underlying error.
So, which errno value do you get for timeout? I'm willing to bet it's EINPROGRESS, but let's find out for sure:
>>> urllib.urlopen('http://127.0.0.1', timeout=0)
urllib2.URLError: <urlopen error [Errno 36] Operation now in progress>
>>> errno.errorcode[36]
'EINPROGRESS'
(You could just use the number 36, but that's not guaranteed to be the same across platforms; errno.EINPROGRESS should be more portable.)
So:
import errno
import urllib2
def retrying_urlopen(retries, *args, **kwargs):
for i in range(retries):
try:
return urllib2.urlopen(*args, **kwargs)
except URLError as e:
if e.reason.errno == errno.EINPROGRESS:
continue
raise
If you think this sucks and should be a lot less clunky… well, I think everyone agrees. Exceptions have been radically improved twice, with another big one coming up, plus various small changes along the way. But if you stick with 2.7, you don't get the benefits of those improvements.
If moving to Python 3.4 isn't possible, maybe moving to a third-party module like requests or urllib3 is. Both of those libraries have a separate exception type for Timeout, instead of making you grub through the details of a generic URLError.
Check out the requests library. If you'd like to wait only for a specified amount of time (not for the entire download, just until you get a response from the server), just add the timeout argument to the standard URL request, in seconds:
r = requests.get(url, timeout=10)
If the timeout time is exceeded, it raises a requests.exceptions.Timeout exception, which can be handled however you wish. As an example, you could put the request in a try/except block, catch the exception if it's raised, and retry the connection again for a specified number of times before failing completely.
You might also want to check out requests.adapters.HTTPAdapter, which has a max_retries argument. It's typically used within a Requests Session, and according to the docs, it provides a general-case interface for Requests sessions to contact HTTP and HTTPS urls by implementing the Transport Adapter interface.
Even I am new to Python, but I think even a simple solution like this could do the trick,
begin with considering stuff as None, where stuff is page_source. Also remember that I have only considered the URLError exception. You might want to add more as desired.
import urllib2
import time
stuff=None
max_attempts=4
r=0
while stuff is None and r<max_attempts:
try:
response = urllib2.urlopen('http://www.google.com/ncr', timeout=10)
stuff = response.read()
except urllib2.URLError:
r=r+1
print "Re-trying, attempt -- ",r
time.sleep(5)
pass
print stuff
Hope that helps.
Regards,
Md. Mohsin
I've been trying to create a TCP server with gevent without (any major) success so far. I think that the problem lies within Windows ( I've had some issues with sockets under Windows before ). I'm using Python2.7, gevent0.13 under Windows7. Here's my code:
from gevent import socket
from gevent.server import StreamServer
def handle_echo(sock, address):
try:
fp = sock.makefile()
while True:
# Just echos whatever it receives
try:
line = fp.readline()
except Exception:
break
if line:
try:
fp.write(line)
fp.flush()
except Exception:
break
else:
break
finally:
sock.shutdown(socket.SHUT_WR)
sock.close()
server = StreamServer(("", 2345), handle_echo)
server.server_forever()
This implementation is similar to the one you can find here:
http://blog.pythonisito.com/2012/08/building-tcp-servers-with-gevent.html
Now there are no errors and the server seems to work correctly, however it is not reading ( and thus sending ) anything. Is it possible that sock.makefile() does not work correctly under Windows7? Or maybe the problem lies somewhere else?
I've tried to replace sock.makefile() with simple
while True:
line = sock.recv(2048)
but this operation obviously blocks.
I've also tried to mix gevent's spawn with sock.setblocking(0). Now this was better and it worked, however it would not handle more then ~300 connections at a time.
I'm going to do some tests on Linux and see if it makes difference. In the meantime if you have any ideas, then feel free to share them with me. Cheers!
UPDATE Original code does the same thing under Ubuntu 12.04. So how should I implement gevent TCP server??
What did you send to the server? Make sure it's terminated by newline.. otherwise readline() won't work.
You could also use tcpdump or wireshark to see what's happening at TCP layer if you think you're doing correct things in your code.
I've got a large bulk downloading application written in Python/Mechanize, aiming to download something like 20,000 files. Clearly, any downloader that big is occasionally going to run into some ECONNRESET errors. Now, I know how to handle each of these individually, but there's two problems with that:
I'd really rather not wrap every single outbound web call in a try/catch block.
Even if I were to do so, there's trouble with knowing how to handle the errors once the exception has thrown. If the code is just
data = browser.response().read()
then I know precisely how to deal with it, namely:
data = None
while (data == None):
try:
data = browser.response().read()
except IOError as e:
if e.args[1].args[0].errno != errno.ECONNRESET:
raise
data = None
but if it's just a random instance of
browser.follow_link(link)
then how do I know what Mechanize's internal state looks like if an ECONNRESET is thrown somewhere in here? For example, do I need to call browser.back() before I try the code again? What's the proper way to recover from that kind of error?
EDIT: The solution in the accepted answer certainly works, and in my case it turned out to be not so hard to implement. I'm still academically interested, however, in whether there's an error handling mechanism that could result in quicker error catching.
Perhaps place the try..except block higher up in the chain of command:
import collections
def download_file(url):
# Bundle together the bunch of browser calls necessary to download one file.
browser.follow_link(...)
...
response=browser.response()
data=response.read()
urls=collections.deque(urls)
while urls:
url=urls.popleft()
try:
download_file(url)
except IOError as err:
if err.args[1].args[0].errno != errno.ECONNRESET:
raise
else:
# if ECONNRESET error, add the url back to urls to try again later
urls.append(url)