I'm struggling with what should be a very simple problem. I'm failing to set the session timeout on a SUDS jurko connection. My WSDL is good. Everything works when pulling a smaller dataset. I've attempted several means of setting the timeout. While the following doesn't complain/etc, it also is ineffective:
from suds.client import Client
client = Client(authUrl, timeout=600)
My connection/etc appears to fail after the default 90 seconds. Unfortunately, this just isn't long enough to get the data I need. The error I receive is
ssl.SSLError: ('The read operation timed out',)
Help! My Google foo is weak, I guess. I've tried many things... and, finally, I have to ask for help. Which will be greatly appreciated...
While this will not help the OP I think that it is worth mentioning that under Python 3.9 the call to Client(...., timeout=300) seems to be working with sudz version 1.0.3 from https://github.com/Skylude/suds - so I guess that this issue has been resolved.
Related
I am using gspread module to read data from a google sheet, however, some gsheets are somehow too large, and whenever i try to read(get) the values from the google sheet i get a timeout error as the following:
ReadTimeout: HTTPSConnectionPool(host='sheets.googleapis.com', port=443): Read timed out. (read timeout=120)
One solution comes to my mind is to extend the timeout value ,which i don't know exactly how.
If you know how, or have any kind of solution to this issue, I would really appreciate your help.
Hi if you look at gspread repository it recently merged a new PR that introduces timeouts in the client. When released, just update gspread to latest version and you'll be able to set a timeout on you requests.
Recently I am having errors of too much request that the host ip is blocked that I have to flush hosts in order for the connection to work again, I event went to mariadb configuration to change the max_connetions, max_connect_errors as I found in other forums but this error still happens sometimes even though not as often anymore.
I just want to double check and see if there's any part of my code that somehow the connection got opened but didn't close because there aren't that much people using the request (as I know of) when the error happens.
Is there a way in django or somewhere I can print out / see how many connections requests are opened and if they are closed after requests are done or something like that?
Thanks in advance for any help and suggestions.
from django.db import connections
print(connections.all())
I'm using overpy to query the Overpass API, and the nature of the data is such that I have a lot of queries to execute. I've run into the 429 OverpassTooManyRequests exception and I'm trying to play by the rules. I've tried introducing time.sleep methods to space out the requests, but I have no basis for how long the program should wait before continuing.
I found this link which mentions a "Retry-after" header:
How to avoid HTTP error 429 (Too Many Requests) python
Is there a way to access that header in an overpy response? I've been through the docs and the source code, but nothing stood out that would allow me to access that header so I can pause querying until it's acceptable to do so again.
I'm using Python 3.6 and overpy 0.4.
Maybe this isn't quite the answer you're seeking, but I ran into the same issue and fixed it by simply hosting my own OSM database server using docker. Just clone the repo and follow instructions:
https://github.com/mediasuitenz/docker-overpass-api
from http://overpass-api.de/command_line.html do check that you do not have a single 'runaway' request that is taking up all the resources.
After verifying that I don't have runaway queries, I have taken Peter's advice and added a catch for the TooManyRequests exception that waits 30s and tries again. This seems to be working as an immediate solution.
I will also raise an issue with the originators of OverPy to suggest an enhancement to allow evaluating the /api/status, as per mmd's advice.
I am new to python as well as new to the world of querying the semantic web.
I am using SPARQLWrapper library to query dbpedia, I searched the library documentation but failed to find 'timeout' for a query fired to dbpedia from sparqlWrapper.
Anyone has any idea about the same.
As of 2018, you can use SPARQLWrapper.setTimeout() to set the timeout for SPARQLWrapper requests.
As Karoo mentioned you can use SPARQLWrapper.setTimeout(timeout=(int)).
If you want a timeout as a float, go to the Wrapper.py module and change self.timeout = int(timeout) to self.timeout = float(timeout) in the def setTimeout(self, timeout): function.
I don't know if this is specifically an answer for your question, but I searched for it for ages and heres my solution for anyone else having trouble with Virtuoso specific timeouts on SPARQLWrapper:
Your can use this line of code to set a server-side timeout for your queries (not clientside like .setTimeout):
[your SPARQLWrapper entity].addExtraURITag("timeout","[your timeout in ms]")
In my case it looks like this:
s.addExtraURITag("timeout","10000")
This should give you 10 seconds of time before your query stops searching and returns results instead of just giving you a Timeout error.
Hope I could help anyone.
DBPedia uses Virtuoso server for it's endpoint and timeout is a virtuoso-specific option. SparqlWrapper doesn't currently support it.
Next version will feature better modularity and proper vendor-specific extensions might be implemented after that, but I guess you don't have time to wait.
Currently, the only way to add such parameter is to manually hardcode it into your local version of library
I have a large scraping job to do -- most of the script's time is spent blocking due to a lot of network latency. I'm trying to multi-thread the script so I can make multiple requests simultaneously, but about 10% of my threads die with the following error
URLError: <urlopen error [Errno -2] Name or service not known>
The other 90% complete successfully. I am requesting multiple pages from the same domain, so it seems like there may be some DNS issue. I make 25 requests at a time (25 threads). Everything works fine if i limit myself to 5 requests at a time, but once I get to around 10 requests, I start seeing this error sometimes.
I have read Repeated host lookups failing in urllib2
which describes the same issue I have and followed the suggestions therein, but to no avail.
I have also tried using the multiprocessing module instead of multi-threading, I get the same behaviour -- about 10% of the processes die with the same error -- which leads me to believe this is not an issue with urllib2 but something else.
Can someone explain what is going on and suggest how to fix?
UPDATE
If I manually code the ip address of the site into my script everything works perfectly, so this error happens sometime during the DNS lookup.
Suggestion: Try enabling a DNS cache in your system, such as nscd. This should eliminate DNS lookup problems if your scraper always makes requests to the same domain.
Make sure that the file objects returned by urllib2.urlopen are properly closed after being read, in order to free resources. Otherwise, you may reach the limit of max open sockets in your system.
Also, take into account the politeness policy web crawlers should have to avoid overloading a server with multiple requests.