I was trying to download images with multi-thread, which has a limited max_count in python.
Each time a download_thread is started, I leave it alone and activate another one. I hope the download process could be ended in 5s, which means downloading is failed if opening the url costs more than 5s.
But how can I know it and stop the failed thread???
Can you tell which version of python you are using?
Maybe you could have posted a snippet too.
From Python 2.6, you have a timeout added in urllib2.urlopen.
Hope this will help you. It's from the python docs.
urllib2.urlopen(url[, data][,
timeout]) Open the URL url, which can
be either a string or a Request
object.
Warning HTTPS requests do not do any
verification of the server’s
certificate. data may be a string
specifying additional data to send to
the server, or None if no such data is
needed. Currently HTTP requests are
the only ones that use data; the HTTP
request will be a POST instead of a
GET when the data parameter is
provided. data should be a buffer in
the standard
application/x-www-form-urlencoded
format. The urllib.urlencode()
function takes a mapping or sequence
of 2-tuples and returns a string in
this format. urllib2 module sends
HTTP/1.1 requests with
Connection:close header included.
The optional timeout parameter
specifies a timeout in seconds for
blocking operations like the
connection attempt (if not specified,
the global default timeout setting
will be used). This actually only
works for HTTP, HTTPS and FTP
connections.
This function returns a file-like
object with two additional methods:
geturl() — return the URL of the
resource retrieved, commonly used to
determine if a redirect was followed
info() — return the meta-information
of the page, such as headers, in the
form of an mimetools.Message instance
(see Quick Reference to HTTP Headers)
Raises URLError on errors.
Note that None may be returned if no
handler handles the request (though
the default installed global
OpenerDirector uses UnknownHandler to
ensure this never happens).
In addition, default installed
ProxyHandler makes sure the requests
are handled through the proxy when
they are set.
Changed in version 2.6: timeout was
added.
Related
I would like to test a site with APIs that at this stage of development is classified as "not secure".
In the Locust documentation is said that:
Safe mode:
The HTTP client is configured to run in safe_mode. What this does is that any request that fails due to a connection error, timeout, or similar will not raise an exception, but rather return an empty dummy Response object. The request will be reported as a failure in User’s statistics. The returned dummy Response’s content attribute will be set to None, and its status_code will be 0.
I would like to know if there is a configuration that let you disable this option.
Thank you for your time
safe_mode has nothing to do with SSL/TLS, it refers to requests not throwing an exception when the request fails (for whatever reason).
To make HttpUser/requests ignore SSL issues, add verify=False to your call.
——
You're looking at a very old Locust documentation, right? I dont think the text you mention has been there for quite a while.
safe_mode has been removed from requests (it used to look like this https://github.com/psf/requests/pull/604) but the same behaviour is implemented by Locust here https://github.com/locustio/locust/blob/51b1d5038a2be6e2823b2576c4436f2ff9f7c7c2/locust/clients.py#L195
The documentation I've found explaining http.client for Python seems a bit sparse. I want to use it over requests because requests has not worked for our project.
So, knowing that I'm using Python's http.client, I'm seeing again and again request and putrequest. Both methods are defined here under HTTPConnection.
HTTPConnection.request: This will send a request to the server using
the HTTP request method method and the selector url.
HTTPConnection.putrequest: This should be the first call after the
connection to the server has been made. It sends a line to the server
consisting of the method string, the url string, and the HTTP version
(HTTP/1.1). To disable automatic sending of Host: or Accept-Encoding:
headers (for example to accept additional content encodings), specify
skip_host or skip_accept_encoding with non-False values.
Also, the source code for both is defined in this file.
From my guess and reading things, it seems like request is a more high level API compared to putrequest. Is that correct?
The Answer: request() is an abstracted version of multiple functions, putrequest() being one of them.
Although this is defined in the documentation, it's easy to skip over the line that answers this question.
This is pointed out in this line of the http.client documentation:
As an alternative to using the request() method described above, you can also send your request step by step, by using the four functions below.
I need to somehow extract plain HTTP request message from a Request object in Scrapy (so that I could, for example, copy/paste this request and run from Burp).
So given a scrapy.http.Request object, I would like to get the corresponding request message, such as e.g.
POST /test/demo_form.php HTTP/1.1
Host: w3schools.com
name1=value1&name2=value2
Clearly I have all the information I need in the Request object, however trying to reconstruct the message manually is error-prone as I could miss some edge cases. My understanding is that Scrapy first converts this Request into Twisted object, which then writes headers and body into a TCP transport. So maybe there's away to do something similar, but write to a string instead?
UPDATE
I could use the following code to get HTTP 1.0 request message, which is based on http.py. Is there a way to do something similar with HTTP 1.1 requests / http11.py, which is what's actually being sent? I would obviously like to avoid duplicating code from Scrapy/Twisted frameworks as much as possible.
factory = webclient.ScrapyHTTPClientFactory(request)
transport = StringTransport()
protocol = webclient.ScrapyHTTPPageGetter()
protocol.factory = factory protocol.makeConnection(transport)
request_message = transport.value()
print(request_message.decode("utf-8"))
As scrapy is open source and also has plenty of extension points, this should be doable.
The requests are finally assembled and sent out in scrapy/core/downloader/handlers/http11.py in ScrapyAgent.download_request ( https://github.com/scrapy/scrapy/blob/master/scrapy/core/downloader/handlers/http11.py#L270 )
If you place your hook there you can dump the request type, request headers, and request body.
To place your code there you can either try monkey patching ScrapyAgent.download_request or to subclass ScrapyAgent to do the request logging, then subclass HTTP11DownloadHandler to use your Scrapy Agent and then set HTTP11DownloadHandler as new DOWNLOAD_HANDLER for http / https requests in your project's settings.py (for details see: https://doc.scrapy.org/en/latest/topics/settings.html#download-handlers)
In my opinion this is the closest you can get to logging the requests going out without using a packet sniffer or a logging proxy (which might be a bit overkill for your scenario).
In the Flask documentation on testing (http://flask.pocoo.org/docs/testing/), it has a line of code
rv = self.app.get('/')
And below it, it mentions "By using self.app.get we can send an HTTP GET request to the application with the given path."
Where can the documentation be found for these direct access methods (I'm assuming that there's one for all of the restful methods)? Specifically, I'm wondering what sort of arguments they can take (for example, passing in data, headers, etc). Looking around on flask's documentation for a Flask object, it doesn't seem to list these methods, even though it uses them in the above example.
Alternatively, a knowledgeable individual could answer what I am trying to figure out: I'm trying to simulate sending a POST request to my server, as I would with the following line, if I were doing it over HTTP:
res = requests.post("http://localhost:%d/generate" % port,
data=json.dumps(payload),
headers={"content-type": "application/json"})
The above works when running a Flask app on the proper port. But I tried replacing it with the following:
res = self.app.post("/generate",
data=json.dumps(payload),
headers={"content-type": "application/json"})
And instead, the object I get in response is a 400 BAD REQUEST.
This is documented in the Werkzeug project, from which Flask gets the test client: Werkzeug's test client.
The test client does not issue HTTP requests, it dispatches requests internally, so there is no need to specify a port.
The documentation isn't very clear about support for JSON in the body, but it seems if you pass a string and set the content type you should be fine, so I'm not exactly sure why you get back a code 400. I would check if your /generate view function is invoked at all. A debugger should be useful to figure out where is the 400 coming from.
I'm using elasticsearch and the RESTful API supports supports reading bodies in GET requests for search criteria.
I'm currently doing
response = urllib.request.urlopen(url, data).read().decode("utf-8")
If data is present, it issues a POST, otherwise a GET. How can I force a GET despite the fact that I'm including data (which should be in the request body as per a POST)
Nb: I'm aware I can use a source property in the Url but the queries we're running are complex and the query definition is quite verbose resulting in extremely long urls (long enough that they can interfere with some older browsers and proxies).
I'm not aware of a nice way to do this using urllib. However, requests makes it trivial (and, in fact, trivial with any arbitrary verb and request content) by using the requests.request* function:
requests.request(method='get', url='localhost/test', data='some data')
Constructing a small test web server will show that the data is indeed sent in the body of the request, and that the method perceived by the server is indeed a GET.
*note that I linked to the requests.api.requests code because that's where the actual function definition lives. You should call it using requests.request(...)