Python & URLLIB2 - Request webpage but don't wait for response - python

In python, how would I go about making a http request but not waiting for a response. I don't care about getting any data back, I just need to server to register a page request.
Right now I use this code:
urllib2.urlopen("COOL WEBSITE")
But obviously this pauses the script until a a response is returned, I just want to fire off a request and move on.
How would I do this?

What you want here is called Threading or Asynchronous.
Threading:
Wrap the call to urllib2.urlopen() in a threading.Thread()
Example:
from threading import Thread
def open_website(url):
return urllib2.urlopen(url)
Thread(target=open_website, args=["http://google.com"]).start()
Asynchronous:
Unfortunately there is no standard way of doing this in the Python standard library.
Use the requests library which has this support.
Example:
from requests import async
async.get("http://google.com")
There is also a 3rd option using the restclient library which has
builtin (has for some time) Asynchronous support:
from restclient import GET
res = GET("http://google.com", async=True, resp=True)

Use thread:
import threading
threading.Thread(target=urllib.urlopen, args=('COOL WEBSITE',)).start()
Don't forget args argument should be tuple. That's why there's trailing ,.

You can do this with requests library as follows
import requests
try:
requests.get("http://127.0.0.1:8000/test/",timeout=10)
except requests.exceptions.ReadTimeout: #this confirms you that the request has reached server
do_something
except:
print "unable to reach server"
raise
from the above code you can send async requests without getting response. Specify timeout according to your need. if not it will not time out.

gevent may be a proper choice.
First patch socket:
import gevent
import gevent.monkey
monkey.patch_socket()
monkey.patch_ssl()
Then use gevent.spawn() to encapulate your requests to generate greenlets. It will not block the main thread and be very fast!
Here's a simple tutorial.

Related

How to avoid logging request in Locust without using context manager?

The Locust documentation explains that a request can be prevented from being logged by using a context manager and raising an exception. For example:
try:
with self.client.get('/wont_be_logged', catch_response=True) as response:
raise RuntimeError
catch RuntimeError
pass
Is there a way to achieve the same without having to use a context manager?
Just do the request yourself (without using self.client)
For example by using requests.get(...)
(note that this will use a different session so it wont use the same cookies or underlying http connection)
Not detract from cyberwiz's answer as ultimately you could just do your own requests and get the same behavior, but if you really want to just use Locust's client and not manage another client yourself, you can. catch_response=True should be enough for it to not automatically fire success or failure Events. You can then manually fire events with whatever you want after that.
Demo Locust file:
from locust import HttpUser
from locust.user.task import task
class TestUser(HttpUser):
host = "http://localhost:8089"
#task
def test_call(self):
r = self.client.get("/", catch_response=True)
print("Test")
print(r.elapsed)
self.environment.events.request_success.fire(
request_type="POST",
name="/somewhere_new",
response_time=r.elapsed.microseconds / 1000,
response_length=13,
)
Again, doing it this way doesn't save much it works.

Deferred callback not being called using Python requests-threads

I am trying to perform async HTTP requests by using the requests library in Python. I found that the last version of the library does not directly support async requets. To achive it they provide the requests-threads library that makes use of Twisted to handle asynchronicity. I tried modifying the examples provided to use callbacks instead of await/yield, but the callbacks are not being called.
My sample code is:
session = AsyncSession(n=10)
def processResponse(response):
print(response)
def main():
a = session.get('https://reqres.in/api/users')
a.addCallbacks(processResponse, processResponse)
time.sleep(5)
The requests-threads library: https://github.com/requests/requests-threads
I suspect the callbacks are not called because you aren't running Twisted's eventloop (known as the reactor). Remove your sleep function and replace it with reactor.run().
from twisted.internet import reactor
# ...
def main():
a = session.get('https://reqres.in/api/users')
a.addCallbacks(processResponse, processResponse)
#time.sleep(5) # never use blocking functions like this w/ Twisted
reactor.run()
The catch is Twisted's reactor cannot be restarted, so once you stop the event loop (ie. reactor.stop()), an exception will be raised when reactor.run() is executed again. In other words, your script/app will only "run once". To circumvent this issue, I suggest you use crochet. Here's a quick example using a similar example from requests-thread:
import crochet
crochet.setup()
print('setup')
from twisted.internet.defer import inlineCallbacks
from requests_threads import AsyncSession
session = AsyncSession(n=100)
#crochet.run_in_reactor
#inlineCallbacks
def main(reactor):
responses = []
for i in range(10):
responses.append(session.get('http://httpbin.org/get'))
for response in responses:
r = yield response
print(r)
if __name__ == '__main__':
event = main(None)
event.wait()
And just as an FYI requests-thread is not for production systems and is subject to significant change (as of Oct 2017). The end goal of this project is to design an awaitable design pattern for requests in the future. If you need production ready concurrent requests, consider grequests or treq.
I think the only mistake here is that you forgot to run the reactor/event loop.
The following code works for me:
from twisted.internet import reactor
from requests_threads import AsyncSession
session = AsyncSession(n=10)
def processResponse(response):
print(response)
a = session.get('https://reqres.in/api/users')
a.addCallbacks(processResponse, processResponse)
reactor.run()

Process Multiple Requests Simultaneously and return the result using Klein Module Python

Hi I am using Klein Python module for my web server.
I need to run each request separately as a thread and also need to
return the result.
But Klein waits until the completion of single request to process
another request.
I also tried using deferToThread from twisted module. But it also
process the requests only after completion of the first request.
Similarly I also tried #inlineCallbacks method it also produce the
same result.
Note: This methods works perfectly when there is nothing to return.
But I need to return the result.
Here I attached a sample code snippet below,
import time
import klein
import requests
from twisted.internet import threads
def test():
print "started"
x = requests.get("http://google.com")
time.sleep(10)
return x.text
app = klein.Klein()
#app.route('/square/submit',methods = ['GET'])
def square_submit(request):
return threads.deferToThread(test)
app.run('localhost', 8000)
As #notorious.no suggested, the code is valid and it works.
To prove it, check out this code
# app.py
from datetime import datetime
import json
import time
import random
import string
import requests
import treq
from klein import Klein
from twisted.internet import task
from twisted.internet import threads
from twisted.web.server import Site
from twisted.internet import reactor, endpoints
app = Klein()
def test(y):
print(f"test called at {datetime.now().isoformat()} with arg {y}", )
x = requests.get("http://www.example.com")
time.sleep(10)
return json.dumps([{
"time": datetime.now().isoformat(),
"text": x.text[:10],
"arg": y
}])
#app.route('/<string:y>',methods = ['GET'])
def index(request, y):
return threads.deferToThread(test, y)
def send_requests():
# send 3 concurrent requests
rand_letter = random.choice(string.ascii_letters)
for i in range(3):
y = rand_letter + str(i)
print(f"request send at {datetime.now().isoformat()} with arg {y}", )
d = treq.get(f'http://localhost:8080/{y}')
d.addCallback(treq.content)
d.addCallback(lambda r: print("response", r.decode()))
loop = task.LoopingCall(send_requests)
loop.start(15) # repeat every 15 seconds
reactor.suggestThreadPoolSize(3)
# disable unwanted logs
# app.run("localhost", 8080)
# this way reactor logs only print calls
web_server = endpoints.serverFromString(reactor, "tcp:8080")
web_server.listen(Site(app.resource()))
reactor.run()
Install treq and klein and run it
$ python3.6 -m pip install treq klein requests
$ python3.6 app.py
The output should be
request send at 2019-12-28T13:22:27.771899 with arg S0
request send at 2019-12-28T13:22:27.779702 with arg S1
request send at 2019-12-28T13:22:27.780248 with arg S2
test called at 2019-12-28T13:22:27.785156 with arg S0
test called at 2019-12-28T13:22:27.786230 with arg S1
test called at 2019-12-28T13:22:27.786270 with arg S2
response [{"time": "2019-12-28T13:22:37.853767", "text": "<!doctype ", "arg": "S1"}]
response [{"time": "2019-12-28T13:22:37.854249", "text": "<!doctype ", "arg": "S0"}]
response [{"time": "2019-12-28T13:22:37.859076", "text": "<!doctype ", "arg": "S2"}]
...
As you can see Klein does not block the requests.
Furthermore, if you decrease thread pool size to 2
reactor.suggestThreadPoolSize(2)
Klein will execute the first 2 requests and wait until there is a free thread again.
And "async alternatives", suggested by #notorious.no are discussed here.
But Klein waits until the completion of single request to process another request.
This is not true. In fact, there's absolutely nothing wrong with the code you've provided. Simply running your example server at tcp:localhost:8000 and using the following curl commands, invalidates your claim:
curl http://localhost:8000/square/submit & # run in background
curl http://localhost:8000/square/submit
Am I correct in assuming you're testing the code in a web browser? If you are, then you're experiencing a "feature" of most modern browsers. The browser will make single request per URL at a given time. One way around this in the browser would be to add a bogus query string at the end of the URL, like so:
http://localhost:8000/squre/submit
http://localhost:8000/squre/submit?bogus=0
http://localhost:8000/squre/submit?bogus=1
http://localhost:8000/squre/submit?bogus=2
However, a very common mistake new Twisted/Klein developers tend to make is to write blocking code, thinking that Twisted will magically make it async. Example:
#app.route('/square/submit')
def square_submit():
print("started")
x = requests.get('https://google.com') # blocks the reactor
time.sleep(5) # blocks the reactor
return x.text
Code like this will handle requests sequentially and should be modified with async alternatives.

Gevent async server with blocking requests

I have what I would think is a pretty common use case for Gevent. I need a UDP server that listens for requests, and based on the request submits a POST to an external web service. The external web service essentially only allows one request at a time.
I would like to have an asynchronous UDP server so that data can be immediately retrieved and stored so that I don't miss any requests (this part is easy with the DatagramServer gevent provides). Then I need some way to send requests to the external web service serially, but in such a way that it doesn't ruin the async of the UDP server.
I first tried monkey patching everything and what I ended up with was a quick solution, but one in which my requests to the external web service were not rate limited in any way and which resulted in errors.
It seems like what I need is a single non-blocking worker to send requests to the external web service in serial while the UDP server adds tasks to the queue from which the non-blocking worker is working.
What I need is information on running a gevent server with additional greenlets for other tasks (especially with a queue). I've been using the serve_forever function of the DatagramServer and think that I'll need to use the start method instead, but haven't found much information on how it would fit together.
Thanks,
EDIT
The answer worked very well. I've adapted the UDP server example code with the answer from #mguijarr to produce a working example for my use case:
from __future__ import print_function
from gevent.server import DatagramServer
import gevent.queue
import gevent.monkey
import urllib
gevent.monkey.patch_all()
n = 0
def process_request(q):
while True:
request = q.get()
print(request)
print(urllib.urlopen('https://test.com').read())
class EchoServer(DatagramServer):
__q = gevent.queue.Queue()
__request_processing_greenlet = gevent.spawn(process_request, __q)
def handle(self, data, address):
print('%s: got %r' % (address[0], data))
global n
n += 1
print(n)
self.__q.put(n)
self.socket.sendto('Received %s bytes' % len(data), address)
if __name__ == '__main__':
print('Receiving datagrams on :9000')
EchoServer(':9000').serve_forever()
Here is how I would do it:
Write a function taking a "queue" object as argument; this function will continuously process items from the queue. Each item is supposed to be a request for the web service.
This function could be a module-level function, not part of your DatagramServer instance:
def process_requests(q):
while True:
request = q.get()
# do your magic with 'request'
...
in your DatagramServer, make the function running within a greenlet (like a background task):
self.__q = gevent.queue.Queue()
self.__request_processing_greenlet = gevent.spawn(process_requests, self.__q)
when you receive the UDP request in your DatagramServer instance, you push the request to the queue
self.__q.put(request)
This should do what you want. You still call 'serve_forever' on DatagramServer, no problem.

Using python Requests library to consume from Twitter's user streams - how to detect disconnection?

I'm trying to use Requests to create a robust way of consuming from Twitter's user streams. So far, I've produced the following basic working example:
"""
Example of connecting to the Twitter user stream using Requests.
"""
import sys
import json
import requests
from oauth_hook import OAuthHook
def userstream(access_token, access_token_secret, consumer_key, consumer_secret):
oauth_hook = OAuthHook(access_token=access_token, access_token_secret=access_token_secret,
consumer_key=consumer_key, consumer_secret=consumer_secret,
header_auth=True)
hooks = dict(pre_request=oauth_hook)
config = dict(verbose=sys.stderr)
client = requests.session(hooks=hooks, config=config)
data = dict(delimited="length")
r = client.post("https://userstream.twitter.com/2/user.json", data=data, prefetch=False)
# TODO detect disconnection somehow
# https://github.com/kennethreitz/requests/pull/200/files#L13R169
# Use a timeout? http://pguides.net/python-tutorial/python-timeout-a-function/
for chunk in r.iter_lines(chunk_size=1):
if chunk and not chunk.isdigit():
yield json.loads(chunk)
if __name__ == "__main__":
import pprint
import settings
for obj in userstream(access_token=settings.ACCESS_TOKEN, access_token_secret=settings.ACCESS_TOKEN_SECRET, consumer_key=settings.CONSUMER_KEY, consumer_secret=settings.CONSUMER_SECRET):
pprint.pprint(obj)
However, I need to be able to handle disconnections gracefully. Currently, when the stream disconnects, the above just hangs, and there are no exceptions raised.
What would be the best way to achieve this? Is there a way to detect this through the urllib3 connection pool? Should I use a timeout?
I would recommend adding a timeout parameter to the client.post() call. http://docs.python-requests.org/en/latest/user/quickstart/#timeouts
However, it is important to note that requests doesn't set the TCP timeout, so you could set that using the following:
import socket
socket.setdefaulttimeout(TIMEOUT)

Categories