Maximum number of connections per host with twisted.web.client.Agent - python

I have the following code which creates an HTTPConnectionPool using TwistedMatrix Python framework, and an Agent for HTTP requests:
self.pool = HTTPConnectionPool(reactor, persistent=True)
self.pool.retryAutomatically = False
self.pool.maxPersistentPerHost = 1
self.agent = Agent(reactor, pool=self.pool)
then I create requests to connect to a local server:
d = self.agent.request(
"GET",
url,
Headers({"Host": ["localhost:8333"]}),
None)
The problem is: the local server sometimes behaves incorrectly when multiple simultaneous requests are made, so I would like to limit the number of simultaneous requests to 1.
The additional requests should be queued until the pending request completes.
I've tried with self.pool.maxPersistentPerHost = 1 but it doesn't work.
Does twisted.web.client.Agent with HTTPConnectionPool support limiting the maximum number of connections per host, or do I have to implement a request FIFO queue myself?

The reason setting maxPersistentPerHost to 1 didn't help is that maxPersistentPerHost is for controlling the maximum number of persistent connections to cache per host. It does not prevent additional connections from being opened in order to service new requests, it will only cause them to be closed immediately after a response is received, if the maximum number of cached connections has already been reached.
You can enforce serialization in a number of ways. One way to have a "FIFO queue" is with twisted.internet.defer.DeferredLock. Use it together with Agent like this:
lock = DeferredLock()
d1 = lock.run(agent.request, url, ...)
d2 = lock.run(agent.request, url, ...)
The second request will not run until after the first as completed.

Related

How to test a single GET request in parallel for specified count?

I want to parallely send a GET request for the specified count say 100 times. How to achieve this using JMeter or Python ?
I tried bzm parallel executor but that doesn't workout.
import requests
import threading
totalRequests = 0
numberOfThreads = 10
threads = [0] * numberOfThreads
def worker(thread):
r = requests.get("url")
threads[thread] = 0 # free thread
while totalRequests < 100:
for thread in range(numberOfThreads):
if threads[thread] == 0:
threads[thread] = 1 # occupy thread
t = threading.Thread(target=worker, args=(thread,))
t.start()
totalRequests += 1
In JMeter:
Add Thread Group to your Test Plan and configure it like:
Add HTTP Request sampler as a child of the Thread Group and specify protocol, host, port, path and parameters:
if you're not certain regarding properly configuring the HTTP Request sampler - you can just record the request using your browser and JMeter's HTTP(S) Test Script Recorder or JMeter Chrome Extension
For Python the correct would be using Locust framework as I believe you're interested in metrics like response times, latencies and so on. The official website is down at the moment
so in the meantime you can check https://readthedocs.org/projects/locust/

Rabbitmq remote call with Pika

I am new to rabbitmq and trying to figure out how I can make a client request a server with information about memory and CPU utilization with this tutorial (https://www.rabbitmq.com/tutorials/tutorial-six-python.html).
So the client requests for CPU and memory ( I believe I will need two queues) and the server respond with the values.
Is there anyway to simple create a client.py and server.py with this case using the Pika library in Python.
I would recommend you to follow the first RabbitMQ tutorials if you haven't already. The RPC example builds on concepts covered on previous examples (direct queues, exclusive queues, acknowledgements, etc.).
The RPC solution proposed on the tutorial requires at least two queues, depending on how many clients you want to use:
One direct queue (rpc_queue), used to send requests from the client to the server.
One exclusive queue per client, used to receive responses.
The request/response cycle:
The client sends a message to the rpc_queue. Each message includes a reply_to property, with the name of the client exclusive queue the server should reply to, and a correlation_id property, which is just an unique id used to track the request.
The server waits for messages on the rpc_queue. When a message arrives, it prepares the response, adds the correlation_id to the new message, and sends it to the queue defined in the reply_to message property.
The client waits on its exclusive queue until it finds a message with the correlation_id that was originally generated.
Jumping straight to your problem, the first thing to do is to define the message format you'll want to use on your responses. You can use JSON, msgpack or any other serialization library. For example, if using JSON, one message could look something like this:
{
"cpu": 1.2,
"memory": 0.3
}
Then, on your server.py:
def on_request(channel, method, props, body):
response = {'cpu': current_cpu_usage(),
'memory': current_memory_usage()}
properties = pika.BasicProperties(correlation_id=props.correlation_id)
channel.basic_publish(exchange='',
routing_key=props.reply_to,
properties=properties,
body=json.dumps(response))
channel.basic_ack(delivery_tag=method.delivery_tag)
# ...
And on your client.py:
class ResponseTimeout(Exception): pass
class Client:
# similar constructor as `FibonacciRpcClient` from tutorial...
def on_response(self, channel, method, props, body):
if self.correlation_id == props.correlation_id:
self.response = json.loads(body.decode())
def call(self, timeout=2):
self.response = None
self.correlation_id = str(uuid.uuid4())
self.channel.basic_publish(exchange='',
routing_key='rpc_queue',
properties=pika.BasicProperties(
reply_to=self.callback_queue,
correlation_id=self.correlation_id),
body='')
start_time = time.time()
while self.response is None:
if (start_time + timeout) < time.time():
raise ResponseTimeout()
self.connection.process_data_events()
return self.response
As you see, the code is pretty much the same as the original FibonacciRpcClient. The main differences are:
We use JSON as data format for our messages.
Our client call() method doesn't require a body argument (there's nothing to send to the server)
We take care of response timeouts (if the server is down, or if it doesn't reply to our messages)
Still, there're a lot of things to improve here:
No error handling: For example, if the client "forgets" to send a reply_to queue, our server is gonna crash, and will crash again on restart (the broken message will be requeued infinitely as long as it isn't acknowledged by our server)
We don't handle broken connections (no reconnection mechanism)
...
You may also consider replacing the RPC approach with a publish/subscribe pattern; in this way, the server simply broadcasts its CPU/memory state every X time interval, and one or more clients receive the updates.

Batching and queueing in a real-time webserver

I need a webserver which routes the incoming requests to back-end workers by batching them every 0.5 second or when it has 50 http requests whichever happens earlier. What will be a good way to implement it in python/tornado or any other language?
What I am thinking is to publish the incoming requests to a rabbitMQ queue and then somehow batch them together before sending to the back-end servers. What I can't figure out is how to pick multiple requests from the rabbitMq queue. Could someone point me to right direction or suggest some alternate apporach?
I would suggest using a simple python micro web framework such as bottle. Then you would send the requests to a background process via a queue (thus allowing the connection to end).
The background process would then have a continuous loop that would check your conditions (time and number), and do the job once the condition is met.
Edit:
Here is an example webserver that batches the items before sending them to any queuing system you want to use (RabbitMQ always seemed overcomplicated to me with Python. I have used Celery and other simpler queuing systems before). That way the backend simply grabs a single 'item' from the queue, that will contain all required 50 requests.
import bottle
import threading
import Queue
app = bottle.Bottle()
app.queue = Queue.Queue()
def send_to_rabbitMQ(items):
"""Custom code to send to rabbitMQ system"""
print("50 items gathered, sending to rabbitMQ")
def batcher(queue):
"""Background thread that gathers incoming requests"""
while True:
batcher_loop(queue)
def batcher_loop(queue):
"""Loop that will run until it gathers 50 items,
then will call then function 'send_to_rabbitMQ'"""
count = 0
items = []
while count < 50:
try:
next_item = queue.get(timeout=.5)
except Queue.Empty:
pass
else:
items.append(next_item)
count += 1
send_to_rabbitMQ(items)
#app.route("/add_request", method=["PUT", "POST"])
def add_request():
"""Simple bottle request that grabs JSON and puts it in the queue"""
request = bottle.request.json['request']
app.queue.put(request)
if __name__ == '__main__':
t = threading.Thread(target=batcher, args=(app.queue, ))
t.daemon = True # Make sure the background thread quits when the program ends
t.start()
bottle.run(app)
Code used to test it:
import requests
import json
for i in range(101):
req = requests.post("http://localhost:8080/add_request",
data=json.dumps({"request": 1}),
headers={"Content-type": "application/json"})

URLLib3 Connection Pool creates only one pool

Currently i'm trying to scrape a site but the site didn't allow more than 100 request for one tcp connection. So, i tried to create multiple connection pool for requests. I tried the following code. Shouldn't it create 15 connection pool?
from urllib3 import HTTPConnectionPool
for i in range(15):
pool = HTTPConnectionPool('ajax.googleapis.com', maxsize=15)
for j in range(15):
resp= pool.request('GET', '/ajax/services/search/web')
pool.num_connections
pool.num_connection always print 1
The issue is that the requests are made synchronously one after the other.
For this reason the pool will always use the same connection with no need to create any others.
Now let's say we run the code using threads, multiple requests will be issued concurrently.
In this case pool.num_connections will be greater than 1:
from concurrent.futures.thread import ThreadPoolExecutor
from urllib3 import HTTPConnectionPool
pool = HTTPConnectionPool('ajax.googleapis.com', maxsize=15)
def send_request(_):
pool.request('GET', '/ajax/services/search/web')
print(pool.num_connections)
with ThreadPoolExecutor(max_workers=5) as executor:
executor.map(send_request, range(5))
If you need to close sockets every 100 requests, then you'll need to do that manually. Here's an example which closes all sockets every 5 requests:
import urllib3
urllib3.add_stderr_logger() # This lets you see when new connections are made
http = urllib3.PoolManager()
url = 'http://ajax.googleapis.com/ajax/services/search/web'
for j in range(15):
resp = http.request('GET', url)
if j % 5 == 0:
# Reset the PoolManager's connections.
# This might be overkill if you need more granular control per-host.
http.clear()
You could do something similar using HTTPConnectionPool and doing .close() on it before replacing it with a fresh one. I prefer to use PoolManager when possible (there is generally no downside).
If you'd like to get super granular with connections, you can manually take connections out of an HTTPConnectionPool using pool._get_conn() and .close()'ing it.

Gevent async server with blocking requests

I have what I would think is a pretty common use case for Gevent. I need a UDP server that listens for requests, and based on the request submits a POST to an external web service. The external web service essentially only allows one request at a time.
I would like to have an asynchronous UDP server so that data can be immediately retrieved and stored so that I don't miss any requests (this part is easy with the DatagramServer gevent provides). Then I need some way to send requests to the external web service serially, but in such a way that it doesn't ruin the async of the UDP server.
I first tried monkey patching everything and what I ended up with was a quick solution, but one in which my requests to the external web service were not rate limited in any way and which resulted in errors.
It seems like what I need is a single non-blocking worker to send requests to the external web service in serial while the UDP server adds tasks to the queue from which the non-blocking worker is working.
What I need is information on running a gevent server with additional greenlets for other tasks (especially with a queue). I've been using the serve_forever function of the DatagramServer and think that I'll need to use the start method instead, but haven't found much information on how it would fit together.
Thanks,
EDIT
The answer worked very well. I've adapted the UDP server example code with the answer from #mguijarr to produce a working example for my use case:
from __future__ import print_function
from gevent.server import DatagramServer
import gevent.queue
import gevent.monkey
import urllib
gevent.monkey.patch_all()
n = 0
def process_request(q):
while True:
request = q.get()
print(request)
print(urllib.urlopen('https://test.com').read())
class EchoServer(DatagramServer):
__q = gevent.queue.Queue()
__request_processing_greenlet = gevent.spawn(process_request, __q)
def handle(self, data, address):
print('%s: got %r' % (address[0], data))
global n
n += 1
print(n)
self.__q.put(n)
self.socket.sendto('Received %s bytes' % len(data), address)
if __name__ == '__main__':
print('Receiving datagrams on :9000')
EchoServer(':9000').serve_forever()
Here is how I would do it:
Write a function taking a "queue" object as argument; this function will continuously process items from the queue. Each item is supposed to be a request for the web service.
This function could be a module-level function, not part of your DatagramServer instance:
def process_requests(q):
while True:
request = q.get()
# do your magic with 'request'
...
in your DatagramServer, make the function running within a greenlet (like a background task):
self.__q = gevent.queue.Queue()
self.__request_processing_greenlet = gevent.spawn(process_requests, self.__q)
when you receive the UDP request in your DatagramServer instance, you push the request to the queue
self.__q.put(request)
This should do what you want. You still call 'serve_forever' on DatagramServer, no problem.

Categories