In order to test our server we designed a test that sends a lot of requests with JSON payload and compares the response it gets back.
I'm currently trying to find a way to optimize the process by using multi threads to do so. I didn't find any solution for the problem that I'm facing though.
I have a url address and a bunch of JSON files (these files hold the requests, and for each request file there is an 'expected response' JSON to compare the response to).
I would like to use multi threading to send all these requests and still be able to match the response that I get back to the request I sent.
Any ideas?
Well, you have couple of options:
Use multiprocessing.pool.ThreadPool (Python 2.7) where you create pool of threads and then use them for dispatching requests. map_async may be of interest here if you want to make async requests,
Use concurrent.futures.ThreadPoolExecutor (Python 3) with similar way of working with ThreadPool pool and yet it is used for asynchronously executing callables,
You even have option of using multiprocessing.Pool, but I'm not sure if that will give you any benefit since everything you will be doing is I/O bound, so threads should do just fine,
You can make asynchronous requests with Twisted or asyncio but that may require a bit more learning if you are not accustomed to asynchronous programming.
Use python multiprocessing threadpool where you can get return value can be compared.
https://docs.python.org/2/library/multiprocessing.html
https://gist.github.com/wrunk/b689be4b59270c32441c
Related
I'm trying to learn some network/backend stuff.
I now want to build an API that makes an HTTP request, does some processing, sends back a response. Not very useful, but it's for learning.
I noticed that the get request is a huge bottleneck. It is a I/O problem i think because the respones are veery small.
Now I thought I could maybe do the downloading on multiple threads. If a fictional client of mine makes a request, an URL would need to be added to a pool, then fetched (by some worker thread) und returned to the worker thread, processed, and send back. Or something like that...
I'm really not an expert and maybe nothing what I just said made any sense... but I would really appreciate a little help:)
Multiple solutions exist.
You can use threading (thread pools) or multiprocessing (multiprocessing pools) to perform multiple requests in parallel.
Or you could use libraries like asyncio (or twisted) to perform multiple requests within one thread in a way, that waiting for IO is no more the blocking point.
I suggest you look at:
https://docs.python.org/3/library/threading.html for threading
or https://docs.python.org/3/library/multiprocessing.html?highlight=multiprocessing#module-multiprocessing for multiprocessing.
Asynchronous programming is in my opinion much more difficult, but if curious look at
https://docs.python.org/3/library/asyncio.html?highlight=asyncio#module-asyncio for asyncio basics and at https://docs.aiohttp.org/en/stable/ for performing multiple http requests in 'parallel' with asyncio
Afterwards after playing a little you will probably have much pore precise questions.
Just post your code then, explain issues and you will get more help
I have been searching how to open parallel processes in Python and have stumbled upon concurrent futures and multiprocessing as the most interesting options. Sadly I haven't been able to implement them correctly since they seem to be working one worker after another instead of at the same time. Right now my process is taking a little too long and I think I can make it faster.
Say I coded a function in Python that is connecting to a Kafka queue and reading json messages, each message will then be sent to a rest service using requests and getting some information in order to complete the data and be posted afterwards; then I'm just updating my database with the response.
I need to be able to have this function run in several processes and read from the queue at the same time while doing the requests until I'm out messages.
What would be the best approach to do so?
So I have this problem I am trying to solve in a particular way, but I am not sure how hard it is to achieve.
I would like to use the asyncio/coroutines features of Python 3.4 to trigger many concurrent http requests using a blocking http library, like requests, or any Python api that does http requests like boto for aws.
I know about run_in_executor() method to run tasks in threads/processes, but I would like to avoid that.
I would like to do it in a single-thread, using those select features in Linux/Unix kernel.
Actually I was following David Beazley's presentation on this, and I was trying to use this code: https://github.com/dabeaz/concurrencylive/blob/master/aserver.py
but without the future/pool stuff, and use my blocking-api call instead of computing the Fibonacci number.
Put it seems that the http requests are still running in sequence.
Any ideas if this is possible? And how?
Thanks
Not possible. All the calls that the requests library makes to the underlying socket are blocking (i.e. socket.read) because the socket is in blocking mode. You could put the socket into non-blocking mode, but then socket.read would fail. You basically need an event-loop to tell you when it's possible to do a socket.read, but blocking libraries aren't written with one in mind. This is the whole reason why asyncio exists; providing a default event-loop that different libraries can share and make use of non-blocking file descriptors (e.g. sockets).
Use aiohttp, it's just as easy as requests and in the process you get to learn more about asyncio. asyncio and the new Python 3.5 async/await syntax are the Future of networking IO; yield to it (pun intended).
In my application, I am sending off several request.post() requests in threads. Depending on the amount of data I have to post, the number of threads created can be in their hundreds.
The actual creation of the request object is made using requests-oauthlib, which inserts authentication data into the request object when it is used.
My issue is that when there is a large amount of data being sent in parallel, that the log is flooded with the following messages, and eventually no more input is sent to the log:
Connection pool is full. Discarding connection.
My question is, with the use of requests-oauthlib, is there a way to specity, perhaps within the post method itself, the size of the connection pool, or whether it should block so that other requests can complete before creating more? I ask for this because with the use of requests-oauthlib, it would be tricky to construct a custom request object, and ask requests-oauthlib to use it.
One thing I have tried is as follows, but it had no effect - I continued to get the warnings:
import requests
s = requests.Session()
a = requests.adapters.HTTPAdapter(pool_block=True)
s.mount('http://', a)
s.mount('https://', a)
Update - The threads are now being created in a controlled manner.
with futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.submit(function, args)
The easiest way to block the requests so only N of them are trying to use the connection pool at once is to only create N at a time.
The easiest way to do that is to use a pool of N threads servicing queue of M requests, instead of a separate thread for every request. If you're using Python 3.2+, this is very easy with the concurrent.futures library—in fact, it's nearly identical to the first ThreadPoolExecutor example, except that you're using requests instead of urllib. If you're not using 3.2+, there's a backport of the stdlib module named futures that provides the same functionality back to… I think 2.6, but don't quote me on that (PyPI is down at the moment).
There may be an even easier solution: there's a third-party library named requests-futures that, I'm guessing from the name (again, PyPI down…), wraps that up for you in some way.
You may also want to consider using something like grequests to do it all in one thread with gevent greenlets, but that won't be significantly different, as far as your code is concerned, from using a thread pool.
I need to write a proxy like program in Python, the work flow is very similar to a web proxy. The program sits in between the client and the server, incept requests sent by the client to the server, process the request, then send it to the original server. Of course the protocol used is a private protocol uses TCP.
To minimize the effort, I want to use Python Twisted to handle the request receiving (the part acts as a server) and resending (the part acts as a client).
To maximum the performance, I want to use python multiprocessing (threading has the GIL limit) to separate the program into three parts (processes). The first process runs Twisted to receive requests, put the request in a queue, and return success immediately to the original client. The second process take request from the queue, process the request further and put it to another queue. The 3rd process take request from the 2nd queue and send it to the original server.
I was a new comer to Python Twisted, I know it is event driven, I also heard it's better to not mix Twisted with threading or multiprocessing. So I don't know whether this way is appropriate or is there a more elegant way by just using Twisted?
Twisted has its own event-driven way of running subprocesses which is (in my humble, but correct, opinion) better than the multiprocessing module. The core API is spawnProcess, but tools like ampoule provide higher-level wrappers over it.
If you use spawnProcess, you will be able to handle output from subprocesses in the same way you'd handle any other event in Twisted; if you use multiprocessing, you'll need to develop your own queue-based way of getting output from a subprocess into the Twisted mainloop somehow, since the normal callFromThread API that a thread might use won't work from another process. Depending on how you call it, it will either try to pickle the reactor, or just use a different non-working reactor in the subprocess; either way it will lose your call forever.
ampoule is the first thing I think when reading your question.
It is a simple process pool implementation which uses the AMP protocol to communicate. You can use the deferToAMPProcess function, it's very easy to use.
You can try something like Cooperative Multitasking technique as it's described there http://us.pycon.org/2010/conference/schedule/event/73/ . It's simillar to technique as Glyph menitioned and it's worth a try.
You can try to use ZeroMQ with Twisted but it's really hard and experimental for now :)