Sending many post requests [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm relatively new to Python and requests, so I'm not sure the best way to go about this.
I need to send a large amount of POST requests to a URL. Right now, I'm simply using a loop and sending the request, which yields roughly 100 posts every 10 - 30 seconds, depending on the internet. I'm looking for a way to do this faster and with more posts. Multiprocessing was recommended to me, but my knowledge here is very lacking (I've already frozen my computer trying to spawn too many processes).
How can I effectively implement multiprocessing to increase my results?

Here is a code sample taken from http://skipperkongen.dk/2016/09/09/easy-parallel-http-requests-with-python-and-asyncio/ which may solve your problem. It uses the requests library to make the request and asyncio for the asynchronous calls. The only change you'd have to make is from a GET call to a POST call.
This was written in Python 3.5 (as expressed in the article)
# Example 2: asynchronous requests
import asyncio
import requests
async def main():
loop = asyncio.get_event_loop()
futures = [
loop.run_in_executor(
None,
requests.get,
'http://example.org/'
)
for i in range(20)
]
for response in await asyncio.gather(*futures):
pass
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
I would also recommend reading the entire article as it shows time comparisons when using lots of threads.

There's no reason to use multiprocessing here. Making requests of HTTP servers is almost entirely I/O-bound, not CPU-bound, so threads work just fine.
And the very first example of using ThreadPoolExecutor in the stdlib's concurrent.futures documentation does exactly what you're asking for, except with urllib instead of requests.
If you're doing anything complicated, look at requests-futures.
If you really do need to use multiprocessing for some reason (e.g., you're doing a whole lot of text processing on each result, and you want to parallelize that along with the requesting), you can just switch the ThreadPoolExecutor to a ProcessPoolExecutor and change nothing else in your code.

Related

Sending requests to different API endpoints every N seconds [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last month.
Improve this question
I use an API that has ~30 endpoints and I have settings how often I need to send request to each endpoint. For some endpoints it's seconds and for some hours. I want to implement python app that will call each API endpoint (and execute some code) after every N seconds where N can be different for each endpoint. If one call is still in progress when second one kicks in, then that one should be added to queue (or something similar) and executed after the first one finishes.
What would be the correct way to implement this using python?
I have some experience with RabbitMQ but I think that might be overkill for this problem.
You said "executed after the first one finishes", so it's a single thread program.
Just use def() to create some functions and then execute them one by one.
For example
import time
def task1(n):
print("Task1 start")
time.sleep(n)
print("Task1 end ")
def task2(n):
print("Task2 start")
time.sleep(n)
print("Task2 end ")
task1(5) #After 5sec, task1 end and execute task2
task2(3) #task2 need 3sec to execute.
You could build your code in this way:
store somewhere the URL, method and parameters for each type of query. A dictionary would be nice: {"query1": {"url":"/a","method":"GET","parameters":None} , "query2": {"url":"/b", "method":"GET","parameters":"c"}} but you can do this any way you want, including a database if needed.
store somewhere a relationship between query type and interval. Again, you could do this with a case statement, or with a dict (maybe the same you previously used), or an interval column in a database.
Every N seconds, push the corresponding query entry to a queue (queue.put)
an HTTP client library such as requests runs continuously, removes an element from the queue, runs the HTTP request and when it gets a result it removes the following element.
Of course if your code is going to be distributed across multiple nodes for scalability or high availability, you will need a distributed queue such as RabbitMQ, Ray or similar.

Python: Async IO Tasks vs. Threads [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 12 months ago.
Improve this question
I recently heard of this feature in Python3.7+ where the asyncio brought a thing called "tasks" which people refer to as background tasks. So that's may first question:
Do these tasks really run in background?
Also, when comparing asyncio tasks to threads in Python, we know that python has a GIL. So, there's nothing like parallel. I know the difference in core structure i.e. asyncio tasks run in an event loop inside the same thread, while python threads are simply forked threads. But when it comes to speed, none of these are parallel.
We can call them concurrent instead. So the second question is:
Which of these two would be faster?
A few things I got to know about memory consumption is:
Threads consume a fair amount of data since each thread needs to have its own stack. With async code, all the code shares the same stack and the stack is kept small due to continuously unwinding the stack between tasks.
Threads are OS structures and therefore require more memory for the platform to support. There is no such problem with asynchronous tasks.
References:
What does asyncio.create_task() do?
How does asyncio actually work?
Coming to my last question:
When should you use asyncio tasks compared to threads? (This question has came in my mind as we can even fire async task from sync code)

API gets stuck after not so many calls [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm working with an API and the documentation doesn't state the exact limits on the requests I make, this causes my app to suddenly stop working because of long waiting periods and eventually timeouts.
Is there a way to find out what the API limits are and build a workaround? such as "if API limits are 5 requests per minute then wait a minute before sending the 6th request" or so ...
The API I'm talking about here is the TD Ameritrade API, documentation:
https://developer.tdameritrade.com/home
I'm coding with Python.
Thanks for anybody who helps.
Edit: Problem was solved, the API can handle 120 calls per minute.
Yes, there is a limit every minute. So, it's says at the bottom of this page : https://developer.tdameritrade.com/content/authentication-faq
All non-order based requests by personal use non-commercial applications are throttled to 120 per minute. Exceeding this throttle limit will provide a response with a 429 error code to inform you that the throttle limit has been exceeded.
API calls, especially private accounts are restricted to be able to preserve processing power to people who pay for the service, like companies do.
For about 2 minutes of searching in the documentation, I managed to find this line:
All private, non-commercial use apps are currently limited to 120 requests per minute on all APIs except for Accounts & Trading
Please, read the docs carefully before posting here!
By the way, you can calculate that you have 120 calls / 60 seconds, which means 1 call / 0.5 second.
You can simply sleep for that amount of time, or delay the call of a new thread, if your app is designed that way.
Since you did not provided any code, I will show you a basic example using sleep.
import time
while True: #main loop
apicall() #apicall here
time.sleep(1) #sleep 1 second after each call
But I strongly suggest adding your code to the question, so people can provide you better solutions.

Python: Create a microservice that uses a multiprocess worker pool to answer queries [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I have a Python class, named Brish. This class takes time to initialize, and can answer queries ala brish.z("QUERY HERE"). I want to create a microservice for my local machine that accepts queries with a REST HTTP API, and uses load-balancing of a worker pool (with a fixed capacity of, say, 4 Brish instances) to answer these queries. I want these workers to be multiprocess, so that they can take full advantage of the CPU cores available. What libraries/design patterns should I use?
I have worked with Scala's Akka, and I am familiar with the actor model. I have taken a look at multiprocessing.Pool, Ray, Pykka, and aioprocessing, but after the ~2 hours I have spent looking at their docs, I am still confused which of them is the tool I need.
PS: The REST HTTP API can be replaced with any API that I can easily use from bash.
You can use concurrent.futures which will allow you to create a pool of processes. Number of processes can be set to either the number of cores available or as many as you want.
Here is an amazing video which explains how to use this module.
I'd suggest you to watch the above video first, then read the docs
I found an answer myself, though I don't know how buggy it is.
from typing import Optional
from fastapi import FastAPI, Response
app = FastAPI()
import time
import brish
brishes = [brish.Brish() for i in range(4)]
#app.get("/zsh/")
def cmd_zsh(cmd: str, verbose: Optional[int] = 0):
while len(brishes) <= 0:
time.sleep(1)
myBrish = brishes.pop()
res = myBrish.z(cmd)
brishes.append(myBrish)
if verbose == 0:
return Response(content=res.outerr, media_type="text/plain")
else:
return {"cmd": cmd, "brishes": len(brishes), "out": res.out, "err": res.err, "retcode": res.retcode}

Python Script - Improve Speed [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have a working Python script that checks the 6,300 or so sites we have to ensure they are up by sending an HTTP request to each and measuring the response. Currently the script takes about 40 min to run completely, I was interested in possibly some other ways to speed up the script, two thoughts were either threading or multiple running instances.
This is the order of execution now:
MySQL query to get all of the active domains to scan (6,300 give or take)
Iterate through each domain and using urllib send an HTTP request to each
If the site doesn't return '200' then log the results
repeat until complete
This seems like it could possibly be sped up significantly with threading but I am not quite sure how that process flow would look since I am not familiar with threading.
If someone could offer a sample high-level process flow and any other pointers for working with threading or offer any other insights on how to improve the script in general it would be appreciated.
The flow would look something like this:
Create a domain Queue
Create a result Queue
MySQL query to get all of the active domains to scan
Put the domains in the domain Queue
Spawn a pool of worker threads
Run the threads
Each worker will get a domain from the domain Queue, send a request and put the result in the result Queue
Wait for the threads to finish
Get everything from the result Queue and log it
You'll probably want to tune the number of threads, thus the pool, and not just 6300 threads for every domain.
You can take a look at scrapy framework. It's made for web scraping. It's asynchronus build on twisted and pretty fast.
In your case you can just get list of domains to scrape, and only see if it will return 200 without actually scraping anything. It should be much faster.
Here's the link:
http://scrapy.org/
Threading is definitely what you need. It will remove the serialized nature of your algorithm, and since it is mostly IO bounded, you will gain a lot by sending HTTP requests in parallel.
Your flow would become:
MySQL query to get all of the active domains to scan (6,300 give or take)
Iterate through each domain and create a thread that will use urllib to send an HTTP request to each
Log the results in threads
You can make this algorithm better by creating a n worker threads with queues, and add domains to queues instead of creating one thread per each domain. I just wanted to make things a little bit easier for you since you're not familiar with threads.
I guees you should go for threading, taking under investigation the optimal number of processes to start in order to avoid killing your client. Python manual offers good examples by the way take a look here Download multiple pages concurrently?
and to urllib, threading, multiprocessing

Categories