I was able to get the response of one API from another but unable to store it somewhere(in a file or something before returning the response)
response=RedirectResponse(url="/apiname/") (I want to access a post request with header and body)
I want to store this response content without returning it.
Yes, if I return the function I will get the results but when I print it I don't find results.
Also, if I give post request then I get error Entity not found.
I read the starlette and fastapi docs but couldn't get the workaround. The callbacks also didn't help.
I didn't exactly get the way to store response without returning using fastapi/starlette directly. But I found a workaround for completing this task.
For the people trying to implement same thing, Please consider this
way.
import requests
def test_function(request: Request, path_parameter: path_param):
request_example = {"test" : "in"}
host = request.client.host
data_source_id = path_parameter.id
get_test_url= f"http://{host}/test/{id}/"
get_inp_url = f"http://{host}/test/{id}/inp"
test_get_response = requests.get(get_test_url)
inp_post_response = requests.post(get_inp_url , json=request_example)
if inp_post_response .status_code == 200:
print(json.loads(test_get_response.content.decode('utf-8')))
Please let me know if there are better approaches.
I have the same problem & I needed to call the third-party API with async way
So I tried many ways & I came solution with requests-async library
and it works for me.
import http3
client = http3.AsyncClient()
async def call_api(url: str):
r = await client.get(url)
return r.text
#app.get("/")
async def root():
...
result_1 = await call_api('url_1')
result_2 = await call_api('url_2')
...
httpx also you can use
this video he is using httpx
Related
I have this code in python:
session = requests.Session()
for i in range(0, len(df_1)):
page = session.head(df_1['listing_url'].loc[i], allow_redirects=False, stream=True)
if page.status_code == 200:
df_1['condition'][i] = 'active'
else:
df_1['condition'][i] = 'false'
df_1 is my data frame and the column "listing_url" have more than 500 lines.
I want to Request if the URL list is active and append this in my data frame. But this code demands a long time. How can I reduce my time?
The problem with your current approach is that requests runs sequentially (synchronously), which means that a new request can't be sent before the prior one is finished.
What you are looking for is handling those requests asynchronously. Sadly, requests library does not support asynchronous requests. A newer library that has similar API to requests but can do that is httpx. aiohttp is another popular choice. With httpx you can do something like this:
import asyncio
import httpx
listing_urls = list(df_1['listing_url'])
async def do_tasks():
async with httpx.AsyncClient() as client:
tasks = [client.head(url) for url in listing_urls]
responses = await asyncio.gather(*tasks)
return {r.url: r.status_code for r in responses}
url_2_status = asyncio.run(do_tasks())
This will give you a mapping of {url: status_code}. You should be able to go from there.
This solution assumes you are using Python3.7 or newer. Also remember to install httpx.
I would like to write a FastAPI endpoint, with a Swagger page (or something similar) that will accept a non-encoded URL as input. It should preferably use GET, not POST method.
Here's an example of a GET endpoint that does require double-URL encoding.
#app.get("/get_from_dub_encoded/{double_encoded_url}")
async def get_from_dub_encoded(double_encoded_url: str):
"""
Try https%253A%252F%252Fworld.openfoodfacts.org%252Fapi%252Fv0%252Fproduct%252F7622300489434.json
"""
original_url = urllib.parse.unquote(urllib.parse.unquote(double_encoded_url))
response = requests.get(original_url)
return response.json()
Which generates a Swagger interface as below.
The following PUT request does solve my problem, but the simplicity of a GET request with a form is better for my co-workers.
class InputModel(BaseModel):
unencoded_url: AnyUrl = Field(description="An unencoded URL for an external resource", format="url")
#app.post("/unencoded-url")
def unencoded_url(inputs: InputModel):
response = requests.get(inputs.unencoded_url)
return response.json()
How can I deploy a convenient interface like that without requiring users to write the payload for a PUT request or to perform double URL encoding?
This post has some helpful related discussion, but doesn't explicitly address the FORM solution: How to pass URL as a path parameter to a FastAPI route?
You can use Form instead of query parameters as payload.
from fastapi import FastAPI, Form
import requests
app = FastAPI()
#app.post("/")
def get_url(url: str = Form()):
"""
Try https://world.openfoodfacts.org/api/v0/product/7622300489434.json
"""
response = requests.get(url)
return response.json()
Swagger interface would look like:
You'll need to install python-multipart.
Tip: Don't use async endpoint if you are using requests or any non-async library.
I want to use HTTPX (within FastAPI, if that matters) to make asynchronous http requests to an outside API and store the responses as individual variables for processing in slightly different ways depending on which URL was fetched. I'm modifying the code from this StackOverflow answer.
import asyncio
import httpx
async def perform_request(client, url):
response = await client.get(url)
return response.text
async def gather_tasks(*urls):
async with httpx.AsyncClient() as client:
tasks = [perform_request(client, url) for url in urls]
result = await asyncio.gather(*tasks)
return result
async def f():
url1 = "https://api.com/object=562"
url2 = "https://api.com/object=383"
url3 = "https://api.com/object=167"
url4 = "https://api.com/object=884"
result = await gather_tasks(url1, url2, url3, url4)
# print(result[0])
# print(result[1])
# DO THINGS WITH url2, SOMETHING ELSE WITH url4, ETC.
if __name__ == '__main__':
asyncio.run(f())
What's the best way to access the individual responses? (If I use result[n] I wouldn't know which response I'm working with.)
And I'm pretty new to httpx and async operations in general so please share if you have any suggestions for how to achieve it in a better way.
Regardless of AsyncIO, I would probably put the logic inside gather_tasks. There you know the response, and you can define all the if else logic you want to proceed with the right path.
In my opinion you have two options:
1 - Process the request right away
In this case f would only initialize the urls and trigger the processing, everything else would happen inside gather_tasks.
2 - "Enrich" the response
In gather_tasks you can understand which kind of operation to do next, and "attach" to the response some sort of code to define it. For example, you could return a dict with two keys: response and operation. This would be the most explicit way of doing this, but you could also use a list or a tuple, you just need to know where the response and the "next step code" is within them.
This is useful if the further processing must happen later instead of right away.
Makes sense?
Apologies for asking with what may be considered redundant, but I'm finding it extremely difficult to figure out what are the current recommended best practices for using asyncio and aiohttp.
I'm working with an API that ultimately returns a link to a generated CSV file. There are two steps in using the API.
Submit request the triggers a long running process and returns a status URL.
Poll the status URL until the status_code is 201 and then get the URL of the CSV file from the headers.
Here's a stripped down example of how I can successfully do this synchronously with requests.
import time
import requests
def submit_request(id):
"""Submit request to create CSV for specified id"""
body = {'id': id}
response = requests.get(
url='https://www.example.com/endpoint',
json=body
)
response.raise_for_status()
return response
def get_status(request_response):
"""Check whether the CSV has been created."""
status_response = requests.get(
url=request_response.headers['Location']
)
status_response.raise_for_status()
return status_response
def get_data_url(id, poll_interval=10):
"""Submit request to create CSV for specified ID, wait for it to finish,
and return the URL of the CSV.
Wait between status requests based on poll_interval.
"""
response = submit_request(id)
while True:
status_response = get_status(response)
if status_response.status_code == 201:
break
time.sleep(poll_interval)
data_url = status_response.headers['Location']
return data_url
What I'd like to do is be able to submit a group of requests at once, and then wait on all of them to be finished. But I'm not clear on how to structure this with asyncio and aiohttp.
One option would be to first submit all of the requests and then use await.gather (or something) to get all of the status URLs. Then start another event loop where I continuously poll the status_urls until they have all completed and I end up with a list of data URLs.
Alternatively, I suppose I could create a single function that submits the request, gets the status URL, and then polls that until it completes. In that case I would just have a single event loop where I submit each of the IDs that I want processed.
If some pseudo code for those options would be useful I can try to provide it. I've looked at a lot of different examples where you submit requests for a bunch of URLs asynchronously -- this for example -- but I'm finding that I get a bit lost when trying to translate them to this slightly more complicated scenario where I submit the request and then get back a new URL to poll.
FYI based on the comments above my current solution is something like this.
import asyncio
import aiohttp
async def get_data_url(session, id):
url = 'https://www.example.com/endpoint'
body = {'id': id}
async with session.post(url=url, json=body) as response:
response.raise_for_status()
status_url = response.headers['Location']
while True:
async with session.get(url=status_url) as status_response:
status_response.raise_for_status()
if status_response.status == 201:
return status_response.headers['Location']
await asyncio.sleep(10)
async def main(access_token, id):
headers = {'token': access_token}
async with aiohttp.ClientSession(headers=headers) as session:
data_url = await get_data_url(session, id)
return data_url
This works though I'm still not sure on best practices for submitting a set of IDs. I think asyncio.gather would work but it looks like it's deprecated. Ideally I would have a queue of say 100 IDs and only have 5 requests running at any given time. I've found some examples like this but they depend on asyncio.Queue which is also deprecated.
I am writing a web crawler using asyncio/aiohttp. I want the crawler to only want to download HTML content, and skip everything else. I wrote a simple function to filter URLS based on extensions, but this is not reliable because many download links do not include a filename/extension in them.
I could use aiohttp.ClientSession.head() to send a HEAD request, check the Content-Type field to make sure it's HTML, and then send a separate GET request. But this will increase the latency by requiring two separate requests per page (one HEAD, one GET), and I'd like to avoid that if possible.
Is it possible to just send a regular GET request, and set aiohttp into "streaming" mode to download just the header, and then proceed with the body download only if the MIME type is correct? Or is there some (fast) alternative method for filtering out non-HTML content that I should consider?
UPDATE
As requested in the comments, I've included some example code of what I mean by making two separate HTTP requests (one HEAD request and one GET request):
import asyncio
import aiohttp
urls = ['http://www.google.com', 'http://www.yahoo.com']
results = []
async def get_urls_async(urls):
loop = asyncio.get_running_loop()
async with aiohttp.ClientSession() as session:
tasks = []
for u in urls:
print(f"This is the first (HEAD) request we send for {u}")
tasks.append(loop.create_task(session.get(u)))
results = []
for t in asyncio.as_completed(tasks):
response = await t
url = response.url
if "text/html" in response.headers["Content-Type"]:
print("Sending the 2nd (GET) request to retrive body")
r = await session.get(url)
results.append((url, await r.read()))
else:
print(f"Not HTML, rejecting: {url}")
return results
results = asyncio.run(get_urls_async(urls))
This is a protocol problem, if you do a GET, the server wants to send the body. If you don't retrieve the body you have to discard the connection (this is in fact what it does if you don't do a read() before __aexit__ on the response).
So the above code should do more of less what you want. NOTE the server may send in the first chunk already more than just the headers