Create url without request execution - python

I am currently using the python requests package to make JSON requests. Unfortunately, the service which I need to query has a daily maximum request limit. Right know, I cache the executed request urls, so in case I come go beyond this limit, I know where to continue the next day.
r = requests.get('http://someurl.com', params=request_parameters)
log.append(r.url)
However, for using this log the the next day I need to create the request urls in my program before actually doing the requests so I can match them against the strings in the log. Otherwise, it would decrease my daily limit. Does anybody of you have an idea how to do this? I didn't find any appropriate method in the request package.

You can use PreparedRequests.
To build the URL, you can build your own Request object and prepare it:
from requests import Session, Request
s = Session()
p = Request('GET', 'http://someurl.com', params=request_parameters).prepare()
log.append(p.url)
Later, when you're ready to send, you can just do this:
r = s.send(p)
The relevant section of the documentation is here.

Related

History of retries using Request Library

I'm building a new retry feature in my Orchestrate script and I want to know how many times, and, if possible, what error my request method got when trying to connect to a specific URL.
For now, I need this for logging purposes, because I'm working on a messaging system and I may need this 'retry' information to understand when and why I'm facing any kind of problem in HTTP requests, once I work in a micro-service environment.
So far, I debugged and certify that retries are working as expected (I have a mocked flask server for all micro services that we use), but I couldn't find a way to got the 'retries history' data.
In other words, for example, I want to see if a specific micro-service may respond only after the third request, and those kind of thing.
Below is the code that I'm using now:
from requests import exceptions, Session
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
def open_request_session():
# Default retries configs
toggle = True #loaded from config file
session = Session()
if toggle:
# loaded from config file as
parameters = {'total': 5,
'backoff_factor': 0.0,
'http_error_to_retry': [400]}well
retries = Retry(total=parameters['total'],
backoff_factor=parameters['backoff_factor'],
status_forcelist=parameters['http_error_to_retry'],
# Do not force an exception when False
raise_on_status=False)
session.mount('http://', HTTPAdapter(max_retries=retries))
session.mount('https://', HTTPAdapter(max_retries=retries))
return session
# Request
with open_request_session() as request:
my_response = request.get(url, timeout=10)
I see in the urllib3 documentation that Retry has a history attribute, but when I try to consult the attribute it is empty.
I don't know if I'm doing wrong or forgetting something, once Software Development is not my best skill.
So, I have two questions:
Does anyone know a way to got this history information?
How can I do create tests to verify if the retries behavior is working as expected? (So far I only test in debug mode)
I'm using Python 3.6.8.
I know that I can create a while statement to 'control' this, but I'm trying to avoid complexity. And this is why I'm here, I'm looking for an alternative based on Python and community best practices.
A bit late, but I just figured this out myself so thought I'd share what I found.
Short answer:
response.raw.retries.history
will get you what you are looking for.
Long answer:
You cannot get the history off the original Retry instance created. Under the covers, urllib3 creates a new Retry instance for every attempt.
urllib3 does store the last Retry instance on the response when one is returned. However, the response from the requests library is a wrapper around the urllib3 response. Luckily, requests stores the original urllib3 response on the raw field.

Debugging a python requests module 400 error

I'm doing a post request using python and confluence REST API in order to update confluence pages via a script.
I ran into a problem which caused me to receive a 400 error in response to a
requests.put(url, data = jsonData, auth = (username, passwd), headers = {'Content-Type' : 'application/json'})
I spent some time on this to discover that the reason for it was me not supplying an incremented version when updating the content. I have managed to make my script work, but that is not the point of this question.
During my attempts to make this work, I swapped from requests to an http.client connection. Using this module, I get a lot more information regarding my error:
b'{"statusCode":400,"data":{"authorized":false,"valid":true,"allowedInReadOnlyMode":true,"errors":[],"successful":false},"message":"Must supply an incremented version when updating Content. No version supplied.","reason":"Bad Request"}'
Is there a way for me to get the same feedback information while using requests? I've turned on logging, but this kind of info is never shown.
You're looking for
requests.json()
It outputs everything the requests item returns, as a dictionary.

Python requests caching authentication headers

I have used python's requests module to do a POST call(within a loop) to a single URL with varying sets of data in each iteration. I have already used session reuse to use the underlying TCP connection for each call to the URL during the loop's iterations.
However, I want to further speed up my processing by 1. caching the URL and the authentication values(user id and password) as they would remain the same in each call 2. Spawning multiple sub-processes which could take a certain number of calls and pass them as a group thus allowing for parallel processing of these smaller sub-processes
Please note that I pass my authentication as headers in base64 format and a pseudo code of my post call would typically look like this:
S=requests.Session()
url='https://example.net/'
Loop through data records:
headers={'authorization':authbase64string,'other headers'}
data="data for this loop"
#Post call
r=s.post(url,data=data,headers=headers)
response=r.json()
#end of loop and program
Please review the scenario and suggest any techniques/tips which might be of help-
Thanks in advance,
Abhishek
You can:
do it as you described (if you want to make it faster then you can run it using multiprocessing) and e.g. add headers to session, not request.
modify target server and allow to send one post request with multiple data (so you're going to limit time spent on connecting, etc)
do some optimalizations on server side, so it's going to reply faster (or just store requests and send you response using some callback)
It would be much easier if you described the use case :)

In requests_cache, does the sqlite file contain the cached request time and age?

Consider the following code:
import requests
import requests_cache
requests_cache.install_cache(expire_after=7200)
url = 'http://www.example.com'
with requests.Session() as sess:
response = sess.get(url)
print response.text
First run
When I first run this code, I am sure that the GET request is sent out to www.example.com, since no cache has been set up yet. I will then see a file named cache.sqlite in the working directory, which contains the request being cached inside it.
The first process will then exit, erasing all traces of it from RAM.
Second run, maybe 2000 seconds later
What else does requests_cache.install_cache do? Aside from "installing" a cache, does it also tell the present Python session that "Hey, there's a cache present right now, you might want to look into it before sending out new requests".
So, my question is, does the new instance of my script process respect the existing cache.sqlite or does it create an entirely new one from scratch?
If not, how do I make sure that it will look up the existing cache first before sending out new requests, and also consider the age of the cached requests?
Here's what's going on under the hood:
requests_cache.install_cache() globally patches out requests.Session with caching behavior.
install_cache() takes a number of optional arguments to tell it where and how to cache responses, but by default it will create a SQLite database in the current directory, as you noticed.
A cached response will be stored along with its expiration time, in response.expires
The next time you run your script, install_cache() will load the existing database instead of making a new one
The next time you make that request, the expiration time will be checked against the current time. If it's expired, a new request will be sent and the old cached response will be overwritten with the new one.
Here's an example that makes it more obvious what's going on:
from requests_cache import CachedSession
session = CachedSession('~/my_project/requests_cache.db', expire_after=7200)
session.get('http://www.example.com')
session.get('http://www.example.com')
# Show if the response came from the cache, when it was created, and when it expires
print(f'From cache: {response.from_cache}')
print(f'Created: {response.created_at}')
print(f'Expires: {response.expires}')
# You can also get a summary from just printing the cached response object
print(response)
# Show where the cache is stored, and currently cached request URLs
print(session.cache.db_path)
for url in session.cache.urls:
print(url)
And for reference, there is now more thorough user documentation that should answer most questions about how requests-cache works and how to make it behave the way you want: https://requests-cache.readthedocs.io

Using ETag in feedparser

I'm writing a Django view that gets the latest blog posts of a wordpress system.
def __get_latest_blog_posts(rss_url, limit=4):
feed = feedparser.parse(rss_url)
return something
I tried in a terminal to use ETags:
>>> import feedparser
>>> d = feedparser.parse("http://a real url")
>>> d.etag
u'"2ca34419a999eae486b5e9fddaa2b2b9"'
>>> d2 = feedparser.parse("http://a real url", d.etag)
I'd like to avoid requesting the feed for every user of the web app. maybe etag aren't the best option?
Once the first user sees this view, can I store the etag and use it for all the other users? is there a thread for every user and therefore I can't share the value of a variable this way?
Etag allows to mark unique status of a web resource, so that you have a chance to ask for the resource expressing latest status you already have.
But to have some version already at your client, you have to fetch it the first time, so for the first request is use of etag irrelevant.
See HTTP Etag at wikipedia, it explains it all.
Typical scenario is:
fetch your page the first time and read value of Etag header for future use
next time you ask for the same page, you add header If-None-Match with value of Etag from your last fetch. Server will check, if there is something new, if the Etag you provide and Etag at current version of resource are the same, it will not return complete page back, but rather returh HTTP Status code 304 Not Modified. If the page has different status on the server, you get the page with HTTP Status code 200 and with new value of Etag in the response header.
If you want to optimize your app not to generate initial request for the same feed by each user, you shall somehow share the Etag value for given resource globally across your application.
The first request the client will never be able to use any local caches, so at the first request ETag isn't necessary. Remember that ETag needs to be passed into the conditional request headers (If-None-Match, If-Match, etc), the semantic of non conditional requests are clear.
If your feed is a public feed, then an intermediate caching proxy are also allowed to return an ETagged result for non conditional request, although it will always have to contact origin server if the conditional header doesn't match.

Categories