I am trying to scrape some data (reproduce a POST operation I did in a browser) by using Python Requests library. Expecting it will return the content I saw while using browser by copying the request header and post form.
However, I am not quite sure what is the correct way to send cookies using Python Requests.
Here is a screen shot how it looks like in Chrome.
It looks like I can either use cookie as a key in the request header or use the cookie argument in the post command.
(1). If I want to use cookie in the request header, should I treat the whole cookie content as a long string and set the key to be cookie and set the value to be that string?
(2). If I want to use the cookie argument in the request.post command, should I manually translate that long string into a dictionary first, and then pass to cookie argument?. Something like this?
mycookie = {'firsttimevisitor':'N'; 'cmTPSet':'Y'; 'viewType':'List'... }
# Then
r = requests.post(myurl, data=myformdata, cookies=mycookie, headers=myheaders)
Thanks!
Just follow the documentation:
>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')
>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'
So use a dict for cookie, and note it's cookies not cookie.
Yes. But make sure you call it "Cookie" (With capital C)
I always did it with a dict. Requests expects you to give a dict.
A string will give the following
cookiejar.set_cookie(create_cookie(name, cookie_dict[name]))
TypeError: string indices must be integers, not str
Related
I've just discovered something strange. When downloading data from facebook with GET using the requests 2.18.4 library, I get error when I just use
requests.get('https://.../{}/likes?acces_token={}'.format(userID,token))
into which I parse the user ID and access - the API does not read the access token correctly.
But, it works fine as
requests.get('https://../{}'.format(userID), params={"access_token":token})
Or it works when I copy paste the values in the appropriate fields by hand in the python console.
So my hypothesis is that it has something to with how the token string got parsed using the params vs the string. But what I don't understand at all, why would that be the case? Or is ? character somehow strange in this case?
Double check if both the URLs are the same (in your post they differ by the /likes substring).
Then you can check how the library requests concatenated parameters from the params argument:
url = 'https://facebook.com/.../{}'.format(userID)
r = requests.Request('GET', url, params={"access_token":token})
pr = r.prepare()
print pr.url
I am trying to have a request.get statement with two urls in it. What I am aiming to do is have requests (Python Module) make two requests based on list or two strings I provide. How can I pass multiple strings from a list into a request.get statement, and have requests go to each url (string) and have do something?
Thanks
Typically if we talking python requests library it only runs one url get request at a time. If what you are trying to do is perform multiple requests with a list of known urls then it's quite easy.
import requests
my_links = ['www.google.com', 'www.yahoo.com']
my_responses = []
for link in my_links:
payload = requests.get(link).json()
print('got response from {}'.format(link))
my_response.append(payload)
print(payload)
my_responses now has all the content from the pages.
You don't. The requests.get() method (or any other method, really) takes single URL and makes a single HTTP request because that is what most humans want it to do.
If you need to make two requests, you must call that method twice.
requests.get(url)
requests.get(another_url)
Of course, these calls are synchronous, the second will only begin once the first response is received.
I am trying to automate a web page request by using mechanize in python.
When I add custom headers like
X-Session= 'abc'
and
X-Auth='123'
by using addheader function.
object=mechanize.Browser()
object.addheaders=[('X-Session','abc'),('X-Auth','123')]
It changes those headers to X-session and X-auth.
I believe due to that the server is not able to authenticate me.
Can anybody help how to maintain the case?
Thanks.
Mechanize expect two items tuple as header, first item is header name, second is value, so you must do:
object.addheaders=[('X-Session','abc'), ('X-Auth','123')]
(Two tuples of two elements instead of one tuple with 4 elements).
To check headers that Mechanize will send with query, you can do:
print(request.header_items())
This should print something like:
[('X-Session','abc'), ('X-Auth','123')]
Doc: http://wwwsearch.sourceforge.net/mechanize/doc.html#adding-headers
I'm trying to understand what exactly I'm getting back when I make a POST request using the Requests module — is it always JSON? Seems like every response I get appears to be JSON, but I'm not sure.
Where r is my response object, when I do:
print r.apparent_encoding
It always seems to return ascii
And when I try type():
>>>print type(r)
<class 'requests.models.Response'
I pasted the output from print r.text into a JSON validator, and it reported no errors. So should I assume Requests is providing my with JSON objects here?
A response can be anything. If you've posted to a REST endpoint, it will usually respond with JSON. If so, requests will detect that and allow you to decode it via the .json() method.
But it's perfectly possible for you to post to a normal web URL, in effect pretending to be a browser, and unless the server is doing something really clever it will just respond with the standard HTML it would serve to the browser. In that case, doing response.json() will raise a ValueError.
No, the response text for a POST request is totally up to the web service. A good REST API will always respond with JSON, but you will not always get that.
Example
A common pattern in PHP is
<?php
$successful_whatever = false;
if (isset($_POST['whatever'])) {
# put $_POST['whatever'] in a database
$successful_whatever = true;
}
echo $twig->render('gallery.twig',
array('successful_whatever' => $successful_whatever));
?>
As you can see the response text will be a rendered template (HTML). I'm not saying it is good, just that it is common.
I would like to create a python script that makes a series of web requests to endpoints that will spit out JSON. After receiving the response, I want to validate that it is well-formed JSON, and not some error page. Also, I will need to insert an API key into the request header to get a proper response. what is the best way to go about doing this in python?
Thanks!
To validate the JSON you can use http://python-jsonschema.readthedocs.org/en/latest/
To insert the API key in the header you can use Requests http://docs.python-requests.org/en/latest/user/quickstart/
Validate the json use json.dump(), if your json object incorect it's return the error, handle the error using try catch and answer to your second question , i think you talking about X-AUTH-TOKEN , if your request return json object use this header.
headers = {
"Content-Type": "application/json",
"X-AUTH-TOKEN": your API Token
}
otherwise use
"Content-type":"application/x-www-form-urlencoded"