I am new to Web requests . I saw a piece of code that does HTML PDF conversion like this :
headers = {'content-type': 'text/html', 'accept': 'application/pdf'}
urllib2.Request(url, data=html, headers=headers) # html is a string and it works fine
The url does the pdf conversion and it needs html as input .
Why is 'data' keyword argument so important ? Why can't be clubbed as just another param ?
I would have thought that urllib2.Request(url, params = {'data': html}) where data is just one of the key value pairs . And server does it processing accordingly .
Why do we need 'data' as something seperate from other parameters ?
Is it because we specify 'content-type' in the header and it bound to the data keyword as a convention ?
I am writing an API that makes everything is request like a keyword arguement , for a simple library purpose . So I would like to know when is data required and when is not as well . I do understand params but 'data' is that mandatory or only for post requests where you have a specific content-type to sent to server? What if I have multiple content types now ?
When the data attribute is provided, the request is sent as POST. It is not mandatory, it can be None, if it is none (or not provided) it is sent as GET. This is all described here: http://docs.python.org/2/library/urllib2.html#urllib2.Request
Does request also have the same convention ? I ask so because in
request we have request.get . So request.get(url, data=something)
would be converted to a POST ? And how is this data seen at the server
side any idea ?
request.get(url, data="test") would be sent as a GET request with "test" as the body of the request. This is the raw HTTP request:
GET /headers HTTP/1.1\r\nHost: httpbin.org\r\nContent-Length: 4\r\nAccept-Encoding: gzip, deflate, compress\r\nAccept: */*\r\nUser-Agent: python-requests/2.2.1 CPython/2.7.5 Windows/7\r\n\r\ntest
Formatted:
GET /headers HTTP/1.1
Host: httpbin.org
Content-Length: 4
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/2.2.1 CPython/2.7.5 Windows/7
test
The server will in most cases just ignore it.
Related
I am trying to fill a form like that and submit it automaticly. To do that, I sniffed the packets while logging in.
POST /?pg=ogrgiris HTTP/1.1
Host: xxx.xxx.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
Origin: http://xxx.xxx.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15
Referer: http://xxx.xxx.com/?pg=ogrgiris
Upgrade-Insecure-Requests: 1
DNT: 1
Content-Length: 60
Connection: close
seviye=700&ilkodu=34&kurumkodu=317381&ogrencino=40&isim=ahm
I repeated that packet by burp suite and saw works porperly. the response was the html of the member page.
Now I tried to do that on python. The code is below:
import requests
url = 'http://xxx.xxx.com/?pg=ogrgiris'
headers = {'Host':'xxx.xxx.com',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate',
'Content-Type':'application/x-www-form-urlencoded',
'Referer':'http://xxx.xxx.com/?pg=ogrgiris',
'Content-Lenght':'60','Connection':'close'}
credentials = {'seviye': '700','ilkodu': '34','kurumkodu': '317381','ogrecino': '40','isim': 'ahm'}
r = requests.post(url,headers=headers, data=credentials)
print(r.content)
the problem is, that code prints the html of the login page even I send all of the credentials enough to log in. How can I get the member page? thanks.
If the POST request displays a page with the content you want, then the problem is only that you are sending data as JSON, not in "form" data format (application/x-www-form-urlencoded).
If a session is created at the request base and you have to make another request for the requested data, then you have to deal with cookies.
Problem with data format:
r = requests.post(url, headers=headers, data=credentials)
Kwarg json = creates a request body as follows:
{"ogrecino": "40", "ilkodu": "34", "isim": "ahm", "kurumkodu": "317381", "seviye": "700"}
While data= creates a request body like this:
seviye=700&ilkodu=34&kurumkodu=317381&ogrencino=40&isim=ahm
You can try https://httpbin.org:
from requests import post
msg = {"a": 1, "b": True}
print(post("https://httpbin.org/post", data=msg).json()) # Data as Form data, look at key `form`, it's object in JSON because it's Form data format
print(post("https://httpbin.org/post", json=msg).json()) # Data as json, look at key `data`, it's string
If your goal is to replicate the sample request, you are missing a lot of the headers; this in particular is very important Content-Type: application/x-www-form-urlencoded because it will tell your HTTP client how to format/encode the payload.
Check the documentation for requests so see how these form posts can work.
I want to get data from this site.
When I get data from the main url. I get an HTML file that contains structure but not the values.
import requests
from bs4 import BeautifulSoup
url ='http://option.ime.co.ir/'
r = requests.get(url)
soup = BeautifulSoup(r,'lxml')
print(soup.prettify())
I find out that the site get values from
url1 = 'http://option.ime.co.ir/GetTime'
url2 = 'http://option.ime.co.ir/GetMarketData'
When I watch responses from those url in the browser. I see a JSON format response and time in a specific format.
but when I use requests to get the data it gives me same HTML that I get from url.
Do you know whats the reason? How should I get the responses that I see in the browser?
I check headers for all urls and I didn't find something special that I should send with my request.
You have to provide the proper HTTP headers in the request. In my case, I was able to make it work using the following headers. Note that in my testing the HTTP response was a 200 OK rather than a redirect to the root website (as when no HTTP headers were provided in the request).
Raw HTTP Request:
GET http://option.ime.co.ir/GetTime HTTP/1.1
Host: option.ime.co.ir
Referer: "http://option.ime.co.ir/"
Accept: "application/json, text/plain, */*"
User-Agent: "Mozilla/5.0 (Windows NT 6.1; rv:45.0) Gecko/20100101 Firefox/45.0"
This should give you the proper JSON response you need.
You first connection using the browser is getting a 302 Redirection response (to the same url).
Then it is running some JS so the so the second request doesn't redirect anymore and gets the expected JSON.
It is a usual technique so other people don't use their API without permission.
Set the "preserve log" checkbox in dev. tools so you can see it by yourself.
In my Werkzeug application I am intercepting all error responses and trying to respond with a JSON response if the client expects JSON or return the usual HTML page with 404 or 500:
def handle_error_response(self, environ, start_response, exc):
if ('application/json' in environ.get('CONTENT_TYPE', '')
and exc.get_response().content_type != 'application/json'):
start_response('%s %s' % (exc.code, exc.name),
(('Content-Type', 'application/json'), ))
return (json.dumps({"success": False, "error": exc.description}, ensure_ascii=False), )
# go the regular path
...
In this solution I am relying on Content-Type header containing string'application/json'.
However this doesn't look like a correct solution, because Wikipedia says:
Content-Type The MIME type of the body of the request (used with POST and PUT requests)
Is it a good strategy to check if 'text/html' is inside header Accept and then return HTML response otherwise return JSON response?
Any other more robust solutions?
When Chrome requests an HTML page header
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
is sent, when Ember makes an API request
Accept: application/json, text/javascript, */*; q=0.01
is sent.
Maybe X-Requested-With: XMLHttpRequest should be taken into account?
You should probably add AcceptMixin to your request object.
Once you do this, you can use the accept_mimetypes.accept_json, accept_mimetypes.accept_html and accept_mimetypes.accept_xhtml attributes on your request object. The response's default content type really depends only on what your application is; just try and imagine which would result in less confusion.
This works for us:
if ('text/html' not in environ.get('HTTP_ACCEPT', '')
and 'application/json' not in response.content_type):
# the user agent didn't explicitely request html, so we return json
... # make the JSON response
I.e. if the client expects html -- do not return json. Otherwise return
json response, if the response isn't already json.
Within our server we've got this piece a code calling a function inside my APP like this:
data = urllib.urlencode( dict(api_key="a-key-goes-here") )
headers = {
"User-Agent" : "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36",
"Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,text/png,*/*;q=0.5",
"Accept-Language" : "en-US,en;q=0.8",
"Accept-Charset" : "ISO-8859-1,utf-8",
"Content-type": "application/x-www-form-urlencoded; charset=UTF-8"
}
request = urllib2.Request(url, data, headers)
response = urllib2.urlopen(request)
code = response.code
message = response.read()
response.close()
I know that this is not using url_for( neither other ways to call a url trough your APP but this has a reason. Our server is just testing that the call goes correctly but the url is expected to be outside our APP and so is the api key.
So, our server handling url looks like this:
#app.route('/loan_performer', methods=["GET", "POST"])
def loan_performer():
if 'api_key' in request.form and request.form['api_key'] == API_KEY:
ret = dict()
# rate1 return a random number between 3.000 and 4.000 and point1 will be 0
ret['rate_one'] = random.randint(3000, 4000)
ret['point_one'] = 0
# rate2 do it between 3.500 and 4.500, point2 being 0.5
ret['rate_two'] = random.randint(3500, 4500)
ret['point_two'] = 0.5
# rate3 between 4.000 and 5.000 with 1.0
ret['rate_three'] = random.randint(4000, 5000)
ret['point_three'] = 1.0
return json.dumps(ret), 200
else:
return u"Your API Key is invalid.", 403
Our error is as the title says:
We are constantly receiving the error "Bad request (GET and HEAD requests may not contain a request body)" Which is a 404 error handled by Passenger WSGI for Apache. But in other words, for a reason request.form is empty and it shouldn't.
Is there anything I'm doing wrong or any other way to POST data from Flask to outside?
If there is needed more info I'm willing to update, just call it.
Edit 1
I switched to use requests like this:
dat = dict( api_key="a-key-goes-here" )
request = requests.post(url, data=dat)
message = request.text
code = request.status_code
And it refuses to work. This is the logger I made in the handling url function:
(08/02/2014 07:07:20 AM) INFO This is message ImmutableMultiDict([]) ImmutableMultiDict([]) and this code 403
request.args - request.form - request.data , all of them are empty, either using urllib or requests.
Update on Edit 1
After removing "GET" from methods suggested by Martin konecny I got this response:
(08/02/2014 08:35:06 AM) INFO This is message <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>405 Method Not Allowed</title>
<h1>Method Not Allowed</h1>
<p>The method is not allowed for the requested URL.</p>
and this code 405
But in other words, for a reason request.form is empty and it shouldn't.
I don't think you can infer this conclusion. What appears to be happening is you have a GET request with a POST header.
From the Python docs:
data may be a string specifying additional data to send to the server, or None if no such data is needed. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided. data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format.
You need to remove the header
"Content-type": "application/x-www-form-urlencoded; charset=UTF-8"
when you are sending requests with an empty data structure, or just remove it alltogether (it will be added automatically by urllib when needed)
I'm trying to write Python code for Twitter OAuth authentication. I'm getting a "401 Unauthorized" error code when I attempt to request a token.
In the process of trying to diagnose my problem, I'm going thru each step of the authentication process and trying to undercover any errors I'm making. With regard to generating the "Signature Base String", I found an online tool that tries to help validate signature base strings: http://quonos.nl/oauthTester/
When I use this tool, it complains:
Bad URL encoding!
Both key and value in the POST body need to be URL encoded.
Here is an example Signature Base String that my Python code generates:
POST&https%3A%2F%2Fapi.twitter.com%2F1.1%2Foauth%2Frequest_token&oauth_callback%3Doob%26oauth_consumer_key%3DeXL46FKblmfiXHvmC3wcew%26oauth_nonce%3DTAHTO%2FmlyeJ1x9FrgFixosZPYVhvWLXmq%2BdKKTL1rTY%3D%26oauth_signature_method%3DHMAC-SHA1%26oauth_timestamp%3D1391813822%26oauth_version%3D1.0
When I paste this string into the validator, it says:
Bad URL encoding!
Both key and value in the POST body need to be URL encoded.
In this case: "TAHTO/mlyeJ1x9FrgFixosZPYVhvWLXmq+dKKTL1rTY" is bad
I'm very confused because all key/value pairs in the URL are, in fact, URL encoded (I'm assuming "URL encoded" means "percent encoded" here.)
Is there anything wrong with my base string here?
Edit:
The actual HTTP request headers I'm sending to Twitter to request a token are:
POST /1.1/oauth/request_token HTTP/1.1
Accept-Encoding: identity
Content-Length: 0
Connection: close
Accept: */*
User-Agent: Python-urllib/3.2
Host: api.twitter.com
Content-Type: application/x-www-form-urlencoded format
Authorization: OAuth oauth_callback="oob", oauth_consumer_key="eXL46FKblmfiXHvmC3wcew", oauth_nonce="nBcVYSqv8FEi0d7MEs8%2BqtqvdYA9JcbnW%2BVqoP%2FGlrI%3D", oauth_signature="WT9c3U5Puam7dEnMt3DWDsyVAHw%3D", oauth_signature_method="HMAC-SHA1", oauth_timestamp="1391815422", oauth_version="1.0"