I'm making a request in python using pycurl to a URL which returns a reasonably large json formatted response. When I goto the URL in a browser I see the entire contents, but if I use pycurl and print the received data, I only see about half of what I see when I browse to the URL, and I get an error parsing the data using the json library stating :
ValueError: Unterminated string starting at: line 1 column 16078 (char 16078)
The pycurl request is this :
conn = pycurl.Curl()
conn.setopt(pycurl.URL, myUrl)
conn.setopt(pycurl.WRITEFUNCTION, on_receive)
conn.setopt(pycurl.CONNECTTIMEOUT, 30)
conn.setopt(pycurl.TIMEOUT, 30)
conn.setopt(pycurl.NOSIGNAL, 10)
conn.perform()
with the on_receive function currently just printing the data.
Does anybody know why I am only getting part of the response? I have used massive timeouts just for trying to solve this, I had initially not specified any timeouts but was still getting this error.
in pycurl, you could set this,
import pycurl
pycurl.CONTENT_LENGTH_DOWNLOAD
try using
import Curl, pycurl
con = Curl()
con.set_option(pycurl.CONTENT_LENGTH_DOWNLOAD, 9999999999)
con.get('url' ....
also try following until it works:
pycurl.SIZE_DOWNLOAD
pycurl.REQUEST_SIZE
You could try to access those json data with curl tool.
When you're able to get data, just translate curl options to pycurl options.
curl --help | less
Related
I am trying to index to elasticsearch using a python 2.7 script as follows:
from __future__ import print_function
import urllib, urllib2
#FORMDATA is a json format string that has been taken from a file
ELASTIC_URL = 'http://1.2.3.9:9200/indexname/entry/
req = urllib2.Request(ELASTIC_URL)
req.add_header('contentType', 'application/x-www-form-urlencoded')
response = urllib2.urlopen(req, FORMDATA, timeout=4).read()
print(response)
I keep getting the error HTTP Error 406: Not Acceptable: HTTPError
i have also tried formatting the data with urllib.quote(FORMDATA) and get the same error. The data is not a dictionary it is a string that when converted to json is multi dimensional.
I think this is something to do with the fact the req header needs to specify the contentType to be the correct format but i'm struggling to workout what this is. I managed to do this import on elasticsearch 5.x but now on 3.x it doesn't seem to be working.
Any ideas??
Almost all elasticsearch API calls use Content-Type: application/json in the headers - this should be what you need here.
Also be aware that if you are submitting data, this will need to be in the form of a POST (or a PUT if generating your own id), not a GET request: https://www.elastic.co/guide/en/elasticsearch/guide/current/index-doc.html
I actually want to send mutlipart data on a web-application with python. I'm using this wery useful Requests module (http://requests-fr.readthedocs.org/en/latest/).
I have to send an audio file (stored in local on the system) and 2 parameters (GPS coordonates for information).
I have already mannage to do this with a curl command, but i'm looking for a Requests python implementation.
This is the curl command:
curl -u "user:pass" -F 'audio=#file.wav' -F "latitude=42.44646" -F "longitude=8.46464" 'http://my_server_ip/web/rest/vocal' -v --digest
This is how i'm trying to do that in python Requests:
url = "http://my_server_ip/web/rest/vocal"
files = {'audio' : open('/PATH/record.wav','rb'),'latitude':42.44646,'longitude':8.46464}
r = requests.post(url, auth=HTTPDigestAuth('user','pass'),data=files)
r.json
print r.json
For the moment, the only response i get is a 500 error.
Does someone understand what's wrong ? Feel free to tell me if you see a better solution to do that :)
Greetings!
Solved !
The solution is to separate the files and the datas like that:
files = {'audio' : open('/PATH//record.wav','rb')}
data = {'latitude':latitude,'longitude':longitude}
And build the resquest wit BOTH files AND data parameters:
r=requests.post(url,auth=HTTPDigestAuth('user','pass'),files=files,data=data)
I am checking for url status with this code:
h = httplib2.Http()
hdr = {'User-Agent': 'Mozilla/5.0'}
resp = h.request("http://" + url, headers=hdr)
if int(resp[0]['status']) < 400:
return 'ok'
else:
return 'bad'
and getting
Error -3 while decompressing data: incorrect header check
the url i am checking is:
http://www.sueddeutsche.de/wirtschaft/deutschlands-innovationsangst-wir-neobiedermeier-1.2117528
the Exception Location is:
Exception Location: C:\Python27\lib\site-packages\httplib2-0.9-py2.7.egg\httplib2\__init__.py in _decompressContent, line 403
try:
encoding = response.get('content-encoding', None)
if encoding in ['gzip', 'deflate']:
if encoding == 'gzip':
content = gzip.GzipFile(fileobj=StringIO.StringIO(new_content)).read()
if encoding == 'deflate':
content = zlib.decompress(content) ##<---- error line
response['content-length'] = str(len(content))
# Record the historical presence of the encoding in a way the won't interfere.
response['-content-encoding'] = response['content-encoding']
del response['content-encoding']
except IOError:
content = ""
http status is 200 which is ok for my case, but i am getting this error
I actually need only http status, why is it reading the whole content?
You may have any number of reasons why you choose httplib2, but it's far too easy to get the status code of an HTTP request using the python module requests. Install with the following command:
$ pip install requests
See an extremely simple example below.
In [1]: import requests as rq
In [2]: url = "http://www.sueddeutsche.de/wirtschaft/deutschlands-innovationsangst-wir-neobiedermeier-1.2117528"
In [3]: r = rq.get(url)
In [4]: r
Out[4]: <Response [200]>
Link
Unless you have a considerable constraint that needs httplib2 explicitly, this solves your problem.
This may be a bug (or just uncommon design decision) in httplib2. I don't get this problem with urllib2 or httplib in the 2.x stdlib, or urllib.request or http.client in the 3.x stdlib, or the third-party libraries requests, urllib3, or pycurl.
So, is there a reason you need to use this particular library?
If so:
I actually need only http status, why is it reading the whole content?
Well, most HTTP libraries are going to read and parse the whole content, or at least the headers, before returning control. That way they can respond to simple requests about the headers or chunked encoding or MIME envelope or whatever without any delay.
Also, many of them automate things like 100 continue, 302 redirect, various kinds of auth, etc., and there's no way they could do that if they didn't read ahead. In particular, according to the description for httplib2, handling these things automatically is one of the main reasons you should use it in the first place.
Also, the first TCP read is nearly always going to include the headers anyway, so why not read them?
This means that if the headers are invalid, you may get an exception immediately. They may still provide a way to get the status code (or the raw headers, or other information) anyway.
As a side note, if you only want the HTTP status, you should probably send a HEAD request rather than a GET. Unless you're writing and testing a server, you can almost always rely on the fact that, as the RFC says, the status and headers should be identical to what you'd get with GET. In fact, that would almost certainly solve things in this caseāif there is no body to decompress, the fact that httplib2 has gotten confused into thinking the body is gzipped when it isn't won't matter anyway.
I'm having trouble understanding how to issue an HTTP POST request using curl from inside of python.
I'm tying to post to facebook open graph. Here is the example they give which I'd like to replicate exactly in python.
curl -F 'access_token=...' \
-F 'message=Hello, Arjun. I like this new API.' \
https://graph.facebook.com/arjun/feed
Can anyone help me understand this?
You can use httplib to POST with Python or the higher level urllib2
import urllib
params = {}
params['access_token'] = '*****'
params['message'] = 'Hello, Arjun. I like this new API.'
params = urllib.urlencode(params)
f = urllib.urlopen("https://graph.facebook.com/arjun/feed", params)
print f.read()
There is also a Facebook specific higher level library for Python that does all the POST-ing for you.
https://github.com/pythonforfacebook/facebook-sdk/
https://github.com/facebook/python-sdk
Why do you use curl in the first place?
Python has extensive libraries for Facebook and included libraries for web requests, calling another program and receive output is unecessary.
That said,
First from Python Doc
data may be a string specifying additional data to send to the server,
or None if no such data is needed. Currently HTTP requests are the
only ones that use data; the HTTP request will be a POST instead of a
GET when the data parameter is provided. data should be a buffer in
the standard application/x-www-form-urlencoded format. The
urllib.urlencode() function takes a mapping or sequence of 2-tuples
and returns a string in this format. urllib2 module sends HTTP/1.1
requests with Connection:close header included.
So,
import urllib2, urllib
parameters = {}
parameters['token'] = 'sdfsdb23424'
parameters['message'] = 'Hello world'
target = 'http://www.target.net/work'
parameters = urllib.urlencode(parameters)
handler = urllib2.urlopen(target, parameters)
while True:
if handler.code < 400:
print 'done'
# call your job
break
elif handler.code >= 400:
print 'bad request or error'
# failed
break
I'm familiar with CURL in PHP but am using it for the first time in Python with pycurl.
I keep getting the error:
Exception Type: error
Exception Value: (2, '')
I have no idea what this could mean. Here is my code:
data = {'cmd': '_notify-synch',
'tx': str(request.GET.get('tx')),
'at': paypal_pdt_test
}
post = urllib.urlencode(data)
b = StringIO.StringIO()
ch = pycurl.Curl()
ch.setopt(pycurl.URL, 'https://www.sandbox.paypal.com/cgi-bin/webscr')
ch.setopt(pycurl.POST, 1)
ch.setopt(pycurl.POSTFIELDS, post)
ch.setopt(pycurl.WRITEFUNCTION, b.write)
ch.perform()
ch.close()
The error is referring to the line ch.setopt(pycurl.POSTFIELDS, post)
I do like that:
post_params = [
('ASYNCPOST',True),
('PREVIOUSPAGE','yahoo.com'),
('EVENTID',5),
]
resp_data = urllib.urlencode(post_params)
mycurl.setopt(pycurl.POSTFIELDS, resp_data)
mycurl.setopt(pycurl.POST, 1)
...
mycurl.perform()
I know this is an old post but I've just spent my morning trying to track down this same error. It turns out that there's a bug in pycurl that was fixed in 7.16.2.1 that caused setopt() to break on 64-bit machines.
It would appear that your pycurl installation (or curl library) is damaged somehow. From the curl error codes documentation:
CURLE_FAILED_INIT (2)
Very early initialization code failed. This is likely to be an internal error or problem.
You will possibly need to re-install or recompile curl or pycurl.
However, to do a simple POST request like you're doing, you can actually use python's "urllib" instead of CURL:
import urllib
postdata = urllib.urlencode(data)
resp = urllib.urlopen('https://www.sandbox.paypal.com/cgi-bin/webscr', data=postdata)
# resp is a file-like object, which means you can iterate it,
# or read the whole thing into a string
output = resp.read()
# resp.code returns the HTTP response code
print resp.code # 200
# resp has other useful data, .info() returns a httplib.HTTPMessage
http_message = resp.info()
print http_message['content-length'] # '1536' or the like
print http_message.type # 'text/html' or the like
print http_message.typeheader # 'text/html; charset=UTF-8' or the like
# Make sure to close
resp.close()
to open an https:// URL, you may need to install PyOpenSSL:
http://pypi.python.org/pypi/pyOpenSSL
Some distibutions include this, others provide it as an extra package right through your favorite package manager.
Edit: Have you called pycurl.global_init() yet? I still recommend urllib/urllib2 where possible, as your script will be more easily moved to other systems.