Indexing to elasticsearch 6.1 with python script - python

I am trying to index to elasticsearch using a python 2.7 script as follows:
from __future__ import print_function
import urllib, urllib2
#FORMDATA is a json format string that has been taken from a file
ELASTIC_URL = 'http://1.2.3.9:9200/indexname/entry/
req = urllib2.Request(ELASTIC_URL)
req.add_header('contentType', 'application/x-www-form-urlencoded')
response = urllib2.urlopen(req, FORMDATA, timeout=4).read()
print(response)
I keep getting the error HTTP Error 406: Not Acceptable: HTTPError
i have also tried formatting the data with urllib.quote(FORMDATA) and get the same error. The data is not a dictionary it is a string that when converted to json is multi dimensional.
I think this is something to do with the fact the req header needs to specify the contentType to be the correct format but i'm struggling to workout what this is. I managed to do this import on elasticsearch 5.x but now on 3.x it doesn't seem to be working.
Any ideas??

Almost all elasticsearch API calls use Content-Type: application/json in the headers - this should be what you need here.
Also be aware that if you are submitting data, this will need to be in the form of a POST (or a PUT if generating your own id), not a GET request: https://www.elastic.co/guide/en/elasticsearch/guide/current/index-doc.html

Related

cannot access URL due to SSL module not available Python

First time trying to use Python 3.6 requests library's get() function with data from quandl.com and load and dump json.
import json
import requests
request = requests.get("https://www.quandl.com/api/v3/datasets/CHRIS/MX_CGZ2.json?api_key=api_keyxxxxx", verify=False)
request_text=request.text()
data = json.loads(request_text)
data_serialized = json.dumps(data)
print(data_serialized)
I have an account at quandl.com to access the data. The error when python program is run in cmd line says "cannot connect to HTTPS URL because SSL mode not available."
import requests
import urllib3
urllib3.disable_warnings()
r = requests.get(
"https://www.quandl.com/api/v3/datasets/CHRIS/MX_CGZ2.json?api_key=api_keyxxxxx").json()
print(r)
Output will be the following since i don't have API Key
{'quandl_error': {'code': 'QEAx01', 'message': 'We could not recognize your API key. Please check your API key and try again.'}}
You don't need to import json module as requests already have it.
Although I have verified 'quandl_api_keys received the following error when trying to retrieve data with 'print' data 'json.dumps' function: "quandl_error": {"code": "QEAx01" ... discovered that the incorrect fonts on the quotation marks around key code in the .env file resulted in this error. Check language settings and fonts before making requests after the error mssg.

Python 3.4 HTTP Error 505 retrieving json from url

I am trying to connect to a page that takes in some values and return some data in JSON format in Python 3.4 using urllib. I want to save the values returned from the json into a csv file.
This is what I tried...
import json
import urllib.request
url = 'my_link/select?wt=json&indent=true&f=value'
response = urllib.request.Request(url)
response = urllib.request.urlopen(response)
data = response.read()
I am getting an error below:
urllib.error.HTTPError: HTTP Error 505: HTTP Version Not Supported
EDIT: Found a solution to my problem. I answered it below.
You have found a server that apparently doesn't want to talk HTTP/1.1. You could try lying to it by claiming you are using a HTTP/1.0 client instead, by patching the http.client.HTTPConnection class:
import http.client
http.client.HTTPConnection._http_vsn = 10
http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'
and re-trying your request.
I used FancyURLopener and it worked. Found this useful: docs.python.org: urllib.request
url_request = urllib.request.FancyURLopener({})
with url_request.open(url) as url_opener:
json_data = url_opener.read().decode('utf-8')
with open(file_output, 'w', encoding ='utf-8') as output:
output.write(json_data)
Hope this helps those having the same problems as mine.

python 2.7 requests.get() returning cookie raising TypeError

I'm doing a simple HTTP requests authentication vs our internal server, getting the cookie back then hitting a Cassandra RESTful server to get data. The requests.get() chokes when returning the cookie.
I have a curl script that extracts the data successfully, I'd rather work with the response JSON data in pure python.
Any clues to what I've doing wrong below? I dump the cookie, it looks fine, very similar to my curl cookie.
Craig
import requests
import rtim
# this makes the auth and gets the cookie returned, save the cookie
myAuth = requests.get(rtim.rcas_auth_url, auth=(rtim.username, rtim.password),verify=False)
print myAuth.status_code
authCookie=myAuth.headers['set-cookie']
IXhost='xInternalHostName.com:9990'
mylink='http:/%s/v1/BONDISSUE?format=JSONARRAY&issue.isin=%s' % (IXhost, 'US3133XK4V44')
# chokes on next line .... doesn't like the Cookie format
r = requests.get(mylink, cookies=authCookie)
(Pdb) next
TypeError: 'string indices must be integers, not str'
I think the problem is on the last line:
r = requests.get(mylink, cookies=authCookie)
requests assumes that the cookies parameter is a dictionary, but you are passing a string object authCookie to it.
The exception raises when requests tries to treat the string authCookie as a dictionary.

Why is my HTTP POST request data string (probably) being incorrectly encoded?

I'm having some trouble with twisted.web.client.Agent...
I think the string data in my post request isn't being formatted properly. I'm trying to do something analogous to this synchronous code:
from urllib import urlencode
import urllib2
page = 'http://example.com/'
id_string = 'this:is,my:id:string'
req = urllib2.Request(page, data=urlencode({'id': id_string})) # urlencode call returns 'id=this%3Ais%2Cmy%3Aid%3Astring'
resp = urllib2.urlopen(req)
Here's how I'm building my Agent request as of right now:
from urllib import urlencode
from StringIO import StringIO
page = 'http://example.com/'
id_string = 'my:id_string'
head = {'User-Agent': ['user agent goes here']}
data = urlencode({'id': id_string})
request = agent.request('POST', page, Headers(head), FileBodyProducer(StringIO(data)))
request.addCallback(foo)
Because of the HTTP response I'm getting (null JSON string), I'm beginning to suspect that the id is not being properly encoded in the POST request, but I'm not sure what I can be doing about it. Is using urlencode with the Agent.request call valid? Is there another way I should be encoding these things?
EDIT: Some kind IRC guys have suggested that the problem may stem from the fact that I didn't send the header information that indicates the data is encoded in a url string. I know remarkably little about this kind of stuff... Can anybody point me in the right direction?
As requested, here's my comment in the form of an answer:
HTTP requests with bodies should have the Content-Type header set (to tell the server how to interpret the bytes in the body); in this case, it seems the server is expecting URL-encoded data, like a web browser would send when a form is filled out.
urllib2.Request apparently defaults the content type for you, but the twisted library seems to need it to be set manually. In this case, you want a content type of application/x-www-form-urlencoded.

pycurl only geting part of the response

I'm making a request in python using pycurl to a URL which returns a reasonably large json formatted response. When I goto the URL in a browser I see the entire contents, but if I use pycurl and print the received data, I only see about half of what I see when I browse to the URL, and I get an error parsing the data using the json library stating :
ValueError: Unterminated string starting at: line 1 column 16078 (char 16078)
The pycurl request is this :
conn = pycurl.Curl()
conn.setopt(pycurl.URL, myUrl)
conn.setopt(pycurl.WRITEFUNCTION, on_receive)
conn.setopt(pycurl.CONNECTTIMEOUT, 30)
conn.setopt(pycurl.TIMEOUT, 30)
conn.setopt(pycurl.NOSIGNAL, 10)
conn.perform()
with the on_receive function currently just printing the data.
Does anybody know why I am only getting part of the response? I have used massive timeouts just for trying to solve this, I had initially not specified any timeouts but was still getting this error.
in pycurl, you could set this,
import pycurl
pycurl.CONTENT_LENGTH_DOWNLOAD
try using
import Curl, pycurl
con = Curl()
con.set_option(pycurl.CONTENT_LENGTH_DOWNLOAD, 9999999999)
con.get('url' ....
also try following until it works:
pycurl.SIZE_DOWNLOAD
pycurl.REQUEST_SIZE
You could try to access those json data with curl tool.
When you're able to get data, just translate curl options to pycurl options.
curl --help | less

Categories