elasticsearch python api overwrites the existing field - python

I'm using ElasticSearch Python API, I found if the _id is same the old data would be overwritten. e.g. I had name="Tom", right now I index the same _id with field age=30. I found the name="Tom" was removed after the reindex. The right result I hope age=30 only appended to the existing index. Should I tune any parameters please?
I'm using the following code:
from elasticsearch import Elasticsearch
es = Elasticsearch("http://10.0.0.1:9200")
res = es.index(index="panavstream", doc_type='panav', id="123", body=doc)
Thanks in Advance

update function with script body can append a field in data. elasticsearch-py update
the sample:
doc = {
'script' : 'ctx._source.age = 30'
}
es.update(index="panavstream", doc_type='panav', id="123", body=doc)

Related

Python requests.post does not force Elasticsearch to create missing index

I want to push data to my Elasticsearch server using :
requests.post('http://localhost:9200/_bulk', data=data_1 + data_2)
and it complains that the index does not exist. I try creating the index manually:
curl -X PUT http://localhost:9200/_bulk
and it complains that I am not feeding a body to it:
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}
Seems like a bit of a chicken and egg problem here. How can I create that _bulk index, and then post my data?
EDIT:
My data is very large to even understand the schema. Here is a small snippet:
'{"create":{"_index":"products-guys","_type":"t","_id":"0"}}\n{"url":"http://www.plaisio.gr/thleoraseis/tv/tileoraseis/LG-TV-43-43LH630V.htm","title":"TV LG 43\\" 43LH630V LED Full HD Smart","description":"\\u039a\\u03b1\\u03b9 \\u03cc\\u03bc\\u03bf\\u03c1\\u03c6\\u03b7 \\u03ba\\u03b1\\u03b9 \\u03ad\\u03be\\u03c5\\u03c0\\u03bd\\u03b7, \\u03bc\\u03b5 \\u03b9\\u03c3\\u03c7\\u03c5\\u03c1\\u03cc \\u03b5\\u03c0\\u03b5\\u03be\\u03b5\\u03c1\\u03b3\\u03b1\\u03c3\\u03c4\\u03ae \\u03b5\\u03b9\\u03ba\\u03cc\\u03bd\\u03b1\\u03c2 \\u03ba\\u03b1\\u03b9 \\u03bb\\u03b5\\u03b9\\u03c4\\u03bf\\u03c5\\u03c1\\u03b3\\u03b9\\u03ba\\u03cc webOS 3.0 \\u03b5\\u03af\\u03bd\\u03b1\\u03b9 \\u03b7 \\u03c4\\u03b7\\u03bb\\u03b5\\u03cc\\u03c1\\u03b1\\u03c3\\u03b7 \\u03c0\\u03bf\\u03c5 \\u03c0\\u03ac\\u03b5\\u03b9 \\u03c3\\u03c4\\u03bf \\u03c3\\u03b1\\u03bb\\u03cc\\u03bd\\u03b9 \\u03c3\\u03bf\\u03c5","priceCurrency":"EUR","price":369.0}\n{"create":{"_index":"products-guys","_type":"t","_id":"1"}}\n{"url":"http://www.plaisio.gr/thleoraseis/tv/tileoraseis/Samsung-TV-43-UE43M5502.htm","title":"TV Samsung 43\\" UE43M5502 LED ...
This is essentially someone else's code, that I need to make work. It seems that the "data" object I am passing to the PUT method is a string.
When I use requests.post('http://localhost:9200/_bulk', data=data)
I get <Response [406]>,
If you want to do a bulk request using requests
response = requests.post('http://localhost:9200/_bulk', data= data=data_1 + data_2, headers={'content-type':'application/json', 'charset':'UTF-8'})
Old Answer
I recommend using the bulk helper from the python library
from elasticsearch import Elasticsearch, helpers
client = Elasticsearch("localhost:9200")
def gendata():
mywords = ['foo', 'bar', 'baz']
for word in mywords:
yield {
"_index": "mywords",
"word": word,
}
resp = helpers.bulk(
client,
gendata(),
index = "some_index",
)
If you didn't touch the elasticsearch configuration a new index will be created on document indexing.
About the ways you tried:
Probably the query is malformed. To do bulk ingest the body shape is different than just sending the docs as a json array.
You are doing PUT instead of post and you have to specify the documents you want to ingest.
There is no need to create the empty index first. Just in case you want to do you can just do :
curl -X PUT http://localhost:9200/index_name

Elasticsearch - Reindex single field with different analyzer using Python

I use dynamic mapping in elasticsearch to load my json file into elasticsearch, like this:
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
def extract():
f = open('tmdb.json')
if f:
return json.loads(f.read())
movieDict = extract()
def index(movieDict={}):
for id, body in movieDict.items():
es.index(index='tmdb', id=id, doc_type='movie', body=body)
index(movieDict)
How can I update mapping for single field? I have field title to which I want to add different analyzer.
title_settings = {"properties" : { "title": {"type" : "text", "analyzer": "english"}}}
es.indices.put_mapping(index='tmdb', body=title_settings)
This fails.
I know that I cannot update already existing index, but what is proper way to reindex mapping generated from json file? My file has a lot of fields, creating mapping/settings manually would be very troublesome.
I am able to specify analyzer for an query, like this:
query = {"query": {
"multi_match": {
"query": userSearch, "analyzer":"english", "fields": ['title^10', 'overview']}}}
How do I specify it for index or field?
I am also able to put analyzer to settings after closing and opening index
analysis = {'settings': {'analysis': {'analyzer': 'english'}}}
es.indices.close(index='tmdb')
es.indices.put_settings(index='tmdb', body=analysis)
es.indices.open(index='tmdb')
Copying exact settings for english analyzers doesn't do 'activate' it for my data.
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis-lang-analyzer.html#english-analyzer
By 'activate' I mean, search is not returned in a form processed by english analyzer ie. there are still stopwords.
Solved it with massive amount of googling....
You cannot change analyzer on already indexed data. This includes opening/closing of index. You can specify new index, create new mapping and load your data (quickest way)
Specifying analyzer for whole index isn't good solution, as 'english' analyzer is specific to 'text' fields. It's better to specify analyzer by field.
If analyzers are specified by field you also need to specify type.
You need to remember that analyzers are used at can be used at/or index and search time. Reference Specifying analyzers
Code:
def create_index(movieDict={}, mapping={}):
es.indices.create(index='test_index', body=mapping)
start = time.time()
for id, body in movieDict.items():
es.index(index='test_index', id=id, doc_type='movie', body=body)
print("--- %s seconds ---" % (time.time() - start))
Now, I've got mapping from dynamic mapping of my json file. I just saved it back to json file for ease of processing (editing). That's because I have over 40 fields to map, doing it by hand would be just tiresome.
mapping = es.indices.get_mapping(index='tmdb')
This is example of how title key should be specified to use english analyzer
'title': {'type': 'text', 'analyzer': 'english','fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}

How to get the options for a custom field in Jira?

In the Jira issue I'm looking at, there are fields with a drop down list for valid values. I would like to access that drop down list using python. When looking at the returned fields for the issue, the object has a value customfield_14651 which is an object with value and id. Jira documentation shows there is a custom_field_option() method which should return the fields? I call the method like below:
self.jira = JIRA('https://jira.companyname.com',basic_auth (login['username'], login['password']) )
print self.jira.custom_field_option('14651')
and receive back the following error:
response text = {"errorMessages":["A custom field option with id '14651' does not exist"],"errors":{}}
Jira has the .fields() function which returns a list of all fields that are visible to the account you are using.
from jira import JIRA
jira = JIRA(basic_auth=('username', 'password'), options = {'server': 'url'})
# Fetch all fields
allfields = jira.fields()
# Make a map from field name -> field id
name_map = {field['name']:field['id'] for field in allfields}
name_map is now a dict in the format of {"field name":"customfield_xxxx", ... }
It looks like the way to do this in the API is:
from jira import JIRA
jira = JIRA(basic_auth=('username', 'password'), options = {'server': 'url'})
# get an example issue that has the field you're interested in
issue = jira("PRJ-1")
meta = jira.editmeta(issue)
# inspect the meta to get the field you want to look at
allowed_values = [v['value'] for v in meta['fields']['customfield_99999']['allowedValues']]

Getting issue comments JIRA python

I am trying to get all comments of issues created in JIRA of a certain search query. My query is fairly simple:
import jira
from jira.client import JIRA
def fetch_tickets_open_yesterday(jira_object):
# JIRA query to fetch the issues
open_issues = jira_object.search_issues('project = Support AND issuetype = Incident AND \
(status = "Open" OR status = "Resolved" OR status = "Waiting For Customer")', maxResults = 100,expand='changelog')
# returns all open issues
return open_issues
However, if I try to access the comments of tickets created using the following notation, I get a key error.
for issue in issues:
print issue.raw['fields']['comment']
If I try to get comments of a single issue like below, I can access the comments:
single_issue = jira_object.issue('SUP-136834')
single_issue.raw['fields']['comment']
How do I access these comments through search_issues() function?
The comment field is not returned by the search_issues method you have to manually state the fields that must be included by setting the corresponding parameter.
just include the 'fields' and 'json_result' parameter in the search_issue method and set it like this
open_issues = jira_object.search_issues('project = Support AND issuetype = Incident AND \
(status = "Open" OR status = "Resolved" OR status = "Waiting For Customer")', maxResults = 100,expand='changelog',fields = 'comment',json_result ='True')
Now you can access the comments without getting keytype error
comm=([issue.raw['fields']['comment']['comments'] for issue in open_issues])
I struggled with the same issue. Assuming "issue" is an object of type Issue, and "jira" is an object of type JIRA, according to http://jira.readthedocs.org/en/latest/#issues
issue.fields.comment.comments
should work, but the fields object does not have any key "comment".
The other option mentioned there works for me:
jira.comments(issue)
So, for it to work you use the issues from your search result and call jira.comments. E.g.
issues = jira.search_issues(query)
comments = jira.comments(issues[index])
(My version of the library is 1.0.3, python 2.7.10)
from jira import JIRA
Jira = JIRA('https://jira.atlassian.com')
issue_num = "ISSUE-123"
issue = Jira.issue(issue_num)
comments = issue.fields.comment.comments
for comment in comments:
print("Comment text : ",comment.body)
print("Comment author : ",comment.author.displayName)
print("Comment time : ",comment.created)

Python, CouchDb: how to Update already existing document by ID

I am trying to update an already existing document by ID. My intention is to find the doc by its id, then change its "firstName" with new value coming in "json", then update it into the CouchDB database.
Here is my code:
def updateDoc(self, id, json):
doc = self.db.get(id)
doc["firstName"] = json["firstName"]
doc_id, doc_rev = self.db.save(doc)
print doc_id, doc_rev
print "Saved"
//"json" is retrieved from PUT request (request.json)
at self.db.save(doc) I'm getting exception as "too many values to unpack".
I am using Bottle framework, Python 2.7 and Couch Query.
How do I update the document by id? what is the right way to do it?
In couchdb-python the db.save(doc) method returns tuple of _id and _rev. You're using couch-query - a bit different project that also has a db.save(doc) method, but it returns a different result. So your code should look like this:
def updateDoc(self, id, json):
doc = self.db.get(id)
doc["firstName"] = json["firstName"]
doc = self.db.save(doc)
print doc['_id'], doc['_rev']
print "Saved"

Categories