How to swap two elsticsearch indexes - python

I want to implement cash for highly loaded elasticsearch-based search system. I want to store cash in special elastic index. The problem is in cache warm-up: once an hour my system needs to update cached results with the fresh ones.
So, I'm creating a new empty index and fill it with updated results, then I need to swap old index and new index, so users can use fresh cached results.
The question is: how to swap two elasticsearch indexes efficiently?

For this kind of scenario you use something that is called "index alias swapping".
You have an alias that points to your current index, you fill a new index with the fresh records, and then you point this alias to the new index.
Something like this:
Current index name is items-2022-11-26-001
Create alias items pointing to items-2022-11-26-001
POST _aliases
{
"actions": [
{
"add": {
"index": "items-2022-11-26-001",
"alias": "items"
}
}
]
}
Create new index with fresh data items-2022-11-26-002
When it finishes, now point the items alias to items-2022-11-26-002
POST _aliases
{
"actions": [
{
"remove": {
"index": "items-2022-11-26-001",
"alias": "items"
}
},
{
"add": {
"index": "items-2022-11-26-002",
"alias": "items"
}
}
]
}
Delete items-2022-11-26-001
You run all your queries against "items" alias that will act as an index.
References:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

Related

Appending list inside JSON object

This is my dictionary format:
quest_attr = {
"questions": [
{
"Tags": [
{
"tagname": ""
}
],
"Title": "",
"Authors": [
{
"name": ""
}
],
"Answers": [
{
"ans": ""
}
],
"Related_Questions": [
{
"quest": ""
}
]
}
]
}
I want to add list of "Tags" such that the result will be:
"questions":[
{
"Tags": [
{"tagname":"#Education"}, {"tagname":"#Social"}
],
remaining fields...
}
The remaining fields can be assumed to be null. And I want to add multiple questions to the main "questions" list.
I am using this code but he results are not as expected.
ind=0
size=len(tags)
while ind<size:
quest_attr["questions"].append({["Tags"].append({"tagname":tags[ind]})})
ind=ind+1
And if I maintain a variable for looping through the list of questions like:
quest_attr["questions"][ind]["Tags"].append({"tagname":tags[ind]
It gives an error that the index is out of range. What should I do?
It appears that the index variable ind is intended to iterate only through the list of tags. The way you have the append structured, your loop will attempt to attach the next tag to the next question in the questions list, instead of adding the rest of the tags to the same question.
If you were to add the same set to multiple questions, you need loop through the questions list separately while nesting your append statement for the tags inside another loop. On the other hand, if there's only one question you want to target, just use the index number, [0] in this case.
Something like this would perhaps work better but more context would help:
for question in quest_attr["questions"]:
for tag in tags:
question["Tags"].append({"tagname":tag})
Please don't make a mess with dict and list like your code.
Here I recommend a simpler deploy.
quest_attr = {
'questions': {
"Tags":[],
"Title":"",
"Authors":[],
"Answers":[],
"Related_Questions":[]
}
}
tags = [ {"tagname":"#Education"},{"tagname":"#Social"} ]
quest_attr["questions"]['Tags'] += tags
print(quest_attr)

Select multiple values from a document

Background
I have data stored in the following format
{
"player_id": "VU3R5HNTAGMK",
"markers": {
"BICF2P964092": "GC",
"BICF2G630653981": "CG",
"BICF2P483996": "CT",
"BICF2S23452916": "CG",
"chr26_19147949": "TC",
}
}
You can imagine i have data stored for multiple players and each has a unique player_id and they all have varying number of markers with different marker values.
In the above case a marker is BICF2P964092 and it's marker value is GC.
I am trying to query my mongo db in various ways. One obvious way is by using player_id. To do that I do the following col.find({"player_id": "VU3R5HNTAGMK"})
Another thing i want to do is maybe I just want to know value of a specific marker for a specific player. So for that I can do the following col.find({"player_id": "VU3R5HNTAGMK"}, {'markers.BICF2P964092'})
ISSUE
I also want to be able to get values for multiple markers for a specific player and i am not able to do so. I have tried the following with no luck.
col.find({"player_id": "VU3R5HNTAGMK"},{'markers': {'$in': ["BICF2P964092", "chr26_19147949"]}})
col.find({"player_id": "VU3R5HNTAGMK"}, {'markers.BICF2P964092'}, {'markers.chr26_19147949'})
I would really appreciate it if someone can help me write a query where i can get multiple marker values for specified marker and player_id
You can simply do the following
col.find({“player_id”: “VU3R5HNTAGMK”}, {“markers.” + m: 1 for m in [“ BICF2P964092", “BICF2G630653981”]})
As you've tagged this pymongo, you might be as best to process the marker values in python after the find; e.g.
docs = col.find({"player_id": "VU3R5HNTAGMK"})
for doc in docs:
for marker, value in doc.get('markers').items():
if marker in ["BICF2P964092", "chr26_19147949"]:
print(marker, value)
#Belly Buster solution is good if you want to handle this using python.
But, there is a way to completely handle this on the MongoDB side using Aggregation.
You can combine $objectToArray, $filter, and $arrayToObject operators in $project stage.
collection.aggregate([
{
"$match": {
"player_id": "VU3R5HNTAGMK" # <-- All your match conditons
}
},
{
"$project": {
"player_id": 1, # All the other keys which you want to project
"markers": {
"$arrayToObject": {
"$filter": {
"input": {
"$objectToArray": "$markers"
},
"as": "elem",
"cond": {
"$in": [
"$$elem.k",
[
# <-- List of key names you want to project
"BICF2G630653981",
"BICF2P483996"
]
]
},
},
},
},
}
},
])
Note: You have to use MongoDB version >= 3.4.4 for this aggregation query to work.

Mongodb Pymongo using $set to create an array/list/collection

I'm trying to use $set to create an array/list/collection (not sure which is proper terminology), and I'm not sure how to do it. For example:
I have a document inserted into my database that looks like this:
"_id": (unique, auto-generated id)
"Grade": Sophomore
I want to insert a collection/list/array using update. So, basically I want this:
"_id": (unique, auto-generated id)
"Grade": Sophomore
"Information"{
"Class_Info": [
{"Class_Name": "Math"}
]
What I've been doing so far is using .update and dot notation. So, what I was trying to do was use $set like this:
collection.update({'_id': unique ID}, {'$set': {'Information.Class_Info.Class_Name': 'Math}})
However, what that is doing is making Class_Info a document and not a list/collection/array, so it's doing:
"_id": (unique id)
"Grade": Sophomore
"Information"{
"Class_Info": {
"Class_Name": "Math"
}
How do I specify that I want Class_Info to be a list? IF for some reason I absolutely cannot use $set to do this, it is very important that I can use dot notation because of the way the rest of my program works, so if I'm supposed to use something other than $set, can it have dot notation to specify where to insert the list? (I know $push is another option, but it doesn't use dot notation, so I can't really use it in my case).
Thanks!
If you want to do it with only one instruction but starting up from NOT having any key created yet, this is the only way to do it ($set will never create an array that's not explicit, like {$set: {"somekey": [] }}
db.test.update(
{ _id: "(unique id)" },
{ $push: {
"Information.Class_Info": { "Class_Name": "Math" }
}}
)
This query does the trick, push to a non-existing key Information.Class_Info, the object you need to create as an array. This is the only possible solution with only one instruction, using dot notation and that works.
There is a way to do it with one instructions, $set and dot notation, as follows:
db.test.updateOne(
{ _id: "my-unique-id" },
{ $set: {
"Information.Class_Info": [ { "Class_Name": "Math" } ]
}}
)
There is also a way to do it with two instructions and the array index in the dot notation, allowing you to use similar statements to add more array elements:
db.test.updateOne(
{ _id: "my-unique-id" },
{ $set: { "Information.Class_Info": [] }}
)
db.test.updateOne(
{ _id: "my-unique-id" },
{ $set: {
"Information.Class_Info.0": { "Class_Name": "Math" },
"Information.Class_Info.1": { "Class_AltName": "Mathematics" }
}}
)
Deviating from these options has interesting failure modes:
If you try to combine the second option into a single updateOne() call, which is usually possible, MongoDB will complain that "Updating the path 'Information.Class_Info.0' would create a conflict at 'Information.Class_Info'"
If you try to use dot the notation with the array index ("Information.Class_Info.0.Class_Name": "Math") but without creating an empty array first, then MongoDB will create an object with numeric keys ("0", "1", …). It really refuses to create array except when told explicitly using […] (as also told in the answer by #Maximiliano).

Selectively retrieving depending on existence of key in map

Is it possible to selectively retrieve depending on the existence of keys in a map in mongodb? And if so, how do you go about doing it?
Suppose I have a document that looks like this for example..
{ "_id": 1234,
"parentfield1" : {
"childfield1" : { ...},
"childfield2" : { ...},
"childfield5" : { ...}, // There might be many childfields.. > 50
},
}
How would I be able to selectively retrieve from the document a/some particular childfields given multiple options to choose from? Some of which may not exist in the document.
i.e.
input "childfield1", "childfield2", "childfield3"
-> output
{ "_id": 1234,
"parentfield1": {
"childfield1" : { ... },
"childfield2" : { ... },
},
}
Is it even doable? Is it possible to do efficiently also?
Any help would be great (python, go).
Yes, that's the purpose of the projection parameter of find:
db.collection.find({_id: 1234}, {
'parentfield1.childfield1': 1,
'parentfield1.childfield2': 1,
'parentfield1.childfield3': 1
});
If a specified field isn't present in a given doc, the other matching fields will still be included.
Build up the projection parameter object programmatically if you want it to be dynamic.

MongoDb: $sort by $in

I am running a mongodb find query with an $in operator:
collection.find({name: {$in: [name1, name2, ...]}})
I would like the results to be sorted in the same order as my name array: [name1, name2, ...]. How do I achieve this?
Note: I am accessing MongoDb through pymongo, but I don't think that's of any importance.
EDIT: as it's impossible to achieve this natively in MongoDb, I ended up using a typical Python solution:
names = [name1, name2, ...]
results = list(collection.find({"name": {"$in": names}}))
results.sort(key=lambda x: names.index(x["name"]))
You can achieve this with aggregation framework starting with upcoming version 3.4 (Nov 2016).
Assuming the order you want is the array order=["David", "Charlie", "Tess"] you do it via this pipeline:
m = { "$match" : { "name" : { "$in" : order } } };
a = { "$addFields" : { "__order" : { "$indexOfArray" : [ order, "$name" ] } } };
s = { "$sort" : { "__order" : 1 } };
db.collection.aggregate( m, a, s );
The "$addFields" stage is new in 3.4 and it allows you to "$project" new fields to existing documents without knowing all the other existing fields. The new "$indexOfArray" expression returns position of particular element in a given array.
The result of this aggregation will be documents that match your condition, in order specified in the input array order, and the documents will include all original fields, plus an additional field called __order
Impossible. $in operator checks the presence. The list is treated as set.
Options:
Split for several queries for name1 ... nameN or filter the result the same way.
More names - more queries.
Use itertools groupby/ifilter. In that case - add the "sorting precedence" flag to every document and match name1 to PREC1, name2 to PREC2, ...., then isort by PREC then group by PREC.
If your collection has the index on "name" field - option 1 is better.
If doest not have the index or you cannot create it due to high write/read ratio - option 2 is for you.
Vitaly is correct it's impossible to do that with find but it can be achieved with aggregates:
db.collection.aggregate([
{ $match: { name: { $in: [name1, name2, /* ... */] } } },
{
$project: {
name: 1,
name1: { $eq: ['name1', '$name'] },
name2: { $eq: ['name2', '$name'] },
},
},
{ $sort: { name1: 1, name2: 1 } },
])
tested on 2.6.5
I hope this will hint other people in the right direction.

Categories