is it possible to use wildcards for field names in mongodb? - python

I have a set of field names as follows:
"field0.registers.hilo"
"field0.registers.lllo"
...
"field1.registers.hilo"
"field1.registers.lllo"
...
"field2.registers.hilo"
"field2.registers.lllo"
...
"fieldn.registers.hilo"
"fieldn.registers.lllo"
...
Is there a way to indicate the fields in mongodb with the index to range from 0 to n succinctly without having to expand it all out beforehand?
something like this example for project:
{ $project: { "fieldn.registers.hilo": 1, "fieldn.registers.lllo": 1 } }
For now, I am fully expanding all the project fields from 0 to n in python before interfacing with the collection using pymongo.

is it possible to use wildcards for field names in mongodb?
No.
If your data is in this structure, refactor it to use lists. That's exactly what lists are desgined for.
Taking the refactored example below, Use $elemMatch to project only the array elements needed:
from pymongo import MongoClient
db = MongoClient()['mydatabase']
db.register.insert_many([{
'registers': [
{
'field': 0,
'hilo': 1,
'lllo': 2
},
{
'field': 1,
'hilo': 2,
'lllo': 3
},
{
'field': 2,
'hilo': 3,
'lllo': 4
}
]}])
print(db.register.find_one({}, {'registers': {'$elemMatch': {'field': 1}}}))
prints:
{'_id': ObjectId('60b64e57c3214d73c390557b'), 'registers': [{'field': 1, 'hilo': 2, 'lllo': 3}]}

Related

How can I transpose a list of documents in MongoDB?

I have a document like:
{
"_id": "6345e01473144cec0073ea95",
"results": [
{"total_cost": 10, "total_time": 20},
{"total_cost": 30, "total_time": 40}
]
}
And I want to 'transpose' the list of documents to get:
{
"total_cost": [10, 30],
"total_time": [20, 40]
}
How can I find an object by ID, and then apply a transform on a list of documents with a mongo aggregation?
Every question/answer I have seen doing this has been for multiple documents, however this is for a single document with a list field.
(I am using MongoEngine/Pymongo so I have included the python tag)
simply access the fields with dot notation. You can think the projection of the fields as an individual array, i.e. results.total_cost is an array with content [10, 30]
db.collection.aggregate([
{
$project: {
total_cost: "$results.total_cost",
total_time: "$results.total_time"
}
}
])
Here is the Mongo Playground with your reference.

Format an f-string for each dataframe object

Requirement
My requirement is to have a Python code extract some records from a database, format and upload a formatted JSON to a sink.
Planned approach
1. Create JSON-like templates for each record. E.g.
json_template_str = '{{
"type": "section",
"fields": [
{{
"type": "mrkdwn",
"text": "Today *{total_val}* customers saved {percent_derived}%."
}}
]
}}'
2. Extract records from DB to a dataframe.
3. Loop over dataframe and replace the {var} variables in bulk using something like .format(**locals()))
Question
I haven't worked with dataframes before.
What would be the best way to accomplish Step 3 ? Currently I am
3.1 Looping over the dataframe objects 1 by 1 for i, df_row in df.iterrows():
3.2 Assigning
total_val= df_row['total_val']
percent_derived= df_row['percent_derived']
3.3 In the loop format and add str to a list block.append(json.loads(json_template_str.format(**locals()))
I was trying to use the assign() method in dataframe but was not able to figure out a way to use like a lambda function to create a new column with my expected value that I can use.
As a novice in pandas, I feel there might be a more efficient way to do this (which may even involve changing the JSON template string - which I can totally do). Will be great to hear thoughts and ideas.
Thanks for your time.
I would not write a JSON string by hand, but rather create a corresponding python object and then use the json library to convert it into a string. With this in mind, you could try the following:
import copy
import pandas as pd
# some sample data
df = pd.DataFrame({
'total_val': [100, 200, 300],
'percent_derived': [12.4, 5.2, 6.5]
})
# template dictionary for a single block
json_template = {
"type": "section",
"fields": [
{"type": "mrkdwn",
"text": "Today *{total_val:.0f}* customers saved {percent_derived:.1f}%."
}
]
}
# a function that will insert data from each row
# of the dataframe into a block
def format_data(row):
json_t = copy.deepcopy(json_template)
text_t = json_t["fields"][0]["text"]
json_t["fields"][0]["text"] = text_t.format(
total_val=row['total_val'], percent_derived=row['percent_derived'])
return json_t
# create a list of blocks
result = df.agg(format_data, axis=1).tolist()
The resulting list looks as follows, and can be converted into a JSON string if needed:
[{
'type': 'section',
'fields': [{
'type': 'mrkdwn',
'text': 'Today *100* customers saved 12.4%.'
}]
}, {
'type': 'section',
'fields': [{
'type': 'mrkdwn',
'text': 'Today *200* customers saved 5.2%.'
}]
}, {
'type': 'section',
'fields': [{
'type': 'mrkdwn',
'text': 'Today *300* customers saved 6.5%.'
}]
}]

consul json parsing with python

I am trying to pull multiple values from consul.
after pulling data using the following code:
import consul
c = consul.Consul("consulServer")
index, data = c.kv.get("key",recurese=False)
print data
I am getting the following json in my data list:
[ {
'LockIndex': 0,
'ModifyIndex': 54,
'Value': '1',
'Flags': 0,
'Key': 'test/one',
'CreateIndex': 54
}, {
'LockIndex': 0,
'ModifyIndex': 69,
'Value': '2',
'Flags': 0,
'Key': 'test/two',
'CreateIndex': 69
}]
I want to transform this output to key:value json file. for this example it should look like:
{
"one": "1",
"two": "2"
}
I have two questions:
1. Is there a better way to get multiple values from consul kv?
2. Assuming there is no better way, what is the best way to convert the json from the first example to the second one?
Thanks,

RethinkDB - how to filter arrays in nested objects when updating?

With RethinkDB, how do I update arrays in nested objects so that certain values are filtered out?
Consider the following program, I would like to know how to write an update query that filters out the value 2 from arrays contained in votes sub objects in documents from the 'dinners' table:
import rethinkdb as r
from pprint import pprint
with r.connect(db='mydb') as conn:
pprint(r.table('dinners').get('xxx').run(conn))
r.table('dinners').insert({
'id': 'xxx',
'votes': {
'1': [1, 2, ],
},
}, conflict='replace').run(conn)
# How can I update the 'xxx' document so that the value 2 is
# filtered out from all arrays contained in the 'votes' sub object?
You can use the usual filter method together with object coersion:
def update_dinner(dinner):
return {
'votes': dinner['votes']
.keys()
.map(lambda key: [
key,
dinner['votes'][key].filter(lambda vote_val: vote_val.ne(2)),
])
.coerce_to('object')
}
r.table('dinners').update(update_dinner).run(conn)

How to specify explicit document id in elasticsearch while performing a bulk index operation?

I have to perform a bulk index operation in elasticsearch.
The data looks like
[{'code': 12, 'name': 'ABC', 'designation': 'ceo'},
{'code': 13, 'name': 'AIB', 'designation': 'cfo'},
{'code': 14, 'name': 'AXB', 'designation': 'cto'}]
While indexing i want to explicitly provide code as the id. It is simple when performing single indexing operation. I am not sure as to how can it be done in bulk index operation.
For indexing the format is different for bulk. There need to be 2 lines per index request. First one for meta data like indexname , type name and ID and second one , actual data -
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
You can specify the id in the first field.
You can read more on this here.

Categories