How to query first and last objects from an array in MongoDB - python

So, I have an array of objects like this one:
"coordinates":[
{
"action":"charge",
"position":{
"city":"City A"
}
},
{
"action":"charge",
"position":{
"city":"City B"
}
},
{
"action":"discharge",
"position":{
"city":"City C"
}
},
{
"action":"discharge",
"position":{
"city":"City D"
}
},
(...)
]
This array has a number N of objects, so I don't know the total of objects inside the array.
My question is: How do I query the first and last city from the object position of the coordinates array? I was doing something like this:
db.find({
'coordinates.0.position.city': city_first_name,
'coordinates.position': {'$elemMatch': {'city': city_last_name}
},
{
'coordinates.$.position': {'$slice': -1}})
})
But that didn't work really well. It gets the first position, but will get any element that is in any part of any object. Should I use aggregation or there is another way using find?
Thanks for any help.

db.collection.aggregate([
{
"$project": {
"f": {
"$slice": [//Firsgt
"$a",
1
]
},
"l": {
"$slice": [//Last
"$a",
-1
]
}
}
}
])
You could achieve using $slice with $project
playground

Related

Updating double nested object in MongoDB with pymongo

I have the following object in MongoDB:
{
"_id":"...",
"username":"XXX",
"keys":[
{
"key":"c443c2cc2754d3",
"ref":"Autos",
"lists":[
{
"list_name":"Toyota",
"key":"c443c2cc2754d3",
"broken_parts":{
"headlights":false,
"bonnet":true,
"interior":{
"dashboard":true,
"electronics":false
}
},
"timestamp":"2023-01-26T13:00:21.803Z",
"status":"parked"
},
{
"list_name":"Nissan",
"key":"c443c2cc2754d3",
"broken_parts":{
"headlights":true,
"bonnet":false,
"interior":{
"dashboard":false,
"electronics":true
}
},
"timestamp":"2023-01-26T13:00:21.803Z",
"status":"garage"
}
]
},
{
"key":"80d54bd834ff60",
"ref":"Trucks",
"lists":[
{
"list_name":"MAN",
"key":"c443c2cc2754d3",
"broken_parts":{
"headlights":false,
"bonnet":false,
"interior":{
"dashboard":false,
"electronics":false
}
},
"timestamp":"2023-01-26T13:00:21.803Z",
"status":"parked"
},
{
"list_name":"Toyota",
"key":"c443c2cc2754d3",
"broken_parts":{
"headlights":true,
"bonnet":false,
"interior":{
"dashboard":true,
"electronics":true
}
},
"timestamp":"2023-01-26T13:00:21.803Z",
"status":"leased"
}
]
}
]
}
and whenever I try to update the status of an item inside a list, it only updates the first object.
If I try to update the status of the 1st list, 2nd object, it will only update the first one:
self.users.update_one({"username": username,
"keys.key": key,
"keys.lists.list_name": listid
},
{"$set": {
"keys.0.lists.$.status": 'repaired'
}
}
)
What am I doing wrong and how can I update the list item I want? or even the 3rd nested object (dashboard, electronics)?

How to use the sum of two fields when searching for a document in MongoDB?

I have a collection of accounts and I am trying to find an account in which the targetAmount >= totalAmount + N
{
"_id": {
"$oid": "60d097b761484f6ad65b5305"
},
"targetAmount": 100,
"totalAmount": 0,
"highPriority": false,
"lastTimeUsed": 1624283088
}
Now I just select all accounts, iterate over them and check if the condition is met. But I'm trying to do this all in a query:
amount = 10
tasks = ProviderAccountTaskModel.objects(
__raw__={
'targetAmount': {
'$gte': {'$add': ['totalAmount', amount]}
}
}
).order_by('-highPriority', 'lastTimeUsed')
I have also tried using the $sum, but both options do not work.
Can't it be used when searching, or am I just going the wrong way?
You can use a $where. Just be aware it will be fairly slow (has to execute Javascript code on every record) so combine with indexed queries if you can.
db.getCollection('YourCollectionName').find( { $where: function() { return this.targetAmount > (this.totalAmount + 10) } })
or more compact way of doing it will be
db.getCollection('YourCollectionName').find( { $where: "this.targetAmount > this.totalAmount + 10" })
You have to use aggregation instead of the find command since self-referencing of documents in addition to arithmetic operations won't work on it.
Below is the aggregation command you are looking for. Convert it into motoengine equivalent command.
db.collection.aggregate([
{
"$match": {
"$expr": {
"$gte": [
"$targetAmount",
{
"$sum": [
"$totalAmount",
10
]
},
],
},
},
},
{
"$sort": {
"highPriority": -1,
"lastTimeUsed": 1,
},
},
])
Mongo Playground Sample Execution

How to scrape attributes from json values

I am trying to scrape some values through a json that looks like:
{
"attributes":{
"531":{
"id":"531",
"code":"taille",
"label":"taille",
"options":[
{
"id":"30",
"label":"40",
"is_in":"0"
},
{
"id":"31",
"label":"41",
"is_in":"1"
}
]
}
},
"template":"Helloworld"
}
My issue is that the number 531 is different in each json file that I am trying to scrape and what I am trying to grab through this json is the label and is_in value
What I have done so far is that I tried to do something like this but I am stuck and dont know how to do if the 531 is changing to something else
getOption = '{
"attributes":{
"531":{
"id":"531",
"code":"taille",
"label":"taille",
"options":[
{
"id":"30",
"label":"40",
"is_in":"0"
},
{
"id":"31",
"label":"41",
"is_in":"1"
}
]
}
},
"template":"Helloworld"
}'
for att, values in getOption.items():
print(values)
So how do I possible scrape the value label and is_in?
I'm not sure if you can have several 531 keys but you can loop through them.
getOption = {
"attributes":{
"531":{
"id":"531",
"code":"taille",
"label":"taille",
"options":[
{
"id":"30",
"label":"40",
"is_in":"0"
},
{
"id":"31",
"label":"41",
"is_in":"1"
}
]
}
},
"template":"Helloworld"
}
attributes = getOption['attributes']
for key in attributes.keys():
for item in attributes[key]['options']:
print(item['label'], item['is_in'])

How to find equal values in different fields in Elasticsearch via Python Query?

I've got values in Elasticsearch (+Kibana) and want to make a Graph, where certain nodes are connected.
My fields are "prev" and "curr" and indicate the "previous" and the "current" page, which the user visited.
E.g:
prev: Main_Page, curr: Donald_Trump
prev: other-internal, curr: El_Bienamado
...
So what I'm trying to do is searching for values, where current is equal to previous, to be able to connect those and visualize via Networkx-Graph in Kibana.
My problem is that I just started yesterday with query-syntax and don't know if this is even possible.
All in all, my goal is to make a graph, where nodes are connected to a chain, e.g:
Main_Page -> Donald_Trump -> Problems_in_Afrika -> etc.
Meaning that somebody visited those pages in a certain order.
What I've tried for now is:
def getPrevList():
previous = []
previousQuery = {
"size": 0,
"aggs": {
"topTerms": {
"terms": {
"field": "prev",
"size": 50000
}
}
}
}
results = es.search(index="wiki", body=previousQuery)["aggregations"]["topTerms"]["buckets"]
for bucket in results:
previous.append({
"prev" : bucket["key"],
"numDocs" : bucket["doc_count"]
})
return previous
prevs=getPrevList()
rowNum = 0;
totalNumReviews=0
for prevDetails in prevs:
rowNum += 1
totalNumDocs += prevDetails["numDocs"]
prevId = prevDetails["prev"]
q = {
"query": {
"bool": {
"must": [
{
"term": {"prev": prevId}
}
]
}
},
"controls": {
"sample_size": 10000,
"use_significance": True
},
"vertices": [
{
"field": "curr",
"size": VERTEX_SIZE,
"min_doc_count": 1
},
{
"field": "prev",
"size": VERTEX_SIZE,
"min_doc_count": 1
}
],
"connections": {
"query": {
"match_all": {}
}
}
}
At the end, I'm doing the following:
results = es.transport.perform_request('POST', "/wiki/_xpack/_graph/_explore", body=q)
# Use NetworkX to create a graph of prevs and currs we can analyze
G = nx.Graph()
for node in results["vertices"]:
G.add_node(nodeId(node), type=node["field"])
for edge in results["connections"]:
n1 = results["vertices"][int(edge["source"])]
n2 = results["vertices"][int(edge["target"])]
G.add_edge(nodeId(n1), nodeId(n2))
I copied it from another example, which worked well, but I can see that the "connections" are important to be able to connect the vertices.
As far as I understood, I need the query to find the correct "prev" field.
The controls are not significant for now.
And here comes the complex part for me: What am I writing in the vertices and connections part? Is it correct that I defined vertices as the prev and curr fields?
And in the connections-query: for now I defined "match_all", but this is obviously not correct. I need a query, where I can "match" those, where prev equals curr and connect them.. but HOW??
Any hint is appreciated!
Thank you in forward.
EDIT:
Like #Lupanoide suggested, I altered the code and have now two visualizations:
the first one is the first suggested solution and it gives me this graph (part of it) (matplotlib, not Kibana yet):
The second solution looks more crazy and is more likely to be the correct one, but I need to visualize it in Kibana first:
So the new end of my script is now:
gq = json.dumps(q)
workspaceID ="/f44c95c0-223d-11e9-b49e-bb0f8e1e7bae" # my v6.4.0 workspace
workspaceUrl = "graph#/workspace/"+workspaceID+"?query=" + urllib.quote_plus(gq)
doc = {
"url": workspaceUrl
}
res = es.index(index=connectionsIndexName, doc_type='task', id=0, body=doc)
My only problem now is that when I'm using Kibana to open the URL, I do not see the graph. Instead I get the "new Graph" page.
EDIT2
Okay, I send the query, but of course the query alone is not enough. I need to pass the graph and its connections, right? Is it possible?
Thank you very much!
EDIT:
For your use case you need find all the values for field curr with the same prev value. So you need to groupBy all the pages that are clicked after a certain page. You can do that with terms aggregation .
You need to build a query that on one hand returns, with a term aggregation, all the values for the prev field and then you aggregate against over all the curr values generated:
def getOccurrencyDict():
body = {
"size": 0,
"aggs": {
"getAllThePrevs": {
"terms": {
"field": "prev",
"size": 40000
},
"aggs": {
"getAllTheCurr": {
"terms": {
"field": "curr",
"size": 40000
}
}
}
}
}
}
result = es.search(index="my_index", doc_type="mydoctype", body=body)
Then you have to build a data structure that the class Graph() of Networkx library accepts. So you should build a dict of list and then pass that var to the fromdictoflist method:
dict2Graph = dict()
for res in result["aggregations"]["getAllThePrevs"]["buckets"]:
dict2Graph[ res["key"] ] = list() #you create a dict of list with a prev value key
dict2Graph[ res["key"] ].append(res["getAllTheCurr"]["buckets"]) # you append a list of dict composed by key `key` with the `curr` value, and key `doc_count` with the number of occurrence of the term `curr` before the term prev
Now you pass it to the networkx ingestion method:
G=nx.from_dict_of_lists(dict2Graph)
I have not tested the networkx ingestion, so if it doesn't work , it is because we passed a dict of list of dict inside it and not a dict of list, so you should change a little bit how you build your dict2Graph dict
If the aggregation of aggregation query is too slow you should use prtition. Please read here how you could reach partition aggregation in elastic
EDIT:
after a reading of the networkX documentation, you could do also in this way, without creating the intermediate data structure:
from elasticsearch import Elasticsearch
from elasticsearch.client.graph import GraphClient
es = Elasticsearch()
graph_client = GraphClient(es)
def createGraphInKibana(prev):
q = {
"query": {
"bool": {
"must": [
{
"term": {"prev": prev}
}
]
}
},
"controls": {
"sample_size": 10000,
"use_significance": True
},
"vertices": [
{
"field": "curr",
"size": VERTEX_SIZE,
"min_doc_count": 1
},
{
"field": "prev",
"size": VERTEX_SIZE,
"min_doc_count": 1
}
],
"connections": {
"query": {
"match_all": {}
}
}
}
graph_client.explore(index="your_index", doc_type="your_doc_type", body=q)
G = nx.Graph()
for prev in result["aggregations"]["getAllThePrevs"]["buckets"]:
createGraphInKibana(prev['key'])
for curr in prev["getAllTheCurr"]["buckets"]:
G.add_edge(prev["key"], curr["key"], weight=curr["doc_count"])

triple nested for-loops in Json python

i need to print the latitude and longitude from the following python object:
{
"Siri": {
"ServiceDelivery": {
"ResponseTimestamp": "2014-08-09T15:32:13.078-04:00",
"VehicleMonitoringDelivery": [
{
"VehicleActivity": [
{
"MonitoredVehicleJourney": {
"LineRef": "MTA NYCT_B38",
"DirectionRef": "1",
"FramedVehicleJourneyRef": {
"DataFrameRef": "2014-08-09",
"DatedVehicleJourneyRef": "MTA NYCT_FP_C4-Saturday-090900_B38_110"
},
"JourneyPatternRef": "MTA_B380099",
"PublishedLineName": "B38",
"OperatorRef": "MTA NYCT",
"OriginRef": "MTA_504241",
"DestinationRef": "MTA_901070",
"DestinationName": "DNTWN BKLYN TILLARY ST",
"SituationRef": [
{
"SituationSimpleRef": "MTA NYCT_78100"
}
],
"Monitored": true,
"VehicleLocation": {
"Longitude": -73.937414,
"Latitude": 40.692978
},
So far I have written this :
for delivery in theJSON['Siri']['ServiceDelivery']['VehicleMonitoringDelivery']:
for activity in delivery['VehicleActivity']:
for locations in activity['MonitoredVehicleJourney']['VehicleLocation']:
print locations['VehicleLocation']['Longitude']
But I am getting error: typeError:string indices must be integars.
How to solve?
activity['MonitoredVehicleJourney']['VehicleLocation'] is a dict, not a list, so iterating over it is an iteration over the keys, which are strings. If locations is a string, then locations['VehicleLocation']['Longitude'] makes no sense. You want
for delivery in theJSON['Siri']['ServiceDelivery']['VehicleMonitoringDelivery']:
for activity in delivery['VehicleActivity']:
print activity['MonitoredVehicleJourney']['VehicleLocation']['Longitude']

Categories