Grouping data by date in MongoDB and Python - python

I'm making a standard find query to my MongoDB database, it looks like this:
MyData = pd.DataFrame(list(db.MyData.find({'datetimer': {'$gte': StartTime, '$lt': Endtime}})), columns=['price', 'amount', 'datetime'])
Now i'm trying to do another query, but it's more complicated and i don't know how to do it. Here is a sample of my data:
{"datetime": "2020-07-08 15:10", "price": 21, "amount": 90}
{"datetime": "2020-07-08 15:15", "price": 22, "amount": 50}
{"datetime": "2020-07-08 15:19", "price": 21, "amount": 40}
{"datetime": "2020-07-08 15:30", "price": 21, "amount": 90}
{"datetime": "2020-07-08 15:35", "price": 32, "amount": 50}
{"datetime": "2020-07-08 15:39", "price": 41, "amount": 40}
{"datetime": "2020-07-08 15:49", "price": 32, "amount": 40}
I need to group that data in intervals of 30 Minutes and have them distinct by price. So all the records before 15:30must have 15:30 as datetime, all the records before 16:00 need to have 16:00. An example of the expected output:
The previous data becomes this:
{"datetime": "2020-07-08 15:30", "price": 21, "amount": 90}
{"datetime": "2020-07-08 15:30", "price": 22, "amount": 50}
{"datetime": "2020-07-08 16:00", "price": 32, "amount": 50}
{"datetime": "2020-07-08 16:00", "price": 41, "amount": 40}
I don't know if this query is doable, so any kind of advice is appreciated. I can also do that from my code, if it's not possible to do
I tried the code suggested here, but i got the following result, which is not the expected output:
Query = db.myData.aggregate([
{ "$group": {
"_id": {
"$toDate": {
"$subtract": [
{ "$toLong": "$datetime" },
{ "$mod": [ { "$toLong": "$datetime" }, 1000 * 60 * 15 ] }
]
}
},
"count": { "$sum": 1 }
}}
])
for x in Query:
print(x)
//OUTPUT:
{'_id': datetime.datetime(2020, 7, 7, 9, 15), 'count': 39}
{'_id': datetime.datetime(2020, 7, 6, 18, 30), 'count': 44}
{'_id': datetime.datetime(2020, 7, 7, 16, 30), 'count': 54}
{'_id': datetime.datetime(2020, 7, 7, 11, 45), 'count': 25}
{'_id': datetime.datetime(2020, 7, 6, 22, 15), 'count': 48}
{'_id': datetime.datetime(2020, 7, 7, 15, 0), 'count': 30}
...

What #Gibbs suggested is correct, you just have to modify the data a little bit.
Check if the below aggregate query is what you are looking for
Query = db.myData.aggregate([
{
"$group": {
"_id": {
"datetime":{
"$toDate": {
"$subtract": [
{ "$toLong": "$datetime" },
{ "$mod": [ { "$toLong": "$datetime" }, 1000 * 60 * 30 ] }
]
}
},
"price": "$price",
"amount": "$amount"
},
}
},
{
"$replaceRoot": { "newRoot": "$_id"}
}
])
for x in Query:
print(x)

Related

fetching multiple vales and keys from dict

movies={
'actors':{'prabhas':{'knownAs':'Darling', 'awards':{'nandi':1, 'cinemaa':1, 'siima':1},'remuneration':100, 'hits':{'industry':2, 'super':3,'flops':8}, 'age':41, 'height':6.1, 'mStatus':'single','sRate':'35%'},
'pavan':{'knownAs':'Power Star', 'awards':{'nandi':2, 'cinemaa':2, 'siima':5}, 'hits':{'industry':2, 'super':7,'flops':16}, 'age':48, 'height':5.9, 'mStatus':'married','sRate':'37%','remuneration':50},
},
'actress':{
'tamanna':{'knownAs':'Milky Beauty', 'awards':{'nandi':0, 'cinemaa':1, 'siima':1}, 'remuneration':10, 'hits':{'industry':1, 'super':7,'flops':11}, 'age':28, 'height':5.9, 'mStatus':'single', 'sRate':'40%'},
'rashmika':{'knownAs':'Butter Milky Beauty', 'awards':{'nandi':0, 'cinemaa':0, 'siima':2}, 'remuneration':12,'hits':{'industry':0, 'super':4,'flops':2}, 'age':36, 'height':5.9, 'mStatus':'single', 'sRate':'30%'},
1.What are the total number of Nandi Awards won by actors?
2. What is the success rate of Prince?
3.What is the name of Prince?
you can answer the first question with this:
import jmespath
movies={
"actors": {
"prabhas": {
"knownAs": "Darling",
"awards": {
"nandi": 1,
"cinemaa": 1,
"siima": 1
},
"remuneration": 100,
"hits": {
"industry": 2,
"super": 3,
"flops": 8
},
"age": 41,
"height": 6.1,
"mStatus": "single",
"sRate": "35%"
},
"pavan": {
"knownAs": "Power Star",
"awards": {
"nandi": 2,
"cinemaa": 2,
"siima": 5
},
"hits": {
"industry": 2,
"super": 7,
"flops": 16
},
"age": 48,
"height": 5.9,
"mStatus": "married",
"sRate": "37%",
"remuneration": 50
}
},
"actress": {
"tamanna": {
"knownAs": "Milky Beauty",
"awards": {
"nandi": 0,
"cinemaa": 1,
"siima": 1
},
"remuneration": 10,
"hits": {
"industry": 1,
"super": 7,
"flops": 11
},
"age": 28,
"height": 5.9,
"mStatus": "single",
"sRate": "40%"
},
"rashmika": {
"knownAs": "Butter Milky Beauty",
"awards": {
"nandi": 0,
"cinemaa": 0,
"siima": 2
},
"remuneration": 12,
"hits": {
"industry": 0,
"super": 4,
"flops": 2
},
"age": 36,
"height": 5.9,
"mStatus": "single",
"sRate": "30%"
}
}
}
total_nandies_by_actors = sum(jmespath.search('[]',jmespath.search('actors.*.*.nandi',movies)))
but there is no Prince in the data you've provided

ordering a dictionary by count of items across a number of key value lists

hopefully he the title is not too confusing, I have a dictionary (sample below) whereby im trying to sort the dictionary by the number of list (dictionary items) across a number of key values beneath a parent. Hopefully the example makes more sense then my description?
{
"data": {
"London": {
"SHOP 1": [
{
"kittens": 10,
"type": "fluffy"
},
{
"puppies": 11,
"type": "squidgy"
}
],
"SHOP 2": [
{
"kittens": 15,
"type": "fluffy"
},
{
"puppies": 3,
"type": "squidgy"
},
{
"fishes": 132,
"type": "floaty"
}
]
},
"Manchester": {
"SHOP 1": [
{
"kittens": 10,
"type": "fluffy"
},
{
"puppies": 11,
"type": "squidgy"
}
],
"SHOP 2": [
{
"kittens": 15,
"type": "fluffy"
},
{
"puppies": 3,
"type": "squidgy"
},
{
"fishes": 132,
"type": "floaty"
}
],
"SHOP 3": [
{
"kittens": 15,
"type": "fluffy"
},
{
"puppies": 3,
"type": "squidgy"
},
]
},
"Edinburgh": {
"SHOP 1": [
{
"kittens": 10,
"type": "fluffy"
},
{
"puppies": 11,
"type": "squidgy"
}
],
"SHOP 2": [
{
"kittens": 15,
"type": "fluffy"
},
],
"SHOP 3": [
{
"puppies": 3,
"type": "squidgy"
},
]
}
}
}
Summary
# London 2 shops, 5 item dictionaries total
# Machester 3 shops, 7 item dictionaries total
# Edinburgh 3 shops, 4 item dictionaries total
Desired sorting would be by total items across the shops, so ordered Manchester, London, Edinburgh
id usually use somethign like the below to sort, but im not sure how to do this oen with it being counting the number of items across a number of keys?
{k: v for k, v in sorted(x.items(), key=lambda item: item[1])}
You need to reverse sort based on the total number of items for each location, which you can generate as:
sum(len(i) for i in s.values())
where s is the shop dictionary for each location.
Putting this into a sorted expression:
dict(sorted(d['data'].items(), key=lambda t:sum(len(i) for i in t[1].values()), reverse=True))
gives:
{
'Manchester': {
'SHOP 1': [{'kittens': 10, 'type': 'fluffy'}, {'puppies': 11, 'type': 'squidgy'}],
'SHOP 2': [{'kittens': 15, 'type': 'fluffy'}, {'puppies': 3, 'type': 'squidgy'}, {'fishes': 132, 'type': 'floaty'}],
'SHOP 3': [{'kittens': 15, 'type': 'fluffy'}, {'puppies': 3, 'type': 'squidgy'}]
},
'London': {
'SHOP 1': [{'kittens': 10, 'type': 'fluffy'}, {'puppies': 11, 'type': 'squidgy'}],
'SHOP 2': [{'kittens': 15, 'type': 'fluffy'}, {'puppies': 3, 'type': 'squidgy'}, {'fishes': 132, 'type': 'floaty'}]
},
'Edinburgh': {
'SHOP 1': [{'kittens': 10, 'type': 'fluffy'}, {'puppies': 11, 'type': 'squidgy'}],
'SHOP 2': [{'kittens': 15, 'type': 'fluffy'}], 'SHOP 3': [{'puppies': 3, 'type': 'squidgy'}]
}
}
No need to make things complex:
adict = adict['data']
result = []
for capital, value in adict.items():
shop_count = len(value)
items = sum([len(obj) for obj in value.values()])
result.append((capital, shop_count, items))
for capital, shop_count, items in sorted(result, key=lambda x: x[2], reverse=True):
print(f'{capital} {shop_count} shops, {items} item dictionaries total')
Output:
Manchester 3 shops, 7 item dictionaries total
London 2 shops, 5 item dictionaries total
Edinburgh 3 shops, 4 item dictionaries total

Change one json format to another

I m new to programming, I want to change the following JSON format. I want to remove the "content" keyword as shown in the below example.
[{
"content": "abc",
'entities': [
[44, 55, "SEN"],
[27, 31, "FIN"]
]
}, {
"content": "xyz",
'entities': [
[8, 17, "FIN"]
]
}, {
"content": "klm",
'entities': [
[18, 26, "FIN"]
]
}]
to
[
('abc', {
'entities': [(44, 55, "SEN"), (27, 31, "FIN")]
}),
('xyz', {
'entities': [(8, 17, "FIN")]
}),
('klm', {
'entities': [(18, 26, "FIN"]
})
]
Please help.
Thanks
>>> data = [{
... "content": "abc",
... 'entities': [
... [44, 55, "SEN"],
... [27, 31, "FIN"]
... ]
... }, {
... "content": "xyz",
... 'entities': [
... [8, 17, "FIN"]
... ]
... }, {
... "content": "klm",
... 'entities': [
... [18, 26, "FIN"]
... ]
... }]
>>> [(dct["content"], {"entities": list(map(tuple, dct["entities"]))}) for dct in data]
[('abc', {'entities': [(44, 55, 'SEN'), (27, 31, 'FIN')]}), ('xyz', {'entities': [(8, 17, 'FIN')]}), ('klm', {'entities': [(18, 26, 'FIN')]})]
>>>
In a more readable format:
[
# 2. build a tuple...
(
# 3. whose first element is `content`
dct["content"],
# 4. and the second - a dictionary with one element
{
# 5. which is a list of entities that are converted to `tuple`
"entities": list(map(tuple, dct["entities"]))
}
)
# 1. For each dictionary...
for dct in data
]
You can use list comprehension as:
lst = [{
"content": "abc",
'entities': [
[44, 55, "SEN"],
[27, 31, "FIN"]
]
}, {
"content": "xyz",
'entities': [
[8, 17, "FIN"]
]
}, {
"content": "klm",
'entities': [
[18, 26, "FIN"]
]
}]
output = [( elt["content"], { "entities": [tuple(e) for e in elt["entities"]] } ) for elt in lst]
print(output)

How to combine dups from dictionary with Python [duplicate]

I have this list of dictionaries:
"ingredients": [
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
I want to be able to find the duplicates of ingredients (by either name or id). If there are duplicates and have the same unit_of_measurement, combine them into one dictionary and add the quantity accordingly. So the above data should return:
[
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
How do I go about it?
Assuming you have a dictionary represented like this:
data = {
"ingredients": [
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
}
What you could do is use a collections.defaultdict of lists to group the ingredients by a (name, id) grouping key:
from collections import defaultdict
ingredient_groups = defaultdict(list)
for ingredient in data["ingredients"]:
key = tuple(ingredient["ingredient"].items())
ingredient_groups[key].append(ingredient)
Then you could go through the grouped values of this defaultdict, and calculate the sum of the fraction quantities using fractions.Fractions. For unit_of_measurement and ingredient, we could probably just use the first grouped values.
from fractions import Fraction
result = [
{
"unit_of_measurement": value[0]["unit_of_measurement"],
"quantity": str(sum(Fraction(ingredient["quantity"]) for ingredient in value)),
"ingredient": value[0]["ingredient"],
}
for value in ingredient_groups.values()
]
Which will then give you this result:
[{'ingredient': {'id': 12, 'name': 'Balsamic Vinegar'},
'quantity': '1',
'unit_of_measurement': {'id': 13, 'name': 'Pound (Lb)'}},
{'ingredient': {'id': 14, 'name': 'Basil Leaves'},
'quantity': '3',
'unit_of_measurement': {'id': 15, 'name': 'Tablespoon'}}]
You'll probably need to amend the above to account for ingredients with different units or measurements, but this should get you started.

MongoDB | Update rows record by record on basis of one field

I want to update the documents/records of a collection in mongodb in python with the min/max/avg of temperature on the basis of a time range.
In the below example suppose time range is given to me "20:09-20:15", then the last row will not be updated rest of the ones will do.
Sample Data:
[
{'date': "1-10-2020", 'time': "20:09", 'temperature': 20}, //1
{'date': "1-10-2020", 'time': "20:11", 'temperature': 19}, //2
{'date': "1-10-2020", 'time': "20:15", 'temperature': 18}, //3
{'date': "1-10-2020", 'time': "20:18", 'temperature': 18} //4
]
Required output:
[
{'date': "1-10-2020", 'time': "20:09", 'temperature': 20, 'MIN': 20, 'MAX': 20, 'AVG': 20}, //1
{'date': "1-10-2020", 'time': "20:11", 'temperature': 19, 'MIN': 19, 'MAX': 20, 'AVG': 19.5}, //2
{'date': "1-10-2020", 'time': "20:15", 'temperature': 18, 'MIN': 18, 'MAX': 20, 'AVG': 19}, //3
{'date': "1-10-2020", 'time': "20:18", 'temperature': 18} //4
]
If you're using Mongo version 4.4+ you can use $merge to achieve this using a pipline:
db.collection.aggregate([
{
$match: {
time: {
$gte: "20:09",
$lte: "20:15"
}
}
},
{
$group: {
_id: null,
avg: {
$avg: "$temperature"
},
min: {
$min: "$temperature"
},
max: {
$max: "$temperature"
},
root: {
$push: "$$ROOT"
}
}
},
{
$unwind: "$root"
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
"$root",
{
"MIN": "$min",
"MAX": "$max",
"AVG": "$avg"
}
]
}
}
},
{
$merge: {
into: "collection",
on: "_id",
whenMatched: "replace"
}
}
])
Mongo Playground
If you're on a lesser Mongo version you have to split this into 2 calls, First use the same $group stage to fetch results, then use the values to update: (i'll write this one in python as you've tagged you're using pymongo)
results = list(collection.aggregate([
{
"$match": {
"time": {
"$gte": "20:09",
"$lte": "20:15"
}
}
},
{
"$group": {
"_id": None,
"avg": {
"$avg": "$temperature"
},
"min": {
"$min": "$temperature"
},
"max": {
"$max": "$temperature"
},
"root": {
"$push": "$$ROOT"
}
}
}
]))
collection.update_many(
{
"time": {
"$gte": "20:09",
"$lt": "20:15"
}
},
{
"$set": {
"MAX": results[0]["max"],
"MIN": results[0]["min"],
"AVG": results[0]["avg"],
}
}
)

Categories