Accessing collection field inside python dictionary - python

totalHotelsInTown=hotels.aggregate([ {"$group": {"_id": "$Town", "TotalRestaurantInTown": {"$sum":1}} } ])
NumOfHotelsInTown={}
for item in totalHotelsInTown:
NumOfHotelsInTown[item['_id']]=item['TotalRestaurantInTown']
results = hotels.aggregate(
[{"$match": {"cuisine": cuisine}},
{"$group": {"_id": "$town", "HotelsCount": {"$sum": 1} }}, {"$project": {"HotelsCount":1,"Percent": {"$multiply": [{"$divide": ["$HotelsCount", NumOfHotelsInTown["$_id"]]}, 100]}}}, {"$sort": {"Percent": 1}},
{"$limit": 1}])
I want to pass the value of "_id" field as a key to python dictionary, but the interpreter is taking "$_id" itself as a key instead of its value and giving a KeyError because of that. Any help would be much appreciated. Thanks!
'NumOfHotelsInTown' dictionary has key value pairs of place and number of hotels
When I am trying to retrieve the value from NumOfHotelsInTown dictionary,
I am giving the key dynamically with "$_id".
The exact error I am getting is:
{"$group": {"_id": "$borough", "HotelsCount": {"$sum": 1} }}, {"$project": {"HotelsCount":1,"Percent": {"$multiply": [{"$divide": ["$HotelsCount", NumOfHotlesInTown["$_id"]]}, 100]}}}, {"$sort": {"Percent": 1}},
KeyError: '$_id'

I see what you're trying to do, but you can't dynamically run python code during a MongbDB aggregate.
What you should do instead:
Get the total counts for every borough (which you have already done)
Get the total counts for every borough for a given cuisine (which you have part of)
Use python to compare the 2 totals to produce a list of percentages and not MongoDB
For example:
group_by_borough = {"$group": {"_id": "$borough", "TotalRestaurantInBorough": {"$sum":1}} }
count_of_restaurants_by_borough = my_collection.aggregate([group_by_borough])
restaurant_count_by_borough = {doc["_id"]: doc["TotalRestaurantInBorough"] for doc in count_of_restaurants_by_borough}
count_of_cuisines_by_borough = my_collection.aggregate([{"$match": {"cuisine": cuisine}}, group_by_borough])
cuisine_count_by_borough = {doc["_id"]: doc["TotalRestaurantInBorough"] for doc in count_of_cuisines_by_borough}
percentages = {}
for borough, count in restaurant_count_by_borough.items():
percentages[borough] = cuisine_count_by_borough.get(borough, 0) / float(count) * 100
# And if you wanted it sorted you can use an OrderedDict
from collections import OrderedDict
percentages = OrderedDict(sorted(percentages.items(), key=lambda x: x[1]))

Related

Create JSON from specific lines of data

I have this test data that I am trying to use to create a JSON for just select items. I have the items listed and would just like to output a JSON list with select item.
What I have:
import json
# Data to be written
dictionary ={
"id": "04",
"name": "sunil",
"department": "HR"
}
# Serializing json
x= json.dumps(dictionary,indent=0)
# JSON String
y = json.loads(x)
# Goal is to print:
{
"id": "04",
"name": "sunil"
}
If you don't need to save the department key you can use this:
del y['department']
Then your y variable will print what you wanted:
{"id": "04", "name": "sunil"}
Other ways to solve the same issue
for key in list(y.keys()):
#add all potential keys you want to remain in the final dictionary
if key == "id" or key == "name":
continue
else:
del y[key]
However, iterating over a dictionary is pretty slow. You could assign values to temporary variables and then remake the dictionary like this:
temp_id = y['id']
temp_name = y['name']
y.clear()
y['id'] = temp_id
y['name'] = temp_name
This should be faster that iterating over a dictionary.

Create a new dictionary from existing with new keys

I have a dictionary d, I want to modify the keys and create a new dictionary. What is best way to do this?
Here's my existing code:
import json
d = json.loads("""{
"reference": "DEMODEVB02C120001",
"business_date": "2019-06-18",
"final_price": 40,
"products": [
{
"quantity": 4,
"original_price": 10,
"final_price": 40,
"id": "123"
}
]
}""")
d2 ={
'VAR_Reference':d['reference'],
'VAR_date': d['business_date'],
'VAR_TotalPrice': d['final_price']
}
Is there a better way to map the values using another mapping dictionary or a file where mapping values can be kept.
for eg, something like this:
d3 = {
'reference':'VAR_Reference',
'business_date': 'VAR_date',
'final_price': 'VAR_TotalPrice'
}
Appreciate any tips or hints.
You can use a dictionary comprehension to iterate over your original dictionary, and fetch your new keys from the mapping dictionary
{d3.get(key):value for key, value in d.items()}
You can also iterate over d3 and get the final dictionary (thanks #IcedLance for the suggestion)
{value:d.get(key) for key, value in d3.items()}

Merging list of python dictionaries by column value

I have data that is a list of python dictionaries, each representing a row in the data, and want to combine several of these into one dictionary.
I need to combine them by a common value in a single column, note the dictionaries to merge may or may not contain similar columns and values should be concatenated, not clobbered.
Here is an example (combining dicts by value in column 'a'):
data = [{ 'a':0, 'b':10, 'c':20 }
{ 'a':2, 'd':30, 'e':40 }
{ 'a':0, 'b':50, 'c':60 }
{ 'a':1, 'd':70, 'c':80 }
{ 'a':1, 'b':90, 'e':100 }]
Desired output is:
new_data = [{ 'a':0, 'b':[10,50], 'c':[20,60] }
{ 'a':1, 'd':[70], 'c':[80], 'b':[90], 'e':[100] }
{ 'a':2, 'd':[30], 'e':[40] }]
I have a simple function that can accomplish this, but need a faster method (Data has approx 1,000,000 rows and 20 columns). My method of finding the dictionaries I want to merge is very expensive.
Here is where I have an issue with computation time:
unique_idx, locations = [], {}
for i, row in enumerate(data):
_id = row['a']
if _id not in unique_idx:
unique_idx.append(_id)
locations[_id] = [i]
else:
locations[_id].append(i)
grouped_data = [data[loc] for loc in locations.values()]
I need a faster method to collect dictionaries that contain the same value in one column. Ideally I want a quick method with plain python, but if this can be done simply with a pandas DataFrame that is good as well.

RethinkDB - how to filter arrays in nested objects when updating?

With RethinkDB, how do I update arrays in nested objects so that certain values are filtered out?
Consider the following program, I would like to know how to write an update query that filters out the value 2 from arrays contained in votes sub objects in documents from the 'dinners' table:
import rethinkdb as r
from pprint import pprint
with r.connect(db='mydb') as conn:
pprint(r.table('dinners').get('xxx').run(conn))
r.table('dinners').insert({
'id': 'xxx',
'votes': {
'1': [1, 2, ],
},
}, conflict='replace').run(conn)
# How can I update the 'xxx' document so that the value 2 is
# filtered out from all arrays contained in the 'votes' sub object?
You can use the usual filter method together with object coersion:
def update_dinner(dinner):
return {
'votes': dinner['votes']
.keys()
.map(lambda key: [
key,
dinner['votes'][key].filter(lambda vote_val: vote_val.ne(2)),
])
.coerce_to('object')
}
r.table('dinners').update(update_dinner).run(conn)

How to count pymongo aggregation cursor without iterating

I want to get the total number of records in an aggregate cursor in pymongo version 3.0+. Is there any way to get total count without iterating over the cursor?
cursor = db.collection.aggregate([{"$match": options},{"$group": {"_id": groupby,"count": {"$sum":1}}} ])
cursorlist = [c for c in cursor]
print len(cursorlist)
Is there any way to skip the above iteration?
You could add another group pipeline where you specify an _id value of None to calculate accumulated values for all the input documents as a whole, this is where you can get the total count, as well as the original grouped counts, albeit in an accumulated array:
>>> pipeline = [
... {"$match": options},
... {"$group": {"_id": groupby, "count": {"$sum":1}}},
... {"$group": {"_id": None, "total": {"$sum": 1}, "details":{"$push":{"groupby": "$_id", "count": "$count"}}}}
... ]
>>> list(db.collection.aggregate(pipeline))

Categories