My documents in a collection looks like this -
{'_id' : 'Delhi1', 'loc' : [28.34242,77.656565] }
{'_id' : 'Delhi2', 'loc' : [27.34242,78.626523] }
{'_id' : 'Delhi3', 'loc' : [25.34242,77.612345] }
{'_id' : 'Delhi4', 'loc' : [28.34242,77.676565] }
I want to apply aggregation using pymongo, to find out relevant document based on input latlong. I have created the index on 'loc'. Here is what I have done so far -
pipeline = [{'$geoNear':{'near': [27.8787, 78.2342],
'distanceField': "distance",
'maxDistance' : 2000 }}]
db['mycollection'].aggregate(pipeline)
But this is not working for me ? How to correctly use this ?
Actually, I created the '2dsphere' index in collection, and to use geoNear with 2dsphere we need to specify, spherical = True in the pipeline
pipeline = [{'$geoNear':{'near': [27.8787, 78.2342],
'distanceField': "distance",
'maxDistance' : 2000,
'spherical' : True }}]
It looks like you have a few formatting mistakes: 1) neither the collection nor operators need parentheses or brackets, 2) logical operators are lowercase.
db.mycollection.aggregate([
{
$geoNear: {
near: { coordinates: [ 27.8787 , 78.2342 ] },
distanceField: "distance",
maxDistance: 2000,
spherical: true
}
}
])
Related
I have a dict, I need to convert to pandas dataframe, dict have arrays, if the arrays are of same length it is working fine, but different length array throwing valueError, second question is I need to access only few key value pairs from the dict
This case working, as expected I get two rows
my_dict = {
"ColA" : "No",
"ColB" : [
{
"ColB_a" : "2011-10-26T00:00:00Z",
"ColB_b" : 8.3
},
{
"ColB_a" : "2013-10-26T00:00:00Z",
"ColB_b" : 5.3
}
],
"ColC" : "Graduate",
"ColD" : [
{
"ColD_a" : 5436.0,
"ColD_b" : "RD"
},
{
"ColD_a" : 4658.0,
"ColD_b" : "DV"
}
],
"ColE" : "Work"
}
sa = pd.DataFrame(my_dict)
In this case ColB has only one value
my_dict = {
"ColA" : "No",
"ColB" : [
{
"ColB_a" : "2011-10-26T00:00:00Z",
"ColB_b" : 8.3
}
],
"ColC" : "Graduate",
"ColD" : [
{
"ColD_a" : 5436.0,
"ColD_b" : "RD"
},
{
"ColD_a" : 4658.0,
"ColD_b" : "DV"
}
],
"ColE" : "Work"
}
sa = pd.DataFrame(my_dict)
so this throws ValueError: arrays must all be same length, How this can be fixed?
expected output is
I can do
sa = pd.DataFrame.from_dict(my_dict, orient='index').transpose()
But I have to melt and join again.
Second Question, if I need to choose only ColA, ColB from dict to create dataframe, How this to be done?
for your second question, you could select a couple of columns from your dictionary using 'columns' parameter: For example:
sa = pd.DataFrame(my_dict, columns = ['ColA', 'ColD'])
I have a collection with documents like this:
{
"_id" : "1234567890",
"area" : "Zone 63",
"last_state" : "Cloudy",
"recent_indices" : [
21,
18,
33,
...
38
41
],
"Report_stats" : [
{
"date_hour" : "2017-01-01 01",
"count" : 31
},
{
"date_hour" : "2017-01-01 02",
"count" : 20
},
...
{
"date_hour" : "2018-08-26 13",
"count" : 3
}
]
}
which is supposed to be updated based on some online real-time reports
and assume each report looks like this:
{
'datetime' : '2018-08-26 13:48:11.677635',
'areas' : 'Zone 3; Zone 45; Zone 63',
'status' : 'Clear',
'index' : '33'
}
Now I have to update the collection in way that:
Each time that a new 'area' (say Zone 1025) shows up on the report, a new document adds to keep the related data
New 'index' adds to list "recent_indices" while "last_state" updates to 'status'
based on what the 'datetime' is, the respective "Report_stats.count" increments by 1 or a new "Report_stats" document ('datetime' with an hour resolution, where its 'count' is 1) inserted.
The way to do each of these updates separately, is somehow obvious, the problem is: How can I do all these simultaneously in a single update/upsert task?
I tried to use update_one and find_one_and_update(as well as update and find_and_modify) using pyMongo, but it was not possible (for me at least) to resolve the problem.
So I started to wonder if there possibly is a simple/single task to do so, or I should start trying to fix it in a different way altogether.
Can you please help me how to do this or (since there is a lot of data being gathered and therefore should be processed) suggest a low-cost alternative?
Thank you!
I am unsure if I understand your question, but if your problem revolves around upsert i.e update it or add the record if it is not there.
You can do it by adding one parameter like this:
update_one( {'_id':1}, {$set:{}}, upsert=True )
If you want to update multiple fields you can simply do it like setting your updated document:
{
name: 'Kanika',
age: 19
},
//set document
{
name: 'Andy',
age: 30
}
Please try looking into: https://docs.mongodb.com/manual/reference/method/db.collection.update/ , if it helps.
Thanks, Kanika
The best solution I have reached so far, is this:
if mycollection.find_one({'area': 'zone 45', 'Report_stats.date_hour': '2018-08-26 13'}):
mycollection.update_one({'area': 'zone 45', 'Report_stats.date_hour': '2018-08-26 13'},
{
'$inc': {
'Report_stats.$.count': 1
},
'$set': {
'last_state': 'Clear'
},
'$push': {
'recent_indices': 33,
}
},
)
else:
mycollection.update_one({'area': 'zone 45'},
{
'$set': {
'last_state': 'Clear'
},
'$push': {
'recent_indices': 33,
'Report_stats':{'date_hour':'2018-08-26 13', 'count':1}
}
},
upsert = True
)
However, it still is performing one query twice to update one document based on one request, which is not quite satisfactory.
Any better suggestions?
What if I figured out from your above reply is that if Report_stats.date_hour exists in your document, then you increment the counter or else you just push a new document.
I believe we can do it using $cond or $switch. Can you please take a look.
https://docs.mongodb.com/manual/reference/operator/aggregation/cond/#exp._S_cond
Meanwhile, I am trying to write the whole query for you and lets see if it works.
Thanks, Kanika
I have a JSON-array from a mongoexport containing data from the Beddit sleeptracker. Below is an example of one of the truncated documents (removed some unneeded detail).
{
"user" : "xxx",
"provider" : "beddit",
"date" : ISODate("2016-11-30T23:00:00.000Z"),
"data" : [
{
"end_timestamp" : 1480570804.26226,
"properties" : {
"sleep_efficiency" : 0.8772404,
"resting_heart_rate" : 67.67578,
"short_term_resting_heart_rate" : 61.36963,
"activity_index" : 50.51958,
"average_respiration_rate" : 16.25667,
"total_sleep_score" : 64,
},
"date" : "2016-12-01",
"session_range_start" : 1480545636.55059,
"start_timestamp" : 1480545636.55059,
"session_range_end" : 1480570804.26226,
"tags" : [
"not_enough_sleep",
"long_sleep_latency"
],
"updated" : 1480570805.25201
}
],
"__v" : 0
}
Several related questions like this and this do not seem to work for the data structure above. As recommended in other related questions I am trying to stay away from looping over each row for performance reasons (the full dataset is ~150MB). How would I flatten out the "data"-key with json_normalize so that each key is at the top-level? I would prefer one DataFrame where e.g. total_sleep_score is a column.
Any help is much appreciated! Even though I know how to 'prepare' the data using JavaScript, I would like to be able to understand and do it using Python.
edit (request from comment to show preferred structure):
{
"user" : "xxx",
"provider" : "beddit",
"date" : ISODate("2016-11-30T23:00:00.000Z"),
"end_timestamp" : 1480570804.26226,
"properties.sleep_efficiency" : 0.8772404,
"properties.resting_heart_rate" : 67.67578,
"properties.short_term_resting_heart_rate" : 61.36963,
"properties.activity_index" : 50.51958,
"properties.average_respiration_rate" : 16.25667,
"properties.total_sleep_score" : 64,
"date" : "2016-12-01",
"session_range_start" : 1480545636.55059,
"start_timestamp" : 1480545636.55059,
"session_range_end" : 1480570804.26226,
"updated" : 1480570805.25201,
"__v" : 0
}
The 'properties' append is not necessary but would be nice.
Try This algo for flatten:-
def flattenPattern(pattern):
newPattern = {}
if type(pattern) is list:
pattern = pattern[0]
if type(pattern) is not str:
for key, value in pattern.items():
if type(value) in (list, dict):
returnedData = flattenPattern(value)
for i,j in returnedData.items():
if key == "data":
newPattern[i] = j
else:
newPattern[key + "." + i] = j
else:
newPattern[key] = value
return newPattern
print(flattenPattern(dictFromJson))
OutPut:-
{
'session_range_start':1480545636.55059,
'start_timestamp':1480545636.55059,
'properties.average_respiration_rate':16.25667,
'session_range_end':1480570804.26226,
'properties.resting_heart_rate':67.67578,
'properties.short_term_resting_heart_rate':61.36963,
'updated':1480570805.25201,
'properties.total_sleep_score':64,
'properties.activity_index':50.51958,
'__v':0,
'user':'xxx',
'provider':'beddit',
'date':'2016-12-01',
'properties.sleep_efficiency':0.8772404,
'end_timestamp':1480570804.26226
}
Although not explicitly what I asked for, the following worked for me so far:
Step 1
Normalize the data record using json_normalize on the original dataset (not inside a Pandas DataFrame) and prefix the data.
beddit_data = pd.io.json.json_normalize(beddit, record_path='data', record_prefix='data.', meta='_id')
Step 2
The properties record was a Series with dicts so these can be 'formatted' with .apply(pd.Series)
beddit_data_properties = beddit_data['data.properties'].apply(pd.Series)
Step 3
Final step is to merge both DataFrames. In step 1, I kept the 'meta=_id' so that DataFrame can be merged with the original DataFrame from Bedit. I didn't include it in the final step yet because I can spend some time on the results from the results so far.
beddit_final = pd.concat([beddit_data_properties[:], beddit_data[:]], axis=1)
If anyone is interested, I can share the final Jupyter Notebook when it is ready :)
My problem is albeit a bit atypical. My Mongo instance records appear as follows:
{
"_id" : ObjectId("559670400084d37ea4cafa29"),
"('7412791816', '3838144', '723031613')" : {
"Customer_Loc_PinCode" : "110035",
"Net_Delivery_Time" : 3,
"Manifest_Date" : ISODate("2015-04-04T00:00:00Z"),
"Shipping_Date" : ISODate("2015-04-05T00:00:00Z"),
"Shipping_Method_Code" : "COD",
"Origin_PinCode" : "382470",
"Net_Manifest_Time" : 0,
"Transition_State" : [
[
"DNE",
"CTD",
"NULL",
"2015-04-05 15:23:22",
"NULL"
],
...# Many more such tuples present within this list.
],
"Net_Shipping_Time" : 2,
"RTD_Date" : "NULL",
"Delivery_Date" : ISODate("2015-04-07T00:00:00Z"),
"Intervening_Distance" : 522.3881079330106,
"Awb_Number" : "723031613",
"SubOrder_Number" : "7412791816",
"Last_Status" : "SHP",
"Customer_LatLong" : [
-,#Some float value
-#Some float value
],
"Order_Date" : ISODate("2015-04-04T00:00:00Z"),
"RTA_Date" : "NULL",
"Return_Direction" : 0,
"New_Status" : "DEL",
"Origin_LatLong" : [
-,#Some float value
-
],
"Rec_ID" : "3838144",
"RTU_Date" : "NULL"
}}
Now I require to obtain the dates and Net_Delivery_Time, as an example here, for all the records for further processing(plotting).
However the major debacle is that each such dictionary is referenced by a composite key,i.e. a tuple consisting of 3 fields. Now each such key uniquely identifies the associated record. I wish to extract the required fields from each such dictionary, but I have no means of iterating through all the keys.
I tried an approach to first collect all the keys and then retrieve the concerned fields, but that method didn't ork as there is no associated support for that in PyMongo.
If I were to use the db.'collection_name'.find() method, how will I craft the query? Can the uniqueness of each key present any potential problems? And what approach should I employ to achieve this task?
Thank You
I have a parameter dictionary as below -
paramDict = {
"DataFilter": {
"tableField": [{
"table":"GL_LEDGERS",
"field":"NAME"
}],
"value" : ["ABC."]
}
}
Now I want to use a "like" instead of the "isin" condition so that the data gets filtered for "ABC" as well as "ABC." -
DataFilter = df['NAME'].isin(
pd.Series(paramDict['DataFilter']['value']))
df = df[DataFilter]
Can you please help me with the same. I am using python 2.7. Thanks.
I assume your Series is a string type.
If so, you can use .contains:
DataFilter = df['NAME'].str.contains('ABC')