pymongo date field converted to unknown date format - python

I am getting data from one collection using python and I will be processing it and storing it in another collection. In the processed collection some of the date fields looks different like Date(-61833715200000).
I use below code to get data and processing it and then I bulk insert the values to new collection.
fleet_managers = taximongo.users.aggregate([{ "$match": { "role" : "fleet_manager"}}])
fleet_managers = pd.DataFrame(list(fleet_managers))
fleet_managers['city_id'] = fleet_managers['region_id'].map({'57ff2e84f39e0f0444000004':'Chennai','57ff2e08f39e0f0444000003':'Hyderabad'})
pros_fleet_managers.insert_many(fleet_managers.to_dict('records'))
The collection looks like this:
{
"_id" : ObjectId("58006678ee5e0e29c5000009"),
"deleted_at" : NaN,
"region_id" : "57ff2e84f39e0f0444000004",
"reset_password_sent_at" : Date(-61833715200000),
"current_sign_in_at" : ISODate("2016-10-14T06:07:55.568Z"),
"last_sign_in_at" : ISODate("2016-10-14T06:07:45.574Z"),
"remember_created_at" : Date(-61833715200000)
}
What did do wrong here. Thanks already.
I have found the solution by using the $ifNull while projecting the fields.
fleet_managers = taximongo.users.aggregate([{ "$match": { "role" : "fleet_manager"}},{"$project":{'_id':1,'deleted_at':{ "$ifNull": [ "$deleted_at", "null" ] },
'reset_password_sent_at':{ "$ifNull": [ "$reset_password_sent_at", "null" ] }, 'region_id':1,'current_sign_in_at':1,'last_sign_in_at':1,'remember_created_at':{ "$ifNull": [ "$remember_created_at", "null" ] }}}])
fleet_managers = pd.DataFrame(list(fleet_managers))
fleet_managers['city_id'] = fleet_managers['region_id'].map({'57ff2e84f39e0f0444000004':'Chennai','57ff2e08f39e0f0444000003':'Hyderabad'})
pros_fleet_managers.insert_many(fleet_managers.to_dict('records'))
The above code gives the solution but I need to handle the null or non existence dynamically i.e., when fetching it from the source collection.
Help me out on this.

Related

MongoDB - Querying a nested boolean field

I have the following mongoclient query:
db = db.getSiblingDB("test-db");
hosts = db.getCollection("test-collection")
db.hosts.aggregate([
{$match: {"ip_str": {$in: ["52.217.105.116"]}}}
]);
Which outputs this:
{
"_id" : ObjectId("..."),
"ip_str" : "52.217.105.116",
"data" : [
{"ssl" : {"cert" : {"expired" : "False"}}}
]
}
I'm trying to build the query so it returns a boolean True or False depending on the value of the ssl.cert.expired field. I'm not quite sure how to do this though. I've had a look into the $lookup and $where operators, but am not overly familiar with querying nested objects in Mongo yet.
As the data is an array, in order to get the (first) element of the nested expired, you should work with $arrayElemAt and provide an index as 0 to indicate the first element.
{
$project: {
ip_str: 1,
expired: {
$arrayElemAt: [
"$data.ssl.cert.expired",
0
]
}
}
}
Demo # Mongo Playgound

Obtaining Distinct values from mongodb as list of dictionaries using pymongo

I have a mongodb collection data in the following format
[{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 11:20", "ml_pred":"Invalid","hum_pred":"valid"},
{"name":"axe2","base-url":"www.example2.com","date":"2022-06-22 12:20", "ml_pred":"Valid","hum_pred":"null"},
{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 22:20", "ml_pred":"Invalid","hum_pred":"valid"},
{"name":"axe3","base-url":"www.example3.com","date":"2022-06-22 02:20", "ml_pred":"Valid","hum_pred":"null"},
{"name":"axe2","base-url":"www.example2.com","date":"2022-06-22 06:20", "ml_pred":"Invalid","hum_pred":"valid"},
{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 14:20", "ml_pred":"Invalid","hum_pred":"null"},
{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 10:20", "ml_pred":"Invalid","hum_pred":"invalid"},
{"name":"axe1","base-url":"www.example1.com","date":"2022-06-22 01:20", "ml_pred":"Invalid","hum_pred":"null"}]
I am trying to get unique base-url and name as a response. For that I use pymongo distinct like below
filter_stuff = {'base-url': 1, 'name':1,'_id': 0}
data = list(crawlcol.find({},filter_stuff).distinct("base-url"))
which returned me a list of base urls. But I am expecting an output like
[{"name":"axe1","base-url":"www.example1.com"},
{"name":"axe2","base-url":"www.example2.com"},
{"name":"axe3","base-url":"www.example3.com"}]
How this can be obtained
This will give the result as required
result = list(crawlcol.aggregate(
[
{"$group": { "_id": { "base-url": "$base-url", "name": "$name" } } }
]
))

How to query PointField - MongoEngine

I was trying to update PointField in my flask app with upsert_one. But it always inserts new document. I know the problem is with the query which I'm passing.
Below is my model.
class Location(db.Document):
location_name = db.StringField(required=True)
geoCoords = db.PointField()
And the update query.
Location.objects(geoCoords=loc["geoCoords"]).upsert_one(location_name=loc["location_name"], geoCoords=loc["geoCoords"])
#loc["geoCoords"] = [77.6309395,12.9539974]
I also tried running get. But I'm getting the error message "Location matching query does not exist." for the below query.
loc = Location.objects(geoCoords=[77.6309395,12.9539974]).get()
I have following entries in my location collection.
> db.location.find()
{ "_id" : ObjectId("59c5019727bae70ad3259e67"), "geoCoords" : { "type" : "Point", "coordinates" : [ 77.6309395, 12.9539974 ] }, "location_name" : "Bengaluru" }
{ "_id" : ObjectId("59c5022d27bae70ad3259ea2"), "geoCoords" : { "type" : "Point", "coordinates" : [ 77.6309395, 12.9539974 ] }, "location_name" : "Bengaluru" }
>
I couldn't find any related information on querying the PointFiled.
To answer to my question. I think there is no way to get the exact points like I have mentioned in the question.
The nearest method works here is to use __near selector. This accepts the range in meters. So, you can give closest range query as per your requirement.
In my case, I gave 100 meters. Which is fine for me.
Example:
Location.objects(geoCoords__near=thelocation["geoCoords"], geoCoords__max_distance=100).upsert_one(location_name=thelocation["location_name"], geoCoords=thelocation["geoCoords"])
Try this:
Location.objects(geoCoords="...").update(location_name=loc["location_name"], geoCoords=loc["geoCoords"])

Remove duplicate values in mongodb

I am learning mongodb using python with tornado.I have a mongodb collection, when I do
db.cal.find()
{
"Pid" : "5652f92761be0b14889d9854",
"Registration" : "TN 56 HD 6766",
"Vid" : "56543ed261be0b0a60a896c9",
"Period" : "10-2015",
"AOs": [
"14-10-2015",
"15-10-2015",
"18-10-2015",
"14-10-2015",
"15-10-2015",
"18-10-2015"
],
"Booked": [
"5-10-2015",
"7-10-2015",
"8-10-2015",
"5-10-2015",
"7-10-2015",
"8-10-2015"
],
"NA": [
"1-10-2015",
"2-10-2015",
"3-10-2015",
"4-10-2015",
"1-10-2015",
"2-10-2015",
"3-10-2015",
"4-10-2015"
],
"AOr": [
"23-10-2015",
"27-10-2015",
"23-10-2015",
"27-10-2015"
]
}
I need an operation to remove the duplicate values from the Booked,NA,AOs,AOr. Finally it should be
{
"Pid" : "5652f92761be0b14889d9854",
"Registration" : "TN 56 HD 6766",
"Vid" : "56543ed261be0b0a60a896c9",
"AOs": [
"14-10-2015",
"15-10-2015",
"18-10-2015",
],
"Booked": [
"5-10-2015",
"7-10-2015",
"8-10-2015",
],
"NA": [
"1-10-2015",
"2-10-2015",
"3-10-2015",
"4-10-2015",
],
"AOr": [
"23-10-2015",
"27-10-2015",
]
}
How do I achieve this in mongodb?
Working solution
I have created a working solution based on JavaScript, which is available on the mongo shell:
var codes = ["AOs", "Booked", "NA", "AOr"]
// Use bulk operations for efficiency
var bulk = db.dupes.initializeUnorderedBulkOp()
db.dupes.find().forEach(
function(doc) {
// Needed to prevent unnecessary operatations
changed = false
codes.forEach(
function(code) {
var values = doc[code]
var uniq = []
for (var i = 0; i < values.length; i++) {
// If the current value can not be found, it is unique
// in the "uniq" array after insertion
if (uniq.indexOf(values[i]) == -1 ){
uniq.push(values[i])
}
}
doc[code] = uniq
if (uniq.length < values.length) {
changed = true
}
}
)
// Update the document only if something was changed
if (changed) {
bulk.find({"_id":doc._id}).updateOne(doc)
}
}
)
// Apply all changes
bulk.execute()
Resulting document with your sample input:
replset:PRIMARY> db.dupes.find().pretty()
{
"_id" : ObjectId("567931aefefcd72d0523777b"),
"Pid" : "5652f92761be0b14889d9854",
"Registration" : "TN 56 HD 6766",
"Vid" : "56543ed261be0b0a60a896c9",
"Period" : "10-2015",
"AOs" : [
"14-10-2015",
"15-10-2015",
"18-10-2015"
],
"Booked" : [
"5-10-2015",
"7-10-2015",
"8-10-2015"
],
"NA" : [
"1-10-2015",
"2-10-2015",
"3-10-2015",
"4-10-2015"
],
"AOr" : [
"23-10-2015",
"27-10-2015"
]
}
Using indices with dropDups
This simply does not work. First, as per version 3.0, this option no longer exists. Since we have 3.2 released, we should find a portable way.
Second, even with dropDups, the documentation clearly states that:
dropDups boolean : MongoDB indexes only the first occurrence of a key and removes all documents from the collection that contain subsequent occurrences of that key.
So if there would be another document which has the same values in one of the billing codes as a previous one, the whole document would be deleted.
You can't use the "dropDups" syntax here first because it has been "deprecated" as of MongoDB 2.6 and removed in MongoDB 3.0 and will not even work.
To remove the duplicate from each list you need to use the set class in python.
import pymongo
fields = ['Booked', 'NA', 'AOs', 'AOr']
client = pymongo.MongoClient()
db = client.test
collection = db.cal
bulk = colllection.initialize_ordered_op()
count = 0
for document in collection.find():
update = dict(zip(fields, [list(set(document[field])) for field in fields]))
bulk.find({'_id': document['_id']}).update_one({'$set': update})
count = count + 1
if count % 200 == 0:
bulk.execute()
bulk = colllection.initialize_ordered_op()
if count > 0:
bulk.execute()
MongoDB 3.2 deprecates Bulk() and its associated methods and provides the .bulkWrite() method. This method is available from Pymongo 3.2 as bulk_write(). The first thing to do using this method is to import the UpdateOne class.
from pymongo import UpdateOne
requests = [] # list of write operations
for document in collection.find():
update = dict(zip(fields, [list(set(document[field])) for field in fields]))
requests.append(UpdateOne({'_id': document['_id']}, {'$set': update}))
collection.bulk_write(requests)
The two queries give the same and expected result:
{'AOr': ['27-10-2015', '23-10-2015'],
'AOs': ['15-10-2015', '14-10-2015', '18-10-2015'],
'Booked': ['7-10-2015', '5-10-2015', '8-10-2015'],
'NA': ['1-10-2015', '4-10-2015', '3-10-2015', '2-10-2015'],
'Period': '10-2015',
'Pid': '5652f92761be0b14889d9854',
'Registration': 'TN 56 HD 6766',
'Vid': '56543ed261be0b0a60a896c9',
'_id': ObjectId('567f808fc6e11b467e59330f')}
have you tried "Distinct()" ?
Link: https://docs.mongodb.org/v3.0/reference/method/db.collection.distinct/
Specify Query with distinct
The following example returns the distinct values for the field sku, embedded in the item field, from the documents whose dept is equal to "A":
db.inventory.distinct( "item.sku", { dept: "A" } )
The method returns the following array of distinct sku values:
[ "111", "333" ]
Assuming that you want to remove duplicate dates from the collection, so you can add a unique index with the dropDups: true option:
db.bill_codes.ensureIndex({"fieldName":1}, {unique: true, dropDups: true})
For more reference:
db.collection.ensureIndex() - MongoDB Manual 3.0
Note: Back up your database first in case it doesn't do exactly as you're expecting.

How to $set sub-sub-array value in MongoDB by pymongo

Data:
{
"_id" : ObjectId("50cda9741d41c81da6000002"),
"template_name" : "common_MH",
"role" : "MH",
"options" : [
{
"sections" : [
{
"tpl_option_name" : "test321",
"tpl_option_type" : "string",
"tpl_default_value" : "test321"
}
],
"tpl_section_name" : "Test"
}
]
}
could I modify tpl_default_value in options.$.section.$.tpl_option_name = 'test321'?
I already try too times, but I can't solve.
please assist me, thanks.
This is a bad schema for doing these kinda of updates, there is a JIRA for multi-level positional operator however it is not yet done: https://jira.mongodb.org/browse/SERVER-831
Ideally you either have to update this client side and then atomically set that section of the array:
$section = {
"tpl_option_name" : "test321",
"tpl_option_type" : "string",
"tpl_default_value" : "test321"
};
db.col.update({}, {$set: {options.$.sections.1: $section}})
Or you need to change your schema. Does the sections really need to be embedded? I noticed that you have a tpl_section_name in the top level but then you are nesting sections within that, it sounds more logical that only one section should be there.
That document would be easier to update.

Categories