Date to string conversion in mongodb collection using python - python

I have downloaded some tweets from Twitter and uploaded them in mongodb through Python.
All records have a creation date attribute in this format:
'created_at': 'Fri Mar 12 04:36:10 +0000 2021',
I would like to analyze these tweets by date and hour so I would need to convert this date.
So far, I was able to run this command:
pipe = [ { "$addFields":
{ "create_date":
{ "$dateFromString": {"dateString": "$created_at"} }
}
},
{ "$group":
{
"_id": "$create_date",
"count": {"$sum": 1}
}
},
{"$sort": {'count':-1}},
{"$limit":10}
]
list(tweets.aggregate(pipeline=pipe))
which gives me some aggregated result but I would like to create thre new attributes in my collection: Date, Hour and Minute in order to have more flexibility in the analysis.
I cannot find a way to add these three new columns for all my records, by using Python.
Can someone help me?
Thank you in advance,
Francesca

At the end, I managed this in Mongo directly:
db.tweets.find().forEach(
(tweet) => {
db.tweets.update(
{_id: tweet._id},
{$set: {
created_at: new Date(tweet.created_at)
}}
);
}
);

Related

How to save json data as it is without data type conversion in dynamo db using python

I want to store key-value JSON data in aws DynamoDB where key is a date string in YYYY-mm-dd format and value is entries which is a python dictionary. When I used boto3 client to save data there, it saved it as a data type object, which I don't want. My purpose is simple: Store JSON data against a key which is a date, so that later I will query the data by giving that date. I am struggling with this issue because I did not find any relevant link which says how to store JSON data and retrieve it without any conversion.
I need help to solve it in Python.
What I am doing now:
item = {
"entries": [
{
"path": [
{
"name": "test1",
"count": 1
},
{
"name": "test2",
"count": 2
}
],
"repo": "test3"
}
],
"date": "2022-10-11"
}
dynamodb_client = boto3.resource('dynamodb')
table = self.dynamodb_client.Table(table_name)
response = table.put_item(Item = item)
What actually saved:
[{"M":{"path":{"L":[{"M":{"name":{"S":"test1"},"count":{"N":"1"}}},{"M":{"name":{"S":"test2"},"count":{"N":"2"}}}]},"repo":{"S":"test3"}}}]
But I want to save exactly the same JSON data as it is, without any conversion at all.
When I retrieve it programmatically, you see the difference of single quote, count value change.
response = table.get_item(
Key={
"date": "2022-10-12"
}
)
Output
{'Item': {'entries': [{'path': [{'name': 'test1', 'count': Decimal('1')}, {'name': 'test2', 'count': Decimal('2')}], 'repo': 'test3'}], 'date': '2022-10-12} }
Sample picture:
Why not store it as a single attribute of type string? Then you’ll get out exactly what you put in, byte for byte.
When you store this in DynamoDB you get exactly what you want/have provided. Key is your date and you have a list of entries.
If you need it to store in a different format you need to provide the JSON which correlates with what you need. It's important to note that DynamoDB is a key-value store not a document store. You should also look up the differences in these.
I figured out how to solve this issue. I have two column name date and entries in my dynamo db (also visible in screenshot in ques).
I convert entries values from list to string then saved it in db. At the time of retrival, I do the same, create proper json response and return it.
I am also sharing sample code below so that anybody else dealing with the same situation can have atleast one option.
# While storing:
entries_string = json.dumps([
{
"path": [
{
"name": "test1",
"count": 1
},
{
"name": "test2",
"count": 2
}
],
"repo": "test3"
}
])
item = {
"entries": entries_string,
"date": "2022-10-12"
}
dynamodb_client = boto3.resource('dynamodb')
table = dynamodb_client.Table(<TABLE-NAME>)
-------------------------
# While fetching:
response = table.get_item(
Key={
"date": "2022-10-12"
}
)['Item']
entries_string=response['entries']
entries_dic = json.loads(entries_string)
response['entries'] = entries_dic
print(json.dumps(response))

mongodb extract data between dates saved with javascript Date.now

I have a mongodb collection called foo_collection where the documents contains a field called createdby that uses javascript Date.now() to save the timestamp.
an example would be
{
"_id": {
"$oid": "618a669ea3bff474f6fe4767"
},
"widgetid": "ddbae9a2-4156-11ec-905a-02cf95edae88",
"publisher_id": "938ecebe-1089-11ec-8bd1-0a57868782b0",
"impression_id": "b6bfc0bc-1dc4-11ec-850e-0a57868782b0",
"logid": "ksaeqkqe65",
"createdby": {
"$numberLong": "1636460191573"
}
}
As you can see the field createdby uses $numberLong to save the data.
And, According to the doc The static Date.now() method returns the number of milliseconds elapsed since January 1, 1970 00:00:00 UTC.
How to query the collection foo_collection using pymongo such that I get the datas between two date range ?
Since you are using Date.now() which returns a number and not a Date object you can simply do a number comparison
db.collection.find({
$and: [
{
createdBy: {
$gt: *some_number*
}
},
{
createdBy: {
$lt: *some_number*
}
}
]
})```
converting python datetime into timestamp here
start = datetime.strptime('2021-11-10', '%Y-%m-%d').timestamp() * 1000
end = datetime.strptime('2021-11-11', '%Y-%m-%d').timestamp() * 1000
db. collection.find({
"createdby": {
"$gt": start,
"$lt": end
}
})

MongoDB (PyMongo) Pagination with distinct not giving consistent result

I am trying to achieve pagination with distinct using pymongo.
I have records
{
name: string,
roll: integer,
address: string,
.
.
}
I only want name for each record, where name can be duplicate, so i want distinct name with pagination.
result = collection.aggregate([
{'$sort':{"name":1}},
{'$group':{"_id":"$name"}},
{'$skip':skip},
{'$limit':limit}
])
Problem is, with this query, each time I query I get different result for same page number
Looked into this answer
Distinct() command used with skip() and limit()
but didn't help in my case.
How do I resolve this.
Thanks in advance!
I've tried to sort after the group and it seems to solve the problem
db.collection.aggregate([
{
"$group": {
"_id": "$name"
}
},
{
"$sort": {
"_id": 1
}
},
{
"$skip": 0
},
{
"$limit": 1
}
])
try it here

Get all MongoDB documents for the whole day [duplicate]

I've been playing around storing tweets inside mongodb, each object looks like this:
{
"_id" : ObjectId("4c02c58de500fe1be1000005"),
"contributors" : null,
"text" : "Hello world",
"user" : {
"following" : null,
"followers_count" : 5,
"utc_offset" : null,
"location" : "",
"profile_text_color" : "000000",
"friends_count" : 11,
"profile_link_color" : "0000ff",
"verified" : false,
"protected" : false,
"url" : null,
"contributors_enabled" : false,
"created_at" : "Sun May 30 18:47:06 +0000 2010",
"geo_enabled" : false,
"profile_sidebar_border_color" : "87bc44",
"statuses_count" : 13,
"favourites_count" : 0,
"description" : "",
"notifications" : null,
"profile_background_tile" : false,
"lang" : "en",
"id" : 149978111,
"time_zone" : null,
"profile_sidebar_fill_color" : "e0ff92"
},
"geo" : null,
"coordinates" : null,
"in_reply_to_user_id" : 149183152,
"place" : null,
"created_at" : "Sun May 30 20:07:35 +0000 2010",
"source" : "web",
"in_reply_to_status_id" : {
"floatApprox" : 15061797850
},
"truncated" : false,
"favorited" : false,
"id" : {
"floatApprox" : 15061838001
}
How would I write a query which checks the created_at and finds all objects between 18:47 and 19:00? Do I need to update my documents so the dates are stored in a specific format?
Querying for a Date Range (Specific Month or Day) in the MongoDB Cookbook has a very good explanation on the matter, but below is something I tried out myself and it seems to work.
items.save({
name: "example",
created_at: ISODate("2010-04-30T00:00:00.000Z")
})
items.find({
created_at: {
$gte: ISODate("2010-04-29T00:00:00.000Z"),
$lt: ISODate("2010-05-01T00:00:00.000Z")
}
})
=> { "_id" : ObjectId("4c0791e2b9ec877893f3363b"), "name" : "example", "created_at" : "Sun May 30 2010 00:00:00 GMT+0300 (EEST)" }
Based on my experiments you will need to serialize your dates into a format that MongoDB supports, because the following gave undesired search results.
items.save({
name: "example",
created_at: "Sun May 30 18.49:00 +0000 2010"
})
items.find({
created_at: {
$gte:"Mon May 30 18:47:00 +0000 2015",
$lt: "Sun May 30 20:40:36 +0000 2010"
}
})
=> { "_id" : ObjectId("4c079123b9ec877893f33638"), "name" : "example", "created_at" : "Sun May 30 18.49:00 +0000 2010" }
In the second example no results were expected, but there was still one gotten. This is because a basic string comparison is done.
To clarify. What is important to know is that:
Yes, you have to pass a Javascript Date object.
Yes, it has to be ISODate friendly
Yes, from my experience getting this to work, you need to manipulate the date to ISO
Yes, working with dates is generally always a tedious process, and mongo is no exception
Here is a working snippet of code, where we do a little bit of date manipulation to ensure Mongo (here i am using mongoose module and want results for rows whose date attribute is less than (before) the date given as myDate param) can handle it correctly:
var inputDate = new Date(myDate.toISOString());
MyModel.find({
'date': { $lte: inputDate }
})
Python and pymongo
Finding objects between two dates in Python with pymongo in collection posts (based on the tutorial):
from_date = datetime.datetime(2010, 12, 31, 12, 30, 30, 125000)
to_date = datetime.datetime(2011, 12, 31, 12, 30, 30, 125000)
for post in posts.find({"date": {"$gte": from_date, "$lt": to_date}}):
print(post)
Where {"$gte": from_date, "$lt": to_date} specifies the range in terms of datetime.datetime types.
db.collection.find({"createdDate":{$gte:new ISODate("2017-04-14T23:59:59Z"),$lte:new ISODate("2017-04-15T23:59:59Z")}}).count();
Replace collection with name of collection you want to execute query
MongoDB actually stores the millis of a date as an int(64), as prescribed by http://bsonspec.org/#/specification
However, it can get pretty confusing when you retrieve dates as the client driver will instantiate a date object with its own local timezone. The JavaScript driver in the mongo console will certainly do this.
So, if you care about your timezones, then make sure you know what it's supposed to be when you get it back. This shouldn't matter so much for the queries, as it will still equate to the same int(64), regardless of what timezone your date object is in (I hope). But I'd definitely make queries with actual date objects (not strings) and let the driver do its thing.
Use this code to find the record between two dates using $gte and $lt:
db.CollectionName.find({"whenCreated": {
'$gte': ISODate("2018-03-06T13:10:40.294Z"),
'$lt': ISODate("2018-05-06T13:10:40.294Z")
}});
Using with Moment.js and Comparison Query Operators
var today = moment().startOf('day');
// "2018-12-05T00:00:00.00
var tomorrow = moment(today).endOf('day');
// ("2018-12-05T23:59:59.999
Example.find(
{
// find in today
created: { '$gte': today, '$lte': tomorrow }
// Or greater than 5 days
// created: { $lt: moment().add(-5, 'days') },
}), function (err, docs) { ... });
db.collection.find({$and:
[
{date_time:{$gt:ISODate("2020-06-01T00:00:00.000Z")}},
{date_time:{$lt:ISODate("2020-06-30T00:00:00.000Z")}}
]
})
##In case you are making the query directly from your application ##
db.collection.find({$and:
[
{date_time:{$gt:"2020-06-01T00:00:00.000Z"}},
{date_time:{$lt:"2020-06-30T00:00:00.000Z"}}
]
})
You can also check this out. If you are using this method, then use the parse function to get values from Mongo Database:
db.getCollection('user').find({
createdOn: {
$gt: ISODate("2020-01-01T00:00:00.000Z"),
$lt: ISODate("2020-03-01T00:00:00.000Z")
}
})
Save created_at date in ISO Date Format then use $gte and $lte.
db.connection.find({
created_at: {
$gte: ISODate("2010-05-30T18:47:00.000Z"),
$lte: ISODate("2010-05-30T19:00:00.000Z")
}
})
use $gte and $lte to find between date data's in mongodb
var tomorrowDate = moment(new Date()).add(1, 'days').format("YYYY-MM-DD");
db.collection.find({"plannedDeliveryDate":{ $gte: new Date(tomorrowDate +"T00:00:00.000Z"),$lte: new Date(tomorrowDate + "T23:59:59.999Z")}})
mongoose.model('ModelName').aggregate([
{
$match: {
userId: mongoose.Types.ObjectId(userId)
}
},
{
$project: {
dataList: {
$filter: {
input: "$dataList",
as: "item",
cond: {
$and: [
{
$gte: [ "$$item.dateTime", new Date(`2017-01-01T00:00:00.000Z`) ]
},
{
$lte: [ "$$item.dateTime", new Date(`2019-12-01T00:00:00.000Z`) ]
},
]
}
}
}
}
}
])
For those using Make (formerly Integromat) and MongoDB:
I was struggling to find the right way to query all records between two dates. In the end, all I had to do was to remove ISODate as suggested in some of the solutions here.
So the full code would be:
"created": {
"$gte": "2016-01-01T00:00:00.000Z",
"$lt": "2017-01-01T00:00:00.000Z"
}
This article helped me achieve my goal.
UPDATE
Another way to achieve the above code in Make (formerly Integromat) would be to use the parseDate function. So the code below will return the same result as the one above :
"created": {
"$gte": "{{parseDate("2016-01-01"; "YYYY-MM-DD")}}",
"$lt": "{{parseDate("2017-01-01"; "YYYY-MM-DD")}}"
}
⚠️ Be sure to wrap {{parseDate("2017-01-01"; "YYYY-MM-DD")}} between quotation marks.
Convert your dates to GMT timezone as you're stuffing them into Mongo. That way there's never a timezone issue. Then just do the math on the twitter/timezone field when you pull the data back out for presentation.
Why not convert the string to an integer of the form YYYYMMDDHHMMSS? Each increment of time would then create a larger integer, and you can filter on the integers instead of worrying about converting to ISO time.
Scala:
With joda DateTime and BSON syntax (reactivemongo):
val queryDateRangeForOneField = (start: DateTime, end: DateTime) =>
BSONDocument(
"created_at" -> BSONDocument(
"$gte" -> BSONDateTime(start.millisOfDay().withMinimumValue().getMillis),
"$lte" -> BSONDateTime(end.millisOfDay().withMaximumValue().getMillis)),
)
where millisOfDay().withMinimumValue() for "2021-09-08T06:42:51.697Z" will be "2021-09-08T00:00:00.000Z"
and
where millisOfDay(). withMaximumValue() for "2021-09-08T06:42:51.697Z" will be "2021-09-08T23:59:99.999Z"
i tried in this model as per my requirements i need to store a date when ever a object is created later i want to retrieve all the records (documents ) between two dates
in my html file
i was using the following format mm/dd/yyyy
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<script>
//jquery
$(document).ready(function(){
$("#select_date").click(function() {
$.ajax({
type: "post",
url: "xxx",
datatype: "html",
data: $("#period").serialize(),
success: function(data){
alert(data);
} ,//success
}); //event triggered
});//ajax
});//jquery
</script>
<title></title>
</head>
<body>
<form id="period" name='period'>
from <input id="selecteddate" name="selecteddate1" type="text"> to
<input id="select_date" type="button" value="selected">
</form>
</body>
</html>
in my py (python) file i converted it into "iso fomate"
in following way
date_str1 = request.POST["SelectedDate1"]
SelectedDate1 = datetime.datetime.strptime(date_str1, '%m/%d/%Y').isoformat()
and saved in my dbmongo collection with "SelectedDate" as field in my collection
to retrieve data or documents between to 2 dates i used following query
db.collection.find( "SelectedDate": {'$gte': SelectedDate1,'$lt': SelectedDate2}})

Query Mongodb on month, day, year... of a datetime

I'm using mongodb and I store datetime in my database in this way
for a date "17-11-2011 18:00" I store:
date = datetime.datetime(2011, 11, 17, 18, 0)
db.mydatabase.mycollection.insert({"date" : date})
I would like to do a request like that
month = 11
db.mydatabase.mycollection.find({"date.month" : month})
or
day = 17
db.mydatabase.mycollection.find({"date.day" : day})
anyone knows how to do this query?
Dates are stored in their timestamp format. If you want everything that belongs to a specific month, query for the start and the end of the month.
var start = new Date(2010, 11, 1);
var end = new Date(2010, 11, 30);
db.posts.find({created_on: {$gte: start, $lt: end}});
//taken from http://cookbook.mongodb.org/patterns/date_range/
You cannot straightly query mongodb collections by date components like day or month. But its possible by using the special $where javascript expression
db.mydatabase.mycollection.find({$where : function() { return this.date.getMonth() == 11} })
or simply
db.mydatabase.mycollection.find({$where : 'return this.date.getMonth() == 11'})
(But i prefer the first one)
Check out the below shell commands to get the parts of date
>date = ISODate("2011-09-25T10:12:34Z")
> date.getYear()
111
> date.getMonth()
8
> date.getdate()
25
EDIT:
Use $where only if you have no other choice. It comes with the performance problems. Please check out the below comments by #kamaradclimber and #dcrosta. I will let this post open so the other folks get the facts about it.
and check out the link $where Clauses and Functions in Queries for more info
how about storing the month in its own property since you need to query for it? less elegant than $where, but likely to perform better since it can be indexed.
If you want to search for documents that belong to a specific month, make sure to query like this:
// Anything greater than this month and less than the next month
db.posts.find({created_on: {$gte: new Date(2015, 6, 1), $lt: new Date(2015, 7, 1)}});
Avoid quering like below as much as possible.
// This may not find document with date as the last date of the month
db.posts.find({created_on: {$gte: new Date(2015, 6, 1), $lt: new Date(2015, 6, 30)}});
// don't do this too
db.posts.find({created_on: {$gte: new Date(2015, 6, 1), $lte: new Date(2015, 6, 30)}});
Use the $expr operator which allows the use of aggregation expressions within the query language. This will give you the power to use the Date Aggregation Operators in your query as follows:
month = 11
db.mydatabase.mycollection.find({
"$expr": {
"$eq": [ { "$month": "$date" }, month ]
}
})
or
day = 17
db.mydatabase.mycollection.find({
"$expr": {
"$eq": [ { "$dayOfMonth": "$date" }, day ]
}
})
You could also run an aggregate operation with the aggregate() function that takes in a $redact pipeline:
month = 11
db.mydatabase.mycollection.aggregate([
{
"$redact": {
"$cond": [
{ "$eq": [ { "$month": "$date" }, month ] },
"$$KEEP",
"$$PRUNE"
]
}
}
])
For the other request
day = 17
db.mydatabase.mycollection.aggregate([
{
"$redact": {
"$cond": [
{ "$eq": [ { "$dayOfMonth": "$date" }, day ] },
"$$KEEP",
"$$PRUNE"
]
}
}
])
Using OR
month = 11
day = 17
db.mydatabase.mycollection.aggregate([
{
"$redact": {
"$cond": [
{
"$or": [
{ "$eq": [ { "$month": "$date" }, month ] },
{ "$eq": [ { "$dayOfMonth": "$date" }, day ] }
]
},
"$$KEEP",
"$$PRUNE"
]
}
}
])
Using AND
var month = 11,
day = 17;
db.collection.aggregate([
{
"$redact": {
"$cond": [
{
"$and": [
{ "$eq": [ { "$month": "$createdAt" }, month ] },
{ "$eq": [ { "$dayOfMonth": "$createdAt" }, day ] }
]
},
"$$KEEP",
"$$PRUNE"
]
}
}
])
The $redact operator incorporates the functionality of $project and $match pipeline and will return all documents match the condition using $$KEEP and discard from the pipeline those that don't match using the $$PRUNE variable.
You can find record by month, day, year etc of dates by Date Aggregation Operators, like $dayOfYear, $dayOfWeek, $month, $year etc.
As an example if you want all the orders which are created in April 2016 you can use below query.
db.getCollection('orders').aggregate(
[
{
$project:
{
doc: "$$ROOT",
year: { $year: "$created" },
month: { $month: "$created" },
day: { $dayOfMonth: "$created" }
}
},
{ $match : { "month" : 4, "year": 2016 } }
]
)
Here created is a date type field in documents, and $$ROOT we used to pass all other field to project in next stage, and give us all the detail of documents.
You can optimize above query as per your need, it is just to give an example. To know more about Date Aggregation Operators, visit the link.
You can use MongoDB_DataObject wrapper to perform such query like below:
$model = new MongoDB_DataObject('orders');
$model->whereAdd('MONTH(created) = 4 AND YEAR(created) = 2016');
$model->find();
while ($model->fetch()) {
var_dump($model);
}
OR, similarly, using direct query string:
$model = new MongoDB_DataObject();
$model->query('SELECT * FROM orders WHERE MONTH(created) = 4 AND YEAR(created) = 2016');
while ($model->fetch()) {
var_dump($model);
}

Categories