Im struggling with localized datetimes.
All the dates stored in mongo are converted to UTC automatically, so we have to localized them after retrieving them. Im fine with that, but..
In the case where I make a query to group records by date, meaning YYY-MM-DD, problems arise. Since my local time is GMT-3, any record with time above 21:00 will be stored in mongo as 00:00, thus corresponding to the following day. When grouping by date in the query i'd be getting records in the wrong day, and wont be able to recover from that because i lose the hour details.
Is there a way of localizing the dates in the groupby command in a pymongo query?
Here's the code:
def records_by_date():
pipeline = []
pipeline.append({"$group": {
"_id": {
"$concat": [
{"$substr": [{"$year": "$date"}, 0, 4]},
"-",
{"$substr": [{"$month": "$date"}, 0, 2]},
"-",
{"$substr": [{"$dayOfMonth": "$date"}, 0, 2]}
]},
"record_id": {"$push": "$_id"},
"count": {"$sum": 1}
}})
pipeline.append({"$project": {
"_id": 0,
"date": "$_id",
"record_id": 1,
"count": 1
}})
pipeline.append({"$sort": {"date": 1}})
return self.collection.aggregate(pipeline)['result']
If I add the hour details, I could verify the records after that, but then I wouldn't be grouping by date.
Any ideas?
Related
Dataset :
{"_id":{"$oid":"61e038a052124accf41cb5e4"},"event_date":{"$date":{"$numberLong":"1642204800000"}},
"name":"UFC Fight Night","event_url":"https://www.tapology.com/fightcenter/events/82805-ufc-fight-night","location":"Las Vegas, NV"}
{"_id":{"$oid":"61e038a252124accf41cb5e5"},"event_date":{"$date":{"$numberLong":"1642809600000"}},"name":"UFC 270","event_url":"https://www.tapology.com/fightcenter/events/82993-ufc-270","location":"Anaheim, CA"}
{"_id":{"$oid":"61e038a252124accf41cb5e6"},"event_date":{"$date":{"$numberLong":"1644019200000"}},"name":"UFC Fight Night","event_url":"https://www.tapology.com/fightcenter/events/83125-ufc-fight-night","location":"Las Vegas, NV"}
I'm using python, and that means I have no way to use $commands in my code for Mongo DB to find collection I need. Question is how can find object which should have datetime value closest to current date. As I understand I have to use python's datetime.now() to set current date and compare it ($currentDate doesn't work for Python). But but in order to compare values I have to deserialize the object and this looks very heavy. By default Mongo uses ISO datetime type.
Can you help me? At least direction to put me on a right way?
Here is a solution with an aggregate
db.collection.aggregate([
{
"$addFields": {
"dateDiff": {
"$abs": {
"$subtract": [
"$event_date",
{
"$literal": new Date()
}
]
}
}
}
},
{
"$sort": {
"dateDiff": 1
}
}
])
I use $subtract to have a difference between event_date and today.
$abs is to have the absolute value. To have the closest in the future or in the past.
And then we just have to sort by dateDiff
In your code you have to replace new Date()
Try it here
I am building a html form where it will query the MongoDB and retrieve the entire record based on the year.
How can I query only the year?
Something along the line in SQL:
select * from database where date == '2021'
How to do the equivalent in MongoDB?
If you are storing the dates as Date you have two options:
First one is to create dates and compare (not tested in python but should work):
import datetime
start = datetime.datetime(2021, 1, 1)
end = datetime.datetime(2022,1,1)
db.find({date: {$gte: start, $lt: end}});
Note the trick here is to query dates between the desired year, like this example
The other way is using aggregation like this:
Here you are getting the year using $year and comparing with your desired value.
db.collection.aggregate([
{
"$match": {
"$expr": {
"$eq": [
{
"$year": "$date"
},
2021
]
}
}
}
])
Example here
I am trying to configure my PuLP problem to ensure an employee does not have more than 10 hours per day.
The employee variable I have set up is:
cost = []
vars_by_shift = defaultdict(list)
for employee, info in employees.iterrows():
for shift in info['availability']:
employee_var = pl.LpVariable("%s_%s" % (employee, shift), 0, 1, pl.LpInteger)
vars_by_shift[shift].append(employee_var)
cost.append(employee_var * info['base_rate'])
My objective is to minimize cost:
prob = pl.LpProblem("scheduling", pl.LpMinimize)
prob += sum(cost)
An example of my shift data is:
"76749": {
"start_date": "2019-08-14",
"start_time": "08:00",
"end_date": "2019-08-14",
"end_time": "12:00",
"duration": 4,
"number_positions": 1
},
"76750": {
"start_date": "2019-08-14",
"start_time": "13:00",
"end_date": "2019-08-14",
"end_time": "20:00",
"duration": 7,
"number_positions": 1
}
An employee sometimes can be assign two short shifts on the same day. I want to ensure the total hours an employee is rostered any given day does not exceed 10 hours. How would model that constraint?
If I understand your implementation you have a set of binary decision variables:
pl[e, s]
With one variable for each e in employees and for each s in shifts
I'm also assuming there is (or you can easily create) a list days which includes the list of days covered by the shifts, and you can easily write a function which returns the number of hours of a shift in a particular day.
You then want to add constraints:
for e in employees:
for d in days:
lpSum([pl[e, s]*n_hours_of_shift_in_day(s, d) for s in shifts]) <= 10.0
Where the function n_hours_of_shift_in_day(s, d) is a function which returns the number of hours of shift s in day d, so for example if your shift was:
"76749": {
"start_date": "2019-08-14",
"start_time": "18:00",
"end_date": "2019-08-15",
"end_time": "19:00",
"duration": 25,
"number_positions": 1
}
Then n_hours_of_shift_in_day("76749", "2019-08-14") would return 5.0, and n_hours_of_shift_in_day("76749", "2019-08-15") would return 19.0.
Also your example shift seems to use a 12-hour clock format with no indication of AM or PM which might give you some problems.
well, you need a grouping variable for the hours, is it considered the start_date as the day you dont want to assign more than 10 hours?
if the answer is yes
then you need sth like this....
emp in employees.iterrows():
for d in dates:
prob.addConstrain(pl.lpSum([ employee_var*vars_by_shift[s][hours] if vars_by_shift[s]==d else 0 for s in shifts]) < 10)
I'm writing a function that takes a vendorID and a date_time string that should return if a vendorID can deliver if time/date doesn't overlap for a delivery to be done.
I'm trying to compare datetime strings inside a dict that has a list of nested dict with various elements including the datetime string. I want to compare each datetime string from each nested dict inside the list and check if the date is different and then compare if a certain amount of minutes have been passed.
Tried to loop through dict and items and use datetime.strptime() to parse the datetime string but I'm not sure how to compare the dates inside the list of dicts when iterating through the dict items.
dict = {
"results": [
{
"vendor_id": 1,
"client_id": 10,
"datetime": "2017-01-01 13:30:00"
},
{
"vendor_id": 1,
"client_id": 40,
"datetime": "2017-01-01 14:30:00"
}
Hope this helps you. Using dict as you have advised;
dict = {
"results": [
{
"vendor_id": 1,
"client_id": 10,
"datetime": "2017-01-01 13:30:00"
},
{
"vendor_id": 1,
"client_id": 40,
"datetime": "2017-01-01 14:30:00"
}]}
Having some date times, use the for loop and test if;
somedatetime1 = '2017-01-01 14:00:00'
somedatetime2 = '2017-01-01 15:00:00'
for d in dict['results']:
if d['datetime'] < somedatetime1:
print('somedatetime1 :: Vendor {} Client {} DateTime {}'.format(d['vendor_id'], d['client_id'], d['datetime']))
if d['datetime'] < somedatetime2:
print('somedatetime2 :: Vendor {} Client {} DateTime {}'.format(d['vendor_id'], d['client_id'], d['datetime']))
Returns;
somedatetime1 :: Vendor 1 Client 10 DateTime 2017-01-01 13:30:00
somedatetime2 :: Vendor 1 Client 10 DateTime 2017-01-01 13:30:00
somedatetime2 :: Vendor 1 Client 40 DateTime 2017-01-01 14:30:00
I would think it would be easier to do this in pandas as you can groupby vendor ID and do operations only for that vendor.
from sklearn.metrics.pairwise import euclidean_distances
import pandas as pd
import numpy as np
df = pd.DataFrame() #Initiate
grp = df.from_dict(dict['results']).groupby('vendor_id') #Convert into df and groupby vendor id, should be useful when u have more than one vendor
dct = {}
for group in grp.groups:
df_vid = grp.get_group(group) # Obtain data for that vendor
df_vid['datetime'] = pd.to_datetime(df_vid['datetime'])
ab = np.array(df_vid['datetime'])
dist = euclidean_distances(ab.reshape(-1, 1), ab.reshape(-1, 1)) # Find distance matrix
dct[group] = dist # Assign it to dict by vendor ID as key
You can change time of string type to datetime type.
And, Just you use -.
After that, You can use timedelta what is returned by time_diff function if you handle the time more.
If you want to get minutes, use seconds attribute.
from datetime import datetime
def time_diff (a, b):
return a - b
dict = {
"results": [
{
"vendor_id": 1,
"client_id": 10,
"datetime": "2017-01-01 13:30:00"
},
{
"vendor_id": 1,
"client_id": 40,
"datetime": "2017-01-01 14:30:00"
}
]
}
for r in dict['results']:
r['datetime'] = datetime.strptime(r['datetime'], '%Y-%m-%d %H:%M:%S')
print (int(time_diff(dict['results'][1]['datetime'], dict['results'][0]['datetime']).seconds/60))
Ok so I'm looking for advice and suggestions on the best way to comb through json data to look for today's date/time and return the right value.
Here is a sample of what the json data looks like:
[
{
"startDateTime": "2018-04-11T14:17:00-05:00",
"endDateTime": "2018-04-11T14:18:00-05:00",
"oncallMember": [
"username1"
],
"shiftType": "historical"
},
{
"startDateTime": "2018-04-11T14:18:00-05:00",
"endDateTime": "2018-04-16T08:00:00-05:00",
"oncallMember": [
"username2"
],
"shiftType": null
},
{
"startDateTime": "2018-04-16T08:00:00-05:00",
"endDateTime": "2018-04-23T08:00:00-05:00",
"oncallMember": [
"username1"
],
"shiftType": null
},
{
"startDateTime": "2018-04-23T08:00:00-05:00",
"endDateTime": "2018-04-30T08:00:00-05:00",
"oncallMember": [
"username2"
],
"shiftType": null
},
......continues on for the year
The start/end dates are set to be weekly rotations between the members, however when exceptions are set or changed, the start/end dates could vary daily or any other amount of time. What I want to do is check for today's date and find the current "oncallMember". I'm not sure how to search between the start and end times for today's date.
Any help is appreciated.
the module arrow maybe helpful.
first,get the today's timestamp range
import arrow
today = arrow.now()
(day_start,day_end) = today.span('day')
day_start_timestamp = day_start.timestamp
day_end_timestamp = day_end.timestamp
and then you need parse detail data into timestamp,but your raw data looks like a duration of time like "2018-04-16T08:00:00-05:00",maybe you need slice part of it like "2018-04-16T08:00:00",and using arrow parse it into timestamp,like
raw = "2018-04-16T08:00:00"
FORMAT = "YYYY-MM-DDTHH:mm:SS"
obj = arrow.get(raw,FORMAT)
obj_ts = obj.timestamp
and then you need judge whether obj_ts is in range between day_start_timestamp and day_end_timestamp
but if you need your code running for days,the timestamp range need to be changed everyday
json and datetime libraries
Usejson library for reading json by json.loads and converting it into dictionary
and for str to datetime conversion use datetime and dateutil.parser.parse
import json
from dateutil.parser import parse
from datetime import datetime
dict_ = json.loads(json_str)
# json str is the json you mentioned
startDate = dict_[0]['startDateTime']
# '2018-04-11T14:17:00-05:00'
date = parse(startDate)
# datetime.datetime(2018, 4, 11, 14, 17, tzinfo=tzoffset(None, -18000))
Once you get the date-time do further coding for start end date of today's comparison and return oncallMember