I've data like this.
startDateTime: {'timeZoneID': 'America/New_York', 'date': {'year': '2014', 'day': '29', 'month': '1'}, 'second': '0', 'hour': '12', 'minute': '0'}
This is just a representation for 1 attribute. Like this i've 5 other attributes. LastModified, created etc.
I wanted to derive this as ISO Date format yyyy-mm-dd hh:mi:ss. is this the right way for doing this?
def parse_date(datecol):
x=datecol;
y=str(x.get('date').get('year'))+'-'+str(x.get('date').get('month')).zfill(2)+'-'+str(x.get('date').get('day')).zfill(2)+' '+str(x.get('hour')).zfill(2)+':'+str(x.get('minute')).zfill(2)+':'+str(x.get('second')).zfill(2)
print y;
return;
That works, but I'd say it's cleaner to use the string formatting operator here:
def parse_date(c):
d = c["date"]
print "%04d-%02d-%02d %02d:%02d:%02d" % tuple(map(str, (d["year"], d["month"], d["day"], c["hour"], c["minute"], c["second"])))
Alternatively, you can use the time module to convert your fields into a Python time value, and then format that using strftime. Remember the time zone, though.
Related
I used pandas to create a list of dictionaries. The following codes is how I create the list:
sheetwork = client.open('RMA Daily Workload').sheet1
list_of_work = sheetwork.get_all_records()
dfr = pd.DataFrame(list_of_work, columns = ['date' , 'value'])
rnow = dfrnow.to_dict('records')
The following is the output of my list:
rnow =
[{'date': '01/02/2020', 'value': 13},
{'date': '01/03/2020', 'value': 2},
{'date': '01/06/2020', 'value': 5},
...
{'date': '01/07/2020', 'value': 6}]
I want to change the date format from MM/DD/YYYY to YYYY-MM-DDT00:00:00.000Z, so that my data will be compatible with my javascript file where I want to add my data.
I want my list to be shown as:
rnow =
[{'date': '2020-01-02T00:00:00.000Z', 'value': 13},
{'date': '2020-01-03T00:00:00.000Z', 'value': 2},
{'date': '2020-01-06T00:00:00.000Z', 'value': 5},
...
{'date': '2020-01-07T00:00:00.000Z', 'value': 6}]
I tried so many methods but can only convert them into 2020-01-02 00:00:00 but not 2020-01-02T00:00:00.000Z. Please advise what should I do
If you need exact T00:00:00.000Z this string after the time, try to use string format after time conversion,
e.g.,
import datetime
# '2020-01-07T00:00:00.000Z'
datetime.datetime.strptime("07/02/2020", '%d/%m/%Y').strftime('%Y-%m-%dT00:00:00.000Z'))
How to apply to pandas:
def func(x):
myDate = x.date
return datetime.datetime.strptime(myDate, '%d/%m/%Y').strftime('%Y-%m-%dT00:00:00.000Z')
df['new_date'] = df.apply(func, axis=1)
To make it easy and keeping UTC and since you are using pandas:
rnow = [{'date': '01/02/2020', 'value': 13},
{'date': '01/03/2020', 'value': 2},
{'date': '01/06/2020', 'value': 5},
{'date': '01/07/2020', 'value': 6}]
def get_isoformat(date):
return pd.to_datetime(date, dayfirst=False, utc=True).isoformat()
for i in range (len(rnow)):
rnow[i]['date'] = get_isoformat(rnow[i]['date'])
rnow
which outputs:
[{'date': '2020-01-02T00:00:00+00:00', 'value': 13},
{'date': '2020-01-03T00:00:00+00:00', 'value': 2},
{'date': '2020-01-06T00:00:00+00:00', 'value': 5},
{'date': '2020-01-07T00:00:00+00:00', 'value': 6}]
in fact, you probably want to consider using the function get_isoformat() applied to your dataframe for simplicity. Also, if you use utc=None will get rid of the +00:00 part in case you don't want it or need it.
Edit
To get specificly 2020-01-02T00:00:00Z try:
pd.to_datetime(date, dayfirst=False, utc=False).isoformat()+'Z'
You can use the isoformat function of Python's builtin datetime package:
from datetime import datetime, timezone
formatted = datetime.strptime('01/02/2020', '%m/%d/%Y', tzInfo=timezone.utc).isoformat()
formatted
# Output: '2020-01-02T00:00:00+00:00'
Note that Python doesn't support the Z suffix for UTC timezone, instead it will be +00:00 which is according to ISO 8601 as well and should parse in other code just fine.
If this is a problem, you can omit the timezone and instead manually put a Z there:
from datetime import datetime
formatted = datetime.strptime('01/02/2020', '%m/%d/%Y').isoformat() + 'Z'
formatted
# Output: '2020-01-02T00:00:00Z'
Alternatively (in a more "manual" approach), you could format the date using strftime:
from datetime import datetime
formatted = datetime.strptime('01/02/2020', '%m/%d/%Y').strftime('%Y-%m-%dT00:00:00Z')
formatted
# Output: '2020-01-02T00:00:00Z'
I need to get the date month from various strings such as '14th oct', '14oct', '14.10', '14 10' and '14/10'. For these cases my below code working fine.
query = '14.oct'
print(re.search(r'(?P<date>\b\d{1,2})(?:\b|st|nd|rd|th)?(?:[\s\.\-/_\\,]*)(?P<month>\d{1,2}|[a-z]{3,9})', query, re.I).groupdict())
Result:-
{'date': '14', 'month': 'oct'}
But for this case (1410), its still capturing the date and month. But I don't want that, since this will be another number format of that entire string and not to be considered as date and month. The result should be None.
How to change the search pattern for this? (with groupdict() only)
Edited:-
The mentioned parathesis in the number above (1410) is just to differentiate from other text. What I want to mean is 1410 only.
The below solution is what I want and I got the idea from the answer of #the-fourth-bird by adding (?!\d{3,}\b) in the regex pattern.
Thanks🙏🏽
Final Solution
import re
queries = ['14 10', '14.10', '1410', '14-10', '14/10', '14,10', '17800', '14th oct', '14thoct', '14th-oct', '14th/oct', '14-oct', '14.oct', '14oct']
max_indent = len(max(queries, key = len)) + 1
for query in queries:
if resp := re.search(r'(?P<date>\b(?!\d{3,}\b)\d{1,2})(?:\b|st|[nr]d|th)?(?:[\s.-/_\\,-]*)(?P<month>\d{1,2}|[a-z]{3,9})', query, re.I):
print(f"{query:{max_indent}}- {resp.groupdict()}")
else:
print(f"{query:{max_indent}}- 'Not a date'")
Result:-
14 10 - {'date': '14', 'month': '10'}
14.10 - {'date': '14', 'month': '10'}
1410 - 'Not a date'
14-10 - {'date': '14', 'month': '10'}
14/10 - {'date': '14', 'month': '10'}
14,10 - {'date': '14', 'month': '10'}
17800 - 'Not a date'
14th oct - {'date': '14', 'month': 'oct'}
14thoct - {'date': '14', 'month': 'oct'}
14th-oct - {'date': '14', 'month': 'oct'}
14th/oct - {'date': '14', 'month': 'oct'}
14-oct - {'date': '14', 'month': 'oct'}
14.oct - {'date': '14', 'month': 'oct'}
14oct - {'date': '14', 'month': 'oct'}
Not sure if you don't want to match 1410 as in 4 digits only or (1410) with the parenthesis, but to exclude matching both you can make sure there are not 4 consecutive digits:
(?P<date>\b(?!\d{4}\b)\d{1,2})(?:st|[nr]d|th)?[\s./_\\,-]*(?P<month>\d{1,2}|[a-z]{3,9})
Regex demo
To not match any date between parenthesis
\([^()]*\)|(?P<date>\b\d{1,2})(?:st|[nr]d|th)?[\s./_\\,-]*(?P<month>\d{1,2}|[a-z]{3,9})
\([^()]*\) Match from opening till closing parenthesis
| Or
(?P<date>\b\d{1,2}) Match 1-2 digits
(?:st|[nr]d|th)? Optionally match st nd rd th
[\s./_\\,-]* Optionally repeat matching any of the listed
(?P<month>\d{1,2}|[a-z]{3,9}) Match 1-2 digits or 3-9 chars a-z
Regex demo
For example
import re
pattern = r"\([^()]*\)|(?P<date>\b\d{1,2})(?:st|[nr]d|th)?(?:[\s./_\\,-]*)(?P<month>\d{1,2}|[a-z]{3,9})"
strings = ["14th oct", "14oct", "14.10", "14 10", "14/10", "1410", "(1410)"]
for s in strings:
m = re.search(pattern, s, re.I)
if m.group(1):
print(m.groupdict())
else:
print(f"{s} --> Not valid")
Output
{'date': '14', 'month': 'oct'}
{'date': '14', 'month': 'oct'}
{'date': '14', 'month': '10'}
{'date': '14', 'month': '10'}
{'date': '14', 'month': '10'}
{'date': '14', 'month': '10'}
(1410) --> Not valid
How to change the search pattern for this?
You might try using negative lookbehind assertion literal ( combined with negative lookahead assertion literal ) as follows
import re
query = '14.oct'
noquery = '(1410)'
print(re.search(r'(?<!\()(?P<date>\b\d{1,2})(?:\b|st|nd|rd|th)?(?:[\s\.\-/_\\,]*)(?P<month>\d{1,2}|[a-z]{3,9})(?!\))', query, re.I).groupdict())
print(re.search(r'(?<!\()(?P<date>\b\d{1,2})(?:\b|st|nd|rd|th)?(?:[\s\.\-/_\\,]*)(?P<month>\d{1,2}|[a-z]{3,9})(?!\))', noquery, re.I))
output
{'date': '14', 'month': 'oct'}
None
Beware that it does prevent all bracketed forms, i.e. not only (1410) but also (14 10), (14/10) and so on.
I am getting the below output using describe_snapshots function in boto3
u'StartTime': datetime.datetime(2017, 4, 7, 4, 21, 42, tzinfo=tzutc())
I wish to convert it into proper date so that I can proceed with sorting the snapshots and removing the ones which are older than a particular number of days.
Is there a python functionality which can be used to attain this ?
This is almost certainly already the format you need. datetime objects are easily comparable / sortable. For example:
from datetime import datetime
import boto3
ec2 = boto3.client('ec2')
account_id = 'MY_ACCOUNT_ID'
response = ec2.describe_snapshots(OwnerIds=[account_id])
snapshots = response['Snapshots']
# January 1st, 2017
target_date = datetime(2017, 01, 01)
# Get the snapshots older than the target date
old_snapshots = [s for s in snapshots if s['StartTime'] < target_date]
# Sort the old snapshots
old_snapshots = sorted(old_snapshots, key=lambda s: s['StartTime'])
docs: https://docs.python.org/3.6/library/datetime.html
Really late to this post but I recently ran into this. I am assuming you're comparing the dates by hand/eye vs comparing the datetime objects programatically. Or you're debugging and you just want to see the date/time in the json objects in human readable format.
I found that the converter in the aws aha samples works really well.
def myconverter(json_object):
if isinstance(json_object, datetime.datetime):
return json_object.__str__()
From there you can just pass your original event/message from boto to json.dump and get the converted json string back
In [34]: print(json_msg)
{'arn': 'arn:aws:service:region::X', 'service': 'SERVICE', 'eventTypeCode': 'SOME_CODE', 'eventTypeCategory': 'CAT', 'eventScopeCode': 'SCOPE', 'region': 'us-east-1', 'startTime': datetime.datetime(YYYY, MM, DD, HH, MM, tzinfo=tzlocal()), 'endTime': datetime.datetime(YYYY, MM, DD, HH, MM, tzinfo=tzlocal()), 'lastUpdatedTime': datetime.datetime(YYYY, MM, DD, HH, MM, SS, tzinfo=tzlocal()), 'statusCode': 'CODE' }
In [35]: json_msg = json.dumps(json_event, default=myconverter)
In [36]: print(json_event)
{'arn': 'arn:aws:service:region::X', 'service': 'SERVICE', 'eventTypeCode': 'SOME_CODE', 'eventTypeCategory': 'CAT', 'eventScopeCode': 'SCOPE', 'region': 'us-east-1', 'startTime': "YYYY-MM-DD HH:MM:SS-OH:OS", 'endTime': "YYYY-MM-DD HH:MM:SS-OH:OS", 'lastUpdatedTime': "YYYY-MM-DD HH:MM:SS-OH:OS" , 'statusCode': 'CODE' }
probably needs more code from your end - but it almost seems like you're the one making it output that way (or something you're using is by default)
aws returns this format:
<startTime>YYYY-MM-DDTHH:MM:SS.SSSZ</startTime>
so im guessing you are somewhere using datetime.datetime() instead of something else on the date fields? (https://docs.python.org/2/library/datetime.html)
I have a list with multiple dictionaries, like the following:
[{'Date': '6-1-2017', 'Rate':'0.3', 'Type':'A'},
{'Date': '6-1-2017', 'Rate':'0.4', 'Type':'B'},
{'Date': '6-1-2017', 'Rate':'0.6', 'Type':'F'},
{'Date': '6-1-2017', 'Rate':'0.1', 'Type':'B'}
]
I would now like to change the dates, because they need to be in the format 'yyymmdd', which starts at 1900-01-01. In other words, I would like to change the '6-1-2017' to '1170106'.
As this has to be done every week (with the then current date), I do not want to change this by hand. So next week, '13-1-2017' has to be transformed into '1170113'.
Anyone ideas how to do this? I have tried several things, but I can't even get my code to select the date-values of all dictionaries.
Many thanks!
You can use the datetime module, which provides a lot of functionality to manipulate datetime objects including converting datetime to string and the way back, accessing different components of the datetime object, etc:
from datetime import datetime
for l in lst:
l['Date'] = datetime.strptime(l['Date'], "%d-%m-%Y")
l['Date'] = str(l['Date'].year - 1900) + l['Date'].strftime("%m%d")
lst
#[{'Date': '1170106', 'Rate': '0.3', 'Type': 'A'},
# {'Date': '1170106', 'Rate': '0.4', 'Type': 'B'},
# {'Date': '1170106', 'Rate': '0.6', 'Type': 'F'},
# {'Date': '1170106', 'Rate': '0.1', 'Type': 'B'}]
mydict = [{'Counted number': '26', 'Timestamp': '8/10/2015 13:07:38'},{'Counted number': '14','Timestamp': '8/10/2015 11:51:14'},{'Counted number': '28','Timestamp': '8/10/2015 13:06:27'}, {'Counted number': '20','Timestamp': '8/10/2015 12:53:42'}]
How to sort this dict based on timestamp?
This should work
import time
mydict.sort(key=lambda x:time.mktime(time.strptime(x['Timestamp'], '%d/%m/%Y %H:%M:%S')))
mydict.sort(key=lambda x:x['Timestamp'])
This will compare the elements of mydict based on their time stamp and sort it that way. Now, if you want to sort it by the actual time, then you have to convert that timestamp string to a Time object of some sort, and then sort mydict based on that. This question will likely help with that.