getting date from datetime data - python

I have a datetime data in this format,
08:15:54:012 12 03 2016 +0000 GMT+00:00
I need to extract only date,that is 12 03 2016 in python.
I have tried
datetime_object=datetime.strptime('08:15:54:012 12 03 2016 +0000 GMT+00:00','%H:%M:%S:%f %d %m %Y')
I get an
ValueError: unconverted data remains: +0000 GMT+00:00

If you don't mind using an external library, I find the dateparser module much more intuitive than pythons internal datetime. It can parse pretty much anything if you just do
>>> import dateparser
>>> dateparser.parse('08:15:54:012 12 03 2016 +0000 GMT+00:00')
It claims it can handle timezone offsets tho I haven't tested it.

If you need this as string then use slicing
text = '08:15:54:012 12 03 2016 +0000 GMT+00:00'
print(text[13:23])
# 12 03 2016
but you can also convert to datetime
from datetime import datetime
text = '08:15:54:012 12 03 2016 +0000 GMT+00:00'
datetime_object = datetime.strptime(text[13:23],'%d %m %Y')
print(datetime_object)
# datetime.datetime(2016, 3, 12, 0, 0)
BTW:
in your oryginal version you have to remove +0000 GMT+00:00 usinig slicing [:-16]
strptime('08:15:54:012 12 03 2016 +0000 GMT+00:00'[:-16], '%H:%M:%S:%f %d %m %Y')
You can also use split() and join()
>>> x = '08:15:54:012 12 03 2016 +0000 GMT+00:00'.split()
['08:15:54:012', '12', '03', '2016', '+0000', 'GMT+00:00']
>>> x[1:4]
['12', '03', '2016']
>>> ' '.join(x[1:4])
'12 03 2016'

You can do it like this:
d = '08:15:54:012 12 03 2016 +0000 GMT+00:00'
d = d[:23] #Remove the timezone details
from datetime import datetime
d = datetime.strptime(d, "%H:%M:%S:%f %m %d %Y") #parse the string
d.strftime('%m %d %Y') #format the string
You get:
'12 03 2016'

Related

convert the datetime stamp in numpy array

This might be simple but I had no luck finding the right solution.
I have a 'date' column in np array with dates in format 'Tue Feb 04 17:04:01 +0000 2020' which I would like to convert to '2020-02-04 17:04:01'
Are there any inherent methods in np which does that?
There are solutions which suggested looping through the elements in the column, but I guess that's not Numpy - thonic way.
Maybe you can try dateutil to parse dates
from dateutil import parser
date_str = 'Tue Feb 04 17:04:01 +0000 2020'
new_date = parser.parse(date_str).strftime('%Y-%m-%d %T')
With NumPy maybe you do as below:
np.datetime64(new_date)
#Example
date_str = 'Tue Feb 04 17:04:01 +0000 2020'
date_str2 = 'Fri Feb 07 17:04:01 +0000 2020'
new_date = parser.parse(date_str).strftime('%Y-%m-%d %T')
new_date2 = parser.parse(date_str2).strftime('%Y-%m-%d %T')
np.arange(np.datetime64(new_date), np.datetime64(new_date2))

Change datetime format to hours only in Pandas

I have a list of strings date. Formatted in like
Fri Apr 23 12:38:07 +0000 2021
How can I change its format? I want to take only the hours. I checked other source before, but you need to change the date format, which obviously I'm struggling rn
As I know, you can write the code like
ds['waktu'] = pd.to_datetime(ds['tanggal'], format='%A %b %d %H:%M:%S %z %Y')
to change its format. But idk what +0000 means.
If you only want to take the hours from the date strings, you can use .dt.strftime() after the pd.to_datetime() call, as follows:
ds['waktu'] = pd.to_datetime(ds['tanggal'], format='%a %b %d %H:%M:%S %z %Y').dt.strftime('%H:%M:%S')
Note that your format string for pd.to_datetime() is not correct and need to replace %A by %a.
+0000 is the time zone, which you can parse with %z in the format string.
Demo
ds = pd.DataFrame({'tanggal': ['Fri Apr 23 12:38:07 +0000 2021', 'Thu Apr 22 11:28:17 +0000 2021']})
ds['waktu'] = pd.to_datetime(ds['tanggal'], format='%a %b %d %H:%M:%S %z %Y').dt.strftime('%H:%M:%S')
print(ds)
tanggal waktu
0 Fri Apr 23 12:38:07 +0000 2021 12:38:07
1 Thu Apr 22 11:28:17 +0000 2021 11:28:17

Changing datetime format in Python Language

I am parsing emails through Gmail API and have got the following date format:
Sat, 21 Jan 2017 05:08:04 -0800
I want to convert it into ISO 2017-01-21 (yyyy-mm-dd) format for MySQL storage. I am not able to do it through strftime()/strptime() and am missing something. Can someone please help?
TIA
isoformat() in the dateutil.
import dateutil.parser as parser
text = 'Sat, 21 Jan 2017 05:08:04 -0800'
date = (parser.parse(text))
print(date.isoformat())
print (date.date())
Output :
2017-01-21T05:08:04-08:00
2017-01-21
You can do it with strptime():
import datetime
datetime.datetime.strptime('Sat, 21 Jan 2017 05:08:04 -0800', '%a, %d %b %Y %H:%M:%S %z')
That gives you:
datetime.datetime(2017, 1, 21, 5, 8, 4, tzinfo=datetime.timezone(datetime.timedelta(-1, 57600)))
You can even do it manually using simple split and dictionary.That way, you will have more control over formatting.
def dateconvertor(date):
date = date.split(' ')
month = {'Jan': 1, 'Feb': 2, 'Mar': 3}
print str(date[1]) + '-' + str(month[date[2]]) + '-' + str(date[3])
def main():
dt = "Sat, 21 Jan 2017 05:08:04 -0800"
dateconvertor(dt)
if __name__ == '__main__':
main()
Keep it simple.
from datetime import datetime
s="Sat, 21 Jan 2017 05:08:04 -0800"
d=(datetime.strptime(s,"%a, %d %b %Y %X -%f"))
print(datetime.strftime(d,"%Y-%m-%d"))
Output : 2017-01-21

Find date within strings using regex in both Python and grep

I have a log with entries in the following format:
1483528632 3 1 Wed Jan 4 11:17:12 2017 501040002 4
1533528768 4 2 Thu Jan 5 19:17:45 2017 534040012 3
...
How do I fetch only the timestamp component (eg. Wed Jan 4 11:17:12 2017) using regular expressions?
I have to implement the final product in python, but the requirement is to have part of an automated regression suite in bash/perl (with the final product eventually being in Python).
If the format is fixed in terms of space delimiters, you can simply split, get a slice of a date string and load it to datetime object via datetime.strptime():
In [1]: from datetime import datetime
In [2]: s = "1483528632 3 1 Wed Jan 4 11:17:12 2017 501040002 4"
In [3]: date_string = ' '.join(s.split()[3:8])
In [4]: datetime.strptime(date_string, "%a %b %d %H:%M:%S %Y")
Out[4]: datetime.datetime(2017, 1, 4, 11, 17, 12)
Grep is most often used in this scenario if you are working with syslog. But as the post is also tagged with Python. This example uses regular expressions with re:
import re
Define the pattern to match:
pat = "\w{3}\s\w{3}\s+\w\s\w{2}:\w{2}:\w{2}\s\w{4}"
Then use re.findall to return all non-overlapping matches of pattern in txt:
re.findall(pat,txt)
Output:
['Wed Jan 4 11:17:12 2017', 'Thu Jan 5 19:17:45 2017']
If you want to then use datetime:
import datetime
dates = re.findall(pat,txt)
datetime.datetime.strptime(dates[0], "%a %b %d %H:%M:%S %Y")
Output:
datetime.datetime(2017, 1, 4, 11, 17, 12)
You can then utilise these datetime objects:
dateObject = datetime.datetime.strptime(dates[0], "%a %b %d %H:%M:%S %Y").date()
timeObject = datetime.datetime.strptime(dates[0], "%a %b %d %H:%M:%S %Y").time()
print('The date is {} and time is {}'.format(dateObject,timeObject))
Output:
The date is 2017-01-04 and time is 11:17:12
The regex to match the timestamp is:
'[a-zA-Z]{3} +[a-zA-Z]{3} +\d{1,2} +\d{2}:\d{2}:\d{2} +\d{4}'.
With grep that can be used like this (if your log file was called log.txt):
$ grep -oE '[a-zA-Z]{3} +[a-zA-Z]{3} +\d{1,2} +\d{2}:\d{2}:\d{2} +\d{4}' log.txt
# Wed Jan 4 11:17:12 2017
# Thu Jan 5 19:17:45 2017
In python you can use that like so:
import re
log_entry = "1483528632 3 1 Wed Jan 4 11:17:12 2017 501040002 4"
pattern = '[a-zA-Z]{3} +[a-zA-Z]{3} +\d{1,2} +\d{2}:\d{2}:\d{2} +\d{4}'
compiled = re.compile(pattern)
match = compiled.search(log_entry)
match.group(0)
# 'Wed Jan 4 11:17:12 2017'
You can use this to get an actual datetime object from the string (expanding on above code):
from datetime import datetime
import re
log_entry = "1483528632 3 1 Wed Jan 4 11:17:12 2017 501040002 4"
pattern = '[a-zA-Z]{3} +[a-zA-Z]{3} +\d{1,2} +\d{2}:\d{2}:\d{2} +\d{4}'
compiled = re.compile(pattern)
match = compiled.search(log_entry)
log_time_str = match.group(0)
datetime.strptime(log_time_str, "%a %b %d %H:%M:%S %Y")
# datetime.datetime(2017, 1, 4, 11, 17, 12)
Two approaches: with and without using regular expressions
1) using re.findall() function:
with open('test.log', 'r') as fh:
lines = re.findall(r'\b[A-Za-z]{3}\s[A-Za-z]{3}\s{2}\d{1,2} \d{2}:\d{2}:\d{2} \d{4}\b',fh.read(), re.M)
print(lines)
2) usign str.split() and str.join() functions:
with open('test.log', 'r') as fh:
lines = [' '.join(d.split()[3:8]) for d in fh.readlines()]
print(lines)
The output in both cases will be a below:
['Wed Jan 4 11:17:12 2017', 'Thu Jan 5 19:17:45 2017']
grep -E '\b(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) +[0-9]+ [0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{4}\b' dates
If you just wanted to list the dates, rather than grep, perhaps:
sed -nre 's/^.*([A-Za-z]{3}\s+[A-Za-z]{3}\s+[0-9]+\s+[0-9]+:[0-9]+:[0-9]+\s+[0-9]{4}).*$/\1/p' filename

how to get a string from split line and compare in python

I have a line after split like in here:
lineaftersplit=Jan 31 00:57:07 2012 GMT
How do I get only year 2012 from this and compare if it falls between (2010) and (2013)
If lineaftersplit is a string value, you can use the datetime module to parse out the information, including the year:
import datetime
parsed_date = datetime.datetime.strptime(lineaftersplit, '%b %d %H:%M:%S %Y %Z')
if 2010 <= parsed_date.year <= 2013:
# year between 2010 and 2013.
This has the advantage that you can do further tests on the datetime object, including sorting and date arithmetic.
Demo:
>>> import datetime
>>> lineaftersplit="Jan 31 00:57:07 2012 GMT"
>>> parsed_date = datetime.datetime.strptime(lineaftersplit, '%b %d %H:%M:%S %Y %Z')
>>> parsed_date
datetime.datetime(2012, 1, 31, 0, 57, 7)
>>> parsed_date.year
2012
You can use str.rsplit:
>>> strs = 'Jan 31 00:57:07 2012 GMT'
str.rstrip will return a list like this:
>>> strs.rsplit(None,2)
['Jan 31 00:57:07', '2012', 'GMT']
Now we need the second item:
>>> year = strs.rsplit(None,2)[1]
>>> year
'2012'
>>> if 2010 <= int(year) <= 2013: #apply int() to get the integer value
... #do something
...
Try this:
st="Jan 31 00:57:07 2012 GMT".split()
year=int(st[3])
This actually works if the string is always of this format
str='Jan 31 00:57:07 2012 GMT'
str.split()[3]

Categories