I have download an RSS file and saved as city.txt.
Then I have to grab the date from the <lastBuildDate> tag.
The date is in the format: Fri,28 Aug 2020 and then I have to translate the day and month all using RegEx.
I have managed to get the date but I have problem changing the date and month after I have found it.
Do I have to use re.sub?
My code:
import re
with open('city.txt', 'r', encoding = 'utf-8') as f:
txt = f.read()
tag_pattern =r'<''lastBuildDate'r'\b[^>]*>(.*?)</''lastBuildDate'r'>'
found = re.findall(tag_pattern, txt, re.I)
found = list(set(found))
for f in found :print('\t\t', f)
I have updated your code based on your requirements, please give it a try.
Code
import re
import locale
import datetime
with open('city.txt', 'r', encoding = 'utf-8') as f:
txt = f.read()
tag_pattern =r'<''lastBuildDate'r'\b[^>]*>(.*?)</''lastBuildDate'r'>'
found = re.findall(tag_pattern, txt, re.I)
found = list(set(found))
for f in found :
locale.setlocale(locale.LC_TIME, "en")
temp=datetime.datetime.strptime(f, '%a, %d %b %Y %H:%M:%S GMT')
locale.setlocale(locale.LC_TIME, "el-GR")
print(temp.strftime("%a, %d %b %Y %H:%M:%S"))
Sample input
<lastBuildDate>Fri, 28 Jan 2020 13:32:12 GMT</lastBuildDate>
<lastBuildDate>Sun, 27 Feb 2020 15:36:53 GMT</lastBuildDate>
<lastBuildDate>Mon, 26 Aug 2020 16:30:43 GMT</lastBuildDate>
Ouput
Ôåô, 26 Áõã 2020 16:30:43
Ðåì, 27 Öåâ 2020 15:36:53
Ôñé, 28 Éáí 2020 13:32:12
Despite it is really not recommended to parse XML content with regexes, your question is actually about date translations.
One approach is to parse the XML content of your RSS file retrieve the text value of the node <lastBuildDate>, then you can parse it and get the value as a datetime object. with datetime.strptime() of datetime package.
The sample below shows of you how to get a datetime objet from a string:
import datetime
# date_time_str contains the date string as formatted in your RSS
date_time_str = 'Fri,28 Aug 2020'
# date_time_obj contains the parsed value (formatted as '%a,%d %b %Y')
date_time_obj = datetime.datetime.strptime(date_time_str, '%a,%d %b %Y')
Then you just have to retrieve the wanted datetime elements as integer. You can display those values in the current locale with the calendar module if it matches your language. Otherwise, a bit more tricky, you can play with TimeEncoding and month_name. (Of course you can write your own translation system.)
You can use locale in python to display date in Greek or any local language.
Please refer below code, and refer this windows documentation for more locale string options.
import datetime
import locale
input = 'Fri, 28 Aug 2020 17:36:59 GMT'
date_parsed = datetime.datetime.strptime(input, '%a, %d %b %Y %H:%M:%S GMT')
locale.setlocale(locale.LC_TIME, "el-CY")
print(date_parsed.strftime("%a, %d %b %Y %H:%M:%S"))
prints
Ðáñ, 28 Áýã 2020 17:36:59
Related
Im trying to convert a string to datetime and keep getting the error: ValueError: time data 'Mon, 22 Apr 2019 17:04:38 +0200 (CEST)' does not match format '%a, %d %b %Y %H:%M:%S %z %Z'
from datetime import datetime
s = "Mon, 22 Apr 2019 17:04:38 +0200 (CEST)"
d = datetime.strptime(s, '%a, %d %b %Y %H:%M:%S %z %Z')
What am i missing?
%Z is generally used for converting into string format. In any case, it is the offset, not the name of the time zone.
The rest of your code is valid, however:
s = "Mon, 22 Apr 2019 17:04:38 +0200"
d = datetime.strptime(s, '%a, %d %b %Y %H:%M:%S %z')
datetime only comes with the ability to parse UTC and whatever local time zone is listed in time.tzname. It can't match (CEST) because it doesn't know what timezone that is (It would also be redundant because you defined the timezone using the offset +0200).
You will need to implement your own (CEST) using datetime.tzinfo or by importing an external library like pytz or pendulum in order to parse (CEST) from a string into a datetime.timezone.
Also, don't forget to include parenthesis() in your match string.
This code passes, however, I do not know what happens to 'CEST' once it is converted into the string.
from datetime import datetime
tz = 'CEST'
s = "Mon, 22 Apr 2019 17:04:38 +0200 " + tz
d = datetime.strptime(s, '%a, %d %b %Y %H:%M:%S %z ' + tz)
When I pull events start times from Facebook Graph in comes in this form:
2017-09-26T18:00:00+0300
I'd like to convert it into readable format so I use this:
readable_event_date = dateutil.parser.parse(event_date).strftime('%a, %b %d %Y %H:%M:%S')
and it comes out like this:
Tue, 26 Sep 2017 18:00:00
Which is good but it loses the offset from UTC and I'd like it in AM PM format.
Thus, I would like it like this:
Tue, 26 Sep 2017 9:00 PM
To get into 12 hours format and keep offset from UTC for printing :
from dateutil.parser import parse
event_date = '2017-09-26T18:00:0+0300'
date = parse(event_date)
offset = date.tzinfo._offset
readable_event_date = (date + offset).strftime('%a, %b %d %Y %I:%M:%S %p')
print(readable_event_date)
Output:
'Tue, Sep 26 2017 09:00:00 PM'
It seems like what you want is this time, expressed in UTC, in the format '%a, %b %d %Y %I:%M:%S %p'. Luckily, all the information you need to do this is contained in the datetime object that you parsed, you just need to convert to UTC
Python 2.6+ or Python 3.3+:
The approach you've taken using dateutil will work for Python 2.6+ or Python 3.3.+ (and also works for a greater variety of datetime string formats):
from dateutil.parser import parse
# In Python 2.7, you need to use another one
from dateutil.tz import tzutc
UTC = tzutc()
dt_str = '2017-09-26T18:00:00+0300'
dt = parse(dt_str)
dt_utc = dt.astimezone(UTC) # Convert to UTC
print(dt_utc.strftime('%a, %b %d %Y %I:%M:%S %p'))
# Tue, Sep 26 2017 03:00:00 PM
One thing I notice is that the date you've provided, as far as I can tell, represents 3PM in UTC, not 9PM (as your example states). This is one reason you should use .astimezone(UTC) rather than some other approach.
If you want to include the time zone offset information, you can also use the %z parameter on the non-converted version of the datetime object.
print(dt.strftime('%a, %b %d %Y %I:%M:%S%z %p'))
# Tue, Sep 26 2017 06:00:00+0300 PM
This %z parameter may also be useful even if you are keeping it in UTC, because then you can at least be clear that the date the user is seeing is a UTC date.
Python 3.2+ only:
Given that you know the exact format of the input string, in Python 3.2+, you can achieve this same thing without pulling in dateutil, and it will almost certainly be faster (which may or may not be a concern for you).In your case here is how to rewrite the code so that it works with just the standard library:
from datetime import datetime, timezone
UTC = timezone.utc
dt_str = '2017-09-26T18:00:00+0300'
dt = datetime.strptime(dt_str, '%Y-%m-%dT%H:%M:%S%z')
dt_utc = dt.astimezone(UTC)
print(dt_utc.strftime('%a, %b %d %Y %I:%M:%S %p'))
# Tue, Sep 26 2017 03:00:00 PM
print(dt.strftime('%a, %b %d %Y %I:%M:%S%z %p'))
# Tue, Sep 26 2017 06:00:00+0300 PM
I'm currently trying to convert a file format into a slightly different style to allow easier importing into a program however I can't quite get my head around how to convert datetime strings between formats. The original I have is the following:
2016-12-15 17:26:45
However the required format for the date time is:
Thu Dec 15 17:19:03 2016
Does anyone know if there is an easy way to convert between these? These values are always in the same place and format so it doesn't need to be too dynamic so to speak outside of recognising what a certain day of the month is (if that can be done at all?)
Update - The conversion has worked for 1 date but not the other weirdly :/ The code to grab the two dates is the following:
startDate=startDate.replace("Started : ","")
startDate=startDate.replace(" (ISO format YYYY-MM-DD HH:MM:SS)","")
startDate=startDate.strip()
startDt = datetime.strptime(startDate, '%Y-%m-%d %H:%M:%S')
startDt=startDt.strftime('%a %b %d %H:%M:%S %Y ')
print (startDt)
This part works as inteded and outputs the required format:
"2016-12-15 17:26:45
Thu Dec 15 17:26:45 2016"
The end date part is a bit "ham fisted" so to speak and I'm sure there are better ways to do the re.sub search just to do anything in brackets but I'll edit that later.
endDate=endDate.replace("Ended : ","")
endDate=endDate.strip()
endDate = re.sub("\(.*?\)", "", endDate)
endDate.strip()
endDt = datetime.strptime(endDate, '%Y-%m-%d %H:%M:%S')
endDt=endDt.strftime('%a %b %d %H:%M:%S %Y ')
print (endDt)
This part however despite the outputs being an identical format
"2016-12-15 17:26:45
2016-12-15 21:22:11"
produces the following error:
endDt = datetime.strptime(endDate, '%Y-%m-%d %H:%M:%S')
File "C:\Python27\lib\_strptime.py", line 335, in _strptime
data_string[found.end():])
ValueError: unconverted data remains:
from datetime import datetime
dt = datetime.strptime('2016-06-01 1:33:45', '%Y-%m-%d %H:%M:%S')
dt.strftime('%a %b %d %H:%M:%S %Y ')
>>> 'Wed Jun 01 01:33:45 2016'
It's a pretty easy task with the Datetime module.
As it's been pointed out, checking the docs will get you a lot of useful info, starting from the directives to feed to the strptime and strftime (respectively, parse and format time) functions which you'll need here.
A working example for you case would be:
from datetime import datetime
myDateString = '2016-12-15 17:26:45'
myDateObj = datetime.strptime(myDateString, '%Y-%m-%d %H:%M:%S')
myDateFormat = myDateObj.strftime('%a %b %d %H:%M:%S %Y')
Check out this section of the docs to have a better understanding of the formatting placeholders.
You can use the datetime module:
from datetime import datetime
string = '2016-12-15 17:26:45'
date = datetime.strptime(string, '%Y-%m-%d %H:%M:%S')
date2 = date.strftime("%a %b %d %H:%M:%S %Z %Y")
print(date2)
Output:
Thu Dec 15 17:26:45 2016
I have found a question at this link that almost answers what I need but not quite. What I need to know, how using this method could I convert a string of the format u'Saturday, Feb 27 2016' into a Python date variable in the format 27/02/2016?
Thanks
You have to first remove the weekday name (it's not much use anyway) and parse the rest:
datetime.datetime.strptime('Saturday, Feb 27 2016'.split(', ', 1)[1], '%b %d %Y').date()
Alternatively, use dateutil:
dateutil.parser.parse('Saturday, Feb 27 2016').date()
EDIT
My mistake, you don't need to remove the Weekday (I'd missed it in the list of options):
datetime.datetime.strptime('Saturday, Feb 27 2016', '%A, %b %d %Y').date()
You don't have to remove anything, you can parse it as is and use strftime to get the format you want:
from datetime import datetime
s = u'Saturday, Feb 27 2016'
dt = datetime.strptime(s,"%A, %b %d %Y")
print(dt)
print(dt.strftime("%d/%m/%Y"))
2016-02-27 00:00:00
27/02/2016
%A Locale’s full weekday name.
%b Locale’s abbreviated month name.
%d Day of the month as a decimal number [01,31].
%Y Year with century as a decimal number.
The full listing of directives are here
I am pretty new to regular expressions and it's pretty alien to me. I am parsing an XML feed which produces a date time as follows:
Wed, 23 July 2014 19:25:52 GMT
But I want to split these up so there are as follows:
date = 23/07/2014
time = 19/25/52
Where would I start? I have looked at a couple of other questions on SO and all of them deviate a bit from what I am trying to achieve.
Use datetime.strptime to parse the date from string and then format it using the strftime method of datetime objects:
>>> from datetime import datetime
>>> dt = datetime.strptime("Wed, 23 July 2014 19:25:52 GMT", "%a, %d %B %Y %H:%M:%S %Z")
>>> dt.strftime('%d/%m/%Y')
'23/07/2014'
>>> dt.strftime('%H/%M/%S')
'19/25/52'
But if you're okay with the ISO format you can call date and time methods:
>>> str(dt.date())
'2014-07-23'
>>> str(dt.time())
'19:25:52'