Restricting RSS elements by date with feedparser. [Python] - python

I iterate a RSS feed like so where _file is the feed
d = feedparser.parse(_file)
for element in d.entries:
print repr(element.date)
The date output comes out like so
u'Thu, 16 Jul 2009 15:18:22 EDT'
I cant seem to understand how to actually quantify the above date output so I can use it to limit feed elements. I So what I am asking is how can I get a actual time out of this, so I can say if greater then 7 days old, skip this element.

feedparser is supposed to give you a struct_time object from Python's time module. I'm guessing it doesn't recognize that date format and so is giving you the raw string.
See here on how to add support for parsing malformed timestamps:
http://pythonhosted.org/feedparser/date-parsing.html
If you manage to get it to give you the struct_time, you can read more about that here:
http://docs.python.org/library/time.html#time.struct_time
struct_time objects have everything you need. They have these members:
time.struct_time(tm_year=2010, tm_mon=2, tm_mday=4, tm_hour=23, tm_min=44, tm_sec=19, tm_wday=3, tm_yday=35, tm_isdst=0)
I generally convert the structs to seconds, like this:
import time
import calendar
struct = time.localtime()
seconds = calendar.timegm(struct)
Then you can just do regular math to see how many seconds have elapsed, or use the datetime module to do timedeltas.

one way
>>> import time
>>> t=time.strptime("Thu, 16 Jul 2009 15:18:22 EDT","%a, %d %b %Y %H:%M:%S %Z")
>>> sevendays=86400*7
>>> current=time.strftime ("%s",time.localtime())
>>> if int(current) - time.mktime(t) > sevendays:
print "more than 7 days"
you can also see the datetime module and make use of timedelta() for date calculations.

If you install the dateutil module:
import dateutil.parser as dp
import dateutil.tz as dtz
import datetime
date_string=u'Thu, 16 Jul 2009 15:18:22 EDT'
adatetime=dp.parse(date_string)
print(adatetime)
# 2009-07-16 15:18:22-04:00
now=datetime.datetime.now(dtz.tzlocal())
print(now)
# 2010-02-04 23:35:52.428766-05:00
aweekago=now-datetime.timedelta(days=7)
print(aweekago)
# 2010-01-28 23:35:52.428766-05:00
if adatetime<aweekago:
print('old news')
If you are using Ubuntu, dateutil is provided by the python-dateutil package.

Related

Parse Date string to datetime with timezone

I have a string:
r = 'Thu Dec 17 08:56:41 CST 2020'
Here CST represent China central time('Asia/Shanghai'). I wanted to parse it to datetime...I am doing something like
from dateparser import parse
r1 = parse(r)
Which is giving me r1 as:
2020-12-17 08:56:41-06:00
And I am also doing this
r2 = r1.replace(tzinfo=pytz.timezone("Asia/Shanghai"))
And this is giving me r2 as:
2020-12-17 08:50:41+08:00
There is 6 min lag in r2 can someone tell me why is that? And how to correctly transfer my raw string r1 to desired r2 which is:
2020-12-17 08:56:41 in Asia/Shanghai timezone
Thanks
Using dateutil.parser you can directly parse your date correctly.
Note that CST is an ambiguous timezone, so you need to specify which one you mean. You can either do this directly in the tzinfos parameter of the parse() call or you can define a dictionary that has mappings for timezones and pass this. In this dict, you can either specify the offset, e.g.
timezone_info = {
"CDT": -5 * 3600,
"CEST": 2 * 3600,
"CST": 8 * 3600
}
parser.parse(r, tzinfos=timezone_info)
or (using gettz) directly specify a timezone:
timezone_info = {
"CDT": gettz("America/Chicago"),
"CEST": gettz("Europe/Berlin"),
"CST": gettz("Asia/Shanghai")
}
parser.parse(r, tzinfos=timezone_info)
See also the dateutil.parser documentation and the answers to this SO question.
Be aware that the latter approach is tricky if you have a location with daylight saving time! Depending on the date you apply it to, gettz("America/Chicago") will have UTC-5 or UTC-6 as a result (as Chicago switches between Central Standard Time and Central Daylight Time). So depending on your input data, the second example may actually not really be correct and yield the wrong outcome! Currently, China observes China Standard Time (CST) all year, so for your use case it makes no difference (may depend on your date range though).
Overall:
from dateutil import parser
from dateutil.tz import gettz
timezone_info = {"CST": gettz("Asia/Shanghai")}
r = 'Thu Dec 17 08:56:41 CST 2020'
d = parser.parse(r, tzinfos=timezone_info)
print(d)
print(d.strftime('%Y-%m-%d %H:%M:%S %Z%z'))
gets you
2020-12-17 08:56:41+08:00
2020-12-17 08:56:41 CST+0800
EDIT: Printing the human readable timezone name instead of the abbreviated one name is just a little more complicated with this approach, as dateutil.tz.gettz() gets you a tzfile that has no attribute which has just the name. However, you can obtain it via the protected _filename using split():
print(d.strftime('%Y-%m-%d %H:%M:%S') + " in " + "/".join(d.tzinfo._filename.split('/')[-2:]))
yields
2020-12-17 08:56:41+08:00 in Asia/Shanghai
This of course only works if you used gettz() to set the timezone in the first place.
EDIT 2: If you know that all your dates are in CST anyway, you can also ignore the timezone when parsing. This gets you naive (or unanware) datetimes which you can then later add a human readable timezone to. You can do this using replace() and specify the timezone either as shown above using gettz() or using timezone(() from the pytz module:
from dateutil import parser
from dateutil.tz import gettz
import pytz
r = 'Thu Dec 17 08:56:41 CST 2020'
d = parser.parse(r, ignoretz=True)
d_dateutil = d.replace(tzinfo=gettz('Asia/Shanghai'))
d_pytz = d.replace(tzinfo=pytz.timezone('Asia/Shanghai'))
Note that depending on which module you use to add the timezone information, the class of tzinfo differs. For the pytz object, there is a more direct way of accessing the timezone in human readable form:
print(type(d_dateutil.tzinfo))
print("/".join(d_dateutil.tzinfo._filename.split('/')[-2:]))
print(type(d_pytz.tzinfo))
print(d_pytz.tzinfo.zone)
produces
<class 'dateutil.tz.tz.tzfile'>
Asia/Shanghai
<class 'pytz.tzfile.Asia/Shanghai'>
Asia/Shanghai
from datetime import datetime
import pytz
# The datetime string you have
r = "Thu Dec 17 08:56:41 CST 2020"
# The time-zone string you want to use
offset_string = 'Asia/Shanghai'
# convert the time zone string into offset from UTC
# a. datetime.now(pytz.timezone(offset_string)).utcoffset().total_seconds() --- returns seconds offset from UTC
# b. convert seconds into hours (decimal) --- divide by 60 twice
# c. remove the decimal point, we want the structure as: +0800
offset_num_repr = '+{:05.2f}'.format(datetime.now(pytz.timezone(offset_string)).utcoffset().total_seconds()/60/60).replace('.', '')
print('Numeric representation of the offset: ', offset_num_repr)
# replace the CST 2020 with numeric timezone offset
# a. replace it with the offset computed above
updated_datetime = str(r).replace('CST', offset_num_repr)
print('\t Modified datetime string: ', updated_datetime)
# Now parse your string into datetime object
r = datetime.strptime(updated_datetime, "%a %b %d %H:%M:%S %z %Y")
print('\tFinal parsed datetime object: ', r)
Should produce:
Numeric representation of the offset: +0800
Modified datetime string: Thu Dec 17 08:56:41 +0800 2020
Final parsed datetime object: 2020-12-17 08:56:41+08:00

Extract each year, month, day, year from getctime , getmtime in Python

I want to extract the year month day hours min eachly from below value.
import os, time, os.path, datetime
date_of_created = time.ctime(os.path.getctime(folderName))
date_of_modi = time.ctime(os.path.getmtime(folderName))
Now I only can get like below
'Thu Dec 26 19:21:37 2019'
but I want to get the the value separtly
2019 // Dec(Could i get this as int??) // 26
each
I want to extract each year month day each time min value from date_of_created and date_of_modi
Could i get it? in python?
You can convert the string to a datetime object:
from datetime import datetime
date_of_created = datetime.strptime(time.ctime(os.path.getctime(folderName)), "%a %b %d %H:%M:%S %Y") # Convert string to date format
print("Date created year: {} , month: {} , day: {}".format(str(date_of_created.year),str(date_of_created.month),str(date_of_created.day)))
The time.ctime function returns the local time in string form. You might want to use the time.localtime function, which returns a struct_time object which contains the information you are looking for. As example,
import os, time
date_created_string = time.ctime(os.path.getctime('/home/b-fg/Downloads'))
date_created_obj = time.localtime(os.path.getctime('/home/b-fg/Downloads'))
print(date_created_string) # Mon Feb 10 09:41:03 2020
print('Year: {:4d}'.format(date_created_obj.tm_year)) # Year: 2020
print('Month: {:2d}'.format(date_created_obj.tm_mon)) # Month: 2
print('Day: {:2d}'.format(date_created_obj.tm_mday)) # Day: 10
Note that these are integer values, as requested.
time.ctime([secs])
Convert a time expressed in seconds since the epoch to a string of a form: 'Sun Jun 20 23:21:05 1993' representing local time.
If that's not what you want... use something else? time.getmtime will return a struct_time which should have the relevant fields, or for a more modern interface use datetime.datetime.fromtimestamp which... returns a datetime object from a UNIX timestamp.
Furthermore, using stat would probably more efficient as it ctime and mtime will probably perform a stat call each internally.
You can use the datetime module, more specifically the fromtimestamp() function from the datetime module to get what you expect.
import os, time, os.path, datetime
date_of_created = datetime.datetime.fromtimestamp(os.path.getctime(my_repo))
date_of_modi = datetime.datetime.fromtimestamp(os.path.getmtime(my_repo))
print(date_of_created.strftime("%Y"))
Output will be 2020 for a repo created in 2020.
All formats are available at this link

How do I modify the format of a date string? (Python/Excel)

what is the best method in Python to convert a string to a given format? My problem is that I have scraped dates that have the following format: Dec 13, 2019 6:01 am
Ideally I want to analyse the scraped data in excel, but unfortunately Excel can not read this date format.
Do you think it is best to do that in Python or in Excel?
Thanks
You can definetely do this with Python using either standard library, or dateparser package.
>>> import dateparser
>>> dateparser.parse('Dec 13, 2019 6:01 am')
datetime.datetime(2019, 12, 13, 6, 1)
Or directly to ISO format:
>>> dateparser.parse('Dec 13, 2019 6:01 am').isoformat()
'2019-12-13T06:01:00'
Another thing to look out for when working with time programmatically is time zone - it's where bugs are very likely to appear. There's a very sweet package for working with datetime data in python called pendulum, I cannot stress enough how convenient it is. And it's API is completely compatible with python's standard library datetime. So you can just do import pendulum as dt instead of import datetime as dt and it will work.
It also has a great parser tool with support for time zones:
>>> import pendulum
>>> dt = pendulum.parse('1975-05-21T22:00:00')
>>> print(dt)
'1975-05-21T22:00:00+00:00
# You can pass a tz keyword to specify the timezone
>>> dt = pendulum.parse('1975-05-21T22:00:00', tz='Europe/Paris')
>>> print(dt)
'1975-05-21T22:00:00+01:00'
# Not ISO 8601 compliant but common
>>> dt = pendulum.parse('1975-05-21 22:00:00')
By passing the tz keyword argument you can parse and specify time zone at the same time.
You can use strptime()
to convert string to a datetime format.
>>> utc_time = datetime.strptime("Dec 13, 2019 6:01 am", "%b %d, %Y %I:%M %p")
>>> utc_time.strftime("%d-%m-%Y %R")
'13-12-2019 06:01'
you can use pythons inbuilt datetime library.
check this: https://docs.python.org/3.6/library/datetime.html

Can't make sense of date conversion to UTC

I have a string containing a date and time in UTC (not part of the string, but I know that it's UTC). So I create an aware datetime object using the following code:
>>> import datetime
>>> import pytz
>>> mystr = '01/09/2018 00:15:00'
>>> start_time = pytz.utc.localize(datetime.datetime.strptime(mystr, '%d/%m/%Y %H:%M:%S'))
>>> start_time
datetime.datetime(2018, 9, 1, 0, 15, tzinfo=<UTC>)
>>> str(start_time)
'2018-09-01 00:15:00+00:00'
>>> start_time.strftime('%s')
'1535757300'
All seems fine now but if I do in the shell:
$ TZ=UTC date -d #1535757300
Fri Aug 31 23:15:00 UTC 2018
Shouldn't I be getting Sat Sep 1 00:15:00 UTC 2018 instead (ie, the same date I started with)?
You need to use the following:
int((start_time - datetime.datetime(1970,1,1)).total_seconds())
Because you are using strftime, it's referencing your systems time which is probably in your local timezone. It's more reliable to just calculate it yourself as above.
EDIT:
As the follow-up post by OP states, it's easier to just use start_time.timestamp() but that works only in Python 3.3* and up.
After some more tries, it looks like %s is not supported among the format specifiers listed in the official documentation at https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior. Instead, I get the correct result by using the timestamp() method:
>>> start_time.timestamp()
1535760900.0

How to print the date and time in Python from string?

Should be quick, but I can't for the life of me figure this one out.
I'm given the following strings:
201408110000
201408120001
201408130002
Which I loaded as a date(?) time object via the following:
dt = time.strptime(datestring, '%Y%m%d%H%M%S')
Where datestring is the string.
From there, how do I output the following:
11-Aug-14
12-Aug-14
13-Aug-14
I tried str(dt) but all it gave me was this weird thing:
time.struct_time(tm_year=2014, tm_mon=8, tm_mday=11, tm_hour=12,
tm_min=1, tm_sec=5, tm_wday=0, tm_yday=223, tm_isdst=-1)
What am I doing wrong? Anything I add so far to dt gives me attribute does not exist or something.
Using strftime
>> dt = time.strftime('%d-%b-%Y', dt)
>> print dt
11-Aug-2014
When you use time module it return a time struct type. Using datetime returns a datetime type.
from datetime import datetime
datestring = '201408110000'
dt = datetime.strptime(datestring, '%Y%m%d%H%M%S')
print dt
2014-08-11 00:00:00
print dt.strftime("%d-%b-%Y")
11-Aug-2014
print dt.strftime("%d-%b-%y")
11-Aug-14
from datetime import datetime
datestring = "201408110000"
print datetime.strptime(datestring, '%Y%m%d%H%M%S').strftime("%d-%b-%Y")
11-Aug-2014
If you're doing a lot of parsing of variety of input time formats, consider also installing and using the dateutil package. The parser module will eat a variety of time formats without specification, such as this concise "one-liner":
from dateutil import parser
datestring = '201408110000'
print parser.parse(datestring).strftime("%d-%b-%Y")
This uses parser to eat the datestring into a datetime, which is then reformatted using datetime.strftime(), not to be confused with time.strftime(), used in a different answer.
See more at https://pypi.python.org/pypi/python-dateutil or the python-dateutil tag.

Categories