Python Feedparser pubdate to one timezone - python

I need to parse RSS feed. I am using feedparser on python. Basically my task is to run script every N seconds and check for last feed. I come up with idea to check is date is younger than 15 seconds each iteration. But there is a problem pubDate has different timezone.
'Published_parsed' I think working not correct because it gives me these:
2020-06-17 05:46:45
-
Wed, 17 Jun 2020 04:46:45 GMT
and this
2020-06-17 11:19:39
-
Wed, 17 Jun 2020 10:19:39 IST
Thus it's not parsed to one timezone. I tried to check it to each timezone using pytz, but there is no IST timezone, what is not good to me.
How can I parse this varioty of dates to one timezone time.
Wed, 17 Jun 2020 13:12:43 IST
Tue, 16 Jun 2020 21:49:32 GMT

Related

Python reading datetime string ( reading T01:00:00-07:00 with python )

Has anyone converted this kind of times before?
2020-10-12T01:00:00-07:00 to 2020-10-12T09:00:00-07:00
equals
Monday, October 12, 2020 at 10:00 AM – 6:00 PM UTC+02
to datetime objects?
2020-10-12T01:00:00-07:00
<--date--> <-time-><zone>
This means 1am on October 12th, 2020, in the time zone 7 hours west of UTC (running through the middle of the US, basically).
It's actually one of the ISO8601 formats, used for date/time data interchange.
I believe the dateutil.parser() library can handle this in Python.

Python Arrow timezone parsing

I'm trying to convert a string to DateTime object, The string:
Saturday 8th of August 2020 07:48:11 AM CDT
I'm using arrow package
arrow.get('Saturday 8th of August 2020 09:23:34 AM CDT', 'dddd Mt[h] of MMMM YYYY HH:mm:ss A ZZZ')
I'm getting the following error
arrow.parser.ParserError: Could not parse timezone expression "CDT"
I couldn't find any way to convert the CDN part into timezone.
Has said in the documentation, some abbreviations are ambiguous. You can use for example America/Chicago instead of CDT

Maximum value of timestamp

I am using Python 3.6.0 on Windows 10 x64.
I just found that in time.ctime(seconds), seconds parameter has an implicit maximum value, which is 32536799999, almost equals to 2^34.92135.
Is that the maximum value?
The error message just says it's an invalid number.
>>> import time
>>> time.ctime(32536799999)
>>> 'Mon Jan 19 15:59:59 3001'
>>> time.ctime(32536799999+1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
I googled and looked on Python documentation, but I didn't find anything about it. And I'm going to check this problem on Ubuntu in my lab.
The time documentation doesn't mention any limits, but the datetime documentation does:
fromtimestamp() may raise OverflowError, if the timestamp is out of the range of values supported by the platform C localtime() or gmtime() functions, and OSError on localtime() or gmtime() failure.
[...]
Naive datetime instances are assumed to represent local time and this method relies on the platform C mktime() function to perform the conversion. Since datetime supports wider range of values than mktime() on many platforms, this method may raise OverflowError for times far in the past or far in the future.
Then we head over to the Windows documentation:
_localtime64, which uses the __time64_t structure, allows dates to be expressed up through 23:59:59, December 31, 3000, coordinated universal time (UTC), whereas _localtime32 represents dates through 23:59:59 January 18, 2038, UTC.
localtime is an inline function which evaluates to _localtime64, and time_t is equivalent to __time64_t. If you need to force the compiler to interpret time_t as the old 32-bit time_t, you can define _USE_32BIT_TIME_T. Doing this will cause localtime to evaluate to _localtime32. This is not recommended because your application may fail after January 18, 2038, and it is not allowed on 64-bit platforms.
All the time-related functions (including ctime) work the same way. So the max date you can reliably convert between timestamps on Windows 10 is 3000-12-31T23:59:59Z.
Trying to get a platform-independent max timestamp is difficult.
I'm using
3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:09:58) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
in a Ubuntu 16.04 VM running on a Windows 10 machine.
I broke apart your ctime call to its components, to investigate but I don't run into the same maximum.
>>> time.asctime(time.localtime(32536799999-1))
'Mon Jan 19 02:59:58 3001'
>>> time.asctime(time.localtime(32536799999+1))
'Mon Jan 19 03:00:00 3001'
>>> time.asctime(time.localtime(32536799999+10))
'Mon Jan 19 03:00:09 3001'
>>> time.asctime(time.localtime(32536799999+10000))
'Mon Jan 19 05:46:39 3001'
>>> time.asctime(time.localtime(32536799999+1000000))
'Fri Jan 30 16:46:39 3001'
>>> time.asctime(time.localtime(32536799999+1000000000))
'Thu Sep 27 05:46:39 3032'
>>> time.ctime(32536799999+1000000000)
'Thu Sep 27 05:46:39 3032'
>>> time.asctime(time.gmtime(32536799999-1))
'Mon Jan 19 07:59:58 3001'
>>> time.asctime(time.gmtime(32536799999+1))
'Mon Jan 19 08:00:00 3001'
>>> time.asctime(time.gmtime(32536799999+1000000000))
'Thu Sep 27 09:46:39 3032'
Either something was fixed from 3.6.0 to 3.6.1, or you have some interesting issue specific to your machine.
I do see the following time related change in 3.6.1:
https://www.python.org/dev/peps/pep-0495/
I wonder if the time you happened to be using happened to fall into a fold or a gap? Could you try adding a little over 1 hour on your system and see if it becomes valid again?
This must be to do with your installation of Python, in version 3.5, I never experience such an error:
>>> time.ctime(32536799999)
'Mon Jan 19 07:59:59 3001'
>>> time.ctime(32536799999+1)
'Mon Jan 19 08:00:00 3001'
>>> time.ctime(32536799999+9999999999999999)
'Thu Feb 13 01:46:38 316890386'
>>> time.ctime(32536799999+99999999999999999)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 75] Value too large for defined data type
and even when I do with a gigantic number, it throws a different error.

This specific str.replace() in Python with BeautifulSoup isn't working

I'm trying to automate a task that occurs roughly monthly, which is adding a hyperlink to a page that looks like:
2013: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2011: Jan Feb Mar ...
Whenever we get a new document for that month, we add the
Jul
tags around it.
So I'm using BeautifulSoup in Python. You can see below that I'm picking out the HTML "p" tag that contains this data and doing a replace() on the first month that it finds (finds Month using the reverse dictionary I created, and the third parameter of replace() indicates to only do the first one it finds).
# Modify link in hr.php:
hrphp = open('\\\\intranet\\websites\\infonet\\hr\\hr.php', 'r').read()
soup = BeautifulSoup(hrphp) # Parsing with BeautifulSoup
Months = {k: v for k,v in enumerate(calendar.month_abbr)} # Creates a reverse dictionary for month abbreviation lookup by month number, ie. "print Months[07]" will print "Jul"
print hrphp+"\n\n\n\n\n" # DEBUGGING: Compare output before
hrphp = hrphp.replace(
str(soup.findAll('p')[4]),
str(soup.findAll('p')[4]).replace(
Months[int(InterlinkDate[1][-5:-3])],
""+Months[int(InterlinkDate[1][-5:-3])]+"",
1),
1
)
print hrphp # DEBUGGING: Compare output after
See how it's a nested replace()? The logic seems to work out fine, but for some reason it doesn't actually change the value. Earlier in the script I do something similar with the Months[] dictionary and str.replace() on a segment of the page, and that works out, although it doesn't have a nested replace() like this nor does it search for a block of text using soup.findAll().
Starting to bang my head around on the desk, any help would be greatly appreciated. Thanks in advance.
What you end up doing with the code str(soup.findAll('p')[4]).replace is just replacing the values that are found in a string representation of the results in soup.findAll('p')[4], which will more than likely differ from the string in hrphp because "Beautiful Soup gives you Unicode" after it parses.
Beautiful Soups documentation holds the answer. Have a look at the Changing Attribute Values section.

Browsers cookie problem

Well,
Opera and Chrome add 2 hours to expiration where i only want 15 minutes to be added. Actually they are both successful at that 15 minutes part but because of some reasons i didn't understand yet, they also add another 2 hours to date.
Here is response header:
Content-Type:text/html
Date:Thu, 28 Apr 2011 15:59:27 GMT
Server:lighttpd/1.4.28
Set-Cookie:SID=2554373e-9144-34af-b9ad-a67b2ccdc8cd; expires=Thu, 28 Apr 2011 16:14:27 GMT; Path=/
Thu, 28 Apr 2011 16:14:27 GMT
Transfer-Encoding:chunked
this is also fine. Exact date that i want. But when i check from browsers cookie list, I see expires=Thu, 28 Apr 2011 18:14:27 GMT.
What can cause that?
Thanks
Edit: Info:
To create cookie I use python. They all depend on server time which is same for all.
And all browsers are tested in same environment.
Edit Code Sample:
def createCookie(self):
expiration = datetime.datetime.now() + datetime.timedelta(hours=0,minutes=15)
self.cookie['SID'] = self.SID
self.cookie['SID']['path'] = "/"
self.cookie['SID']['Expires'] = expiration.strftime("%a, %d %b %Y %H:%M:%S GMT")
As you are not posting the related code to your question it is impossible to say what is causing the issue.
But my nose tells me you are probably mixing timezones in your time delta code.
Here is some info when dealing with timezone aware time and datetime objects in Python:
http://blog.mfabrik.com/2008/06/30/relativity-of-time-shortcomings-in-python-datetime-and-workaround/

Categories