Incorrect conversion from epoch time when scraping web - python

import praw,time
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
username=""
password=""
r = praw.Reddit(user_agent='')
r.login(username,password,disable_warning=True)
posts=r.search('china disaster', subreddit=None, sort=None, syntax=None, period=None,limit=7)
title=[];created=[]
for index,post in enumerate(posts):
date=time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(post.created))
title.append(post.title);created.append(post.created)
print date,title[index]
break #added so it prints one post as a example
Error:
I get incorrect times.
<time title="Fri Jan 23 01:22:20 2015 UTC" datetime="2015-01-22T17:22:20-08:00" class="">5 months ago</time>
I don't understand the issue, I think I making a mistake in time-zone conversion. But reddit posts mention UTC, thus I don't get the error.

I didn't get your exact problem about how is the "Incorrect".
There are two attributes about the created time "created" and "created_utc". Maybe you want to try the second one instead.

Related

Logging message inside except exception

I am trying to use logging when try fails. I have a for loop for converting a string of date into datetime format.
For example, converting "03/05/2021" to 2021-05-03. However, there are typoed dates such as 03/052021. If the loop encounters such typoed date, I want it to create a log.
for id in range(1,items):
try:
dt_bd_lists.append(datetime.strptime(bd_lists[i+1], '%d/%m/%Y'))
#print(dt_bd_lists[id])
except:
dt_bd_lists.append(bd_lists[id+1])
#LOG_FILENAME = 'error_log'
#logging.basicConfig(
#filename=LOG_FILENAME,
#level=logging.ERROR
#)
#logging.error('Error processing line %(lineno)d for ID %d', id)
For logging message, I want to create, "Error processing (line number) for (ID)."
Unfortunately, I am getting logging error and am stuck. What would solve this issue?
This will not help you with the logging, I would need the error produced by the logging for that, but maybe you can "clean" your data.
By replacing all slashes / with empty strings "" you can ignore those typos. Then you need to adjust the format of the date to "%d%m%Y" and you are good to go (just remove the slashes).
from datetime import datetime
date_strings = ["03/05/2021", "03/052021"]
for date_string in date_strings:
# replace / with empty string
date_string = date_string.replace("/", "")
date_time_obj = datetime.strptime(date_string, '%d%m%Y')
print(date_time_obj)

How do I get website to display local time?

If I run this in VS Code, it gives me the correct local time:
from time import strftime
time = strftime("%H:%M %p")
print(time)
But if I use the same code in a Django project, my website displays a time which is 8 hours behind local time.
from time import strftime,localtime
def time_display(request):
context = {
"date":strftime("%b %d, %Y",localtime()), #DOESN'T DISPLAY LOCAL TIME
"time1": strftime("%H:%M %p",localtime()), #DOESN'T DISPLAY LOCAL TIME
"time3": strftime("%H:%M %p"), #DOESN'T DISPLAY LOCAL TIME
}
return render(request,'time_display.html',context)
How do I fix it to display correct local time where the webpage is being viewed?
Not an expert but you can change setting.py file like:
TIME_ZONE = 'Europe/London'
note: change Europe/London to your time zone.
You can find a list of timezone names here: http://en.wikipedia.org/wiki/List_of_tz_database_time_zones

Format time output within template using Pelican

I am using this package: https://pypi.python.org/pypi/pelican-next-meetup/0.1.1
I have successfully installed the package and I can fetch data from meetup.com
Here is a snippet of my index.html file:
<h2 class="text-center">
Link to meetup
<p>{{ next_meetup.time }}</p>
</h2>
The problem I have is that the time displays like this: 1471449600000
That timestamp = 17 August 2016 6.00PM (roughly)
I have managed to convert that time to a datetime in the following file called timecon.py:
import datetime
import time
timestamp = 1471449600000
date_display = datetime.datetime.fromtimestamp(timestamp/1000)
However, trying to make this file work (or even run on its own) doesn't work.
(The file won't run if I make timestamp = next_meetup.time)
How can I convert the next_meetup.time into a datetime so that I can display a formatted date and time of the next meetup on my Pelican blog?
EDIT
I tried to integrate the functionality of timecon.py into my pelicanconf.py file using an import like this:
# some other code
from name_meetup
# some other code
#
DATE_DISPLAY = name_meetup.time
But that doesn't even work, cause I get this error:
from next_meetup
^
SyntaxError: invalid syntax
EDIT 2
I also tried all of these and none worked:
from next_meetup import get_next_meetup
from next_meetup import next_meetup

Does django mess up with python's datetime?

I have a file that I check it's creation time using ctime. Here is the snippet of code (not complete, just the important part):
import time
import pytz
import os
from datetime import datetime
myfile = SOMEWHERE
myfile_ctime = os.path.getctime(myfile)
d = datetime.strptime(time.ctime(myfile_ctime), '%a %b %d %H:%M:%S %Y')
# d here is Tue Mar 25 00:33:40 2014 for example
ny = pytz.timezone("America/New_York")
d_ny = ny.localize(d)
mytz = pytz.timezone(MY_TZ_WHATEVER)
myd = d_ny.astimezone(mytz)
final_date = myd.strftime('%Y-%m-%d %H:%M:%S')
print(final_date + "some string")
# is now 2014-03-25 01:33:40some string, correctly with the timezone.
When this is run as a simple python script, everything is ok. But when I run the same code inside a function in a templatetags/myfile.py that renders to a template in a Django App, when trying to get the date from time.ctime(myfile_ctime), then I get Tue Mar 25 04:33:40 instead of Tue Mar 25 00:33:40 from the snippet above (the code is the same in the standalone script and in Django - and I concatenate the date with another string).
My question is: I'm using just Python standard libraries, same snippet of code in both places, reading the same file in the same environment. Why the difference? Do settings in settings.py mangles up something in the standard libraries? Just being in a Django environment it changes how standard libraries should work? Why when calling standalone everything works as it should?
(I'm behind apache, don't know if this is relevant)
Make sure of the Time Zone settings in settings.py, for more info about Django Time Zone Settings, check this page: https://docs.djangoproject.com/en/1.6/ref/settings/#time-zone
In ./django/conf/__init__.py:126:, TZ environment variable is set based on settings.py.
os.environ['TZ'] = self.TIME_ZONE
My TIME_ZONE is UTC.
That's why a standalone script result is different from a snippet inside Django: when running standalone, this environment variable TZisn't set.
Now, when creating a datetime object from a myfile_ctime, I just need to add tzinfo from my server (/etc/sysconfig/clock). My code now looks like this:
import time
import pytz
import os
from datetime import datetime
myfile = SOMEWHERE
myfile_ctime = os.path.getctime(myfile)
ny = pytz.timezone("America/New_York")
d = datetime.fromtimestamp(myfile_ctime, tz=ny)
mytz = pytz.timezone(MY_TZ_WHATEVER)
myd = d.astimezone(mytz)
final_date = myd.strftime('%Y-%m-%d %H:%M:%S')
I hope this is useful to someone. As always, read the source. :)

header If-Modified-Since does not give 304 code

I am using below code to save an html file with a time stamp in its name:
import contextlib
import datetime
import urllib2
import lxml.html
import os
import os.path
timestamp=''
filename=''
for dirs, subdirs, files in os.walk("/home/test/Desktop/"):
for f in files:
if "_timestampedfile.html" in f.lower():
timestamp=f.split('_')[0]
filename=f
break
if timestamp is '':
timestamp=datetime.datetime.now()
with contextlib.closing(urllib2.urlopen(urllib2.Request(
"http://www.google.com",
headers={"If-Modified-Since": timestamp}))) as u:
if u.getcode() != 304:
myfile="/home/test/Desktop/"+str(datetime.datetime.now())+"_timestampedfile.html"
file(myfile, "w").write(urllib2.urlopen("http://www.google.com").read())
if os.path.isfile("/home/test/Desktop/"+filename):
os.remove("/home/test/Desktop/"+filename)
html = lxml.html.parse(myfile)
else:
html = lxml.html.parse("/home/test/Desktop/"+timestamp+"_timestampedfile.html")
links=html.xpath("//a/#href")
print u.getcode()
When I run this code every time I get the code 200 from If-Modified-since header. Where am I doing mistake? My goal here is to save and use an html file and if it is modified after last time it is accessed, html file should be overwritten.
The problem is that If-Modified-Since is supposed to be a formatted date string:
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
but you're passing in a datetime tuple.
Try something like this:
timestamp = time.time()
...
time.strftime('%a, %d %b %Y %H:%M:%S GMT', time.gmtime(timestamp))
The second reason your code isn't working as you expect:
http://www.google.com/ does not seem to honor If-modified-since. That's allowed per the RFC, and they may have various reasons for choosing that behavior.
c) If the variant has not been modified since a valid If-
Modified-Since date, the server SHOULD return a 304 (Not
Modified) response.
If you try http://www.stackoverflow.com/, for example, you'll see a 304. (I just tried it.)

Categories