pdf.getDocumentInfo date format - python

I am using pypdf2's function for extracting document info. The results are something like this but I am unable to interpret the creation date format. What are the last few digits representing?
pdf.documentInfo
[Output]: {'/Creator': 'Rave (http://www.nevrona.com/rave)',
'/Producer': 'Nevrona Designs',
'/CreationDate': 'D:20060301072826' }
and at one point I also saw this:
CreationDate': "D:20170920114835+02'00'"
how can I read or convert it into a normal date time readable format?

you can clean & parse like
from datetime import datetime
CreationDate = "D:20170920114835+02'00'"
dt = datetime.strptime(CreationDate.replace("'", ""), "D:%Y%m%d%H%M%S%z")
# UTC offset is set correctly:
print(dt)
# 2017-09-20 11:48:35+02:00
print(repr(dt))
# datetime.datetime(2017, 9, 20, 11, 48, 35, tzinfo=datetime.timezone(datetime.timedelta(seconds=7200)))
...which I think is more straight forward than the answer to this related question shows.

Related

Adding days to a ISO 8601 format date in Python

I need to add +3 hours to a date in iso 8601 format in python, for example "2022-09-21T22:31:59Z" due to time difference. In this time information that is returned to me from the API, I only need Y/M/D information, but due to the +3 hour difference, the day information needs to go one step further in the date information, as will be experienced in the date I conveyed in the example. How can I overcome this problem? I think the format of the date format is ISO 8601 but can you correct me if I am wrong?
ex. api response;
"createdDateTime": "2022-09-21T22:31:59Z"
what i need;
"createdDateTime": "2022-09-21T22:31:59Z" to "2022-09-22T01:31:59Z"
Try this code it will definitely work:
from datetime import datetime,timedelta
parsed_date=datetime.strptime("2022-09-21T22:31:59Z", "%Y-%m-%dT%H:%M:%SZ")
Updated_date = parsed_date+ timedelta(hours=3)
print(Updated_date)
If you have a proper JSON string you can parse it with json, extract the string value, parse that with datetime.fromisoformat into a datetime value and then get the date from it :
import json
from datetime import datetime
data=json.loads('{"createdDateTime": "2022-09-21T22:31:59+00:00"}')
isostr=data['createdDateTime'].replace('Z','+00:00')
fulldate=datetime.fromisoformat(isostr)
fulldate.date()
-----
datetime.date(2022, 9, 21)
The replacement is necessary because fromisoformat doesn't understand Z
Adding 3 hours to fulldate will return 1 AM in the next day:
fulldate + timedelta(hours=3)
------
datetime.datetime(2022, 9, 22, 1, 31, 59, tzinfo=datetime.timezone.utc)
fulldate is in UTC. It can be converted to another timezone offset using astimezone
fulldate.astimezone(tz=timezone(timedelta(hours=3)))
---
datetime.datetime(2022, 9, 22, 1, 31, 59, tzinfo=datetime.timezone(datetime.timedelta(seconds=10800)))
Or in a more readable form:
fulldate.astimezone(tz=timezone(timedelta(hours=3))).isoformat()
---------------------------
'2022-09-22T01:31:59+03:00'
This is 1AM in the next day but with a +3:00 offset. This is still the same time as 22PM at UTC.
It's also possible to just replace the UTC offset with another one, without changing the time values, using replace:
fulldate.replace(tzinfo=timezone(timedelta(hours=3))).isoformat()
----------------------------
'2022-09-21T22:31:59+03:00'
That's the original time with a different offset. That's no longer the same time as 2022-09-21T22:31:59Z

format error in python string to time conversion~

Currently, test 32 is the str type.
I would like to express the str type of test32 in time data. yyyy-mm-dd hh:mm:ss method...
However, time data '22/03/0823:33:55.256' does not match format '%Y%m%d%H%M%S' error occurs. Is there any way?
for z in data:
if 'D:\System\iUTILITY\Tool\Curver\ToolBox\STG\i02-K01_S1_CEC_Update_Monintor_Analysis.xpsp' in z:
test30 = z.split(' ')[0:2]
test31 = ''.join(test30)
test32 = test31.split(',')[0]
print(type(test32))
test33 = datetime.datetime.strptime(test32, '%Y%m%d%H%M%S')
print(test33)
# print(test32)
To parse the date 22/03/0823:33:55.256, you need to use format %y/%m/%d%H:%M:%S.%f, like this:
datetime.datetime.strptime('22/03/0823:33:55.256', '%y/%m/%d%H:%M:%S.%f')
# Ouptut: datetime.datetime(2022, 3, 8, 23, 33, 55, 256000)
According to the document, %y stands for the year without century, and %f stands for the microsecond. Here's the document:
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

Removing time stamp when converting date format with dateparser in scrapy

I am using dateparser in scrapy to convert the date format.
Original date format: Apr 16, 2019
After using dateparser: 2019-04-16 00:00:00
This is what I wanted to achieve. However, I would still like to remove the time from the date format, so in the end, I only have 2019-04-16. Unfortunately, I am not able to realize this.
This is my line of code:
import dateparser
...
def parse_site(self, response):
def get_with_xpath(query):
return response.xpath(query).get(default='').strip()
yield {
'date': dateparser.parse(get_with_xpath('//meta[#name="date"]/#content'))
}
As I said, it works. But the time stamp I would like to remove. Any ideas?
Dateparser.parse returns datetime representing parsed date if successful. You can use strftime() function to remove the timestamp as shown below
dateparser.parse('Apr 16, 2019').strftime("%Y-%m-%d")
Methods of this library return all values in datetime format. But afterwards you are free to do with them anything you want. Check this example:
>>> import dateparser
>>> dateparser.parse("Apr 16, 2019")
datetime.datetime(2019, 4, 16, 0, 0)
>>> dateparser.parse("Apr 16, 2019").date()
datetime.date(2019, 4, 16)

Get Unix date (numeric) from string and convert to Python date

I'm trying to extract a unix date from a rather large body of text returned on a url link so Ive used:
link = Open_URL(url)
match=re.compile('"Date":"(.+?)"').findall(link)
But when I print the unix date its a large number rather than a date, I need to convert it to a usable date format, I tried:
datetime.fromtimestamp(int(my)ints)).strftime('%Y-%m-%d %H:%M:%S')
But it wont allow me to convert a link, any ideas?
Thanks in advance
Current code:
link = Open_URL(url)
match=re.compile('"End Date":"(.+?)"').findall(link)
for url in match:
So on
Please help I'm stuck! cannot do anything with the list it returns except print it, which is useless in its current format
Thanks
If your match variable looks like ['1448204858'] when printed then it is a list containing a single string element. datetime.fromtimestamp() requires a float value, so you need to
extract the string from the list,
convert it to a float, and then
convert that to a datetime:
import re
from datetime import datetime
# test data
link = '"Something":"foo","Date":"1448204858","Otherthing":"bar"'
match=re.compile('"Date":"(.+?)"').findall(link)
dt = datetime.fromtimestamp(float(match[0]))
print(repr(dt))
print(dt.strftime('%Y-%m-%d %H:%M:%S'))
Results:
datetime.datetime(2015, 11, 22, 8, 7, 38)
2015-11-22 08:07:38
I think what you mean by a large number is UNIX time stamp also known as time since epoch. It can be easily converted to a datetime object in python as so:
import datetime
a = datetime.datetime.timestamp(datetime.datetime.now())
print(a) # 1448206588.806814
b = datetime.datetime.fromtimestamp(a)
print(b) # datetime.datetime(2015, 11, 22, 21, 7, 8, 661906)
# using strftime on above object
print(b.strftime('%Y-%m-%d %H:%M:%S')) #'2015-11-22 21:07:08'

Convert UTC time to python datetime

I have numerous UTC time stamps in the following format:
2012-04-30T23:08:56+00:00
I want to convert them to python datetime objects but am having trouble.
My code:
for time in data:
pythondata[i]=datetime.strptime(time,"%y-%m-%dT%H:%M:%S+00:00")
I get the following error:
ValueError: time data '2012-03-01T00:05:55+00:00' does not match format '%y-%m-%dT%H:%M:%S+00:00'
It looks like I have the proper format, so why doesn't this work?
Change the year marker in your time format string to %Y:
time = '2012-03-01T00:05:55+00:00'
datetime.strptime(time, "%Y-%m-%dT%H:%M:%S+00:00")
# => datetime.datetime(2012, 3, 1, 0, 5, 55)
See strftime() and strptime() behavior.
I highly recommend python-dateutil library, it allows conversion of multiple datetime formats from raw strings into datetime objects with/without timezone set
>>> from dateutil.parser import parse
>>> parse('2012-04-30T23:08:56+00:00')
datetime.datetime(2012, 4, 30, 23, 8, 56, tzinfo=tzutc())

Categories