How to extract the date from a paragraph - python

I have large sentence as shown below,
how are you
On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:
-------------------------------------------------------------
NOTE: Please do not remove email address from the \"To\" line of this email when replying. This address is used to capture the email and report it. Please do not remove or change the subject line of this email. The subject line of this email contains information to refer this correspondence back to the originating discrepancy.
I want the date and time specified in the sentence (Tue, Dec 21, 2021 at 1:51 PM).
How to extract that from the sentence?

Use a regular expression to extract the date and time.
import re
text = '''how are you
On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:
...
'''
match = re.search('(Mon|Tue|Wed|Thu|Fri|Sat|Sun).*?(AM|PM)', text)
match_date_and_time = match.group() # Tue, Dec 21, 2021 at 1:51 PM
Use datetime.strptime to parse the date and time.
import datetime
datetime.strptime(match_date_and_time, '%a, %b %d, %Y at %I:%M %p')

The way to go here is to use regular expressions but for simplicity and if the format of the text is always the same, you can get the date string by looking for the line the looks like this On SOME DATE <Someone<someone's email address>> wrote:. Here is an example implementation:
email = """how are you
On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:
-------------------------------------------------------------
NOTE: Please do not remove email address from the \"To\" line of this email when replying. This address is used to capture the email and report it. Please do not remove or change the subject line of this email. The subject line of this email contains information to refer this correspondence back to the originating discrepancy."""
for line in email.splitlines():
if line.startswith("On ") and line.endswith(" wrote:"):
date_string = line[3 : line.index(" <")]
print(f"Found the date: {date_string!r}")
break
else:
print("Could not find the date.")

Very dirty:
string = """how are you \r\n\r\nOn Tue, Dec 21, 2021 at 1:51 PM
<abchttp://localhost> wrote:\r\n\r\n\r\n---------------------------------
----------------------------\r\nNOTE: Please do not remove email address
from the"To" line of this email when replying.This address is used to
capture the email and report it.Please do not remove or change the
subject line of this email.The subject line of this email contains
information to refer this correspondence back to the originating
discrepancy.\r\n"""
string = string.split("\r\n\r\n")
date = ' '.join(string[1].split(' ')[:8])
print(date)

Related

Parsing Dates In Python

I have a list of dates from input like the ones below.
I am working on a project and only want to accept the dates that follow the format April 1, 1990 or the January 13, 2003 (taking in any month) format from user input, and any other date format I do not want to accept. I am struggling on how I would use the replace or find function to obtain these goals? Once I receive that format I want to print out the date in this format 7/19/22. If I have the dates in the right format I used the replace function to replace the space, and comma but how would I take that month and replace it with its numerical value? Sorry for all these questions I am just stuck and have been working on this for a while now.
April 1, 1990
November 2 1995
7/19/22
January 13, 2003
userinput = input("Please enter date")
parsed_date = userinput.replace(" ", "/", 2)
new_parsed_date = parsed_date.replace(',',"")
print(new_parsed_date)
March/1/2019 Here is my output when I parse the date. Is there also any easier way to do this task?
You should take a look at the strptime method of the Python datetime object.
Basically you would write code that looks like this:
>>> from datetime import datetime
>>> datetime.strptime("January 13, 2003", "%B %d, %Y")
datetime.datetime(2003, 1, 13, 0, 0)
Documentation for strptime: https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime
I can't really understand what you're going to do. You said that you only want to accept the certain dates formats from user input and ignore other dates formats.
April 1, 1990 # Accept
January 13, 2003 # Accept
November 2 1995 # Ignore (lack of comma)
7/19/22 # Ignore (use numerical value in month field)
May I just think that you would like to accept the format like January 13, 2003 and print or save them in the format 01/13/2003?
Then you should consider strptime() and strftime() methods for datetime object.
# get the date string from user input
date_input = input("Please Enter Date: ")
input_format = "%B %d, %Y"
output_format = "%m/%d/%Y"
try:
parsered_date = datetime.datetime.strptime(date_input, input_format)
.strftime(output_format)
print(parsered_date)
except ValueError:
print("This is the incorrect date string format.")

How to use RegEx to set up a new alphabet in Python?

I have download an RSS file and saved as city.txt.
Then I have to grab the date from the <lastBuildDate> tag.
The date is in the format: Fri,28 Aug 2020 and then I have to translate the day and month all using RegEx.
I have managed to get the date but I have problem changing the date and month after I have found it.
Do I have to use re.sub?
My code:
import re
with open('city.txt', 'r', encoding = 'utf-8') as f:
txt = f.read()
tag_pattern =r'<''lastBuildDate'r'\b[^>]*>(.*?)</''lastBuildDate'r'>'
found = re.findall(tag_pattern, txt, re.I)
found = list(set(found))
for f in found :print('\t\t', f)
I have updated your code based on your requirements, please give it a try.
Code
import re
import locale
import datetime
with open('city.txt', 'r', encoding = 'utf-8') as f:
txt = f.read()
tag_pattern =r'<''lastBuildDate'r'\b[^>]*>(.*?)</''lastBuildDate'r'>'
found = re.findall(tag_pattern, txt, re.I)
found = list(set(found))
for f in found :
locale.setlocale(locale.LC_TIME, "en")
temp=datetime.datetime.strptime(f, '%a, %d %b %Y %H:%M:%S GMT')
locale.setlocale(locale.LC_TIME, "el-GR")
print(temp.strftime("%a, %d %b %Y %H:%M:%S"))
Sample input
<lastBuildDate>Fri, 28 Jan 2020 13:32:12 GMT</lastBuildDate>
<lastBuildDate>Sun, 27 Feb 2020 15:36:53 GMT</lastBuildDate>
<lastBuildDate>Mon, 26 Aug 2020 16:30:43 GMT</lastBuildDate>
Ouput
Ôåô, 26 Áõã 2020 16:30:43
Ðåì, 27 Öåâ 2020 15:36:53
Ôñé, 28 Éáí 2020 13:32:12
Despite it is really not recommended to parse XML content with regexes, your question is actually about date translations.
One approach is to parse the XML content of your RSS file retrieve the text value of the node <lastBuildDate>, then you can parse it and get the value as a datetime object. with datetime.strptime() of datetime package.
The sample below shows of you how to get a datetime objet from a string:
import datetime
# date_time_str contains the date string as formatted in your RSS
date_time_str = 'Fri,28 Aug 2020'
# date_time_obj contains the parsed value (formatted as '%a,%d %b %Y')
date_time_obj = datetime.datetime.strptime(date_time_str, '%a,%d %b %Y')
Then you just have to retrieve the wanted datetime elements as integer. You can display those values in the current locale with the calendar module if it matches your language. Otherwise, a bit more tricky, you can play with TimeEncoding and month_name. (Of course you can write your own translation system.)
You can use locale in python to display date in Greek or any local language.
Please refer below code, and refer this windows documentation for more locale string options.
import datetime
import locale
input = 'Fri, 28 Aug 2020 17:36:59 GMT'
date_parsed = datetime.datetime.strptime(input, '%a, %d %b %Y %H:%M:%S GMT')
locale.setlocale(locale.LC_TIME, "el-CY")
print(date_parsed.strftime("%a, %d %b %Y %H:%M:%S"))
prints
Ðáñ, 28 Áýã 2020 17:36:59

How to format a timestamp in python to a readable format?

I have a slack bot that gets schedule information and prints the start and end times of the user's schedule. I am trying to format the timestamp so that it is a more readable format. Is there a way to format this response?
Here is an example of the output:
co = get_co(command.split(' ')[1])
start = get_schedule(co['group'])['schedule']['schedule_layers'][0]['start']
end = get_schedule(co['group'])['schedule']['schedule_layers'][0]['end']
response = 'Start: {} \n End: {}'.format(start,end)
The current time format is 2019-06-28T15:12:49-04:00, but I want it to be something more readable like Fri Jun 28 15:12:49 2019
You can use dateparser to parse the date time string easily.
import dateparser
date = dateparser.parse('2019-06-28T15:12:49-04:00') # parses the date time string
print(date.strftime('%a %b %d %H:%m:%M %Y'))
# Fri Jun 28 15:06:12 2019
See this in action here
To convert timestamps of any format you can use the datetime module as described here:
https://stackabuse.com/converting-strings-to-datetime-in-python/

Extract Date from string using regex in Python 3.5.2

I have this data extracted from Email body
Data=("""-------- Forwarded Message --------
Subject: Sample Report
Date: Thu, 6 Apr 2017 16:39:19 +0000
From: test1#abc.com
To: test2#xyz.com""")
I want to extract this particular date and month , and copy it in the variables
Need output as
Date = 6
Month = "Apr"
Can anyone please help with this using regular expressions?
You can use this regex with multiline mode m:
^Date:[^,]+,\ (\d+) (\w+)
This will capture the date and the month in groups 1 and 2 respectively, so the match can easily be unpacked into two variables like so:
date, month = re.search("^Date:[^,]+,\ (\d+) (\w+)", Data, re.MULTILINE).groups()
date = int(date)
print(date, month)
# output: 6 Apr
Adding to the solution of #Rakesh,
import re
from datetime import datetime
data1 = re.sub(' ', '', data)
res = re.search(r'Date(.*)$', data1, re.MULTILINE).group()
res2 = datetime.strptime(res, 'Date:%a,%d%b%Y%X%z')
print(res2.day, res2.month)
You can use regex to extract the date
Ex:
import re
from dateutil.parser import parse
s = """-------- Forwarded Message --------
Subject: Sample Report
Date: Thu, 6 Apr 2017 16:39:19 +0000
From: test1#abc.com
To: test2#xyz.com"""
date = re.search("Date(.*)$", s, re.MULTILINE)
if date:
date = date.group().replace("Date:", "").strip()
d = parse(date)
Date = d.day
Month = d.strftime("%b")
print(Date, Month)
Output:
6 Apr

how to generate multiple txt file based on months/year?

I have a large txt file (log file), where each entry starts with timestamp such as Sun, 17 Mar 2013 18:58:06
I want to split the file into multiple txt by mm/yy and and sorted
The general code I planned is below, but I do not know how to implement such. I know how to split a file by number of lines etc, but not by specified timestamp
import re
f = open("log.txt", "r")
my_regex = re.compile('regex goes here')
body = []
for line in f:
if my_regex.match(line):
if body:
write_one(body)
body = []
body.append(line)
f.close()
example of lines from txt
2Sun, 17 Mar 2013 18:58:06 Pro IDS2.0 10E22E37-B2A1-4D55-BE20-84661D420196 nCWgKUtjalmYx053ykGeobwgWW V3
3Sun, 17 Mar 2013 19:17:33 <AwaitingDHKey c i FPdk 1:0 pt 0 Mrse> 0000000000000000000000000000000000000000 wo>
HomeKit keychain state:HomeKit: mdat=2017-01-01 01:41:47 +0000,cdat=2017-01-01 01:41:47 +0000,acct=HEDF3,class=genp,svce=AirPort,labl=HEDF3
4Sun, 13 Apr 2014 19:10:26 values in decoded form...
oak: <C: gen:'[ 21:10 5]' ak>
<PI#0x7fc01dc05d90: [name: Bourbon] [--SrbK-] [spid: zP8H/Rpy] [os: 15G31] [devid: 49645DA6] [serial: C17J9LGKDTY3] -
5Sun, 16 Feb 2014 18:59:41 tLastKVSKeyCleanup:
ak|nCWgKUtjalmYx053ykGeobwgWW:sk1Kv+37Clci7VwR2IGa+DNVEA: DHMessage (0x02): 112
You could use regex (such as [0-9]{4} ([01]\d|2[0123]):([012345]\d):([012345]\d) ) but in the example posted the date is always in the beginning of the string. If that is the case, you could just use the position of the string to parse the date.
import datetime
lines =[]
lines.append("2Sun, 17 Mar 2013 18:58:06 Pro IDS2.0 10E22E37-B2A1-4D55-BE20-84661D420196 nCWgKUtjalmYx053ykGeobwgWW V3")
lines.append("3Sun, 17 Mar 2013 19:17:33 <AwaitingDHKey c i FPdk 1:0 pt 0 Mrse> 0000000000000000000000000000000000000000 wo> HomeKit keychain state:HomeKit: mdat=2017-01-01 01:41:47 +0000,cdat=2017-01-01 01:41:47 +0000,acct=HEDF3,class=genp,svce=AirPort,labl=HEDF3")
lines.append("4Sun, 13 Apr 2014 19:10:26 values in decoded form... oak: <C: gen:'[ 21:10 5]' ak> <PI#0x7fc01dc05d90: [name: Bourbon] [--SrbK-] [spid: zP8H/Rpy] [os: 15G31] [devid: 49645DA6] [serial: C17J9LGKDTY3] -")
for l in lines:
datetime_object = datetime.datetime.strptime(l[6:26], '%d %b %Y %H:%M:%S')
print(datetime_object)
Which gives the correct output for the three examples you provided
2013-03-17 18:58:06
2013-03-17 19:17:33
2014-04-13 19:10:26
The datetime object has attributed such as month() and year() so you can use a simple equality to check whether two dates are in the same month and/or year.

Categories