For a given string, I want to identify the dates in it.
import datefinder
string = str("/plot 23/01/2023 24/02/2021 /cmd")
matches = list(datefinder.find_dates(string))
if matches:
print(matches)
else:
print("no date in string")
Output:
no date in string
However, there are clearly dates in the string. Ultimately I want to identify which date is the oldest by putting in a variable Date1, and which date is the newest by putting in a variable Date2.
I believe that if a string contains multiple dates, datefinder is unable to parse it. In your case, splitting the string using string.split() and applying the find_dates method should do the job.
You've only given 1 example, but based on that example, you can use regex.
import re
from datetime import datetime
string = "/plot 23/01/2023 24/02/2021 /cmd"
dates = [datetime.strptime(d, "%d/%m/%Y") for d in re.findall(r"\d{2}/\d{2}/\d{4}", string)]
print(f"earliest: {min(dates)}, latest: {max(dates)}")
Output
earliest: 2021-02-24 00:00:00, latest: 2023-01-23 00:00:00
Related
I am looking to remove the space separating the date and the time from a python datetime object. I am using strptime "%Y-%m-%dT%H:%M:%S.%f", so I do not know why there is a space included to begin with.
Code:
import datetime
start_timestamp = "2022-11-23T10:08:00.000"
date_time_start = datetime.datetime.strptime(start_timestamp, "%Y-%m-%dT%H:%M:%S.%f")
print(date_time_start)
Output:
2022-11-23 10:08:00
Desired output:
2022-11-23_10:08:00
Use isoformat with custom separator:
>>> date_time_start.isoformat(sep="_")
'2022-11-23_10:08:00'
if I have a string 'Tpsawd_20220320_default_economic_v5_0.xls'.
I want to replace the date part (20220320) with a date variable (i.e if I define the date = 20220410, it will replace 20220320 with this date). How should I do it with build-in python package? Please note the date location in the string can vary. it might be 'Tpsawd_default_economic_v5_0_20220320.xls' or 'Tpsawd_default_economic_20220320_v5_0.xls'
Yes, this can be done with regex fairly easily~
import re
s = 'Tpsawd_20220320_default_economic_v5_0.xls'
date = '20220410'
s = re.sub(r'\d{8}', date, s)
print(s)
Output:
Tpsawd_20220410_default_economic_v5_0.xls
This will replace the first time 8 numbers in a row are found with the given string, in this case date.
I am looking to identify and extract a date from a number of different strings. The dates may not be formatted the same. I have been using the datefinder package but I am having some issues saving the output.
Goal: Extract the date from a string, which may be formatted in a number of different ways (ie April,22 or 4/22 or 22-Apr etc) and if there is no date, set the value to 'None' and append the date list with either the date or 'None'.
Please see the examples below.
Example 1: (This returns a date, but does not get appended to my list)
import datefinder
extracted_dates = []
sample_text = 'As of February 27, 2019 there were 28 dogs at the kennel.'
matches = datefinder.find_dates(sample_text)
for match in matches:
if match == None:
date = 'None'
extracted_dates.append(date)
else:
date = str(match)
extracted_dates.append(date)
Example 2: (This does not return a date, and does not get appended to my list)
import datefinder
extracted_dates = []
sample_text = 'As of the date, there were 28 dogs at the kennel.'
matches = datefinder.find_dates(sample_text)
for match in matches:
if match == None:
date = 'None'
extracted_dates.append(date)
else:
date = str(match)
extracted_dates.append(date)
I have tried using your package, but it seemed that there was no fast and general way of extracting the real date on your example.
I instead used the DateParser package and more specifically the search_dates method
I briefly tested it on your examples only.
from dateparser.search import search_dates
sample_text = 'As of February 27, 2019 there were 28 dogs at the kennel.'
extracted_dates = []
# Returns a list of tuples of (substring containing the date, datetime.datetime object)
dates = search_dates(sample_text)
if dates is not None:
for d in dates:
extracted_dates.append(str(d[1]))
else:
extracted_dates.append('None')
print(extracted_dates)
I need to extract a date from a jpeg format,
I have extracted the text from the jpeg in the form of a string & have used regex to extract the date,
Text from JPEG
Cont:7225811153;
BillNo4896TableNoR306
07-Jun-201921:18:40
Code used
Importing regular expression & Date time
import re as r
from datetime import datetime
regex to identify the date in the above string
id = r.search(r'\d{2}-\w{3}-\d{4}',text)
print(id)
Output
re.Match object; span=(89, 100), match='07-Jun-2019'
However after performing the above code i tried the following to extract the date
Code
Extracting the date
date = datetime.strptime(id.group(),'%d-%B-%Y').date()
Output
ValueError: time data '07-Jun-2019' does not match format '%d-%B-%Y'
Where am I going wrong, or is there a better way to do the same.
Help would be really appreciated
Use %b instead of %B, but make sure you only try to convert the match if it occurred:
import re as r
from datetime import datetime
text = 'Cont:7225811153; BillNo4896TableNoR306 07-Jun-201921:18:40'
id = r.search(r'\d{2}-\w{3}-\d{4}',text)
if id: # <-- Check if a match occurred
print(datetime.strptime(id.group(),'%d-%b-%Y').date())
# => 2019-06-07
See the Python demo online
See more details on the datetime.strptime format strings.
You had it almost perfect. Just replace the B with b.
>>> datetime.strptime(id.group(),'%d-%b-%Y').date()
datetime.date(2019, 6, 7)
I have a variable 'd' that contains dates in this format:
2015-08-03T09:00:00-07:00
2015-08-03T10:00:00-07:00
2015-08-03T11:00:00-07:00
2015-08-03T12:00:00-07:00
2015-08-03T13:00:00-07:00
2015-08-03T14:00:00-07:00
etc.
I need to strip these dates, but I'm having trouble because of the timezone. If I use d = dt.datetime.strptime(d[:19],'%Y-%m-%dT%H:%M:%S'), only the first 19 characters will appear and the rest of the dates are ignored. If I try d = dt.datetime.strptime(d[:-6],'%Y-%m-%dT%H:%M:%S, Python doesn't chop off the timezone and I still get the error ValueError: unconverted data remains: -07:00. I don't think I can use the dateutil parser because I've only seen it be used for one date instead of a whole list like I have. What can I do? Thanks!
Since you have a list just iterate over and use dateutil.parser:
d = ["2015-08-03T09:00:00-07:00","2015-08-03T10:00:00-07:00","2015-08-03T11:00:00-07:00","2015-08-03T12:00:00-07:00",
"2015-08-03T13:00:00-07:00","2015-08-03T14:00:00-07:00"]
from dateutil import parser
for dte in d:
print(parser.parse(dte))
If for some reason you actually want to ignore the timezone you can use rsplit with datetime.strptime:
from datetime import datetime
for dte in d:
print(datetime.strptime(dte.rsplit("-",1)[0],"%Y-%m-%dT%H:%M:%S"))
If you had a single string delimited by commas then just use d.split(",")
You can use strftime to format the string in any format you want if you actually want a string:
for dte in d:
print(datetime.strptime(dte.rsplit("-",1)[0],"%Y-%m-%dT%H:%M:%S").strftime("%Y-%m-%d %H:%M:%S"))