Finding date represented in various formats in a string - python

Below code prints only 2-Nov-2018; how do I modify the code so that both the date formats are picked.
import re
string = "some text contains 2-Nov-2018 and 3-11-2018"
date = re.findall('\d{1,2}[/-]\D{1,8}[/-]\d{2,4}', string)
print(date)

I think the simplest thing would be to write multiple patterns.
(Assuming you are just looking for these two patterns -- obviously gets more complicated to do yourself if you are looking for every possible date format)
import re
date_string = "some text contains 2-Nov-2018 and 3-11-2018"
formats = [r'\d{1,2}[/-]\D{1,8}[/-]\d{2,4}', # List of patterns
r'\d{1,2}[/-]\d{1,2}[/-]\d{2,4}']
dates = re.findall('|'.join(formats), date_string) # Join with | operator
dates
# ['2-Nov-2018', '3-11-2018']
To standardize the dates after this, you could try something like pandas.to_datetime :
import pandas as pd
dates = ['2-Nov-2018', '3-11-2018']
std_dates = [pd.to_datetime(d) for d in dates]
std_dates
# [Timestamp('2018-11-02 00:00:00'), Timestamp('2018-03-11 00:00:00')]
As was mentioned in some comments, there may be libraries already built to do all of this for you. So if you are looking for a more general approach, I would take a look at those libraries.

Related

How to identify multiple dates within a python string?

For a given string, I want to identify the dates in it.
import datefinder
string = str("/plot 23/01/2023 24/02/2021 /cmd")
matches = list(datefinder.find_dates(string))
if matches:
print(matches)
else:
print("no date in string")
Output:
no date in string
However, there are clearly dates in the string. Ultimately I want to identify which date is the oldest by putting in a variable Date1, and which date is the newest by putting in a variable Date2.
I believe that if a string contains multiple dates, datefinder is unable to parse it. In your case, splitting the string using string.split() and applying the find_dates method should do the job.
You've only given 1 example, but based on that example, you can use regex.
import re
from datetime import datetime
string = "/plot 23/01/2023 24/02/2021 /cmd"
dates = [datetime.strptime(d, "%d/%m/%Y") for d in re.findall(r"\d{2}/\d{2}/\d{4}", string)]
print(f"earliest: {min(dates)}, latest: {max(dates)}")
Output
earliest: 2021-02-24 00:00:00, latest: 2023-01-23 00:00:00

replace the date section of a string in python

if I have a string 'Tpsawd_20220320_default_economic_v5_0.xls'.
I want to replace the date part (20220320) with a date variable (i.e if I define the date = 20220410, it will replace 20220320 with this date). How should I do it with build-in python package? Please note the date location in the string can vary. it might be 'Tpsawd_default_economic_v5_0_20220320.xls' or 'Tpsawd_default_economic_20220320_v5_0.xls'
Yes, this can be done with regex fairly easily~
import re
s = 'Tpsawd_20220320_default_economic_v5_0.xls'
date = '20220410'
s = re.sub(r'\d{8}', date, s)
print(s)
Output:
Tpsawd_20220410_default_economic_v5_0.xls
This will replace the first time 8 numbers in a row are found with the given string, in this case date.

Adding day and month fields to date field which doesn't contain them

Say we have a list like this:
['1987', '1994-04', '2001-05-03']
We would like to convert these strings into datetime objects with a consistent format, then back into strings, something like this:
['1987-01-01', '1994-04-01', '2001-05-03']
In this case, we have decided if the date doesn't contain a month or a day, we set it to the first of that respective field. Is there a way to achieve this using a datetime or only by string detection?
Read about datetime module to understand the basics of handling dates and date-like strings in Python.
The below approach using try-except is one of the many ways to achieve the desired outcome.:
from datetime import datetime
strings = ['1987', '1994-04', '2001-05-03']
INPUT_FORMATS = ('%Y', '%Y-%m', '%Y-%m-%d')
OUTPUT_FORMAT = '%Y-%m-%d'
output = []
for s in strings:
dt = None
for f in INPUT_FORMATS:
try:
dt = datetime.strptime(s, f)
break
except ValueError:
pass
except:
raise
if not dt:
raise ValueError("%s doesn't match any of the formats" % s)
output.append(dt.date().strftime(OUTPUT_FORMAT))
print(output)
Output:
['1987-01-01', '1994-04-01', '2001-05-03']
The above code will work only with date formats listed in formats variable. You can add additional formats to it if you know them beforehand.

python extracting date

I need to extract a date from a jpeg format,
I have extracted the text from the jpeg in the form of a string & have used regex to extract the date,
Text from JPEG
Cont:7225811153;
BillNo4896TableNoR306
07-Jun-201921:18:40
Code used
Importing regular expression & Date time
import re as r
from datetime import datetime
regex to identify the date in the above string
id = r.search(r'\d{2}-\w{3}-\d{4}',text)
print(id)
Output
re.Match object; span=(89, 100), match='07-Jun-2019'
However after performing the above code i tried the following to extract the date
Code
Extracting the date
date = datetime.strptime(id.group(),'%d-%B-%Y').date()
Output
ValueError: time data '07-Jun-2019' does not match format '%d-%B-%Y'
Where am I going wrong, or is there a better way to do the same.
Help would be really appreciated
Use %b instead of %B, but make sure you only try to convert the match if it occurred:
import re as r
from datetime import datetime
text = 'Cont:7225811153; BillNo4896TableNoR306 07-Jun-201921:18:40'
id = r.search(r'\d{2}-\w{3}-\d{4}',text)
if id: # <-- Check if a match occurred
print(datetime.strptime(id.group(),'%d-%b-%Y').date())
# => 2019-06-07
See the Python demo online
See more details on the datetime.strptime format strings.
You had it almost perfect. Just replace the B with b.
>>> datetime.strptime(id.group(),'%d-%b-%Y').date()
datetime.date(2019, 6, 7)

Get filename and date time from string

I have file names in the following format: name_2016_04_16.txt
I'm working with python3 and I would like to extract two things from this file. The prefix, or the name value as a string and the date as a DateTime value for the date represented in the string. For the example above, I would like to extract:
filename: name as a String
date: 04/16/2016 as a DateTime
I will be saving these values to a database, so I would like the DateTime variable to be sql friendly.
Is there a library that can help me do this? Or is there a simple way of going about this?
I tried the following as suggested:
filename = os.path.splitext(filename)[0]
print(filename)
filename.split("_")[1::]
print(filename)
'/'.join(filename.split("_")[1::])
print(filename)
But it outputs:
name_2016_04_16
name_2016_04_16
name_2016_04_16
And does not really extract the name and date.
Thank you!
I would first strip the file extension, then I would split by underscore, removing the 'name' field. Finally, I would join by slash (maybe this value could be logged) and parse the date with the datetime library
import os
from datetime import datetime
file_name = os.path.splitext("name_2016_04_16.txt")[0]
date_string = '/'.join(file_name.split("_")[1::])
parsed_date = datetime.strptime(date_string, "%Y/%m/%d")
To make the date string sql friendly, I found this so post: Inserting a Python datetime.datetime object into MySQL, which suggests that the following should work
sql_friendly_string = parsed_date.strftime('%Y-%m-%d %H:%M:%S')
How about simply doing this?
filename = 'name_2016_04_16.txt'
date = filename[-14:-4] # counting from the end will ensure that you extract the date no matter what the "name" is and how long it is
prefix = filename [:-14]
from datetime import datetime
date = datetime.strptime(date, '%Y_%m_%d') # this turns the string into a datetime object
(However, this works on Python 2.7, if it works for Python 3 you need to find for yourself.)
You can split the filename on "." Then split again on "_". This should give you a list of strings. The first being the name, second through fourth being the year, month and day respectively. Then convert the date to SQL friendly form.
Something like this:
rawname ="name_2016_04_16.txt"
filename = rawname.split(".")[0] #drop the .txt
name = filename.split("_")[0] #provided there's no underscore in the name part of the filename
year = filename.split("_")[1]
month = filename.split("_")[2]
day = filename.split("_")[3]
datestring = (month,day,year) # temp string to store a the tuple in the required order
date = "/".join(datestring) #as shown above
datestring = (year,month,day)
SQL_date = "-".join(datestring ) # SQL date
print name
print date
print SQL_date
Unless you want to use the datetime library to get the datetime date, in which case look up the datetime library
You can then do something like this:
SQL_date = datetime.strptime(date, '%m/%d/%Y')
This is the most explicit way I can think of right now. I'm sure there are shorter ways :)
Apologies for the bad formatting, I'm posting on mobile.

Categories