Get filename and date time from string - python

I have file names in the following format: name_2016_04_16.txt
I'm working with python3 and I would like to extract two things from this file. The prefix, or the name value as a string and the date as a DateTime value for the date represented in the string. For the example above, I would like to extract:
filename: name as a String
date: 04/16/2016 as a DateTime
I will be saving these values to a database, so I would like the DateTime variable to be sql friendly.
Is there a library that can help me do this? Or is there a simple way of going about this?
I tried the following as suggested:
filename = os.path.splitext(filename)[0]
print(filename)
filename.split("_")[1::]
print(filename)
'/'.join(filename.split("_")[1::])
print(filename)
But it outputs:
name_2016_04_16
name_2016_04_16
name_2016_04_16
And does not really extract the name and date.
Thank you!

I would first strip the file extension, then I would split by underscore, removing the 'name' field. Finally, I would join by slash (maybe this value could be logged) and parse the date with the datetime library
import os
from datetime import datetime
file_name = os.path.splitext("name_2016_04_16.txt")[0]
date_string = '/'.join(file_name.split("_")[1::])
parsed_date = datetime.strptime(date_string, "%Y/%m/%d")
To make the date string sql friendly, I found this so post: Inserting a Python datetime.datetime object into MySQL, which suggests that the following should work
sql_friendly_string = parsed_date.strftime('%Y-%m-%d %H:%M:%S')

How about simply doing this?
filename = 'name_2016_04_16.txt'
date = filename[-14:-4] # counting from the end will ensure that you extract the date no matter what the "name" is and how long it is
prefix = filename [:-14]
from datetime import datetime
date = datetime.strptime(date, '%Y_%m_%d') # this turns the string into a datetime object
(However, this works on Python 2.7, if it works for Python 3 you need to find for yourself.)

You can split the filename on "." Then split again on "_". This should give you a list of strings. The first being the name, second through fourth being the year, month and day respectively. Then convert the date to SQL friendly form.
Something like this:
rawname ="name_2016_04_16.txt"
filename = rawname.split(".")[0] #drop the .txt
name = filename.split("_")[0] #provided there's no underscore in the name part of the filename
year = filename.split("_")[1]
month = filename.split("_")[2]
day = filename.split("_")[3]
datestring = (month,day,year) # temp string to store a the tuple in the required order
date = "/".join(datestring) #as shown above
datestring = (year,month,day)
SQL_date = "-".join(datestring ) # SQL date
print name
print date
print SQL_date
Unless you want to use the datetime library to get the datetime date, in which case look up the datetime library
You can then do something like this:
SQL_date = datetime.strptime(date, '%m/%d/%Y')
This is the most explicit way I can think of right now. I'm sure there are shorter ways :)
Apologies for the bad formatting, I'm posting on mobile.

Related

Convert custom string date to date

Is there a way to convert a string date that is stored in some non-traditional custom manner into a date using datetime (or something equivalent)? The dates I am dealing with are S3 partitions that look like this:
year=2023/month=2/dayofmonth=3
I can accomplish this with several replaces but im hoping to find a clean single operation to do this.
You might provide datetime.datetime.strptime with format string holding text, in this case
import datetime
dt = datetime.datetime.strptime("year=2023/month=2/dayofmonth=3","year=%Y/month=%m/dayofmonth=%d")
d = dt.date()
print(d) # 2023-02-03
you can do that converting your string into a date object using "datetime" combined with strptime() method.
The strtime() takes two arguments, the first is the string to be parsed, and the second a string with the format.
Here's an example:
from datetime import datetime
# your string
date_string = "year=2023/month=2/dayofmonth=3"
# parse the string into a datetime object
date = datetime.strptime(date_string, "year=%Y/month=%m/dayofmonth=%d")
# print the datetime object
print(date)

replace the date section of a string in python

if I have a string 'Tpsawd_20220320_default_economic_v5_0.xls'.
I want to replace the date part (20220320) with a date variable (i.e if I define the date = 20220410, it will replace 20220320 with this date). How should I do it with build-in python package? Please note the date location in the string can vary. it might be 'Tpsawd_default_economic_v5_0_20220320.xls' or 'Tpsawd_default_economic_20220320_v5_0.xls'
Yes, this can be done with regex fairly easily~
import re
s = 'Tpsawd_20220320_default_economic_v5_0.xls'
date = '20220410'
s = re.sub(r'\d{8}', date, s)
print(s)
Output:
Tpsawd_20220410_default_economic_v5_0.xls
This will replace the first time 8 numbers in a row are found with the given string, in this case date.

I am having trouble formatting a date using datetime

I am getting a date from an API that I am trying to pass to my template after formatting using datetime but I keep getting this error:
time data 2021-03-09T05:00:00.000Z does not match format %Y-%m-%d, %I:%M:%S;%f
I know I have to strftime and then strptime but I cant get past that error.
I would like to split it into two variables one for date and one for the time that will show in the users timezone.
date = game['schedule']['date']
game_date = datetime.strptime(date, '%Y-%m-%d, %I:%M:%S;%f')
You have a slightly wrong time format:
game_date = datetime.strptime(date, '%Y-%m-%dT%H:%M:%S.%fZ')
Or (if you remove the last Z character from the date string) you can also use datetime.fromisoformat:
game_date = datetime.fromisoformat(date[:-1])
And then you can extract date and time this way:
date = game_date.date()
time = game_date.time()
time_with_timezone = game_date.timetz()

Adding day and month fields to date field which doesn't contain them

Say we have a list like this:
['1987', '1994-04', '2001-05-03']
We would like to convert these strings into datetime objects with a consistent format, then back into strings, something like this:
['1987-01-01', '1994-04-01', '2001-05-03']
In this case, we have decided if the date doesn't contain a month or a day, we set it to the first of that respective field. Is there a way to achieve this using a datetime or only by string detection?
Read about datetime module to understand the basics of handling dates and date-like strings in Python.
The below approach using try-except is one of the many ways to achieve the desired outcome.:
from datetime import datetime
strings = ['1987', '1994-04', '2001-05-03']
INPUT_FORMATS = ('%Y', '%Y-%m', '%Y-%m-%d')
OUTPUT_FORMAT = '%Y-%m-%d'
output = []
for s in strings:
dt = None
for f in INPUT_FORMATS:
try:
dt = datetime.strptime(s, f)
break
except ValueError:
pass
except:
raise
if not dt:
raise ValueError("%s doesn't match any of the formats" % s)
output.append(dt.date().strftime(OUTPUT_FORMAT))
print(output)
Output:
['1987-01-01', '1994-04-01', '2001-05-03']
The above code will work only with date formats listed in formats variable. You can add additional formats to it if you know them beforehand.

strip date with -07:00 timezone format python

I have a variable 'd' that contains dates in this format:
2015-08-03T09:00:00-07:00
2015-08-03T10:00:00-07:00
2015-08-03T11:00:00-07:00
2015-08-03T12:00:00-07:00
2015-08-03T13:00:00-07:00
2015-08-03T14:00:00-07:00
etc.
I need to strip these dates, but I'm having trouble because of the timezone. If I use d = dt.datetime.strptime(d[:19],'%Y-%m-%dT%H:%M:%S'), only the first 19 characters will appear and the rest of the dates are ignored. If I try d = dt.datetime.strptime(d[:-6],'%Y-%m-%dT%H:%M:%S, Python doesn't chop off the timezone and I still get the error ValueError: unconverted data remains: -07:00. I don't think I can use the dateutil parser because I've only seen it be used for one date instead of a whole list like I have. What can I do? Thanks!
Since you have a list just iterate over and use dateutil.parser:
d = ["2015-08-03T09:00:00-07:00","2015-08-03T10:00:00-07:00","2015-08-03T11:00:00-07:00","2015-08-03T12:00:00-07:00",
"2015-08-03T13:00:00-07:00","2015-08-03T14:00:00-07:00"]
from dateutil import parser
for dte in d:
print(parser.parse(dte))
If for some reason you actually want to ignore the timezone you can use rsplit with datetime.strptime:
from datetime import datetime
for dte in d:
print(datetime.strptime(dte.rsplit("-",1)[0],"%Y-%m-%dT%H:%M:%S"))
If you had a single string delimited by commas then just use d.split(",")
You can use strftime to format the string in any format you want if you actually want a string:
for dte in d:
print(datetime.strptime(dte.rsplit("-",1)[0],"%Y-%m-%dT%H:%M:%S").strftime("%Y-%m-%d %H:%M:%S"))

Categories