replace the date section of a string in python - python

if I have a string 'Tpsawd_20220320_default_economic_v5_0.xls'.
I want to replace the date part (20220320) with a date variable (i.e if I define the date = 20220410, it will replace 20220320 with this date). How should I do it with build-in python package? Please note the date location in the string can vary. it might be 'Tpsawd_default_economic_v5_0_20220320.xls' or 'Tpsawd_default_economic_20220320_v5_0.xls'

Yes, this can be done with regex fairly easily~
import re
s = 'Tpsawd_20220320_default_economic_v5_0.xls'
date = '20220410'
s = re.sub(r'\d{8}', date, s)
print(s)
Output:
Tpsawd_20220410_default_economic_v5_0.xls
This will replace the first time 8 numbers in a row are found with the given string, in this case date.

Related

How to identify multiple dates within a python string?

For a given string, I want to identify the dates in it.
import datefinder
string = str("/plot 23/01/2023 24/02/2021 /cmd")
matches = list(datefinder.find_dates(string))
if matches:
print(matches)
else:
print("no date in string")
Output:
no date in string
However, there are clearly dates in the string. Ultimately I want to identify which date is the oldest by putting in a variable Date1, and which date is the newest by putting in a variable Date2.
I believe that if a string contains multiple dates, datefinder is unable to parse it. In your case, splitting the string using string.split() and applying the find_dates method should do the job.
You've only given 1 example, but based on that example, you can use regex.
import re
from datetime import datetime
string = "/plot 23/01/2023 24/02/2021 /cmd"
dates = [datetime.strptime(d, "%d/%m/%Y") for d in re.findall(r"\d{2}/\d{2}/\d{4}", string)]
print(f"earliest: {min(dates)}, latest: {max(dates)}")
Output
earliest: 2021-02-24 00:00:00, latest: 2023-01-23 00:00:00

extract datetime from a folder parth string

I have a list of files that are arranged in the following format:
'folder/sensor_01/2021/12/31/005_6_0.csv.gz',
'folder/sensor_01/2022/01/01/005_0_0.csv.gz',
'folder/sensor_01/2022/01/02/005_1_0.csv.gz',
'folder/sensor_01/2022/01/03/005_4_0.csv.gz',
....
Now, what I want to do is filter the entries which are within the time range. So, in the folder listings, the middle segment after sensor_01 and before 005 give the time entry (till date resolution).
I am getting stuck with how to extract this time segment from the folder path and convert it to a python DateTime object. I think I can then use the comparison operators to filter the entries.
The answer is the string to DateTime formatting.
Split
You can split the text to get the Year, Month, and Day part.
file = 'folder/sensor_01/2021/12/31/005_6_0.csv.gz'
file.split("/")
# ['folder', 'sensor_01', '2021', '12', '31', '005_6_0.csv.gz']
Here 2nd, 3rd and 4th elements are year, month and day.
Or
strptime
See https://stackoverflow.com/a/466376/2681662. You can create a DateTime object from a string. But there's no restriction of delimiters for the Year, Month, and Day separator.
So:
file = 'folder/sensor_01/2021/12/31/005_6_0.csv.gz'
datetime.strptime(file, 'folder/sensor_01/%Y/%m/%d/005_6_0.csv.gz') # This is valid
# datetime.datetime(2021, 12, 31, 0, 0)
This is easily done using regex.
\S+\/sensor_[\d]+\/([\S\/]+)\/[\S_]+\.csv\.gz
I have used this regex to match and group the date portion of one of the strings.
In [11]: import re
In [12]: string = 'folder/sensor_01/2021/12/31/005_6_0.csv.gz'
In [13]: reg = '\S+\/sensor_[\d]+\/([\S\/]+)\/[\S_]+\.csv\.gz'
In [15]: re.match(reg, string).groups()[0]
Out[15]: '2021/12/31'

Pandas sets datetime to first day of month if missing day?

When I used Pandas to convert my datetime string, it sets it to the first day of the month if the day is missing.
For example:
pd.to_datetime('2017-06')
OUT[]: Timestamp('2017-06-01 00:00:00')
Is there a way to have it use the 15th (middle) day of the month?
EDIT:
I only want it to use day 15 if the day is missing, otherwise use the actual date - so offsetting all values by 15 won't work.
While this isn't possible using the actual call, you could always use regex matching to figure out if the string contains a date and proceed accordingly. Note: this code only works if using '-' delimited dates:
import re
date_str = '2017-06'
if (not bool(re.match('.+-.+-.+',date_str))):
pd.to_datetime(date_str).replace(date=15)
else:
pd.to_datetime(date_str)

Get filename and date time from string

I have file names in the following format: name_2016_04_16.txt
I'm working with python3 and I would like to extract two things from this file. The prefix, or the name value as a string and the date as a DateTime value for the date represented in the string. For the example above, I would like to extract:
filename: name as a String
date: 04/16/2016 as a DateTime
I will be saving these values to a database, so I would like the DateTime variable to be sql friendly.
Is there a library that can help me do this? Or is there a simple way of going about this?
I tried the following as suggested:
filename = os.path.splitext(filename)[0]
print(filename)
filename.split("_")[1::]
print(filename)
'/'.join(filename.split("_")[1::])
print(filename)
But it outputs:
name_2016_04_16
name_2016_04_16
name_2016_04_16
And does not really extract the name and date.
Thank you!
I would first strip the file extension, then I would split by underscore, removing the 'name' field. Finally, I would join by slash (maybe this value could be logged) and parse the date with the datetime library
import os
from datetime import datetime
file_name = os.path.splitext("name_2016_04_16.txt")[0]
date_string = '/'.join(file_name.split("_")[1::])
parsed_date = datetime.strptime(date_string, "%Y/%m/%d")
To make the date string sql friendly, I found this so post: Inserting a Python datetime.datetime object into MySQL, which suggests that the following should work
sql_friendly_string = parsed_date.strftime('%Y-%m-%d %H:%M:%S')
How about simply doing this?
filename = 'name_2016_04_16.txt'
date = filename[-14:-4] # counting from the end will ensure that you extract the date no matter what the "name" is and how long it is
prefix = filename [:-14]
from datetime import datetime
date = datetime.strptime(date, '%Y_%m_%d') # this turns the string into a datetime object
(However, this works on Python 2.7, if it works for Python 3 you need to find for yourself.)
You can split the filename on "." Then split again on "_". This should give you a list of strings. The first being the name, second through fourth being the year, month and day respectively. Then convert the date to SQL friendly form.
Something like this:
rawname ="name_2016_04_16.txt"
filename = rawname.split(".")[0] #drop the .txt
name = filename.split("_")[0] #provided there's no underscore in the name part of the filename
year = filename.split("_")[1]
month = filename.split("_")[2]
day = filename.split("_")[3]
datestring = (month,day,year) # temp string to store a the tuple in the required order
date = "/".join(datestring) #as shown above
datestring = (year,month,day)
SQL_date = "-".join(datestring ) # SQL date
print name
print date
print SQL_date
Unless you want to use the datetime library to get the datetime date, in which case look up the datetime library
You can then do something like this:
SQL_date = datetime.strptime(date, '%m/%d/%Y')
This is the most explicit way I can think of right now. I'm sure there are shorter ways :)
Apologies for the bad formatting, I'm posting on mobile.

Changing list answers in python

I've been trying to input into a mysql table using python, thing is I'm trying to create a list with all dates from April 2016 to now so I can insert them individually into the sql insert, I searched but I didn't find how can I change value per list result (if it's 1 digit or 2 digits):
dates = ['2016-04-'+str(i+1) for i in range(9,30)]
I would like i to add a 0 every time i is a single digit (i.e 1,2,3 etc.)
and when its double digit for it to stay that way (i.e 10, 11, 12 etc.)
dates = ['2016-04-'+ '{:02d}'.format(i) for i in range(9,30)]
>>> print dates
['2016-04-09', '2016-04-10', '2016-04-11', '2016-04-12', '2016-04-13', '2016-04-14', '2016-04-15', '2016-04-16', '2016-0
4-17', '2016-04-18', '2016-04-19', '2016-04-20', '2016-04-21', '2016-04-22', '2016-04-23', '2016-04-24', '2016-04-25', '
2016-04-26', '2016-04-27', '2016-04-28', '2016-04-29']
>>>
Using C style formatting, all the dates in April:
dates = ['2016-04-%02d'%i for i in range(1,31)]
Need range(1,31) since the last value in the range is not used, or use range(30) and add 1 to i.
The same using .format():
dates = ['2016-04-{:02}'.format(i) for i in range(1,31)]
You can use dateutil module
from datetime import datetime
from dateutil.rrule import rrule, DAILY
start_date = datetime(2016,04,01)
w=[each.strftime('%Y-%m-%d') for each in list(rrule(freq=DAILY, dtstart=start_date, until=datetime(2016,05,9)))]

Categories