I am trying to convert a column of dates from MonthYear form to mm/dd/yyyy and I can do it as a string replace but it requires 157 lines of code to get all the data changed. I want to be able to take the month and year and push out the second wednesday of the month in mm/dd/yyyy form. is that possible?
I am currently using this code
df['Column']=df['Column'].str.replace("December2009", "12/11/2009")
I don't know of a standard library tool for this, but it's easy to make your own, something like this:
from datetime import datetime, timedelta
import pandas as pd
test_arr = ['December2009', 'August2012', 'March2015']
def replacer(d):
# take a datestring of format %B%Y and find the second wednesday
dt = datetime.strptime(d, '%B%Y')
x = 0
# start at day 1 and increment through until conditions satisfied
while True:
s = dt.strftime('%A')
if s == 'Wednesday':
x += 1 # if a wednesday found, increment the counter
if x == 2:
break # when two wednesdays found then break
dt += timedelta(days = 1)
return dt.strftime('%m/%d/%Y')
df = pd.DataFrame(test_arr, columns = ['a'])
df['a'].apply(replacer) # .apply() applies the given python function to each element in the df column
Maybe the calendar module as recommended in the other comments could make the code look nicer but I'm unfamiliar with it so it might be something you want to look into the improve the solution
Related
I am creating an app where I have two values, a committee starting date for e-g 2022,01,02 and for how many months it will continue here it is (4months). Now I am saving some data in my database month wise and also these dates will save too. now the issue is I am getting right result if the number of month is less than or equal to 12 using this.
number of memebrs = 12
starting date = 2022,01,01
for i in range(1,17):
print('date', (2022,i,10))
but the issue comes when the months are greater than 12 so than date start printing 2022,01,13 which is false because I also want to increment the year to 2023, I feel like this is not really a good idea very inefficient looking way. can anyone tell me is there any other way to do this.
While you can use the datetime library to handle dates, it doesn't provide any methods to increase dates month by month.
Now, previous suggestions/answers suggest you increase the year when month == 12, but that will cause December to be skipped. Also, your code doesn't consider any given month in the starting date. So a better solution would be:
>>> year = 2022
>>> month = 7
>>> day = 23
>>>
>>> for i in range(1, 8):
... month += 1
... if month == 13:
... month = 1
... year += 1
... print(f'{year}-{month}-{day}')
...
2022-8-23
2022-9-23
2022-10-23
2022-11-23
2022-12-23
2023-1-23
2023-2-23
you could do something like this:
date = [2022,1,10]
for i in range(1,17):
if i%12==0:
date[0]+=1
date[1]=1
print('date', (time[0],i,time[2]))
By the tone of your question i think you are a beginner, so i won't
recommend you to use datetime module and i appreciate that you tried to do it on your own.
What i dont appreciate is that why cant you just use if statements and create variables for year and date
yr = 2022
dt = 1
for i in range(1,17):
print('date', (yr,i,dt))
if i % 12 == 0:
yr += 1
mn = 1
I also want to share the modern aproach using datetime module. But it requires some modules.
In your cmd enter the command
pip install python-dateutil
Once installed close cmd and refresh your ide
this is the code you may want to use
from datetime import datetime
from dateutil.relativedelta import relativedelta
date_time = datetime(2022, 1, 1) #Creating a Date object
for i in range(1, 17):
date = date_time.date()
print(date)
date_time = date_time + relativedelta(months=1)
My below working code calculates date/month ranges, but I am using the Pandas library, which I want to get rid of.
import pandas as pd
dates=pd.date_range("2019-12","2020-02",freq='MS').strftime("%Y%m%d").tolist()
#print dates : ['20191101','20191201','20200101','20200201']
df=(pd.to_datetime(dates,format="%Y%m%d") + MonthEnd(1)).strftime("%Y%m%d").tolist()
#print df : ['20191130','20191231','20200131','20200229']
How can I rewrite this code without using Pandas?
I don't want to use Pandas library as I am triggering my job through Oozie and we don't have Pandas installed on all our nodes.
Pandas offers some nice functionalities when using datetimes which the standard library datetime module does not have (like the frequency or the MonthEnd). You have to reproduce these yourself.
import datetime as DT
def next_first_of_the_month(dt):
"""return a new datetime where the month has been increased by 1 and
the day is always the first
"""
new_month = dt.month + 1
if new_month == 13:
new_year = dt.year + 1
new_month = 1
else:
new_year = dt.year
return DT.datetime(new_year, new_month, day=1)
start, stop = [DT.datetime.strptime(dd, "%Y-%m") for dd in ("2019-11", "2020-02")]
dates = [start]
cd = next_first_of_the_month(start)
while cd <= stop:
dates.append(cd)
cd = next_first_of_the_month(cd)
str_dates = [d.strftime("%Y%m%d") for d in dates]
print(str_dates)
# prints: ['20191101', '20191201', '20200101', '20200201']
end_dates = [next_first_of_the_month(d) - DT.timedelta(days=1) for d in dates]
str_end_dates = [d.strftime("%Y%m%d") for d in end_dates]
print(str_end_dates)
# prints ['20191130', '20191231', '20200131', '20200229']
I used here a function to get a datetime corresponding to the first day of the next month of the input datetime. Sadly, timedelta does not work with months, and adding 30 days of course is not feasible (not all months have 30 days).
Then a while loop to get a sequence of fist days of the month until the stop date.
And to the get the end of the month, again get the next first day of the month fo each datetime in your list and subtract a day.
I want to parse a date string and manipulate the year, month, date in cases where I either get '00' for month or day or in cases where I get a day beyond the possible days of that year/month. Given a '2012-00-00' or a '2020-02-31', I get a ValueError. What I want, is to catch the error and then turn the former into '2012-01-01' and the latter to '2020-02-29'. No results on Google so far.
Clarification: I use try/except/ValueError... what I want is to parse out the year, month, day and fix the day or month when they are having a ValueError... without having to code the parsing and regular expressions myself... which defeats the purpose of using a library to begin with.
# Try dateutjil
blah = dateutil.parser.parse(date_string, fuzzy=True)
print(blah)
# Try datetime
date_object = datetime.strptime(date_string, date_format)
return_date_string = date_object.date().strftime('%Y-%m-%d')
I know you don't want to parse the date yourself but I think you will probably have to. One option would be to split the incoming string into its component year, month and day parts and check them against valid values, adjusting as required. You can then create a date from that and call strftime to get a valid date string:
from datetime import datetime, date
import calendar
def parse_date(dt):
[y, m, d] = map(int, dt.split('-'))
# optional error checking on y
# ...
# check month
m = 1 if m == 0 else 12 if m > 12 else m
# check day
last = calendar.monthrange(y, m)[-1]
d = 1 if d == 0 else last if d > last else d
return date(y, m, d).strftime('%Y-%m-%d')
print(parse_date('2012-00-00'))
print(parse_date('2020-02-31'))
Output:
2012-01-01
2020-02-29
From the daily stock price data, I want to sample and select end of the month price. I am accomplishing using the following code.
import datetime
from pandas_datareader import data as pdr
import pandas as pd
end = datetime.date.today()
begin=end-pd.DateOffset(365*2)
st=begin.strftime('%Y-%m-%d')
ed=end.strftime('%Y-%m-%d')
data = pdr.get_data_yahoo("AAPL",st,ed)
mon_data=pd.DataFrame(data['Adj Close'].resample('M').apply(lambda x: x[-2])).set_index(data.index)
The line above selects end of the month data and here is the output.
If I want to select penultimate value of the month, I can do it using the following code.
mon_data=pd.DataFrame(data['Adj Close'].resample('M').apply(lambda x: x[-2]))
Here is the output.
However the index shows end of the month value. When I choose penultimate value of the month, I want index to be 2015-12-30 instead of 2015-12-31.
Please suggest the way forward. I hope my question is clear.
Thanking you in anticipation.
Regards,
Abhishek
I am not sure if there is a way to do it with resample. But, you can get what you want using groupby and TimeGrouper.
import datetime
from pandas_datareader import data as pdr
import pandas as pd
end = datetime.date.today()
begin = end - pd.DateOffset(365*2)
st = begin.strftime('%Y-%m-%d')
ed = end.strftime('%Y-%m-%d')
data = pdr.get_data_yahoo("AAPL",st,ed)
data['Date'] = data.index
mon_data = (
data[['Date', 'Adj Close']]
.groupby(pd.TimeGrouper(freq='M')).nth(-2)
.set_index('Date')
)
simplest solution is to take the index of your newly created dataframe and subtract the number of days you want to go back:
n = 1
mon_data=pd.DataFrame(data['Adj Close'].resample('M').apply(lambda x: x[-1-n]))
mon_data.index = mon_data.index - datetime.timedelta(days=n)
also, seeing your data, i think that you should resample not to ' month end frequency' but rather to 'business month end frequency':
.resample('BM')
but even that won't cover it all, because for instance December 29, 2017 is a business month end, but this date doesn't appear in your data (which ends in December 08 2017). so you could add a small fix to that (assuming the original data is sorted by the date):
end_of_months = mon_data.index.tolist()
end_of_months[-1] = data.index[-1]
mon_data.index = end_of_months
so, the full code will look like:
n = 1
mon_data=pd.DataFrame(data['Adj Close'].resample('BM').apply(lambda x: x[-1-n]))
end_of_months = mon_data.index.tolist()
end_of_months[-1] = data.index[-1]
mon_data.index = end_of_months
mon_data.index = mon_data.index - datetime.timedelta(days=n)
btw: your .set_index(data.index) throw an error because data and mon_data are in different dimensions (mon_data is monthly grouped_by)
I have a non-index column in a python dataframe with a date like 02/03/2017. I would like to extract the day of the week and make it a separate column.
Actually there is solution using pandas.
import pandas as pd
your_df = pd.DataFrame(data={'Date': ['31/1/2018', '1/1/2018', '31/12/2018',
'28/2/2016', '3/3/2035']})
your_df['Date'] = pd.to_datetime(your_df['Date'], format="%d/%m/%Y")
your_df['Day of week (int)'] = your_df['Date'].dt.weekday
your_df['Day of week (str)'] = your_df['Date'].dt.day_name()
print(your_df)
More info here: Create a day-of-week column in a Pandas dataframe using Python
Notes as per my other (less elegant) answer...
You might also be interested in the arrow module since it offers quite a few features and advantages. Here I demonstrate its ability to provide weekday names in two forms for one locale, and in one form for a non-English locale.
>>> import arrow
>>> theDate = arrow.get('02/03/2017', 'DD/MM/YYYY')
>>> theDate
<Arrow [2017-03-02T00:00:00+00:00]>
>>> theDate.weekday()
3
>>> theDate.format('ddd', locale='en_GB')
'Thu'
>>> theDate.format('dddd', locale='en_GB')
'Thursday'
>>> theDate.format('dddd', locale='fr_FR')
'jeudi'
First you need to convert the date to a datetime object:
import datetime
date = datetime.datetime.strptime("02/03/2017", "%d/%m/%Y")
print date.weekday()
See https://docs.python.org/2/library/datetime.html#module-datetime
The solution I've found is a two step process as I haven't been able to find a way to get weekday() work on a pandas series.
import pandas as pd
your_df = pd.DataFrame(data={'Date': ['31/1/2018', '1/1/2018', '31/12/2018',
'28/2/2016', '3/3/2035']})
dt_series = pd.to_datetime(your_df['Date'], format="%d/%m/%Y")
dow = []
for dt in range(0, len(your_df)):
dow.append(dt_series[dt].weekday())
your_df.insert(1, 'Day of week', dow)
print(your_df)
The output should look like this:
Date Day of week
0 31/1/2018 2
1 1/1/2018 0
2 31/12/2018 0
3 28/2/2016 6
4 3/3/2035 5
Notes:
I'm using dd/mm/yyyy format. You will need to change the format argument for to_datetime() if your dates are in U.S. or other formats.
Python weekdays: Monday = 0, Sunday = 6.