I have a data frame with date columns as 20,190,927 which means: 2019/09/27.
I need to change the format to YYYY/MM/DD or something similar.
I thought of doing it manually like:
x = df_all['CREATION_DATE'].str[:2] + df_all['CREATION_DATE'].str[3:5] + "-" + \
df_all['CREATION_DATE'].str[5] + df_all['CREATION_DATE'].str[7] + "-" + df_all['CREATION_DATE'].str[8:]
print(x)
What's a more creative way of doing this? Could it be done with datetime module?
I believe this is what you want. First replace the , with nothing, so you get a yyyymmdd format, and then change it to datetime with pd.to_datetime by passing the correct format. One liner:
df['dates'] = pd.to_datetime(df['dates'].str.replace(',',''),format='%Y%m%d')
Full explanation:
import pandas as pd
a = {'dates':['20,190,927','20,191,114'],'values':[1,2]}
df = pd.DataFrame(a)
print(df)
Output, here's how the original dataframe looks like:
dates values
0 20,190,927 1
1 20,191,114 2
df['dates'] = df['dates'].str.replace(',','')
df['dates'] = pd.to_datetime(df['dates'],format='%Y%m%d')
print(df)
print(df.info())
Output of the newly formatted dataframe:
dates values
0 2019-09-27 1
1 2019-11-14 2
Printing .info() to ensure we have the correct format:
dates 2 non-null datetime64[ns]
values 2 non-null int64
Hope this helps,
date=['20,190,927','20,190,928','20,190,929']
df3=pd.DataFrame(date,columns=['Date'])
df3['Date']=df3['Date'].replace('\,','',regex=True)
df3['Date']=pd.to_datetime(df3['Date'])
Related
Here is my problem.
I have a dataframe imported from a .xlsx file. It contains one column with dates but the problem is it is not presented as datetime.
For instance, in the first line, it is the date (format: DD//MM/YYYY (type str) and the 24 following lines are the hours (format: xh (h for hour, x is between 0 and 23)). This is repeated for three years.
I would like to transform this column so as the cells will be datetime in format YY-MM-DD HH:MM:SS.
First of all, I created a dataframe df2 containing the hours:
indexNames = df1[df1['Hour'].str.contains('/')].index
df2= df1.drop(indexNames)
I transformed it to get it as datetime format HH:MM
# Conserving the number
new = df2["Hour"].str.split("h", n = 1, expand = True)
new["new_Hour"]= new[0]
# Dropping old Name columns
df2.drop(columns = ["Hour"], inplace = True)
# Transforming in datetime format
df2['new_Hour'] = pd.to_datetime(df2['new_Hour'], format="%H")
df2['new_Hour'] = df2['new_Hour'].astype(str)
nouv = df2['new_Hour'].str.split(' ', n=1, expand = True)
df2["Hour"]= nouv[1]
df2.drop(columns = ["new_Hour"], inplace = True)
Then, I created a second dataframe having the date and added separated columns for corresponding year, month and day:
df3= df1.loc[df1['Hour'].str.contains('/')].copy()
df3['Hour'] = pd.to_datetime(df3['Hour'], format="%d/%m/%Y")
df3['year'] = df3['Hour'].dt.year
df3['month'] = df3['Hour'].dt.month
df3['day'] = df3['Hour'].dt.day
Here comes my problem,
df3 indexes are strating at 0 and taking +25 at each line. It means df3.index[0] = 0, df3.index[1] = 25, df3.index[2] = 50 etc
df2 indexes are starting at 1 and more genarally, indexes of df3 are missing.
I would like to add the corresponding date of df3 to the corresponding hours of df2.
After having reseted indexes of ddf2 and df3, I tried:
df4 = df2.copy()
df4['year'] = 2019
df4= df4.reset_index(drop = True)
for i in range(len(df3)-1):
df4['year'].iloc[df3.index[i]:df3.index[i+1]] = df3['year'][i]
But I get copy problems and probably indexes problems too.
Hope you could help me, thanks.
you might want to start out with a cleaner way to create a datetime column? e.g. like
import pandas as pd
# dummy sample...
df = pd.DataFrame({'Hour': ["10/12/2013", "0", "1", "3",
"11/12/2013", "0", "1", "3"]})
# make a date column, forward-fill the dates
df['Datetime'] = pd.to_datetime(df['Hour'], format="%d/%m/%Y", errors='coerce').fillna(method="ffill")
# now we can add the hour
df['Datetime'] = df['Datetime'] + pd.to_timedelta(pd.to_numeric(df['Hour'], errors='coerce'), unit='h')
# and optionally drop nans in the Datetime column, i.e. where we had dates initially
df = df[df["Datetime"].notna()].reset_index(drop=True)
df
Hour Datetime
0 0 2013-12-10 00:00:00
1 1 2013-12-10 01:00:00
2 3 2013-12-10 03:00:00
3 0 2013-12-11 00:00:00
4 1 2013-12-11 01:00:00
5 3 2013-12-11 03:00:00
I have followed the instructions from this thread, but have run into issues.
Converting month number to datetime in pandas
I think it may have to do with having an additional variable in my dataframe but I am not sure. Here is my dataframe:
0 Month Temp
1 0 2
2 1 4
3 2 3
What I want is:
0 Month Temp
1 1990-01 2
2 1990-02 4
3 1990-03 3
Here is what I have tried:
df= pd.to_datetime('1990-' + df.Month.astype(int).astype(str) + '-1', format = '%Y-%m')
And I get this error:
ValueError: time data 1990-0-1 doesn't match format specified
IIUC, we can manually create your datetime object then format it as your expected output:
m = np.where(df['Month'].eq(0),
df['Month'].add(1), df['Month']
).astype(int).astype(str)
df['date'] = pd.to_datetime(
"1900" + "-" + pd.Series(m), format="%Y-%m"
).dt.strftime("%Y-%m")
print(df)
Month Temp date
0 0 2 1900-01
1 1 4 1900-02
2 2 3 1900-03
Try .dt.strftime() to show how to display the date, because datetime values are by default stored in %Y-%m-%d 00:00:00 format.
import pandas as pd
df= pd.DataFrame({'month':[1,2,3]})
df['date']=pd.to_datetime(df['month'], format="%m").dt.strftime('%Y-%m')
print(df)
You have to explicitly tell pandas to add 1 to the months as they are from range 0-11 not 1-12 in your case.
df=pd.DataFrame({'month':[11,1,2,3,0]})
df['date']=pd.to_datetime(df['month']+1, format='%m').dt.strftime('1990-%m')
Here is my solution for you
import pandas as pd
Data = {
'Month' : [1,2,3],
'Temp' : [2,4,3]
}
data = pd.DataFrame(Data)
data['Month']= pd.to_datetime('1990-' + data.Month.astype(int).astype(str) + '-1', format = '%Y-%m').dt.to_period('M')
Month Temp
0 1990-01 2
1 1990-02 4
2 1990-03 3
If you want Month[0] means 1 then you can conditionally add this one
How adding a dig / in a string that looks like this:
012019
so it will look like this:
01/2019
Also add maybe day like
01/01/2019
The data:
import pandas as pd
df= pd.DataFrame({ "month": ["012019","152019","222019","142019","302019","012020"]})
My code:
df.month = df.month.apply(lambda x: '{:0>2}'.format(x.split('/')[0]))
But it does not work.
If I understood correctly, you just want to add a slash between 2nd and 3rd characters. then it's easy:
df['new'] = df.month.str.slice(0, 2) + '/' + df.month.str.slice(2)
IIUC just convert to datetime and use dt.strftime
df['month'] = pd.to_datetime(df['month'],format='%d%Y').dt.strftime('%d/%Y')
output:
print(df)
month
0 01/2019
1 15/2019
2 22/2019
3 14/2019
4 30/2019
5 01/2020
if you want to add a month as well just add it to your string
month = '01'
df['month'] = pd.to_datetime(df['month'].astype(str) +
month,format='%d%Y%m').dt.strftime('%m/%d/%Y')
print(df)
month
0 01/01/2019
1 01/15/2019
2 01/22/2019
3 01/14/2019
4 01/30/2019
5 01/01/2020
I've written an if-else function that, based on the number of the quarter, concatenates a string (i.e. '1/1/') with an integer converted to a string (i.e. str(2017)). I have three data frames I want to use this on. Two of the data frames produce the expected result (i.e. '1/1/2017'). The last data frame produces the following '1/1/2017.0'which makes it not convert to a datetime.
I'm at a loss because based on the dtypes, all three dataframes list both quarter and year as int64, and all three dataframes originally come from the same csv.
My first guess was that I had converted my years to a float at some point when I was preparing the last data frame. I tried to ensure that the year column was an integer with .astype(). The year column is listed under .dtypes as an int64 before and after the function is applied.
Data Frame
from pandas import DataFrame
Data = {'quarter': [1,2,3,4],
'year': [2017,2017,2017,2017]}
df = DataFrame(Data, columns = ['quarter', 'year'])
This is the function I am using
def f(row):
if row['quarter'] == 1:
val = '1/1/' + str(row['year'])
elif row['quarter'] == 2:
val = '4/1/' + str(row['year'])
elif row['quarter'] == 3:
val = '7/1/' + str(row['year'])
else:
val = '10/1/' + str(row['year'])
return val
My expected result would be '1/1/2017', '4/1/2017', '7/1/2017', '10/1/2017'
I don't receive any error messages or warnings.
Not sure why your code doesn't work with the third dataset, but you could use pandas' functions instead of writing your own. It might resolve your problem.
>>> df['date'] = pd.to_datetime(
... df['year'].astype(str).str.cat(df['quarter'].astype(str), sep='Q'))
>>> df
quarter year date
0 1 2017 2017-01-01
1 2 2017 2017-04-01
2 3 2017 2017-07-01
3 4 2017 2017-10-01
You could change date format like:
>>> df['date'].dt.strftime('%m/%d/%Y')
0 01/01/2017
1 04/01/2017
2 07/01/2017
3 10/01/2017
Name: date, dtype: object
I have a dataframe column with datetime data in 1980-12-11T00:00:00 format.
I need to convert the whole column to DD/MM/YYY string format.
Is there any easy code for this?
Creating a working example:
df = pd.DataFrame({'date':['1980-12-11T00:00:00', '1990-12-11T00:00:00', '2000-12-11T00:00:00']})
print(df)
date
0 1980-12-11T00:00:00
1 1990-12-11T00:00:00
2 2000-12-11T00:00:00
Convert the column to datetime by pd.to_datetime() and invoke strftime()
df['date_new']=pd.to_datetime(df.date).dt.strftime('%d/%m/%Y')
print(df)
date date_new
0 1980-12-11T00:00:00 11/12/1980
1 1990-12-11T00:00:00 11/12/1990
2 2000-12-11T00:00:00 11/12/2000
You can use pd.to_datetime to convert string to datetime data
pd.to_datetime(df['col'])
You can also pass specific format as:
pd.to_datetime(df['col']).dt.strftime('%d/%m/%Y')
When using pandas, try pandas.to_datetime:
import pandas as pd
df = pd.DataFrame({'date': ['1980-12-%sT00:00:00'%i for i in range(10,20)]})
df.date = pd.to_datetime(df.date).dt.strftime("%d/%m/%Y")
print(df)
date
0 10/12/1980
1 11/12/1980
2 12/12/1980
3 13/12/1980
4 14/12/1980
5 15/12/1980
6 16/12/1980
7 17/12/1980
8 18/12/1980
9 19/12/1980