I have a csv and am reading the csv using the following code
df1 = pd.read_csv('dataDate.csv');
df1
Out[57]:
Date
0 01/01/2019
1 01/01/2019
2 01/01/2019
3 01/01/2019
4 01/01/2019
5 01/01/2019
Currently the column has dtype : dtype('O') I am now doing the following command to convert the following date to datetime in the format %d/%m/%Y
df1.Date = pd.to_datetime(df1.Date, format='%d/%m/%Y')
It produces output as :
9 2019-01-01
35 2019-01-01
48 2019-01-01
38 2019-01-01
18 2019-01-01
36 2019-01-01
31 2019-01-01
6 2019-01-01
Not sure what is wrong here, I want the same format as the input for my process. Can anyone tell what's wrong with the same?
Thanks
The produced output is the default format for pandas' datetime object, so there is nothing wrong. Yet, you can play around with the format and produce a datetime string with strftime method. This built-in method for python is implemented in pandas.
You can try the following:
df1.Date = pd.to_datetime(df1.Date, format='%d/%m/%Y')
df1['my_date'] = df1.Date.dt.strftime('%d/%m/%Y')
So that 'my_date' column has the desired format. Yet, you cannot do datetime operations with that column, but you can use for representation. You can work with Date column for your mathematical operations, etc. and represent them with my_date column.
Related
I have the following date column that I would like to transform to a pandas datetime object. Is it possible to do this with weekly data? For example, 1-2018 stands for week 1 in 2018 and so on. I tried the following conversion but I get an error message: Cannot use '%W' or '%U' without day and year
import pandas as pd
df1 = pd.DataFrame(columns=["date"])
df1['date'] = ["1-2018", "1-2018", "2-2018", "2-2018", "3-2018", "4-2018", "4-2018", "4-2018"]
df1["date"] = pd.to_datetime(df1["date"], format = "%W-%Y")
You need to add a day to the datetime format
df1["date"] = pd.to_datetime('0' + df1["date"], format='%w%W-%Y')
print(df1)
Output
date
0 2018-01-07
1 2018-01-07
2 2018-01-14
3 2018-01-14
4 2018-01-21
5 2018-01-28
6 2018-01-28
7 2018-01-28
As the error message says, you need to specify the day of the week by adding %w :
df1["date"] = pd.to_datetime( '0'+df1.date, format='%w%W-%Y')
I have been working on a dataframe where one of the column (flight_time) contains flight duration, all of the strings are in 3 different formats for example:
"07 h 05 m"
"13h 55m"
"2h 23m"
I would like to change them all to HH:MM format and finally change the data type from object to time.
Can somebody tell me how to do this?
It's not possible to have a time dtype. You can have a datetime64 (pd.DatetimeIndex) or a timedelta64 (pd.TimedeltaIndex). In your case, I think it's better to have a TimedeltaIndex so you can use the pd.to_timedelta function:
df['flight_time2'] = pd.to_timedelta(df['flight_time'])
print(df)
# Output
flight_time flight_time2
0 07 h 05 m 0 days 07:05:00
1 13h 55m 0 days 13:55:00
2 2h 23m 0 days 02:23:00
If you want individual time from datetime.time, use:
df['flight_time2'] = pd.to_datetime(df['flight_time'].str.findall('\d+')
.str.join(':')).dt.time
print(df)
# Output
flight_time flight_time2
0 07 h 05 m 07:05:00
1 13h 55m 13:55:00
2 2h 23m 02:23:00
In this case, flight_time2 has still object dtype:
>>> df.dtypes
flight_time object
flight_time2 object
dtype: object
But each value is an instance of datetime.time:
>>> df.loc[0, 'flight_time2']
datetime.time(7, 5)
In the first case, you can use vectorized method while in the second version is not possible. Furthermore, you loose the dt accessor.
I have a column where there is only time. After reading that CSV file i have converted that column to datetime datatype as it was object when i read it in jupyter notebook. When i try to filter i am getting error like below
TypeError: Index must be DatetimeIndex
code
newdata = newdata['APPOINTMENT_TIME'].between_time('14:30:00', '20:00:00')
sample_data
APPOINTMENT_TIME Id
13:30:00 1
15:10:00 2
18:50:00 3
14:10:00 4
14:00:00 5
Here i am trying display the rows whose appointment_time is between 14:30:00 to 20:00:00
datatype info
Could anyone help. Thanks in advance
between_time is a special method that works with datetime objects as index, which is not your case. It would be useful if you had data like 2021-12-21 13:30:00
In your case, you can just use the between method on strings and the fact that times with your format HH:MM:SS will be naturally sorted:
filtered_data = newdata[newdata['APPOINTMENT_TIME'].between('14:30:00', '20:00:00')]
Output:
APPOINTMENT_TIME Id
1 15:10:00 2
2 18:50:00 3
NB. You can't use a range that starts before midnight and ends after.
I have a dateset with a date column called "zeitpunkt". The time is recorded in utc+1. I have other similar files where I just use parse_dates=["name of column"] in pd.read_csv and it works fine. But with this csv I can't get the datetime column recognized as a datetime column, it still is an object. Any ideas why I am not able to convert it to datetime?
My goal is to access specific days like the mean of monday or the mean of march.
The head of the column looks like this:
0 2019-01-01 00:30:00+01
1 2019-01-01 00:35:00+01
2 2019-01-01 00:40:00+01
3 2019-01-01 00:45:00+01
4 2019-01-01 00:50:00+01
dtypes still shows me object after I use either parse_dates=True or parse_dates=["zeitpunkt"]
I had the below sheet data in a excel file:
id data_1 data_2
1 2018/11/11 00:00 123
2 123 2018/11/2 00:00
The date in excel actully is a float, so I want change it to str by using the following syntax:
df = df.astype(dtype=str)
But the pandas change the date format YYYY/MM/DD to YYYY-MM-DD,so I get this in the output:
id data_1 data_2
1 2018-11-11 00:00 123
2 123 2018-11-2 00:00
How do change all dates to str and keep it format as YYYY/MM/DD?
I'm unable to use df.to_datetime() or some syntax like this, because not all dates are in a particular column.And I don't want to traverse all columns to achieve it.
The only way I konw is use regex:
df.replace(['((?<=[0-9]{4})-(?=([0-9]{2}-[0-9]{2})))|((?<=[0-9]{4}-[0-9]{2})-(?=[0-9]{2}))'], ['/'], regex=True)
But it will lead to errors while I have a YYYY-MM-DD data in some other str data.
I only want change the date type in sheet, and df.astype can do it. The only problem is I want YYYY/MM/DD instead of YYYY-MM-DD.
In general, I want change all dates in sheet to type of str. And format it to YYYY/MM/DD HH:MM:SS. astype can achieve the first step.
Is there a simple and quick way to achieve this?
Think you for reading.
consider you have a dataframe with datetime objects but also random integers:
df = pd.DataFrame(pd.date_range(dt.datetime(2018,1,1), dt.datetime(2018,1,6)))
df[0][0] = 123
print(df)
0
0 123
1 2018-01-02
2 2018-01-03
3 2018-01-04
4 2018-01-05
5 2018-01-06
now you can create a new column with the datetime in the desired format by using df.apply and this function convert:
def convert(x):
try:
return x.strftime('%Y/%m/%d')
except AttributeError:
return str(x)
df['date'] = df[0].apply(convert)
print(df)
0 date
0 123 123
1 2018-01-02 00:00:00 2018/01/02
2 2018-01-03 00:00:00 2018/01/03
3 2018-01-04 00:00:00 2018/01/04
4 2018-01-05 00:00:00 2018/01/05
5 2018-01-06 00:00:00 2018/01/06
Note: it might be a better idea to clean up the dates first to avoid unexpected behavior. For example with this
df[df[0].apply(lambda x: type(x)==pd._libs.tslibs.timestamps.Timestamp)]