I have a dateset with a date column called "zeitpunkt". The time is recorded in utc+1. I have other similar files where I just use parse_dates=["name of column"] in pd.read_csv and it works fine. But with this csv I can't get the datetime column recognized as a datetime column, it still is an object. Any ideas why I am not able to convert it to datetime?
My goal is to access specific days like the mean of monday or the mean of march.
The head of the column looks like this:
0 2019-01-01 00:30:00+01
1 2019-01-01 00:35:00+01
2 2019-01-01 00:40:00+01
3 2019-01-01 00:45:00+01
4 2019-01-01 00:50:00+01
dtypes still shows me object after I use either parse_dates=True or parse_dates=["zeitpunkt"]
Related
I'm trying to follow the solution provided Find all months between two date columns and generate row for each month and I'm hitting a wall as I'm getting an error. What I want to do is create a Year-Month column for each year-month that exists in the startdate and enddate range for each row. When I tried to follow the above linked Stack, I get the error
TypeError: Cannot convert input ... Name: ServiceStartDate, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp
I have no idea how to fix this. Please help!
Sample Data
ID
StartDate
EndDate
1
311566
2021-10-01
2024-09-30
2
235216
2020-11-01
2020-11-30
3
157054
2021-10-01
2023-09-30
4
159954
2021-01-01
2023-12-31
5
255815
2019-11-01
2022-10-31
I have found a solution to my problem (sorry for the long response delay). The problem was that my data had a time stamp associated with it. I needed to change the date field to y/m/-01 format using the following code.
df['date] = df['date'].apply(lambda x: x.strftime('%Y-%m-01'))
Then I used the solution below to get all the months/years that exist between the min and max dates as a single column.
df.merge(df.apply(lambda s: pd.date_range(df['date'].min(),
df['date'].max(), freq='MS'), 1).explode("").rename('Month'),
left_index=True, right_index=True)
I have a column where there is only time. After reading that CSV file i have converted that column to datetime datatype as it was object when i read it in jupyter notebook. When i try to filter i am getting error like below
TypeError: Index must be DatetimeIndex
code
newdata = newdata['APPOINTMENT_TIME'].between_time('14:30:00', '20:00:00')
sample_data
APPOINTMENT_TIME Id
13:30:00 1
15:10:00 2
18:50:00 3
14:10:00 4
14:00:00 5
Here i am trying display the rows whose appointment_time is between 14:30:00 to 20:00:00
datatype info
Could anyone help. Thanks in advance
between_time is a special method that works with datetime objects as index, which is not your case. It would be useful if you had data like 2021-12-21 13:30:00
In your case, you can just use the between method on strings and the fact that times with your format HH:MM:SS will be naturally sorted:
filtered_data = newdata[newdata['APPOINTMENT_TIME'].between('14:30:00', '20:00:00')]
Output:
APPOINTMENT_TIME Id
1 15:10:00 2
2 18:50:00 3
NB. You can't use a range that starts before midnight and ends after.
Lets say I have a idx=pd.DatatimeIndex with one minute frequency. I also have a list of bad dates (each are of type pd.Timestamp without the time information) that I want to remove from the original idx. How do I do that in pandas?
Use normalize to remove the time part from your index so you can do a simple ~ + isin selection, i.e. find the dates not in that bad list. You can further ensure your list of dates don't have a time part with the same [x.normalize() for x in bad_dates] if you need to be extra safe.
Sample Data
import pandas as pd
df = pd.DataFrame(range(9), index=pd.date_range('2010-01-01', freq='11H', periods=9))
bad_dates = [pd.Timestamp('2010-01-02'), pd.Timestamp('2010-01-03')]
Code
df[~df.index.normalize().isin(bad_dates)]
# 0
#2010-01-01 00:00:00 0
#2010-01-01 11:00:00 1
#2010-01-01 22:00:00 2
#2010-01-04 05:00:00 7
#2010-01-04 16:00:00 8
I have a csv and am reading the csv using the following code
df1 = pd.read_csv('dataDate.csv');
df1
Out[57]:
Date
0 01/01/2019
1 01/01/2019
2 01/01/2019
3 01/01/2019
4 01/01/2019
5 01/01/2019
Currently the column has dtype : dtype('O') I am now doing the following command to convert the following date to datetime in the format %d/%m/%Y
df1.Date = pd.to_datetime(df1.Date, format='%d/%m/%Y')
It produces output as :
9 2019-01-01
35 2019-01-01
48 2019-01-01
38 2019-01-01
18 2019-01-01
36 2019-01-01
31 2019-01-01
6 2019-01-01
Not sure what is wrong here, I want the same format as the input for my process. Can anyone tell what's wrong with the same?
Thanks
The produced output is the default format for pandas' datetime object, so there is nothing wrong. Yet, you can play around with the format and produce a datetime string with strftime method. This built-in method for python is implemented in pandas.
You can try the following:
df1.Date = pd.to_datetime(df1.Date, format='%d/%m/%Y')
df1['my_date'] = df1.Date.dt.strftime('%d/%m/%Y')
So that 'my_date' column has the desired format. Yet, you cannot do datetime operations with that column, but you can use for representation. You can work with Date column for your mathematical operations, etc. and represent them with my_date column.
I have a couple of million DateTime objects in pandas. I could not find anything in the documentation for exploratory data analysis (EDA).
It looks like every single row has the same time in either data frame:
DF1
Timestamp('2018-02-20 00:00:00')
or
DF2
Timestamp('2018-01-01 05:00:00')
is there a way to use pandas to go through each column and check to see if there is a difference in the hours/minutes/seconds?
Everything I have found is about calculating differences between times.
I have tried a couple of basic techniques but all I get back are simple descriptive numbers.
min(data['date'])
data['date'].nunique()
I have tried:
print(data['TIMESTAMP_UTC'])
Which does show some dates that have different hours, but I need a way to manage this information:
0 2018-01-16 05:00:00
1 2018-05-04 04:00:00
2 2018-10-22 04:00:00
3 2018-01-02 05:00:00
4 2018-01-03 05:00:00
5 2018-01-04 05:00:00
6 2018-01-05 05:00:00
......
Ideally, I am looking for something that could spit out a .value_counts() of dates that deviate from everything else
You can use the .apply() method to transform the format from str to datetime. Then you use datetime to handle it.
To convert your column values into datetime :
df['TIMESTAMP_UTC'] = pd.to_datetime(df['TIMESTAMP_UTC'] )
df['TIMESTAMP_UTC'] = df['TIMESTAMP_UTC'].apply(lambda x: datetime.strptime(x, "%Y-%b-%d %H:%M:%S"))
then you can use the power of datetime to compare or extract information like this to extract hours for instance:
df['TIMESTAMP_UTC'].dt.day