I have csv file and in 'Time' column, time data is saved in integer type like
7
20
132
4321
123456
...
and I have to convert datatime in python like
00:00:07
00:00:20
00:01:32
00:43:21
12:34:56
...
and size of data is almost 250,000,,,
How do I convert this number to a datetime?
I tried but failed
change_time=str(int(df_NPA_2020['TIME'])).zfill(6)
change_time=change_time[:2]+":"+change_time[2:4]+":"+change_time[4:]
change_time
and
change_time=df_NPA_2020['ch_time'] = df_NPA_2020['TIME'].apply(lambda x: pd.to_datetime(str(x), format='%H:%M:%S'))
You're almost there. You have to use .astype(str) method to convert a column as string and not str(df_NPA_2020['TIME']). The latter is like a print.
df_NPA_2020['ch_time'] = pd.to_datetime(df_NPA_2020['TIME'].astype(str).str.zfill(6), format='%H%M%S').dt.time
print(df_NPA_2020)
# Output
TIME ch_time
0 7 1900-01-01 00:00:07
1 20 1900-01-01 00:00:20
2 132 1900-01-01 00:01:32
3 4321 1900-01-01 00:43:21
4 123456 1900-01-01 12:34:56
Parse the number into a datetime, then format it:
import pandas as pd
df = pd.DataFrame([7,20,132,4321,123456], columns=['Time'])
print(df)
df.Time = df.Time.apply(lambda x: pd.to_datetime(f'{x:06}', format='%H%M%S')).dt.strftime('%H:%M:%S')
print(df)
Output:
Time
0 7
1 20
2 132
3 4321
4 123456
Time
0 00:00:07
1 00:00:20
2 00:01:32
3 00:43:21
4 12:34:56
I am currently having issues with date-time format, particularly converting string input to the correct python datetime format
Date/Time Dry_Temp[C] Wet_Temp[C] Solar_Diffuse_Rate[[W/m2]] \
0 01/01 00:10:00 8.45 8.237306 0.0
1 01/01 00:20:00 7.30 6.968360 0.0
2 01/01 00:30:00 6.15 5.710239 0.0
3 01/01 00:40:00 5.00 4.462898 0.0
4 01/01 00:50:00 3.85 3.226244 0.0
These are current examples of timestamps I have in my time, I have tried splitting date and time such that I now have the following columns:
WC_Humidity[%] WC_Htgsetp[C] WC_Clgsetp[C] Date Time
0 55.553640 18 26 1900-01-01 00:10:00
1 54.204342 18 26 1900-01-01 00:20:00
2 51.896272 18 26 1900-01-01 00:30:00
3 49.007770 18 26 1900-01-01 00:40:00
4 45.825810 18 26 1900-01-01 00:50:00
I have managed to get the year into datetime format, but there are still 2 problems to resolve:
the data was not recorded in 1900, so I would like to change the year in the Date,
I get the following error whent rying to convert time into time datetime python format
pandas/_libs/tslibs/strptime.pyx in pandas._libs.tslibs.strptime.array_strptime()
ValueError: time data '00:00:00' does not match format ' %m/%d %H:%M:%S' (match)
I tried having 24:00:00, however, python didn't like that either...
preferences:
I would prefer if they were both in the same cell without having to split this information into two columns.
I would also like to get rid of the seconds data as the data was recorded in 10 min intervals so there is no need for seconds in my case.
Any help would be greatly appreciated.
the data was not recorded in 1900, so I would like to change the year in the Date,
datetime.datetime.replace method of datetime.datetime instance is used for this task consider following example:
import pandas as pd
df = pd.DataFrame({"when":pd.to_datetime(["1900-01-01","1900-02-02","1900-03-03"])})
df["when"] = df["when"].apply(lambda x:x.replace(year=2000))
print(df)
output
when
0 2000-01-01
1 2000-02-02
2 2000-03-03
Note that it can be used also without pandas for example
import datetime
d = datetime.datetime.strptime("","") # use all default values which result in midnight of Jan 1 of year 1900
print(d) # 1900-01-01 00:00:00
d = d.replace(year=2000)
print(d) # 2000-01-01 00:00:00
I would like to calculate a mean of a time delta serie excluding 00:00:00 values.
Then this is my time serie:
1 00:28:00
3 01:57:00
5 00:00:00
7 01:27:00
9 00:00:00
11 01:30:00
I try to replace 5 and 9 row per NaN and then apply .mean() to the serie. mean() doesn´t include NaN values and I get the desired value.
How can I do that stuff?
I´am trying:
`df["time_column"].replace('0 days 00:00:00', np.NaN).mean()`
but no values are replaced
One idea is use 0 Timedelta object:
out = df["time_column"].replace(pd.Timedelta(0), np.NaN).mean()
print (out)
0 days 01:20:30
I have a pandas DataFrame that looks like this:
pta ptd tpl_num
4 05:17 05:18 0
6 05:29:30 05:30 1
9 05:42 05:44:30 2
11 05:53 05:54 3
12 06:03 06:05:30 4
I'm trying to format pta and ptd to %H:%M:%S using this:
df['pta'] = pandas.to_datetime(df['pta'], format="%H:%M:%S")
df['ptd'] = pandas.to_datetime(df['ptd'], format="%H:%M:%S")
This gives:
ValueError: time data '05:17' does not match format '%H:%M:%S' (match)
Makes sense, as some of my timestamps don't have :00 in the seconds column. Is there any way to pad these at the end? Or will I need to pad my input data manually/before adding it to the DataFrame? I've seen plenty of answers that pad leading zeroes, but couldn't find one for this.
Some dates do not match the specified format and hence are not correctly parsed. Let pandas parse them for you, and then use dt.strftime to format them as you want:
df['pta'] = pd.to_datetime(df['pta']).dt.strftime("%H:%M:%S")
df['ptd'] = pd.to_datetime(df['ptd']).dt.strftime("%H:%M:%S")
print(df)
pta ptd tpl_num
4 05:17:00 05:18:00 0
6 05:29:30 05:30:00 1
9 05:42:00 05:44:30 2
11 05:53:00 05:54:00 3
12 06:03:00 06:05:30 4
If you only want the padded strings, you can do:
df['pta'].add(':00').str[:8]
Output:
4 05:17:00
6 05:29:30
9 05:42:00
11 05:53:00
12 06:03:00
Name: pta, dtype: object
Also, for time only, you should consider using pd.to_timedelta instead of pd.to_datetime.
I have a csv file that I am trying to import into pandas.
There are two columns of intrest. date and hour and are the first two cols.
E.g.
date,hour,...
10-1-2013,0,
10-1-2013,0,
10-1-2013,0,
10-1-2013,1,
10-1-2013,1,
How do I import using pandas so that that hour and date is combined or is that best done after the initial import?
df = DataFrame.from_csv('bingads.csv', sep=',')
If I do the initial import how do I combine the two as a date and then delete the hour?
Thanks
Define your own date_parser:
In [291]: from dateutil.parser import parse
In [292]: import datetime as dt
In [293]: def date_parser(x):
.....: date, hour = x.split(' ')
.....: return parse(date) + dt.timedelta(0, 3600*int(hour))
In [298]: pd.read_csv('test.csv', parse_dates=[[0,1]], date_parser=date_parser)
Out[298]:
date_hour a b c
0 2013-10-01 00:00:00 1 1 1
1 2013-10-01 00:00:00 2 2 2
2 2013-10-01 00:00:00 3 3 3
3 2013-10-01 01:00:00 4 4 4
4 2013-10-01 01:00:00 5 5 5
Apply read_csv instead of read_clipboard to handle your actual data:
>>> df = pd.read_clipboard(sep=',')
>>> df['date'] = pd.to_datetime(df.date) + pd.to_timedelta(df.hour, unit='D')/24
>>> del df['hour']
>>> df
date ...
0 2013-10-01 00:00:00 NaN
1 2013-10-01 00:00:00 NaN
2 2013-10-01 00:00:00 NaN
3 2013-10-01 01:00:00 NaN
4 2013-10-01 01:00:00 NaN
[5 rows x 2 columns]
Take a look at the parse_dates argument which pandas.read_csv accepts.
You can do something like:
df = pandas.read_csv('some.csv', parse_dates=True)
# in which case pandas will parse all columns where it finds dates
df = pandas.read_csv('some.csv', parse_dates=[i,j,k])
# in which case pandas will parse the i, j and kth columns for dates
Since you are only using the two columns from the cdv file and combining those into one, I would squeeze into a series of datetime objects like so:
import pandas as pd
from StringIO import StringIO
import datetime as dt
txt='''\
date,hour,A,B
10-1-2013,0,1,6
10-1-2013,0,2,7
10-1-2013,0,3,8
10-1-2013,1,4,9
10-1-2013,1,5,10'''
def date_parser(date, hour):
dates=[]
for ed, eh in zip(date, hour):
month, day, year=list(map(int, ed.split('-')))
hour=int(eh)
dates.append(dt.datetime(year, month, day, hour))
return dates
p=pd.read_csv(StringIO(txt), usecols=[0,1],
parse_dates=[[0,1]], date_parser=date_parser, squeeze=True)
print p
Prints:
0 2013-10-01 00:00:00
1 2013-10-01 00:00:00
2 2013-10-01 00:00:00
3 2013-10-01 01:00:00
4 2013-10-01 01:00:00
Name: date_hour, dtype: datetime64[ns]