Given the following data frame:
Name Loc DateTime RecordId RST
0 SwingBridge (47.57, -122.35) 08/01/2016 12:00:00 AM 1854751 76.06
1 Roosevelt (47.69, -122.31) 08/01/2016 12:00:00 AM 1941744 2.26
which is read in from a csv file using
filename='my.csv'
df=pd.read_csv(filename,
dtype={'Name': str, 'Loc': object,
'RecordId':int, 'RST': float})
If I read the dates using parse_date=['DateTime'] or recast them as
df['DateTime'] = pd.to_datetime(df['DateTime']), why do the bloody hours disappear??
Name Loc DateTime RecordId RST
0 SwingBridge (47.57, -122.35) 08/01/2016 1854751 76.06
1 Roosevelt (47.69, -122.31) 08/01/2016 1941744 2.26
This is just a display issue, by default when the time is equal to 00:00 or in your case 12:00 AM then it is not displayed, if you look at the individual element's value, the time component is still there.
Example:
In[4]:
import pandas as pd
import io
t="""DateTime
08/01/2016 12:00:00 AM"""
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
df
Out[4]:
DateTime
0 2016-08-01
In[5]:
df['DateTime'].iloc[0]
Out[5]: Timestamp('2016-08-01 00:00:00')
Related
I have to parse an xml file which gives me datetimes in Excel style; for example: 42580.3333333333.
Does Pandas provide a way to convert that number into a regular datetime object?
OK I think the easiest thing is to construct a TimedeltaIndex from the floats and add this to the scalar datetime for 1900,1,1:
In [85]:
import datetime as dt
import pandas as pd
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df
Out[85]:
date
0 42580.333333
1 10023.000000
In [86]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1900,1,1)
df
Out[86]:
date real_date
0 42580.333333 2016-07-31 07:59:59.971200
1 10023.000000 1927-06-12 00:00:00.000000
OK it seems that excel is a bit weird with it's dates thanks #ayhan:
In [89]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1899, 12, 30)
df
Out[89]:
date real_date
0 42580.333333 2016-07-29 07:59:59.971200
1 10023.000000 1927-06-10 00:00:00.000000
See related: How to convert a python datetime.datetime to excel serial date number
you can directly parse with pd.to_datetime, with keywords unit='D' and origin='1899-12-30':
import pandas as pd
df = pd.DataFrame({'xldate': [42580.3333333333]})
df['date'] = pd.to_datetime(df['xldate'], unit='D', origin='1899-12-30')
df['date']
Out[2]:
0 2016-07-29 07:59:59.999971200
Name: date, dtype: datetime64[ns]
further reading:
What is story behind December 30, 1899 as base date?
an answer from Martijn Pieters how to handle excel ordinal value < 60 correctly
You can use the 3rd party xlrd library before passing to pd.to_datetime:
import xlrd
def read_date(date):
return xlrd.xldate.xldate_as_datetime(date, 0)
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df['new'] = pd.to_datetime(df['date'].apply(read_date), errors='coerce')
print(df)
date new
0 42580.333333 2016-07-29 08:00:00
1 10023.000000 1927-06-10 00:00:00
I have to parse an xml file which gives me datetimes in Excel style; for example: 42580.3333333333.
Does Pandas provide a way to convert that number into a regular datetime object?
OK I think the easiest thing is to construct a TimedeltaIndex from the floats and add this to the scalar datetime for 1900,1,1:
In [85]:
import datetime as dt
import pandas as pd
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df
Out[85]:
date
0 42580.333333
1 10023.000000
In [86]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1900,1,1)
df
Out[86]:
date real_date
0 42580.333333 2016-07-31 07:59:59.971200
1 10023.000000 1927-06-12 00:00:00.000000
OK it seems that excel is a bit weird with it's dates thanks #ayhan:
In [89]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1899, 12, 30)
df
Out[89]:
date real_date
0 42580.333333 2016-07-29 07:59:59.971200
1 10023.000000 1927-06-10 00:00:00.000000
See related: How to convert a python datetime.datetime to excel serial date number
you can directly parse with pd.to_datetime, with keywords unit='D' and origin='1899-12-30':
import pandas as pd
df = pd.DataFrame({'xldate': [42580.3333333333]})
df['date'] = pd.to_datetime(df['xldate'], unit='D', origin='1899-12-30')
df['date']
Out[2]:
0 2016-07-29 07:59:59.999971200
Name: date, dtype: datetime64[ns]
further reading:
What is story behind December 30, 1899 as base date?
an answer from Martijn Pieters how to handle excel ordinal value < 60 correctly
You can use the 3rd party xlrd library before passing to pd.to_datetime:
import xlrd
def read_date(date):
return xlrd.xldate.xldate_as_datetime(date, 0)
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df['new'] = pd.to_datetime(df['date'].apply(read_date), errors='coerce')
print(df)
date new
0 42580.333333 2016-07-29 08:00:00
1 10023.000000 1927-06-10 00:00:00
I have a csv file, and want to use H2O to do DeepLearning. But it has some Chinese and datetime that when I finish my Deeplearning need to save output to csv, it can't return to original data.
I use small data to show my problem here.
In[1]: df = pd.DataFrame({'datetime':['2016-12-17 00:00:00'],'time':['00:00:30'],'month':['月'], 'weekend':['周六']})
print(df.dtypes)
df
out[1]: datetime object
time object
month object
weekend object
dtype: object
datetime time month weekend
0 2016-12-17 00:00:00 00:00:30 月 周六
In[2]: h2o_frame = h2o.H2OFrame(df);h2o_frame ;h2o_frame.types ;h2o_frame
C:\Users\thi\Anaconda3\lib\site-packages\h2o\utils\shared_utils.py:170: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
data = _handle_python_lists(python_obj.as_matrix().tolist(), -1)[1]
out[2]: Parse progress: |█████████████████████████████████████████████████████████| 100%
datetime time month weekend
2016-12-17 00:00:00 1970-01-01 00:00:30 <0xA4EB> <0xA9>P<0xA4BB>
the time I want it just only 00:00:30, any way to fix it?
month and weekends I don't find any way to let it show Chinese, but I still finish my deeplearning
But when I want to let h2oframe back to DataFrame and save to csv file, it save <0xA4EB> for me but not 月, and datetime change to int
In[3]: dff = h2o_frame.as_data_frame();dff
out[3]: datetime time month weekend
0 1481932800000 30000 <0xA4EB> <0xA9>P<0xA4BB>
How to correctly return character from h2oframe to DataFrame
How to correctly return datetime from h2oframe to DataFrame
One simplest way to solve this is, when you convet pandas frame to H2OFrame use argument column_types ,as below:
In [69]: col_types
Out[69]: ['categorical', 'categorical', 'categorical', 'categorical']
In [70]: h2o_frame = h2o.H2OFrame(df,column_types=col_types);h2o_frame ;h2o_frame.types ;h2o_frame
Parse progress: |█████████████████████████████████████████████████████████████████████████████| 100%
Out[70]:
datetime month time weekend
------------------- ------- -------- ---------
2016-12-17 00:00:00 月 00:00:30 周六
[1 row x 4 columns]
In [71]: dff = h2o_frame.as_data_frame();dff
Out[71]:
datetime month time weekend
0 2016-12-17 00:00:00 月 00:00:30 周六
allfiles = h2o.import_file(path='data/', pattern=".csv")
df = allfiles.as_data_frame()
df['datetime'] = pd.to_datetime(df["datetime"], unit='ms')
I read a csv data with pandas and now I would like to change the layout of my dataset. My dataset from excel looks like this:
I run the code with df = pd.read_csv(Location2)
This is what I get:
I would like to have a separated column for time and Watt and their values.
I looked at the documentation but I couldn't find something to make it work.
It seems as if you'd need to set up the correct delimiter that separates the two fields. Try adding delimiter=";" to the parameters
Use read_excel
df = pd.read_excel(Location2)
I think you need parameter sep in read_csv, because default separator is ,:
df = pd.read_csv(Location2, sep=';')
Sample:
import pandas as pd
from pandas.compat import StringIO
temp=u"""time;Watt
0;00:00:00;50
1;01:00:00;45
2;02:00:00;40
3;00:03:00;35"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep=";")
print (df)
time Watt
0 00:00:00 50
1 01:00:00 45
2 02:00:00 40
3 00:03:00 35
Then is possible convert time column to_timedelta:
df['time'] = pd.to_timedelta(df['time'])
print (df)
time Watt
0 00:00:00 50
1 01:00:00 45
2 02:00:00 40
3 00:03:00 35
print (df.dtypes)
time timedelta64[ns]
Watt int64
dtype: object
I have a dataset I'm analyzing in pandas where all data is binned monthly. The data originates from a MySQL database where all dates are in the format 'YYYY-MM-01', such that, for example, all rows for October 2013 would have "2013-10-01" in the month column.
I'm currently reading the data into pandas (via a .tsv dump of the MySQL table) with
data = pd.read_table(filename,header=None,names=('uid','iid','artist','tag','date'),index_col=indexes, parse_dates='date')
This is all fine, except for the fact that any subsequent analyses I run in which I do monthly resampling always represents dates using the end-of-month convention (i.e. data from October becomes '2013-10-31' instead of '2013-10-01'), but this can lead to inconsistencies where the original data has months labeled as 'YYYY-MM-01', while any resampled data will have the months labeled as 'YYYY-MM-31' (or '-30' or '-28', as appropriate).
My question is this: What is the easiest and/or fastest way I can convert all the dates in my dataframe to the end-of-month format from the outset? Keep in mind that the date is one of several indexes in a multi-index, not a column. I think my best bet is to use a modified date_parser in my in my pd.read_table call that always converts month to the end-of-month convention, but I'm not sure how to approach it.
Read your dates in exactly like you are doing.
Create some test data. I am setting the dates to the start of month, but it doesn't matter.
In [39]: df = DataFrame(np.random.randn(10,2),columns=list('AB'),
index=date_range('20130101',periods=10,freq='MS'))
In [40]: df
Out[40]:
A B
2013-01-01 -0.553482 0.049128
2013-02-01 0.337975 -0.035897
2013-03-01 -0.394849 -1.755323
2013-04-01 -0.555638 1.903388
2013-05-01 -0.087752 1.551916
2013-06-01 1.000943 -0.361248
2013-07-01 -1.855171 -2.215276
2013-08-01 -0.582643 1.661696
2013-09-01 0.501061 -1.455171
2013-10-01 1.343630 -2.008060
Force convert them to the end-of-month in time space regardless of the day
In [41]: df.index = df.index.to_period().to_timestamp('M')
In [42]: df
Out[42]:
A B
2013-01-31 -0.553482 0.049128
2013-02-28 0.337975 -0.035897
2013-03-31 -0.394849 -1.755323
2013-04-30 -0.555638 1.903388
2013-05-31 -0.087752 1.551916
2013-06-30 1.000943 -0.361248
2013-07-31 -1.855171 -2.215276
2013-08-31 -0.582643 1.661696
2013-09-30 0.501061 -1.455171
2013-10-31 1.343630 -2.008060
Back to the start
In [43]: df.index = df.index.to_period().to_timestamp('MS')
In [44]: df
Out[44]:
A B
2013-01-01 -0.553482 0.049128
2013-02-01 0.337975 -0.035897
2013-03-01 -0.394849 -1.755323
2013-04-01 -0.555638 1.903388
2013-05-01 -0.087752 1.551916
2013-06-01 1.000943 -0.361248
2013-07-01 -1.855171 -2.215276
2013-08-01 -0.582643 1.661696
2013-09-01 0.501061 -1.455171
2013-10-01 1.343630 -2.008060
You can also work with (and resample) as periods
In [45]: df.index = df.index.to_period()
In [46]: df
Out[46]:
A B
2013-01 -0.553482 0.049128
2013-02 0.337975 -0.035897
2013-03 -0.394849 -1.755323
2013-04 -0.555638 1.903388
2013-05 -0.087752 1.551916
2013-06 1.000943 -0.361248
2013-07 -1.855171 -2.215276
2013-08 -0.582643 1.661696
2013-09 0.501061 -1.455171
2013-10 1.343630 -2.008060
use replace() to change the day value. and you can get the last day of month using
from datetime import date
import calendar
d = date(2000,1,1)
d = d.replace(day=calendar.monthrange(d.year, d.month)[1])
UPDATE
I add some example for pandas.
sample file date.csv
2013-01-01, 1
2013-02-01, 2
ipython shell log.
In [27]: import pandas as pd
In [28]: from datetime import datetime, date
In [29]: import calendar
In [30]: def parse(dt):
dt = datetime.strptime(dt, '%Y-%m-%d')
dt = dt.replace(day=calendar.monthrange(dt.year, dt.month)[1])
return dt.date()
....:
In [31]: parse('2013-01-01')
Out[31]: datetime.date(2013, 1, 31)
In [32]: r = pd.read_csv('date.csv', header=None, names=('date', 'value'), parse_dates=['date'], date_parser=parse)
In [33]: r
Out[33]:
date value
0 2013-01-31 1
1 2013-02-28 2