Loading pandas DataFrame from dict of series possible glitch? - python

I'm constructing a dictionary using a dictionary comprehension which has read_csv embedded within it. This constructs the dictionary fine, but when I then push it into a DataFrame all of my data goes to null and the dates get very wacky as well. Here's sample code and output:
In [129]: a= {x.split(".")[0] : read_csv(x, parse_dates=True, index_col=[0])["Settle"] for x in t[:2]}
In [130]: a
Out[130]:
{'SPH2010': Date
2010-03-19 1172.95
2010-03-18 1166.10
2010-03-17 1165.70
2010-03-16 1159.50
2010-03-15 1150.30
2010-03-12 1151.30
2010-03-11 1150.60
2010-03-10 1145.70
2010-03-09 1140.50
2010-03-08 1137.10
2010-03-05 1136.50
2010-03-04 1122.30
2010-03-03 1118.60
2010-03-02 1117.40
2010-03-01 1114.60
...
2008-04-10 1370.4
2008-04-09 1367.7
2008-04-08 1378.7
2008-04-07 1378.4
2008-04-04 1377.8
2008-04-03 1379.9
2008-04-02 1377.7
2008-04-01 1376.6
2008-03-31 1329.1
2008-03-28 1324.0
2008-03-27 1334.7
2008-03-26 1340.7
2008-03-25 1357.0
2008-03-24 1357.3
2008-03-20 1329.8
Name: Settle, Length: 495,
'SPM2011': Date
2011-06-17 1279.4
2011-06-16 1269.0
2011-06-15 1265.4
2011-06-14 1289.9
2011-06-13 1271.6
2011-06-10 1269.2
2011-06-09 1287.4
2011-06-08 1277.0
2011-06-07 1284.8
2011-06-06 1285.0
2011-06-03 1296.3
2011-06-02 1312.4
2011-06-01 1312.1
2011-05-31 1343.9
2011-05-27 1329.9
...
2009-07-10 856.6
2009-07-09 861.2
2009-07-08 856.0
2009-07-07 861.7
2009-07-06 877.9
2009-07-02 875.8
2009-07-01 902.6
2009-06-30 900.3
2009-06-29 908.0
2009-06-26 901.1
2009-06-25 903.8
2009-06-24 885.2
2009-06-23 877.6
2009-06-22 876.0
2009-06-19 903.4
Name: Settle, Length: 497}
In [131]: DataFrame(a)
Out[131]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 806 entries, 2189-09-10 03:33:28.879144 to 1924-01-20 06:06:06.621835
Data columns:
SPH2010 0 non-null values
SPM2011 0 non-null values
dtypes: float64(2)
Thanks!
EDIT:
I've also tried doing this with concat and I get the same results.

You should be able to use concat and unstack. Here's an example:
df1 = pd.Series([1, 2], name='a')
df2 = pd.Series([3, 4], index=[1, 2], name='b')
d = {'A': s1, 'B': s2} # a dict of Series
In [4]: pd.concat(d)
Out[4]:
A 0 1
1 2
B 1 3
2 4
In [5]: pd.concat(d).unstack().T
Out[5]:
A B
0 1 NaN
1 2 3
2 NaN 4

Related

Is there a way to iterate through a list and return variables named after its contents?

I've got a pandas dataframe organized by date I'm trying to split up by year (in a column called 'year'). I want to return one dataframe per year, with a name something like "df19XX".
I was hoping to write a "For" loop that can handle this... something like...
for d in [1980, 1981, 1982]:
df(d) = df[df['year']==d]
... which would return three data frames called df1980, df1981 and df1982.
thanks!
Something like this ? Also using #Andy's df
variables = locals()
for i in [2012, 2013]:
variables["df{0}".format(i)]=df.loc[df.date.dt.year==i]
df2012
Out[118]:
A date
0 0.881468 2012-12-28
1 0.237672 2012-12-29
2 0.992287 2012-12-30
3 0.194288 2012-12-31
df2013
Out[119]:
A date
4 0.151854 2013-01-01
5 0.855312 2013-01-02
6 0.534075 2013-01-03
You can iterate through the groupby:
In [11]: df = pd.DataFrame({"date": pd.date_range("2012-12-28", "2013-01-03"), "A": np.random.rand(7)})
In [12]: df
Out[12]:
A date
0 0.434715 2012-12-28
1 0.208877 2012-12-29
2 0.912897 2012-12-30
3 0.226368 2012-12-31
4 0.100489 2013-01-01
5 0.474088 2013-01-02
6 0.348368 2013-01-03
In [13]: g = df.groupby(df.date.dt.year)
In [14]: for k, v in g:
...: print(k)
...: print(v)
...: print()
...:
2012
A date
0 0.434715 2012-12-28
1 0.208877 2012-12-29
2 0.912897 2012-12-30
3 0.226368 2012-12-31
2013
A date
4 0.100489 2013-01-01
5 0.474088 2013-01-02
6 0.348368 2013-01-03
I would strongly argue that is preferable to simply have a dict having variables and messing around with the locals() dictionary (I claim using locals() so is not "pythonic"):
In [14]: {k: grp for k, grp in g}
Out[14]:
{2012: A date
0 0.434715 2012-12-28
1 0.208877 2012-12-29
2 0.912897 2012-12-30
3 0.226368 2012-12-31, 2013: A date
4 0.100489 2013-01-01
5 0.474088 2013-01-02
6 0.348368 2013-01-03}
Though you might consider calculating this on the fly (rather than storing in a dict or indeed a variable). You can use get_group:
In [15]: g.get_group(2012)
Out[15]:
A date
0 0.865239 2012-12-28
1 0.019071 2012-12-29
2 0.362088 2012-12-30
3 0.031861 2012-12-31

Finding the closest date given a date in a groupby dataframe (Python)

I'm trying to generate the Last_Payment_Date field in my pandas dataframe, and would need to find the closest Payment_Date before the given Order_Date for each customer (i.e. groupby).
Payment_Date will always occur after Order_Date, but may take different periods of time, which is difficult to use sorting and shift to find the nearest date.
Masking seems like a possible way but I've not been able to figure a way on how to use it.
Appreciate all the help I could get!
Cust_No Order_Date Payment_Date Last_Payment_Date
A 5/8/2014 6/8/2014 Nat
B 6/8/2014 1/5/2015 Nat
B 7/8/2014 7/8/2014 Nat
A 8/8/2014 1/5/2015 6/8/2014
A 9/8/2014 10/8/2014 6/8/2014
A 10/11/2014 12/11/2014 10/8/2014
B 11/12/2014 1/1/2015 7/8/2014
B 1/2/2015 2/2/2015 1/1/2015
A 2/5/2015 5/5/2015 1/5/2015
B 3/5/2015 4/5/2015 2/2/2015
Series.searchsorted largely does what you want -- it
can be used to find where the Order_Dates fit inside Payment_Dates. In
particular, it returns the ordinal indices corresponding to where each
Order_Date would need to be inserted in order to keep the Payment_Dates
sorted. For example, suppose
In [266]: df['Payment_Date']
Out[266]:
0 2014-06-08
2 2014-07-08
4 2014-10-08
5 2014-12-11
6 2015-01-01
1 2015-01-05
3 2015-01-05
7 2015-02-02
9 2015-04-05
8 2015-05-05
Name: Payment_Date, dtype: datetime64[ns]
In [267]: df['Order_Date']
Out[267]:
0 2014-05-08
2 2014-07-08
4 2014-09-08
5 2014-10-11
6 2014-11-12
1 2014-06-08
3 2014-08-08
7 2015-01-02
9 2015-03-05
8 2015-02-05
Name: Order_Date, dtype: datetime64[ns]
then searchsorted returns
In [268]: df['Payment_Date'].searchsorted(df['Order_Date'])
Out[268]: array([0, 1, 2, 3, 3, 0, 2, 5, 8, 8])
The first value, 0, for example, indicates that the Order_Date, 2014-05-08,
would have to be inserted at ordinal index 0 (before the Payment_Date
2014-06-08) to keep the Payment_Dates in sorted order. The second value, 1,
indicates that the Order_Date, 2014-07-08, would have to be inserted at
ordinal index 1 (after the Payment_Date 2014-06-08 and before 2014-07-08)
to keep the Payment_Dates in sorted order. And so on for the other indices.
Now, of course, there are some complications:
The Payment_Dates need to be in sorted order for searchsorted to return a
meaningful result:
df = df.sort_values(by=['Payment_Date'])
We need to group by the Cust_No
grouped = df.groupby('Cust_No')
We want the index of the Payment_Date which comes before the
Order_Date. Thus, we really need the decrease the index by one:
idx = grp['Payment_Date'].searchsorted(grp['Order_Date'])
result = grp['Payment_Date'].iloc[idx-1]
So that grp['Payment_Date'].iloc[idx-1] would grab the the prior Payment_Date.
When searchsorted returns 0, the Order_Date is less than all
Payment_Dates. We want a NaT in this case.
result[idx == 0] = pd.NaT
So putting it all togther,
import pandas as pd
NaT = pd.NaT
T = pd.Timestamp
df = pd.DataFrame({
'Cust_No': ['A', 'B', 'B', 'A', 'A', 'A', 'B', 'B', 'A', 'B'],
'expected': [
NaT, NaT, NaT, T('2014-06-08'), T('2014-06-08'), T('2014-10-08'),
T('2014-07-08'), T('2015-01-01'), T('2015-01-05'), T('2015-02-02')],
'Order_Date': [
T('2014-05-08'), T('2014-06-08'), T('2014-07-08'), T('2014-08-08'),
T('2014-09-08'), T('2014-10-11'), T('2014-11-12'), T('2015-01-02'),
T('2015-02-05'), T('2015-03-05')],
'Payment_Date': [
T('2014-06-08'), T('2015-01-05'), T('2014-07-08'), T('2015-01-05'),
T('2014-10-08'), T('2014-12-11'), T('2015-01-01'), T('2015-02-02'),
T('2015-05-05'), T('2015-04-05')]})
def last_payment_date(s, df):
grp = df.loc[s.index]
idx = grp['Payment_Date'].searchsorted(grp['Order_Date'])
result = grp['Payment_Date'].iloc[idx-1]
result[idx == 0] = pd.NaT
return result
df = df.sort_values(by=['Payment_Date'])
grouped = df.groupby('Cust_No')
df['Last_Payment_Date'] = grouped['Payment_Date'].transform(last_payment_date, df)
print(df)
yields
Cust_No Order_Date Payment_Date expected Last_Payment_Date
0 A 2014-05-08 2014-06-08 NaT NaT
2 B 2014-07-08 2014-07-08 NaT NaT
4 A 2014-09-08 2014-10-08 2014-06-08 2014-06-08
5 A 2014-10-11 2014-12-11 2014-10-08 2014-10-08
6 B 2014-11-12 2015-01-01 2014-07-08 2014-07-08
1 B 2014-06-08 2015-01-05 NaT NaT
3 A 2014-08-08 2015-01-05 2014-06-08 2014-06-08
7 B 2015-01-02 2015-02-02 2015-01-01 2015-01-01
9 B 2015-03-05 2015-04-05 2015-02-02 2015-02-02
8 A 2015-02-05 2015-05-05 2015-01-05 2015-01-05

Pandas get specific rows from HDF5 by index

I have a pandas DataFrame that I have written to an HDF5 file. The data is indexed by Timestamps and looks like this:
In [5]: df
Out[5]:
Codes Price Size
Time
2015-04-27 01:31:08-04:00 T 111.75 23
2015-04-27 01:31:39-04:00 T 111.80 23
2015-04-27 01:31:39-04:00 T 113.00 35
2015-04-27 01:34:14-04:00 T 113.00 85
2015-04-27 01:55:15-04:00 T 113.50 203
... ... ... ...
2015-05-26 11:35:00-04:00 CA 110.55 196
2015-05-26 11:35:00-04:00 CA 110.55 98
2015-05-26 11:35:00-04:00 CA 110.55 738
2015-05-26 11:35:00-04:00 CA 110.55 19
2015-05-26 11:37:01-04:00 110.55 12
What I would like is to create a function that I can pass a pandas DatetimeIndex and it will return a DataFrame with the rows at or right before each Timestamp in the DatetimeIndex.
The problem I'm running into is that concatenated read_hdf queries won't work if I am looking for more than 30 rows -- see [pandas read_hdf with 'where' condition limitation?
What I am doing now is this, but there has to be a better solution:
from pandas import read_hdf, DatetimeIndex
from datetime import timedelta
import pytz
def getRows(file, dataset, index):
if len(index) == 1:
start = index.date[0]
end = (index.date + timedelta(days=1))[0]
else:
start = index.date.min()
end = (index.date.max() + timedelta(days=1))
where = '(index >= "' + str(start) + '") & (index < "' str(end) + '")'
df = read_hdf(file, dataset, where=where)
df = df.groupby(level=0).last().reindex(index, method='pad')
return df
This is an example of using a where mask
In [22]: pd.set_option('max_rows',10)
In [23]: df = DataFrame({'A' : np.random.randn(100), 'B' : pd.date_range('20130101',periods=100)}).set_index('B')
In [24]: df
Out[24]:
A
B
2013-01-01 0.493144
2013-01-02 0.421045
2013-01-03 -0.717824
2013-01-04 0.159865
2013-01-05 -0.485890
... ...
2013-04-06 -0.805954
2013-04-07 -1.014333
2013-04-08 0.846877
2013-04-09 -1.646908
2013-04-10 -0.160927
[100 rows x 1 columns]
Store the tests frame
In [25]: store = pd.HDFStore('test.h5',mode='w')
In [26]: store.append('df',df)
Create a random selection of dates.
In [27]: dates = df.index.take(np.random.randint(0,100,10))
In [28]: dates
Out[28]: DatetimeIndex(['2013-03-29', '2013-02-16', '2013-01-15', '2013-02-06', '2013-01-12', '2013-02-24', '2013-02-18', '2013-01-06', '2013-03-17', '2013-03-21'], dtype='datetime64[ns]', name=u'B', freq=None, tz=None)
Select the index column (in its entirety)
In [29]: c = store.select_column('df','index')
In [30]: c
Out[30]:
0 2013-01-01
1 2013-01-02
2 2013-01-03
3 2013-01-04
4 2013-01-05
...
95 2013-04-06
96 2013-04-07
97 2013-04-08
98 2013-04-09
99 2013-04-10
Name: B, dtype: datetime64[ns]
Select the indexers that you want. This could actually be somewhat complicated, e.g. you might want a .reindex(method='nearest')
In [34]: c[c.isin(dates)]
Out[34]:
5 2013-01-06
11 2013-01-12
14 2013-01-15
36 2013-02-06
46 2013-02-16
48 2013-02-18
54 2013-02-24
75 2013-03-17
79 2013-03-21
87 2013-03-29
Name: B, dtype: datetime64[ns]
Select the rows that you want
In [32]: store.select('df',where=c[c.isin(dates)].index)
Out[32]:
A
B
2013-01-06 0.680930
2013-01-12 0.165923
2013-01-15 -0.517692
2013-02-06 -0.351020
2013-02-16 1.348973
2013-02-18 0.448890
2013-02-24 -1.078522
2013-03-17 -0.358597
2013-03-21 -0.482301
2013-03-29 0.343381
In [33]: store.close()

Optimizing Pandas groupby/apply

I am writing a process which takes a semi-large file as input (~4 million rows, 5 columns)
and performs a few operations on it.
Columns:
- CARD_NO
- ID
- CREATED_DATE
- STATUS
- FLAG2
I need to create a file which contains 1 copy of each CARD_NO where STATUS = '1' and CREATED_DATE is the maximum of all CREATED_DATEs for that CARD_NO.
I succeeded but my solution is very slow (3h and counting as of right now.)
Here is my code:
file = 'input.csv'
input = pd.read_csv(file)
input = input.drop_duplicates()
card_groups = input.groupby('CARD_NO', as_index=False, sort=False).filter(lambda x: x['STATUS'] == 1)
def important(x):
latest_date = x['CREATED_DATE'].values[x['CREATED_DATE'].values.argmax()]
return x[x.CREATED_DATE == latest_date]
#where the major slowdown occurs
group_2 = card_groups.groupby('CARD_NO', as_index=False, sort=False).apply(important)
path = 'result.csv'
group_2.to_csv(path, sep=',', index=False)
# ~4 minutes for the 154k rows file
# 3+ hours for ~4m rows
I was wondering if you had any advice on how to improve the running time of this little process.
Thank you and have a good day.
Setup (FYI make sure that your use parse_dates=True when reading your csv)
In [6]: n_groups = 10000
In [7]: N = 4000000
In [8]: dates = date_range('20130101',periods=100)
In [9]: df = DataFrame(dict(id = np.random.randint(0,n_groups,size=N), status = np.random.randint(0,10,size=N), date=np.random.choice(dates,size=N,replace=True)))
In [10]: pd.set_option('max_rows',10)
In [13]: df = DataFrame(dict(card_no = np.random.randint(0,n_groups,size=N), status = np.random.randint(0,10,size=N), date=np.random.choice(dates,size=N,replace=True)))
In [14]: df
Out[14]:
card_no date status
0 5790 2013-02-11 6
1 6572 2013-03-17 6
2 7764 2013-02-06 3
3 4905 2013-04-01 3
4 3871 2013-04-08 1
... ... ... ...
3999995 1891 2013-02-16 5
3999996 9048 2013-01-11 9
3999997 1443 2013-02-23 1
3999998 2845 2013-01-28 0
3999999 5645 2013-02-05 8
[4000000 rows x 3 columns]
In [15]: df.dtypes
Out[15]:
card_no int64
date datetime64[ns]
status int64
dtype: object
Only status == 1, groupby card_no, then return the max date for that group
In [18]: df[df.status==1].groupby('card_no')['date'].max()
Out[18]:
card_no
0 2013-04-06
1 2013-03-30
2 2013-04-09
...
9997 2013-04-07
9998 2013-04-07
9999 2013-04-09
Name: date, Length: 10000, dtype: datetime64[ns]
In [19]: %timeit df[df.status==1].groupby('card_no')['date'].max()
1 loops, best of 3: 934 ms per loop
If you need a transform of this (e.g. the same values for each group. Note that with < 0.14.1 (releasing this week) you will need to use this soln here, otherwise this will be pretty slow)
In [20]: df[df.status==1].groupby('card_no')['date'].transform('max')
Out[20]:
4 2013-04-10
13 2013-04-10
25 2013-04-10
...
3999973 2013-04-10
3999979 2013-04-10
3999997 2013-04-09
Name: date, Length: 399724, dtype: datetime64[ns]
In [21]: %timeit df[df.status==1].groupby('card_no')['date'].transform('max')
1 loops, best of 3: 1.8 s per loop
I suspect you prob want to merge the final transform back into the original frame
In [24]: df.join(res.to_frame('max_date'))
Out[24]:
card_no date status max_date
0 5790 2013-02-11 6 NaT
1 6572 2013-03-17 6 NaT
2 7764 2013-02-06 3 NaT
3 4905 2013-04-01 3 NaT
4 3871 2013-04-08 1 2013-04-10
... ... ... ... ...
3999995 1891 2013-02-16 5 NaT
3999996 9048 2013-01-11 9 NaT
3999997 1443 2013-02-23 1 2013-04-09
3999998 2845 2013-01-28 0 NaT
3999999 5645 2013-02-05 8 NaT
[4000000 rows x 4 columns]
In [25]: %timeit df.join(res.to_frame('max_date'))
10 loops, best of 3: 58.8 ms per loop
The csv writing will actually take a fair amount of time relative to this. I used HDF5 for things like this, MUCH faster.

Get MM-DD-YYYY from pandas Timestamp

dates seem to be a tricky thing in python, and I am having a lot of trouble simply stripping the date out of the pandas TimeStamp. I would like to get from 2013-09-29 02:34:44 to simply 09-29-2013
I have a dataframe with a column Created_date:
Name: Created_Date, Length: 1162549, dtype: datetime64[ns]`
I have tried applying the .date() method on this Series, eg: df.Created_Date.date(), but I get the error AttributeError: 'Series' object has no attribute 'date'
Can someone help me out?
map over the elements:
In [239]: from operator import methodcaller
In [240]: s = Series(date_range(Timestamp('now'), periods=2))
In [241]: s
Out[241]:
0 2013-10-01 00:24:16
1 2013-10-02 00:24:16
dtype: datetime64[ns]
In [238]: s.map(lambda x: x.strftime('%d-%m-%Y'))
Out[238]:
0 01-10-2013
1 02-10-2013
dtype: object
In [242]: s.map(methodcaller('strftime', '%d-%m-%Y'))
Out[242]:
0 01-10-2013
1 02-10-2013
dtype: object
You can get the raw datetime.date objects by calling the date() method of the Timestamp elements that make up the Series:
In [249]: s.map(methodcaller('date'))
Out[249]:
0 2013-10-01
1 2013-10-02
dtype: object
In [250]: s.map(methodcaller('date')).values
Out[250]:
array([datetime.date(2013, 10, 1), datetime.date(2013, 10, 2)], dtype=object)
Yet another way you can do this is by calling the unbound Timestamp.date method:
In [273]: s.map(Timestamp.date)
Out[273]:
0 2013-10-01
1 2013-10-02
dtype: object
This method is the fastest, and IMHO the most readable. Timestamp is accessible in the top-level pandas module, like so: pandas.Timestamp. I've imported it directly for expository purposes.
The date attribute of DatetimeIndex objects does something similar, but returns a numpy object array instead:
In [243]: index = DatetimeIndex(s)
In [244]: index
Out[244]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-10-01 00:24:16, 2013-10-02 00:24:16]
Length: 2, Freq: None, Timezone: None
In [246]: index.date
Out[246]:
array([datetime.date(2013, 10, 1), datetime.date(2013, 10, 2)], dtype=object)
For larger datetime64[ns] Series objects, calling Timestamp.date is faster than operator.methodcaller which is slightly faster than a lambda:
In [263]: f = methodcaller('date')
In [264]: flam = lambda x: x.date()
In [265]: fmeth = Timestamp.date
In [266]: s2 = Series(date_range('20010101', periods=1000000, freq='T'))
In [267]: s2
Out[267]:
0 2001-01-01 00:00:00
1 2001-01-01 00:01:00
2 2001-01-01 00:02:00
3 2001-01-01 00:03:00
4 2001-01-01 00:04:00
5 2001-01-01 00:05:00
6 2001-01-01 00:06:00
7 2001-01-01 00:07:00
8 2001-01-01 00:08:00
9 2001-01-01 00:09:00
10 2001-01-01 00:10:00
11 2001-01-01 00:11:00
12 2001-01-01 00:12:00
13 2001-01-01 00:13:00
14 2001-01-01 00:14:00
...
999985 2002-11-26 10:25:00
999986 2002-11-26 10:26:00
999987 2002-11-26 10:27:00
999988 2002-11-26 10:28:00
999989 2002-11-26 10:29:00
999990 2002-11-26 10:30:00
999991 2002-11-26 10:31:00
999992 2002-11-26 10:32:00
999993 2002-11-26 10:33:00
999994 2002-11-26 10:34:00
999995 2002-11-26 10:35:00
999996 2002-11-26 10:36:00
999997 2002-11-26 10:37:00
999998 2002-11-26 10:38:00
999999 2002-11-26 10:39:00
Length: 1000000, dtype: datetime64[ns]
In [269]: timeit s2.map(f)
1 loops, best of 3: 1.04 s per loop
In [270]: timeit s2.map(flam)
1 loops, best of 3: 1.1 s per loop
In [271]: timeit s2.map(fmeth)
1 loops, best of 3: 968 ms per loop
Keep in mind that one of the goals of pandas is to provide a layer on top of numpy so that (most of the time) you don't have to deal with the low level details of the ndarray. So getting the raw datetime.date objects in an array is of limited use since they don't correspond to any numpy.dtype that is supported by pandas (pandas only supports datetime64[ns] [that's nanoseconds] dtypes). That said, sometimes you need to do this.
Maybe this only came in recently, but there are built-in methods for this. Try:
In [27]: s = pd.Series(pd.date_range(pd.Timestamp('now'), periods=2))
In [28]: s
Out[28]:
0 2016-02-11 19:11:43.386016
1 2016-02-12 19:11:43.386016
dtype: datetime64[ns]
In [29]: s.dt.to_pydatetime()
Out[29]:
array([datetime.datetime(2016, 2, 11, 19, 11, 43, 386016),
datetime.datetime(2016, 2, 12, 19, 11, 43, 386016)], dtype=object)
You can try using .dt.date on datetime64[ns] of the dataframe.
For e.g. df['Created_date'] = df['Created_date'].dt.date
Input dataframe named as test_df:
print(test_df)
Result:
Created_date
0 2015-03-04 15:39:16
1 2015-03-22 17:36:49
2 2015-03-25 22:08:45
3 2015-03-16 13:45:20
4 2015-03-19 18:53:50
Checking dtypes:
print(test_df.dtypes)
Result:
Created_date datetime64[ns]
dtype: object
Extracting date and updating Created_date column:
test_df['Created_date'] = test_df['Created_date'].dt.date
print(test_df)
Result:
Created_date
0 2015-03-04
1 2015-03-22
2 2015-03-25
3 2015-03-16
4 2015-03-19
well I would do this way.
pdTime =pd.date_range(timeStamp, periods=len(years), freq="D")
pdTime[i].strftime('%m-%d-%Y')

Categories