I have sliced the pandas dataframe.
end_date = df[-1:]['end']
type(end_date)
Out[4]: pandas.core.series.Series
end_date
Out[3]:
48173 2017-09-20 04:47:59
Name: end, dtype: datetime64[ns]
How to get rid of end_date's index value 48173 and get only 2017-09-20 04:47:59 string? I have to call REST API with 2017-09-20 04:47:59 as a parameter, so I have to get string from pandas datetime64 series.
How to get rid of end_date's index value 48173 and get only datetime object [something like datetime.datetime.strptime('2017-09-20 04:47:59', '%Y-%m-%d %H:%M:%S')]. I need it because, later I will have to check if '2017-09-20 04:47:59' < datetime.datetime(2017,1,9)
I need to convert just a single cell value, not a whole column.
How to do these conversions?
It seems you need:
import pandas as pd
data = ['2017-09-20 04:47:59','2017-10-20 04:47:59','2017-09-30 04:47:59']
df = pd.DataFrame(data,columns=['end'])
df['end'] = pd.to_datetime(df['end'])
df
df will be:
end
0 2017-09-20 04:47:59
1 2017-10-20 04:47:59
2 2017-09-30 04:47:59
After that you can use below code to get rid of index and use as 'Timestamp' object:
end_date = df['end'].iloc[-1] #get last row of column end
print(type(end_date)) # pandas.tslib.Timestamp
end_date_str = end_date.strftime('%Y-%m-%d %H:%M:%S') #convert to str
print(end_date_str) # '2017-09-30 04:47:59'
print(end_date < datetime.datetime(2017,1,9)) #False
Simply cast the result to a string, and recover it using .values[0]:
In [38]: end_date
Out[38]:
48173 2017-09-20 04:47:59
Name: end, dtype: datetime64[ns]
In [39]: end_date.astype(str).values[0]
Out[39]: '2017-09-20 04:47:59'
If you want a datetime object, you have to convert it to a timestamp, and then back to a datetime object:
In [42]: end_date.values[0].item()
Out[42]: 1505882879000000000
In [43]: datetime.fromtimestamp(end_date.values[0].item()/10**9)
Out[43]: datetime.datetime(2017, 9, 20, 6, 47, 59)
Otherwise, you can strptime the string recovered in step 1:
In [48]: datetime.datetime.strptime(end_date.astype(str).values[0], '%Y-%m-%d %H:%M:%S')
Out[48]: datetime.datetime(2017, 9, 20, 4, 47, 59)
You may wonder why there is a 2 hours difference between the results. This is because the datetime.datetime.fromtimestamp takes my timezone into account (currently CEST, which is UTC+2).
On the other hand, parsing a string to a datetime object doesn't yield any timezone information, srtptime naively parses the timestamp without regards for the timezone, which leads to a 2 hours discrepancy.
Related
I have a dataframe with Date_Birth in the following format: July 1, 1991 (as type object).
How can I change the entire column to datetime?
Thanks
Use the pandas.to_datetime function. You can write out a format specifier in the strptime format or have python guess the format. In your case, guessing works.
>>> import pandas as pd
>>> df=pd.DataFrame({"Date":["July 1, 1991"]})
>>> pd.to_datetime(df["Date"], format="%B %d, %Y")
0 1991-07-01
Name: Date, dtype: datetime64[ns]
>>> pd.to_datetime(df["Date"], infer_datetime_format=True)
0 1991-07-01
Name: Date, dtype: datetime64[ns]
what is the efficient way to convert the column values into dates "DD-MM-YYYY" when the values given like "Feb-15" which needs to be "01-02-2015". if it's "Dec-46" it must return "01-12-1946".
You can pass the format '%b-%y' to to_datetime:
In[42]:
df = pd.DataFrame({'date':["Feb-15","Dec-46"]})
df['new_date'] = pd.to_datetime(df['date'], format='%b-%y')
df
Out[42]:
date new_date
0 Feb-15 2015-02-01
1 Dec-46 2046-12-01
Note that the new dtype is datetime64, you cannot control the display output, if you insist on DD-MM-YYYY then you would have to convert to a string using dt.strftime:
In[43]:
df['str_date'] = df['new_date'].dt.strftime('%d-%m-%Y')
df
Out[43]:
date new_date str_date
0 Feb-15 2015-02-01 01-02-2015
1 Dec-46 2046-12-01 01-12-2046
but then you have strings which is not that useful if you need to perform arithmetic operations or filtering
EDIT
You cannot store dates earlier than 1970 so '01-01-1946' is not a valid datetime that can be represented by datetime64
I have a Pandas Dataframe df:
a date
1 2014-06-29 00:00:00
df.types return:
a object
date object
I want convert column data to data without time but:
df['date']=df['date'].astype('datetime64[s]')
return:
a date
1 2014-06-28 22:00:00
df.types return:
a object
date datetime64[ns]
But value is wrong.
I'd have:
a date
1 2014-06-29
or:
a date
1 2014-06-29 00:00:00
I would start by putting your dates in pd.datetime:
df['date'] = pd.to_datetime(df.date)
Now, you can see that the time component is still there:
df.date.values
array(['2014-06-28T19:00:00.000000000-0500'], dtype='datetime64[ns]')
If you are ok having a date object again, you want:
df['date'] = [x.strftime("%y-%m-%d") for x in df.date]
Here would be ending with a datetime:
df['date'] = [x.date() for x in df.date]
df.date
datetime.date(2014, 6, 29)
Here you go. Just use this pattern:
df.to_datetime().date()
I have created a dataframe with one column as a series of calender dates using
start = datetime.date(2008, 8, 01)
end = datetime.date(2009, 1, 19)
range = pd.date_range(start, end, freq = 'D')
df = pd.DataFrame({'date': pd.Series(range)})
By this I get the type of date column as datetime64[ns] although I used the datetime.date to create the dates. I have looked through a few questions but didn't really find them helpful.
How can I convert the type of the date column of this dataframe to a date object?
date_range indeed returns datetime64, regardless of how you specify start and end (eg these can also be strings).
If you want to convert datetime64 values to datetime.date objects, you can use the .date accessor of DatetimeIndex (date_range returns such an index):
In [22]: s = pd.Series(range.date)
In [23]: s
Out[23]:
0 2008-08-01
1 2008-08-02
2 2008-08-03
3 2008-08-04
4 2008-08-05
...
167 2009-01-15
168 2009-01-16
169 2009-01-17
170 2009-01-18
171 2009-01-19
Length: 172, dtype: object
In [24]: s[0]
Out[24]: datetime.date(2008, 8, 1)
See here for docs on these datetime components: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-date-components. To convert it to datetime.datetime objects, you can use range.to_pydatetime().
But as U2EF1, dependent on the application, it's quite possible you want such datetime64 values, as operations with it will be much more performant.
I am attempting to convert the index of a pandas.DataFrame from string format to a datetime index, using pandas.to_datetime().
Import pandas:
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: '0.10.1'
Create an example DataFrame:
In [3]: d = {'data' : pd.Series([1.,2.], index=['26/12/2012', '10/01/2013'])}
In [4]: df=pd.DataFrame(d)
Look at indices. Note that the date format is day/month/year:
In [5]: df.index
Out[5]: Index([26/12/2012, 10/01/2013], dtype=object)
Convert index to datetime:
In [6]: pd.to_datetime(df.index)
Out[6]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-26 00:00:00, 2013-10-01 00:00:00]
Length: 2, Freq: None, Timezone: None
Already at this stage, you can see that the date format for each entry has been formatted differently. The first is fine, the second has swapped month and day.
This is what I want to write, but avoiding the inconsistent formatting of date strings:
In [7]: df.set_index(pd.to_datetime(df.index))
Out[7]:
data
2012-12-26 1
2013-10-01 2
I guess the first entry is correct because the function 'knows' there aren't 26 months, and so does not choose the default month/day/year format.
Is there another/better way to do this? Can I pass the format into the to_datetime() function?
Thank you.
EDIT:
I have found a way to do this, without pandas.to_datetime:
import datetime.datetime as dt
date_string_list = df.index.tolist()
datetime_list = [ dt.strptime(date_string_list[x], '%d/%m/%Y') for x in range(len(date_string_list)) ]
df.index=datetime_list
but it's a bit messy. Any improvements welcome.
There are (hidden?) dayfirst argument to to_datetime:
In [23]: pd.to_datetime(df.index, dayfirst=True)
Out[23]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-26 00:00:00, 2013-01-10 00:00:00]
Length: 2, Freq: None, Timezone: None
In pandas 0.11 (onwards) you'll be able to use the format argument:
In [24]: pd.to_datetime(df.index, format='%d/%m/%Y')
Out[24]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-26 00:00:00, 2013-01-10 00:00:00]
Length: 2, Freq: None, Timezone: None