Prevent Pandas to_json() from adding time component to date object - python

I have a dataframe of that contains some date objects. I need to convert to a json for use in JavaScript, which requires YYYY-MM-DD, but to_json() keeps adding a time component. I've seen a number of answers that convert to a string first, but this is part of a loop of about 15 queries each with many columns (simplified it for the SO question) and I don't want to hardcode each column conversion as there are a lot.
import pandas as pd
from datetime import date
df = pd.DataFrame(data=[[date(year=2018, month=1, day=1)]])
print df.to_json(orient='records', date_format='iso', date_unit='s')
Output:
[{"0":"2018-01-01T00:00:00Z"}]
Desired Output:
[{"0":"2018-01-01"}]

Pandas does not currently have the feature. There is an open issue about this, you should subscribe to the issue in case more options for the date_format argument are added in a future release (which seems like a reasonable feature request):
No way with to_json to write only date out of datetime #16492
Manually converting the relevant columns to string before dumping out json is likely the best option.

You could use strftime('%Y-%m-%d') format like so:
df = pd.DataFrame(data=[[date(year=2018, month=1, day=1).strftime('%Y-%m-
%d')]]
print(df.to_json(orient='records', date_format='iso', date_unit='s'))
# [{"0":"2018-01-01"}]
I think this is the best approach for now until pandas adds a way to write only the date out of datetime.

Demo:
Source DF:
In [249]: df = pd.DataFrame({
...: 'val':np.random.rand(5),
...: 'date1':pd.date_range('2018-01-01',periods=5),
...: 'date2':pd.date_range('2017-12-15',periods=5)
...: })
In [250]: df
Out[250]:
date1 date2 val
0 2018-01-01 2017-12-15 0.539349
1 2018-01-02 2017-12-16 0.308532
2 2018-01-03 2017-12-17 0.788588
3 2018-01-04 2017-12-18 0.526541
4 2018-01-05 2017-12-19 0.887299
In [251]: df.dtypes
Out[251]:
date1 datetime64[ns]
date2 datetime64[ns]
val float64
dtype: object
You can cast datetime columns to strings in one command:
In [252]: df.update(df.loc[:, df.dtypes.astype(str).str.contains('date')].astype(str))
In [253]: df.dtypes
Out[253]:
date1 object
date2 object
val float64
dtype: object
In [254]: df.to_json(orient='records')
Out[254]: '[{"date1":"2018-01-01","date2":"2017-12-15","val":0.5393488718},{"date1":"2018-01-02","date2":"2017-12-16","val":0.3085324043},{"
date1":"2018-01-03","date2":"2017-12-17","val":0.7885879674},{"date1":"2018-01-04","date2":"2017-12-18","val":0.5265407505},{"date1":"2018-0
1-05","date2":"2017-12-19","val":0.887298853}]'
Alternatively you can cast date columns to strings on the SQL side

I had that problem as well, but since I was looking only for the date, discarding the timezone, I was able to go around that using the following expression:
df = pd.read_json('test.json')
df['date_hour'] = [datetime.strptime(date[0:10],'%Y-%m-%d').date() for date in df['date_hour']]
So, if you have 'iso' date_format for df[date_hour] in the json file = "2018-01-01T00:00:00Z" you may use this solution.
This way you can extract the bit that really matters. Important to say that you must do it using this list comprehension, because the conversion can only be done string by string (or row by row), otherwise, the datetime.strptime alone, would throw an error saying that cannot be used with series.

Generic solution would be as follows:
df.assign( **df.select_dtypes(['datetime']).astype(str).to_dict('list') ).to_json(orient="records")
Based on the dtype it selects the datetime columns and set these as str objects so the date format is kept during serialization.

Related

How to remove the time from datetime of the pandas Dataframe. The type of the column is str and objects, but the value is dateime [duplicate]

i have a variable consisting of 300k records with dates and the date look like
2015-02-21 12:08:51
from that date i want to remove time
type of date variable is pandas.core.series.series
This is the way i tried
from datetime import datetime,date
date_str = textdata['vfreceiveddate']
format_string = "%Y-%m-%d"
then = datetime.strftime(date_str,format_string)
some Random ERROR
In the above code textdata is my datasetname and vfreceived date is a variable consisting of dates
How can i write the code to remove the time from the datetime.
Assuming all your datetime strings are in a similar format then just convert them to datetime using to_datetime and then call the dt.date attribute to get just the date portion:
In [37]:
df = pd.DataFrame({'date':['2015-02-21 12:08:51']})
df
Out[37]:
date
0 2015-02-21 12:08:51
In [39]:
df['date'] = pd.to_datetime(df['date']).dt.date
df
Out[39]:
date
0 2015-02-21
EDIT
If you just want to change the display and not the dtype then you can call dt.normalize:
In[10]:
df['date'] = pd.to_datetime(df['date']).dt.normalize()
df
Out[10]:
date
0 2015-02-21
You can see that the dtype remains as datetime:
In[11]:
df.dtypes
Out[11]:
date datetime64[ns]
dtype: object
You're calling datetime.datetime.strftime, which requires as its first argument a datetime.datetime instance, because it's an unbound method; but you're passing it a string instead of a datetime instance, whence the obvious error.
You can work purely at a string level if that's the result you want; with the data you give as an example, date_str.split()[0] for example would be exactly the 2015-02-21 string you appear to require.
Or, you can use datetime, but then you need to parse the string first, not format it -- hence, strptime, not strftime:
dt = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
date = dt.date()
if it's a datetime.date object you want (but if all you want is the string form of the date, such an approach might be "overkill":-).
simply writing
date.strftime("%d-%m-%Y") will remove the Hour min & sec

Converting into date-time format in pandas?

I need help converting into python/pandas date time format. For example, my times are saved like the following line:
2017-01-01 05:30:24.468911+00:00
.....
2017-05-05 01:51:31.351718+00:00
and I want to know the simplest way to convert this into date time format for essentially performing operations with time (like what is the range in days of my dataset to split up my dataset into chunks by time, what's the time difference from one time to another)? I don't mind losing some of the significance for the times if that makes things easier. Thank you so much!
Timestamp will convert it for you.
>>> pd.Timestamp('2017-01-01 05:30:24.468911+00:00')
Timestamp('2017-01-01 05:30:24.468911+0000', tz='UTC')
Let's say you have a dataframe that includes your timestamp column (let's call it stamp). You can use apply on that column together with Timestamp:
df = pd.DataFrame(
{'stamp': ['2017-01-01 05:30:24.468911+00:00',
'2017-05-05 01:51:31.351718+00:00']})
>>> df
stamp
0 2017-01-01 05:30:24.468911+00:00
1 2017-05-05 01:51:31.351718+00:00
>>> df['stamp'].apply(pd.Timestamp)
0 2017-01-01 05:30:24.468911+00:00
1 2017-05-05 01:51:31.351718+00:00
Name: stamp, dtype: datetime64[ns, UTC]
You could also use Timeseries:
>>> pd.TimeSeries(df.stamp)
0 2017-01-01 05:30:24.468911+00:00
1 2017-05-05 01:51:31.351718+00:00
Name: stamp, dtype: object
Once you have a Timestamp object, it is pretty efficient to manipulate. You can just difference their values, for example.
You may also want to have a look at this SO answer which discusses timezone unaware values to aware.
Let's say I have two strings 2017-06-06 and 1944-06-06 and I wanted to get the difference (what Python calls a timedelta) between the two.
First, I'll need to import datetime. Then I'll need to get both of those strings into datetime objects:
>>> a = datetime.datetime.strptime('2017-06-06', '%Y-%m-%d')
>>> b = datetime.datetime.strptime('1944-06-06', '%Y-%m-%d')
That will give us two datetime objects that can be used in arithmetic functions that will return a timedelta object:
>>> c = abs((a-b).days)
This will give us 26663, and days is the largest resolution that timedelta supports: documentation
Since the Pandas tag is there:
df = pd.DataFrame(['2017-01-01 05:30:24.468911+00:00'])
df.columns = ['Datetime']
df['Datetime'] = pd.to_datetime(df['Datetime'], format='%Y-%m-%d %H:%M:%S.%f', utc=True)
print(df.dtypes)

How do I convert timestamp to datetime.date in pandas dataframe?

I need to merge 2 pandas dataframes together on dates, but they currently have different date types. 1 is timestamp (imported from excel) and the other is datetime.date.
Any advice?
I've tried pd.to_datetime().date but this only works on a single item(e.g. df.ix[0,0]), it won't let me apply to the entire series (e.g. df['mydates']) or the dataframe.
I got some help from a colleague.
This appears to solve the problem posted above
pd.to_datetime(df['mydates']).apply(lambda x: x.date())
Much simpler than above:
df['mydates'].dt.date
For me this works:
from datetime import datetime
df[ts] = [datetime.fromtimestamp(x) for x in df[ts]]
You have to know if the unit of the Unix timestamp is in seconds or milliseconds. Assume that it is in seconds and assume that you have the following pandas
print(df.head())
And you get:
timestamp XETHZUSD
0 1609459200 730.85
1 1609545600 775.01
2 1609632000 979.86
3 1609718400 1042.52
4 1609804800 1103.41
You can convert the timestamp to datetime as follows:
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
print(df.head())
And we get:
timestamp XETHZUSD
0 2021-01-01 730.85
1 2021-01-02 775.01
2 2021-01-03 979.86
3 2021-01-04 1042.52
4 2021-01-05 1103.41
If the Unix timestamp was in milliseconds, then you should have typed
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
Another question was marked as dupe pointing to this, but it didn't include this answer, which seems the most straightforward (perhaps this method did not yet exist when this question was posted/answered):
The pandas doc shows a pandas.Timestamp.to_pydatetime method to "Convert a Timestamp object to a native Python datetime object".
Assume time column is in timestamp integer msec format
1 day = 86400000 ms
Here you go:
day_divider = 86400000
df['time'] = df['time'].values.astype(dtype='datetime64[ms]') # for msec format
df['time'] = (df['time']/day_divider).values.astype(dtype='datetime64[D]') # for day format
If you need the datetime.date objects... then get them through with the .date attribute of the Timestamp
pd.to_datetime(df['mydates']).date
I found the following to be the most effective, when I ran into a similar issue. For instance, with the dataframe df with a series of timestmaps in column ts.
df.ts.apply(lambda x: pd.datetime.fromtimestamp(x).date())
This makes the conversion, you can leave out the .date() suffix for datetimes. Then to alter the column on the dataframe. Like so...
df.loc[:, 'ts'] = df.ts.apply(lambda x: pd.datetime.fromtimestamp(x).date())
I was trying to convert a timestamp column to date/time, here is what I came up with:
df['Timestamp'] = df['Timestamp'].apply(lambda timestamp: datetime.fromtimestamp(timestamp))

How to convert timedelta to time of day in pandas?

I have a SQL table that contains data of the mySQL time type as follows:
time_of_day
-----------
12:34:56
I then use pandas to read the table in:
df = pd.read_sql('select * from time_of_day', engine)
Looking at df.dtypes yields:
time_of_day timedelta64[ns]
My main issue is that, when writing my df to a csv file, the data comes out all messed up, instead of essentially looking like my SQL table:
time_of_day
0 days 12:34:56.000000000
I'd like to instead (obviously) store this record as a time, but I can't find anything in the pandas docs that talk about a time dtype.
Does pandas lack this functionality intentionally? Is there a way to solve my problem without requiring janky data casting?
Seems like this should be elementary, but I'm confounded.
Pandas does not support a time dtype series
Pandas (and NumPy) do not have a time dtype. Since you wish to avoid Pandas timedelta, you have 3 options: Pandas datetime, Python datetime.time, or Python str. Below they are presented in order of preference. Let's assume you start with the following dataframe:
df = pd.DataFrame({'time': pd.to_timedelta(['12:34:56', '05:12:45', '15:15:06'])})
print(df['time'].dtype) # timedelta64[ns]
Pandas datetime series
You can use Pandas datetime series and include an arbitrary date component, e.g. today's date. Underlying such a series are integers, which makes this solution the most efficient and adaptable.
The default date, if unspecified, is 1-Jan-1970:
df['time'] = pd.to_datetime(df['time'])
print(df)
# time
# 0 1970-01-01 12:34:56
# 1 1970-01-01 05:12:45
# 2 1970-01-01 15:15:06
You can also specify a date, such as today:
df['time'] = pd.Timestamp('today').normalize() + df['time']
print(df)
# time
# 0 2019-01-02 12:34:56
# 1 2019-01-02 05:12:45
# 2 2019-01-02 15:15:06
Pandas object series of Python datetime.time values
The Python datetime module from the standard library supports datetime.time objects. You can convert your series to an object dtype series containing pointers to a sequence of datetime.time objects. Operations will no longer be vectorised, but each underlying value will be represented internally by a number.
df['time'] = pd.to_datetime(df['time']).dt.time
print(df)
# time
# 0 12:34:56
# 1 05:12:45
# 2 15:15:06
print(df['time'].dtype)
# object
print(type(df['time'].at[0]))
# <class 'datetime.time'>
Pandas object series of Python str values
Converting to strings is only recommended for presentation purposes that are not supported by other types, e.g. Pandas datetime or Python datetime.time. For example:
df['time'] = pd.to_datetime(df['time']).dt.strftime('%H:%M:%S')
print(df)
# time
# 0 12:34:56
# 1 05:12:45
# 2 15:15:06
print(df['time'].dtype)
# object
print(type(df['time'].at[0]))
# <class 'str'>
it's a hack, but you can pull out the components to create a string and convert that string to a datetime.time(h,m,s) object
def convert(td):
time = [str(td.components.hours), str(td.components.minutes),
str(td.components.seconds)]
return datetime.strptime(':'.join(time), '%H:%M:%S').time()
df['time'] = df['time'].apply(lambda x: convert(x))
found a solution, but i feel like it's gotta be more elegant than this:
def convert(x):
return pd.to_datetime(x).strftime('%H:%M:%S')
df['time_of_day'] = df['time_of_day'].apply(convert)
df['time_of_day'] = pd.to_datetime(df['time_of_day']).apply(lambda x: x.time())
Adapted this code

Convert Column to Date Format (Pandas Dataframe)

I have a pandas dataframe as follows:
Symbol Date
A 02/20/2015
A 01/15/2016
A 08/21/2015
I want to sort it by Date, but the column is just an object.
I tried to make the column a date object, but I ran into an issue where that format is not the format needed. The format needed is 2015-02-20, etc.
So now I'm trying to figure out how to have numpy convert the 'American' dates into the ISO standard, so that I can make them date objects, so that I can sort by them.
How would I convert these american dates into ISO standard, or is there a more straight forward method I'm missing within pandas?
You can use pd.to_datetime() to convert to a datetime object. It takes a format parameter, but in your case I don't think you need it.
>>> import pandas as pd
>>> df = pd.DataFrame( {'Symbol':['A','A','A'] ,
'Date':['02/20/2015','01/15/2016','08/21/2015']})
>>> df
Date Symbol
0 02/20/2015 A
1 01/15/2016 A
2 08/21/2015 A
>>> df['Date'] =pd.to_datetime(df.Date)
>>> df.sort('Date') # This now sorts in date order
Date Symbol
0 2015-02-20 A
2 2015-08-21 A
1 2016-01-15 A
For future search, you can change the sort statement:
>>> df.sort_values(by='Date') # This now sorts in date order
Date Symbol
0 2015-02-20 A
2 2015-08-21 A
1 2016-01-15 A
sort method has been deprecated and replaced with sort_values. After converting to datetime object using df['Date']=pd.to_datetime(df['Date'])
df.sort_values(by=['Date'])
Note: to sort in-place and/or in a descending order (the most recent first):
df.sort_values(by=['Date'], inplace=True, ascending=False)
#JAB's answer is fast and concise. But it changes the DataFrame you are trying to sort, which you may or may not want.
(Note: You almost certainly will want it, because your date columns should be dates, not strings!)
In the unlikely event that you don't want to change the dates into dates, you can also do it a different way.
First, get the index from your sorted Date column:
In [25]: pd.to_datetime(df.Date).order().index
Out[25]: Int64Index([0, 2, 1], dtype='int64')
Then use it to index your original DataFrame, leaving it untouched:
In [26]: df.ix[pd.to_datetime(df.Date).order().index]
Out[26]:
Date Symbol
0 2015-02-20 A
2 2015-08-21 A
1 2016-01-15 A
Magic!
Note: for Pandas versions 0.20.0 and later, use loc instead of ix, which is now deprecated.
Since pandas >= 1.0.0 we have the key argument in DataFrame.sort_values. This way we can sort the dataframe by specifying a key and without adjusting the original dataframe:
df.sort_values(by="Date", key=pd.to_datetime)
Symbol Date
0 A 02/20/2015
2 A 08/21/2015
1 A 01/15/2016
The data containing the date column can be read by using the below code:
data = pd.csv(file_path,parse_dates=[date_column])
Once the data is read by using the above line of code, the column containing the information about the date can be accessed using pd.date_time() like:
pd.date_time(data[date_column], format = '%d/%m/%y')
to change the format of date as per the requirement.
data['Date'] = data['Date'].apply(pd.to_datetime) # non-null datetime64[ns]

Categories