Pandas convert datetime string column to datetime without offset applied - python

I'm new to Python and Pandas, so dont be hard with me :)
I have multiple Columns in the form of "2014-01-01 00:00:00-06:00". Now i want to convert the columns name into a pandas datetime. But i struggle with the format i need to use. I already tried
date = pd.to_datetime("2014-01-01 00:00:00-06:00", format='%Y-%m-%d %H:%M:%S%z')
But here i get a error with "ValueError: time data '2014-01-01 00:00:00-06:00' does not match format '%Y-%m-%d %H:%M:%S%Z' (match)"
I dont want the time to get converted into my timezone. I need it for the Timezone -06:00
For this Input:
2014-01-01 00:00:00-06:00
The Output should be:
2014-01-01 00:00:00
I want to use the date variable of the Output so i can split my data into seasons. Something like this:
date > springBegining
Thanks for all help

You don't need a format string, pandas is man/woman enough to handle this:
In[2]:
pd.to_datetime('2014-01-01 00:00:00-06:00')
Out[2]: Timestamp('2014-01-01 06:00:00')
besides your format string has numerous issues:
%b is month as locale abbreviated form, you have a numerical representation so it should be %m
%z requires a UTC offset in the form '+HHMM'/-HHMM
So you'd need to reformat the datetime string to:
'2014-01-01 00:00:00-0600'
If you don't want the offset to be applied and the offset is always the same you can strip this from the string:
In[25]:
pd.to_datetime('2014-01-01 00:00:00-06:00'.rsplit('-',1)[0])
Out[25]: Timestamp('2014-01-01 00:00:00')
Or you could slice the string:
In[26]:
pd.to_datetime('2014-01-01 00:00:00-06:00'[:-6])
Out[26]: Timestamp('2014-01-01 00:00:00')
So to do the above on an entire column:
pd.to_datetime(df[col].str[:-6])
Example:
In[27]:
df = pd.DataFrame({'date':['2014-01-01 00:00:00-06:00','2014-01-01 00:00:00+06:00']})
df
Out[27]:
date
0 2014-01-01 00:00:00-06:00
1 2014-01-01 00:00:00+06:00
In[28]:
pd.to_datetime(df['date'].str[:-6])
Out[28]:
0 2014-01-01
1 2014-01-01
Name: date, dtype: datetime64[ns]
Here we use the string accessor .str to slice all the columns in the same manner and pass this to to_datetime to convert the entire column

Related

How to remove the time from datetime of the pandas Dataframe. The type of the column is str and objects, but the value is dateime [duplicate]

i have a variable consisting of 300k records with dates and the date look like
2015-02-21 12:08:51
from that date i want to remove time
type of date variable is pandas.core.series.series
This is the way i tried
from datetime import datetime,date
date_str = textdata['vfreceiveddate']
format_string = "%Y-%m-%d"
then = datetime.strftime(date_str,format_string)
some Random ERROR
In the above code textdata is my datasetname and vfreceived date is a variable consisting of dates
How can i write the code to remove the time from the datetime.
Assuming all your datetime strings are in a similar format then just convert them to datetime using to_datetime and then call the dt.date attribute to get just the date portion:
In [37]:
df = pd.DataFrame({'date':['2015-02-21 12:08:51']})
df
Out[37]:
date
0 2015-02-21 12:08:51
In [39]:
df['date'] = pd.to_datetime(df['date']).dt.date
df
Out[39]:
date
0 2015-02-21
EDIT
If you just want to change the display and not the dtype then you can call dt.normalize:
In[10]:
df['date'] = pd.to_datetime(df['date']).dt.normalize()
df
Out[10]:
date
0 2015-02-21
You can see that the dtype remains as datetime:
In[11]:
df.dtypes
Out[11]:
date datetime64[ns]
dtype: object
You're calling datetime.datetime.strftime, which requires as its first argument a datetime.datetime instance, because it's an unbound method; but you're passing it a string instead of a datetime instance, whence the obvious error.
You can work purely at a string level if that's the result you want; with the data you give as an example, date_str.split()[0] for example would be exactly the 2015-02-21 string you appear to require.
Or, you can use datetime, but then you need to parse the string first, not format it -- hence, strptime, not strftime:
dt = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
date = dt.date()
if it's a datetime.date object you want (but if all you want is the string form of the date, such an approach might be "overkill":-).
simply writing
date.strftime("%d-%m-%Y") will remove the Hour min & sec

How to handle dates which is out of timestamp range in pandas?

I was working with the Crunchbase dataset. I have an entry of Harvard University which was founded in 1636. This entry is giving me an error when I am trying to convert string to DateTime.
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1636-09-08 00:00:00
I found out that pandas support timestamp from 1677
>>> pd.Timestamp.min
Timestamp('1677-09-21 00:12:43.145225')
I checked out some solutions like one suggesting using errors='coerce' but dropping this entry/ making it null is not an option.
Can you please suggest a way to handle this issue?
As mentioned in comments by Henry, there is limitation of pandas timestamps because of its representation in float64, you could probably work around it by parsing the date-time using datetime library when needed, otherwise letting it stay as string or convert it to an integer
Scenario 1: If you plan on showing this value only when you print it
datetime_object = datetime.strptime('1636-09-08 00:00:00', '%Y-%m-%d %H:%M:%S')
Scenario 2: If you want to use it as a date column to retain information in the dataframe, you could additionally
datetime_object.strftime("%Y%m%d%H%M%S")
using it on a column in a pandas dataframe would yield this
df=pd.DataFrame([['1636-09-08 00:00:00'],['1635-09-09 00:00:00']], columns=['dates'])
df['str_date']=df['dates'].apply(lambda x:datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
df.head()
dates
str_date
0
1636-09-08 00:00:00
1636-09-08 00:00:00
1
1635-09-09 00:00:00
1635-09-09 00:00:00
pandas treats this column as a object column, but when you access it, it is a datetime column
df['str_date'][0]
>>datetime.datetime(1636, 9, 8, 0, 0)
also, adding this for the sake of completeness: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-oob

csv Pandas datetime convert time to seconds

I work with data from Datalogger and the timestap is not supported by datetime in the Pandas Dataframe.
I would like to convert this timestamp into a format pandas knows and the then convert the datetime into seconds, starting with 0.
>>>df.time
0 05/20/2019 19:20:27:374
1 05/20/2019 19:20:28:674
2 05/20/2019 19:20:29:874
3 05/20/2019 19:20:30:274
Name: time, dtype: object
I tried to convert it from the object into datetime64[ns]. with %m or %b for month.
df_time = pd.to_datetime(df["time"], format = '%m/%d/%y %H:%M:%S:%MS')
df_time = pd.to_datetime(df["time"], format = '%b/%d/%y %H:%M:%S:%MS')
with error: redefinition of group name 'M' as group 7; was group 5 at position 155
I tried to reduce the data set and remove the milliseconds without success.
df['time'] = pd.to_datetime(df['time'],).str[:-3]
ValueError: ('Unknown string format:', '05/20/2019 19:20:26:383')
or is it possible to just subtract the first time line from all the other values in the column time?
Use '%m/%d/%Y %H:%M:%S:%f' as format instead of '%m/%d/%y %H:%M:%S:%MS'
Here is the format documentation for future reference
I am not exactly sure what you are looking for but you can use the above example to format your output and then you can remove items from your results like the microseconds this way:
date = str(datetime.now())
print(date)
2019-07-28 14:04:28.986601
print(date[11:-7])
14:04:28
time = date[11:-7]
print(time)
14:04:28

Prevent Pandas to_json() from adding time component to date object

I have a dataframe of that contains some date objects. I need to convert to a json for use in JavaScript, which requires YYYY-MM-DD, but to_json() keeps adding a time component. I've seen a number of answers that convert to a string first, but this is part of a loop of about 15 queries each with many columns (simplified it for the SO question) and I don't want to hardcode each column conversion as there are a lot.
import pandas as pd
from datetime import date
df = pd.DataFrame(data=[[date(year=2018, month=1, day=1)]])
print df.to_json(orient='records', date_format='iso', date_unit='s')
Output:
[{"0":"2018-01-01T00:00:00Z"}]
Desired Output:
[{"0":"2018-01-01"}]
Pandas does not currently have the feature. There is an open issue about this, you should subscribe to the issue in case more options for the date_format argument are added in a future release (which seems like a reasonable feature request):
No way with to_json to write only date out of datetime #16492
Manually converting the relevant columns to string before dumping out json is likely the best option.
You could use strftime('%Y-%m-%d') format like so:
df = pd.DataFrame(data=[[date(year=2018, month=1, day=1).strftime('%Y-%m-
%d')]]
print(df.to_json(orient='records', date_format='iso', date_unit='s'))
# [{"0":"2018-01-01"}]
I think this is the best approach for now until pandas adds a way to write only the date out of datetime.
Demo:
Source DF:
In [249]: df = pd.DataFrame({
...: 'val':np.random.rand(5),
...: 'date1':pd.date_range('2018-01-01',periods=5),
...: 'date2':pd.date_range('2017-12-15',periods=5)
...: })
In [250]: df
Out[250]:
date1 date2 val
0 2018-01-01 2017-12-15 0.539349
1 2018-01-02 2017-12-16 0.308532
2 2018-01-03 2017-12-17 0.788588
3 2018-01-04 2017-12-18 0.526541
4 2018-01-05 2017-12-19 0.887299
In [251]: df.dtypes
Out[251]:
date1 datetime64[ns]
date2 datetime64[ns]
val float64
dtype: object
You can cast datetime columns to strings in one command:
In [252]: df.update(df.loc[:, df.dtypes.astype(str).str.contains('date')].astype(str))
In [253]: df.dtypes
Out[253]:
date1 object
date2 object
val float64
dtype: object
In [254]: df.to_json(orient='records')
Out[254]: '[{"date1":"2018-01-01","date2":"2017-12-15","val":0.5393488718},{"date1":"2018-01-02","date2":"2017-12-16","val":0.3085324043},{"
date1":"2018-01-03","date2":"2017-12-17","val":0.7885879674},{"date1":"2018-01-04","date2":"2017-12-18","val":0.5265407505},{"date1":"2018-0
1-05","date2":"2017-12-19","val":0.887298853}]'
Alternatively you can cast date columns to strings on the SQL side
I had that problem as well, but since I was looking only for the date, discarding the timezone, I was able to go around that using the following expression:
df = pd.read_json('test.json')
df['date_hour'] = [datetime.strptime(date[0:10],'%Y-%m-%d').date() for date in df['date_hour']]
So, if you have 'iso' date_format for df[date_hour] in the json file = "2018-01-01T00:00:00Z" you may use this solution.
This way you can extract the bit that really matters. Important to say that you must do it using this list comprehension, because the conversion can only be done string by string (or row by row), otherwise, the datetime.strptime alone, would throw an error saying that cannot be used with series.
Generic solution would be as follows:
df.assign( **df.select_dtypes(['datetime']).astype(str).to_dict('list') ).to_json(orient="records")
Based on the dtype it selects the datetime columns and set these as str objects so the date format is kept during serialization.

Converting into date-time format in pandas?

I need help converting into python/pandas date time format. For example, my times are saved like the following line:
2017-01-01 05:30:24.468911+00:00
.....
2017-05-05 01:51:31.351718+00:00
and I want to know the simplest way to convert this into date time format for essentially performing operations with time (like what is the range in days of my dataset to split up my dataset into chunks by time, what's the time difference from one time to another)? I don't mind losing some of the significance for the times if that makes things easier. Thank you so much!
Timestamp will convert it for you.
>>> pd.Timestamp('2017-01-01 05:30:24.468911+00:00')
Timestamp('2017-01-01 05:30:24.468911+0000', tz='UTC')
Let's say you have a dataframe that includes your timestamp column (let's call it stamp). You can use apply on that column together with Timestamp:
df = pd.DataFrame(
{'stamp': ['2017-01-01 05:30:24.468911+00:00',
'2017-05-05 01:51:31.351718+00:00']})
>>> df
stamp
0 2017-01-01 05:30:24.468911+00:00
1 2017-05-05 01:51:31.351718+00:00
>>> df['stamp'].apply(pd.Timestamp)
0 2017-01-01 05:30:24.468911+00:00
1 2017-05-05 01:51:31.351718+00:00
Name: stamp, dtype: datetime64[ns, UTC]
You could also use Timeseries:
>>> pd.TimeSeries(df.stamp)
0 2017-01-01 05:30:24.468911+00:00
1 2017-05-05 01:51:31.351718+00:00
Name: stamp, dtype: object
Once you have a Timestamp object, it is pretty efficient to manipulate. You can just difference their values, for example.
You may also want to have a look at this SO answer which discusses timezone unaware values to aware.
Let's say I have two strings 2017-06-06 and 1944-06-06 and I wanted to get the difference (what Python calls a timedelta) between the two.
First, I'll need to import datetime. Then I'll need to get both of those strings into datetime objects:
>>> a = datetime.datetime.strptime('2017-06-06', '%Y-%m-%d')
>>> b = datetime.datetime.strptime('1944-06-06', '%Y-%m-%d')
That will give us two datetime objects that can be used in arithmetic functions that will return a timedelta object:
>>> c = abs((a-b).days)
This will give us 26663, and days is the largest resolution that timedelta supports: documentation
Since the Pandas tag is there:
df = pd.DataFrame(['2017-01-01 05:30:24.468911+00:00'])
df.columns = ['Datetime']
df['Datetime'] = pd.to_datetime(df['Datetime'], format='%Y-%m-%d %H:%M:%S.%f', utc=True)
print(df.dtypes)

Categories