I have a data frame created with Pandas. It has 3 columns. One of them has the date in the format %Y%m%d%H. I need to find the rows that match a date with the format %Y%m%d.
I tried
df.loc[df["MESS_DATUM"] == 20170807]
which doesn't work. Only when I do
df.loc[df["MESS_DATUM"] == 2017080723]
it works for that single line. But I need the other lines containing the date only (without the hour). I know there is something like .str.cotains(""). Is there something similar for numeric values or a way to use wildcards in the lines above?
We can "integer divide" MESS_DATUM column by 100:
df.loc[df["MESS_DATUM"]//100 == 20170807]
Demo:
In [29]: df
Out[29]:
MESS_DATUM
0 2017080719
1 2017080720
2 2017080721
3 2017080722
4 2017080723
In [30]: df.dtypes
Out[30]:
MESS_DATUM int64
dtype: object
In [31]: df["MESS_DATUM"]//100
Out[31]:
0 20170807
1 20170807
2 20170807
3 20170807
4 20170807
Name: MESS_DATUM, dtype: int64
But I would consider converting it to datetime dtype:
df["MESS_DATUM"] = pd.to_datetime(df["MESS_DATUM"].astype(str), format='%Y%m%d%H')
If df["MESS_DATUM"] is of float dtype, then we can use the following trick:
In [41]: pd.to_datetime(df["MESS_DATUM"].astype(str).str.split('.').str[0],
format='%Y%m%d%H')
Out[41]:
0 2017-08-07 19:00:00
1 2017-08-07 20:00:00
2 2017-08-07 21:00:00
3 2017-08-07 22:00:00
4 2017-08-07 23:00:00
Name: MESS_DATUM, dtype: datetime64[ns]
Related
I have a pandas dataframe with a "time" column, which currently looks like:
ab['ZEIT'].unique()
array([ 0, 165505, 203355, ..., 73139, 75211, 74244], dtype=int64)
How can i get german time format out of it with hh:mm:ss so basically that it looks like:
array([00:00:00, 16:55:05, 20:33:55, ..., 07:31:39, 07:52:11, 07:42:44], dtype=?)
Use pd.to_datetime after converting values to string with padded 0s. Then call dt accessor for time
In [12]: pd.to_datetime(df['ZEIT'].astype(str).str.zfill(6), format='%H%M%S').dt.time
Out[12]:
0 00:00:00
1 16:55:05
2 20:33:55
3 07:31:39
4 07:52:11
5 07:42:44
Name: ZEIT, dtype: object
Details
In [13]: df
Out[13]:
ZEIT
0 0
1 165505
2 203355
3 73139
4 75211
5 74244
In [14]: df['ZEIT'].astype(str).str.zfill(6)
Out[14]:
0 000000
1 165505
2 203355
3 073139
4 075211
5 074244
Name: ZEIT, dtype: object
In [15]: pd.to_datetime(df['ZEIT'].astype(str).str.zfill(6), format='%H%M%S')
Out[15]:
0 1900-01-01 00:00:00
1 1900-01-01 16:55:05
2 1900-01-01 20:33:55
3 1900-01-01 07:31:39
4 1900-01-01 07:52:11
5 1900-01-01 07:42:44
Name: ZEIT, dtype: datetime64[ns]
I've a dataframe of this format -
var1 date
A 2017/01/01
A 2017/01/02
...
I want the date to be converted into YYYY-MM format but the df['date'].dtype is object.
How can I remove the day part from date while keeping the data type as datetime?
Expected Output -
A - 2017/01
Thanks
You can't have custom representation for the datetime dtype. But you have the following options:
use strings - you might have any representation (as you wish), but all datetime methods and attributes get lost
use datetime, but set the day part to 1 (as #Kopytok) has already shown.
use period dtype, which still allows you to use some date arithmetic
Demo:
In [207]: df
Out[207]:
var1 date
0 A 2018-12-31
1 A 2017-09-07
2 B 2016-02-29
In [208]: df['new'] = df['date'].dt.to_period('M')
In [209]: df
Out[209]:
var1 date new
0 A 2018-12-31 2018-12
1 A 2017-09-07 2017-09
2 B 2016-02-29 2016-02
In [210]: df.dtypes
Out[210]:
var1 object
date datetime64[ns]
new object
dtype: object
In [211]: df['new'] + 8
Out[211]:
0 2019-08
1 2018-05
2 2016-10
Name: new, dtype: object
It is possible replace every date with the first day of month:
pd.to_datetime(d["date"], format="%Y/%m/%d").apply(lambda x: x.replace(day=1))
Result:
0 2017-01-01
1 2017-01-01
I have a dataframe and there's a column named 'Time' in it like the below(HH:MM:SS:fffff).
>>> df['Time']
0 09:42:29:75284
1 09:42:29:95584
2 09:42:31:15036
3 09:42:35:15138
4 09:42:35:95491
5 09:42:43:55414
6 09:42:45:35866
7 09:42:46:74638
8 09:42:47:35582
9 09:42:47:74774
10 09:42:48:94582
...
Name: Time, Length: 18924, dtype: object
I want to change its type as datetime, in order to make it easier to calculate. Is it possible to change its type, using pandas.to_datetime, as datetime without date?
You can convert it to timedelta64[ns] dtype:
Source DF:
In [164]: df
Out[164]:
Time
0 09:42:29:75284
1 09:42:29:95584
2 09:42:31:15036
3 09:42:35:15138
4 09:42:35:95491
5 09:42:43:55414
6 09:42:45:35866
7 09:42:46:74638
8 09:42:47:35582
9 09:42:47:74774
10 09:42:48:94582
In [165]: df.dtypes
Out[165]:
Time object # <-------- NOTE!
dtype: object
Converted:
In [166]: df.Time = pd.to_timedelta(df.Time.str.replace(r'\:(\d+)$', r'.\1'),
errors='coerce')
In [167]: df
Out[167]:
Time
0 09:42:29.752840
1 09:42:29.955840
2 09:42:31.150360
3 09:42:35.151380
4 09:42:35.954910
5 09:42:43.554140
6 09:42:45.358660
7 09:42:46.746380
8 09:42:47.355820
9 09:42:47.747740
10 09:42:48.945820
In [168]: df.dtypes
Out[168]:
Time timedelta64[ns] # <-------- NOTE!
dtype: object
Please refer python to_datetime documentation.
import pandas as pd
df = pd.DataFrame({'Time': ['09:42:29:75284','09:42:29:95584','09:42:31:15036']})
df
Out[]:
Time
0 09:42:29:75284
1 09:42:29:95584
2 09:42:31:15036
You can convert this into datetime format by specifying format as follows:
pd.to_datetime(df['Time'], format='%H:%M:%S:%f')
Out[]:
0 1900-01-01 09:42:29.752840
1 1900-01-01 09:42:29.955840
2 1900-01-01 09:42:31.150360
Name: Time, dtype: datetime64[ns]
but doing this will also add date 1900-01-01.
Using pandas 0.6.2. I want to change a dataframe to datetime type, here is the dataframe
>>> tt.head()
0 2015-02-01 00:46:28
1 2015-02-01 00:59:56
2 2015-02-01 00:16:27
3 2015-02-01 00:33:45
4 2015-02-01 13:48:29
Name: TS, dtype: object
And I want change each items in tt into datetime type, and get the hour. The code is
for i in tt.index:
tt[i]=pd.to_datetime(tt[i])
and waring is
__main__:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
Why the warning occurs and how can I deal with it?
If I change one item each time, it works, the code is
>>> tt[1]=pd.to_datetime(tt[1])
>>> tt[1].hour
0
Just do it on the entire Series as to_datetime can operate on array-like args and assign directly to the column:
In [72]:
df['date'] = pd.to_datetime(df['date'])
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 1 columns):
date 5 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 80.0 bytes
In [73]:
df
Out[73]:
date
index
0 2015-02-01 00:46:28
1 2015-02-01 00:59:56
2 2015-02-01 00:16:27
3 2015-02-01 00:33:45
4 2015-02-01 13:48:29
If you changed your loop to this then it would work:
In [80]:
for i in df.index:
df.loc[i,'date']=pd.to_datetime(df.loc[i, 'date'])
df
Out[80]:
date
index
0 2015-02-01 00:46:28
1 2015-02-01 00:59:56
2 2015-02-01 00:16:27
3 2015-02-01 00:33:45
4 2015-02-01 13:48:29
the code moans because you're operating on potentially a copy of that row on the df and not a view, using the new indexers avoids this ambiguity
EDIT
It looks like you're using an ancient version of pandas, the following should work:
tt[1].apply(lambda x: x.hour)
I have a pandas dataframe with a date column the data type is datetime64[ns]. there are over 1000 observations in the dataframe. I want to transform the following column:
date
2013-05-01
2013-05-01
to
date
05/2013
05/2013
or
date
05-2013
05-2013
EDIT//
this is my sample code as of now
test = pd.DataFrame({'a':['07/2017','07/2017',pd.NaT]})
a
0 2017-07-13
1 2017-07-13
2 NaT
test['a'].apply(lambda x: x if pd.isnull(x) == True else x.strftime('%Y-%m'))
0 2017-07-01
1 2017-07-01
2 NaT
Name: a, dtype: datetime64[ns]
why did only the date change and not the format?
You can convert datetime64 into whatever string format you like using the strftime method. In your case you would apply it like this:
df.date = df.date[df.date.notnull()].map(lambda x: x.strftime('%m/%Y'))
df.date
Out[111]:
0 05/2013
1 05/2013