Select pandas dataframe rows between dates and set column value - python

In the dataframe below, I want to set row values in the column p50 to NaN if they are below 2.0 between the dates May 15th and August 15th 2018.
date p50
2018-03-02 2018-03-02 NaN
2018-03-03 2018-03-03 NaN
2018-03-04 2018-03-04 0.022590
2018-03-05 2018-03-05 NaN
2018-03-06 2018-03-06 -0.042227
2018-03-07 2018-03-07 NaN
2018-03-08 2018-03-08 NaN
2018-03-09 2018-03-09 -0.028646
2018-03-10 2018-03-10 NaN
2018-03-11 2018-03-11 -0.045244
2018-03-12 2018-03-12 NaN
2018-03-13 2018-03-13 NaN
2018-03-14 2018-03-14 -0.020590
2018-03-15 2018-03-15 NaN
2018-03-16 2018-03-16 -0.028317
2018-03-17 2018-03-17 NaN
2018-03-18 2018-03-18 NaN
2018-03-19 2018-03-19 NaN
2018-03-20 2018-03-20 NaN
2018-03-21 2018-03-21 NaN
2018-03-22 2018-03-22 NaN
2018-03-23 2018-03-23 NaN
2018-03-24 2018-03-24 -0.066800
2018-03-25 2018-03-25 NaN
2018-03-26 2018-03-26 -0.104135
2018-03-27 2018-03-27 NaN
2018-03-28 2018-03-28 NaN
2018-03-29 2018-03-29 -0.115200
2018-03-30 2018-03-30 NaN
2018-03-31 2018-03-31 -0.000455
... ...
2018-07-03 2018-07-03 NaN
2018-07-04 2018-07-04 2.313035
2018-07-05 2018-07-05 NaN
2018-07-06 2018-07-06 NaN
2018-07-07 2018-07-07 NaN
2018-07-08 2018-07-08 NaN
2018-07-09 2018-07-09 0.054513
2018-07-10 2018-07-10 NaN
2018-07-11 2018-07-11 NaN
2018-07-12 2018-07-12 3.711159
2018-07-13 2018-07-13 NaN
2018-07-14 2018-07-14 6.583810
2018-07-15 2018-07-15 NaN
2018-07-16 2018-07-16 NaN
2018-07-17 2018-07-17 0.070182
2018-07-18 2018-07-18 NaN
2018-07-19 2018-07-19 3.688812
2018-07-20 2018-07-20 NaN
2018-07-21 2018-07-21 NaN
2018-07-22 2018-07-22 0.876552
2018-07-23 2018-07-23 NaN
2018-07-24 2018-07-24 1.077895
2018-07-25 2018-07-25 NaN
2018-07-26 2018-07-26 NaN
2018-07-27 2018-07-27 3.802159
2018-07-28 2018-07-28 NaN
2018-07-29 2018-07-29 0.077402
2018-07-30 2018-07-30 NaN
2018-07-31 2018-07-31 NaN
2018-08-01 2018-08-01 3.202214
The dataframe has a datetime index. I do the foll:
mask = (group['date'] > '2018-5-15') & (group['date'] <= '2018-8-15')
group[mask].loc[group[mask]['p50'] < 2.]['p50'] = np.NaN
However, this does not update the dataframe. How to fix this?

I think you should using .loc like
mask = (group['date'] > '2018-5-15') & (group['date'] <= '2018-8-15')
group.loc[mask&(group['p50'] < 2),'p50']=np.nan

Related

dataframe data transfer with selected values to another dataframe

My goal is selecting the column Sabah in dataframe prdt and entering every value to repeated rows called Sabah in dataframe prcal
prcal
Vakit Start_Date End_Date Start_Time End_Time
0 Sabah 2022-01-01 2022-01-01 NaN NaN
1 Güneş 2022-01-01 2022-01-01 NaN NaN
2 Öğle 2022-01-01 2022-01-01 NaN NaN
3 İkindi 2022-01-01 2022-01-01 NaN NaN
4 Akşam 2022-01-01 2022-01-01 NaN NaN
..........................................................
2184 Sabah 2022-12-31 2022-12-31 NaN NaN
2185 Güneş 2022-12-31 2022-12-31 NaN NaN
2186 Öğle 2022-12-31 2022-12-31 NaN NaN
2187 İkindi 2022-12-31 2022-12-31 NaN NaN
2188 Akşam 2022-12-31 2022-12-31 NaN NaN
2189 rows × 5 columns
prdt
Day Sabah Güneş Öğle İkindi Akşam Yatsı
0 2022-01-01 06:51:00 08:29:00 13:08:00 15:29:00 17:47:00 19:20:00
1 2022-01-02 06:51:00 08:29:00 13:09:00 15:30:00 17:48:00 19:21:00
2 2022-01-03 06:51:00 08:29:00 13:09:00 15:30:00 17:48:00 19:22:00
3 2022-01-04 06:51:00 08:29:00 13:09:00 15:31:00 17:49:00 19:22:00
4 2022-01-05 06:51:00 08:29:00 13:10:00 15:32:00 17:50:00 19:23:00
...........................................................................
360 2022-12-27 06:49:00 08:27:00 13:06:00 15:25:00 17:43:00 19:16:00
361 2022-12-28 06:50:00 08:28:00 13:06:00 15:26:00 17:43:00 19:17:00
362 2022-12-29 06:50:00 08:28:00 13:07:00 15:26:00 17:44:00 19:18:00
363 2022-12-30 06:50:00 08:28:00 13:07:00 15:27:00 17:45:00 19:18:00
364 2022-12-31 06:50:00 08:28:00 13:07:00 15:28:00 17:46:00 19:19:00
365 rows × 7 columns
Selected every row called sabah prcal.iloc[::6,:]
Made a list for prdt['Sabah'].
When integrating prcal.iloc[::6,:] = prdt['Sabah'][0:365] I get a value error:
ValueError: Must have equal len keys and value when setting with an iterable

Pandas: How to replace looping with advanced indexing with two dataframes?

I have the following use case where I have a dataframe dfp containing prices for two assets a and b and have another dataframe dfm containing metadata information about those assets. Assume I have start and end dates where I'd like to consider those prices and would like to set all other prices outside those ranges to np.nan so I have:
import pandas as pd
import numpy as np
# sample prices for two assets
dfp = pd.DataFrame(data=np.random.random_sample((20, 2)),
columns=['a', 'b'],
index=pd.date_range(end='2020-12-10', periods=20))
print(dfp)
a b
2020-11-21 0.411653 0.001124
2020-11-22 0.773671 0.210065
2020-11-23 0.143332 0.090111
2020-11-24 0.062085 0.475205
2020-11-25 0.160982 0.557469
2020-11-26 0.025793 0.353725
2020-11-27 0.651929 0.794265
2020-11-28 0.266566 0.270451
2020-11-29 0.713030 0.346842
2020-11-30 0.838571 0.969477
2020-12-01 0.701627 0.480349
2020-12-02 0.946619 0.344399
2020-12-03 0.430523 0.857529
2020-12-04 0.202790 0.003393
2020-12-05 0.293010 0.250172
2020-12-06 0.172535 0.932216
2020-12-07 0.508303 0.775843
2020-12-08 0.704445 0.760226
2020-12-09 0.515398 0.193958
2020-12-10 0.219717 0.040269
# metadata information for those two assets
dfm = pd.DataFrame(data=[['a', '2020-11-22', '2020-11-29'],
['b', '2020-12-01', '2020-12-07']],
columns=['name', 'start', 'end'])
# set all prices outside the range to np.nan in a loop :(
for index, row in dfm.iterrows():
dfp.loc[(dfp.index < row['start']) | (row['end'] < dfp.index), row['name']] = np.nan
print(dfp)
a b
2020-11-21 NaN NaN
2020-11-22 0.773671 NaN
2020-11-23 0.143332 NaN
2020-11-24 0.062085 NaN
2020-11-25 0.160982 NaN
2020-11-26 0.025793 NaN
2020-11-27 0.651929 NaN
2020-11-28 0.266566 NaN
2020-11-29 0.713030 NaN
2020-11-30 NaN NaN
2020-12-01 NaN 0.480349
2020-12-02 NaN 0.344399
2020-12-03 NaN 0.857529
2020-12-04 NaN 0.003393
2020-12-05 NaN 0.250172
2020-12-06 NaN 0.932216
2020-12-07 NaN 0.775843
2020-12-08 NaN NaN
2020-12-09 NaN NaN
2020-12-10 NaN NaN
Is it possible (and how) to replace the looping with advanced indexing here? if so, how?
if your dataframe isn't too large, you can use melt and merge
then apply a conditional using np.where
df1 = pd.merge(
pd.melt(dfp.reset_index(), id_vars="index", var_name="name"),
dfm,
on=["name"],
how="left",
)
df1['value_new'] = np.where(
(df1['index'] > df1['start']) &
(df1['index'] < df1['end']),
df1['value'],
np.nan
)
index name value start end value_new
0 2020-11-21 a 0.460695 2020-11-22 2020-11-29 NaN
1 2020-11-22 a 0.818190 2020-11-22 2020-11-29 NaN
2 2020-11-23 a 0.869208 2020-11-22 2020-11-29 0.869208
3 2020-11-24 a 0.466557 2020-11-22 2020-11-29 0.466557
4 2020-11-25 a 0.218630 2020-11-22 2020-11-29 0.218630
5 2020-11-26 a 0.769285 2020-11-22 2020-11-29 0.769285
6 2020-11-27 a 0.066418 2020-11-22 2020-11-29 0.066418
7 2020-11-28 a 0.746973 2020-11-22 2020-11-29 0.746973
8 2020-11-29 a 0.881565 2020-11-22 2020-11-29 NaN
9 2020-11-30 a 0.856797 2020-11-22 2020-11-29 NaN
10 2020-12-01 a 0.303156 2020-11-22 2020-11-29 NaN
11 2020-12-02 a 0.152055 2020-11-22 2020-11-29 NaN
12 2020-12-03 a 0.239251 2020-11-22 2020-11-29 NaN
13 2020-12-04 a 0.579377 2020-11-22 2020-11-29 NaN
14 2020-12-05 a 0.950465 2020-11-22 2020-11-29 NaN
15 2020-12-06 a 0.017557 2020-11-22 2020-11-29 NaN
16 2020-12-07 a 0.459709 2020-11-22 2020-11-29 NaN
17 2020-12-08 a 0.235053 2020-11-22 2020-11-29 NaN
18 2020-12-09 a 0.935113 2020-11-22 2020-11-29 NaN
19 2020-12-10 a 0.121584 2020-11-22 2020-11-29 NaN
20 2020-11-21 b 0.982475 2020-12-01 2020-12-07 NaN
21 2020-11-22 b 0.006563 2020-12-01 2020-12-07 NaN
22 2020-11-23 b 0.863132 2020-12-01 2020-12-07 NaN
23 2020-11-24 b 0.059826 2020-12-01 2020-12-07 NaN
24 2020-11-25 b 0.853701 2020-12-01 2020-12-07 NaN
25 2020-11-26 b 0.494347 2020-12-01 2020-12-07 NaN
26 2020-11-27 b 0.680949 2020-12-01 2020-12-07 NaN
27 2020-11-28 b 0.247310 2020-12-01 2020-12-07 NaN
28 2020-11-29 b 0.777140 2020-12-01 2020-12-07 NaN
29 2020-11-30 b 0.552633 2020-12-01 2020-12-07 NaN
30 2020-12-01 b 0.330672 2020-12-01 2020-12-07 NaN
31 2020-12-02 b 0.295119 2020-12-01 2020-12-07 0.295119
32 2020-12-03 b 0.361580 2020-12-01 2020-12-07 0.361580
33 2020-12-04 b 0.874205 2020-12-01 2020-12-07 0.874205
34 2020-12-05 b 0.754738 2020-12-01 2020-12-07 0.754738
35 2020-12-06 b 0.135053 2020-12-01 2020-12-07 0.135053
36 2020-12-07 b 0.998768 2020-12-01 2020-12-07 NaN
37 2020-12-08 b 0.955664 2020-12-01 2020-12-07 NaN
38 2020-12-09 b 0.330856 2020-12-01 2020-12-07 NaN
39 2020-12-10 b 0.826502 2020-12-01 2020-12-07 NaN
Data:
np.random.seed(44)
dfp = pd.DataFrame(data=np.random.random_sample((20, 2)),
columns=['a', 'b'],
index=pd.date_range(end='2020-12-10', periods=20))
dfm = pd.DataFrame(data=[['a', '2020-11-22', '2020-11-29'],
['b', '2020-12-01', '2020-12-07']],
columns=['name', 'start', 'end'])
dfp:
a b
2020-11-21 0.834842 0.104796
2020-11-22 0.744640 0.360501
2020-11-23 0.359311 0.609238
2020-11-24 0.393780 0.409073
2020-11-25 0.509902 0.710148
2020-11-26 0.960526 0.456621
2020-11-27 0.427652 0.113464
2020-11-28 0.217899 0.957472
2020-11-29 0.943351 0.881824
2020-11-30 0.646411 0.213825
2020-12-01 0.636832 0.139146
2020-12-02 0.458704 0.873863
2020-12-03 0.258450 0.664851
2020-12-04 0.862674 0.148848
2020-12-05 0.562950 0.159155
2020-12-06 0.172895 0.104023
2020-12-07 0.202938 0.455189
2020-12-08 0.794575 0.990823
2020-12-09 0.805017 0.377415
2020-12-10 0.515737 0.058899
dfm:
name start end
0 a 2020-11-22 2020-11-29
1 b 2020-12-01 2020-12-07
x = dfm.apply(lambda row: (dfp.index < row['start']) | (row['end'] < dfp.index),
axis=1)
final = dfp[~pd.DataFrame({'a' : x[0], 'b' : x[1]}, index=dfp.index)]
final:
a b
2020-11-21 NaN NaN
2020-11-22 0.744640 NaN
2020-11-23 0.359311 NaN
2020-11-24 0.393780 NaN
2020-11-25 0.509902 NaN
2020-11-26 0.960526 NaN
2020-11-27 0.427652 NaN
2020-11-28 0.217899 NaN
2020-11-29 0.943351 NaN
2020-11-30 NaN NaN
2020-12-01 NaN 0.139146
2020-12-02 NaN 0.873863
2020-12-03 NaN 0.664851
2020-12-04 NaN 0.148848
2020-12-05 NaN 0.159155
2020-12-06 NaN 0.104023
2020-12-07 NaN 0.455189
2020-12-08 NaN NaN
2020-12-09 NaN NaN
2020-12-10 NaN NaN

Group by column and resampled date and get rolling sum of other column

I have the following data:
(Pdb) df1 = pd.DataFrame({'id': ['SE0000195570','SE0000195570','SE0000195570','SE0000195570','SE0000191827','SE0000191827','SE0000191827','SE0000191827', 'SE0000191827'],'val': ['1','2','3','4','5','6','7','8', '9'],'date': pd.to_datetime(['2014-10-23','2014-07-16','2014-04-29','2014-01-31','2018-10-19','2018-07-11','2018-04-20','2018-02-16','2018-12-29'])})
(Pdb) df1
id val date
0 SE0000195570 1 2014-10-23
1 SE0000195570 2 2014-07-16
2 SE0000195570 3 2014-04-29
3 SE0000195570 4 2014-01-31
4 SE0000191827 5 2018-10-19
5 SE0000191827 6 2018-07-11
6 SE0000191827 7 2018-04-20
7 SE0000191827 8 2018-02-16
8 SE0000191827 9 2018-12-29
UPDATE:
As per the suggestions of #user3483203 I have gotten a bit further but not quite there. I've amended the example data above with a new row to illustrate better.
(Pdb) df2.assign(calc=(df2.dropna()['val'].groupby(level=0).rolling(4).sum().shift(-3).reset_index(0, drop=True)))
id val date calc
id date
SE0000191827 2018-02-28 SE0000191827 8 2018-02-16 26.0
2018-03-31 NaN NaN NaT NaN
2018-04-30 SE0000191827 7 2018-04-20 27.0
2018-05-31 NaN NaN NaT NaN
2018-06-30 NaN NaN NaT NaN
2018-07-31 SE0000191827 6 2018-07-11 NaN
2018-08-31 NaN NaN NaT NaN
2018-09-30 NaN NaN NaT NaN
2018-10-31 SE0000191827 5 2018-10-19 NaN
2018-11-30 NaN NaN NaT NaN
2018-12-31 SE0000191827 9 2018-12-29 NaN
SE0000195570 2014-01-31 SE0000195570 4 2014-01-31 10.0
2014-02-28 NaN NaN NaT NaN
2014-03-31 NaN NaN NaT NaN
2014-04-30 SE0000195570 3 2014-04-29 NaN
2014-05-31 NaN NaN NaT NaN
2014-06-30 NaN NaN NaT NaN
2014-07-31 SE0000195570 2 2014-07-16 NaN
2014-08-31 NaN NaN NaT NaN
2014-09-30 NaN NaN NaT NaN
2014-10-31 SE0000195570 1 2014-10-23 NaN
For my requirements, the row (SE0000191827, 2018-03-31) should have a calc value since it has four consecutive rows with a value. Currently the row is being removed with the dropna call and I can't figure out how to solve that problem.
What I need
Calculations: The dates in my initial data is quarterly dates. However, I need to transform this data into monthly rows ranging between the first and last date of each id and for each month calculate the sum of the four closest consecutive rows of the input data within that id. That's a mouthful. This led me to resample. See expected output below. I need the data to be grouped by both id and the monthly dates.
Performance: The data I'm testing on now is just for benchmarking but I will need the solution to be performant. I'm expecting to run this on upwards of 100k unique ids which may result in around 10 million rows. (100k ids, dates range back up to 10 years, 10years * 12months = 120 months per id, 100k*120 = 12million rows).
What I've tried
(Pdb) res = df.groupby('id').resample('M',on='date')
(Pdb) res.first()
id val date
id date
SE0000191827 2018-02-28 SE0000191827 8 2018-02-16
2018-03-31 NaN NaN NaT
2018-04-30 SE0000191827 7 2018-04-20
2018-05-31 NaN NaN NaT
2018-06-30 NaN NaN NaT
2018-07-31 SE0000191827 6 2018-07-11
2018-08-31 NaN NaN NaT
2018-09-30 NaN NaN NaT
2018-10-31 SE0000191827 5 2018-10-19
SE0000195570 2014-01-31 SE0000195570 4 2014-01-31
2014-02-28 NaN NaN NaT
2014-03-31 NaN NaN NaT
2014-04-30 SE0000195570 3 2014-04-29
2014-05-31 NaN NaN NaT
2014-06-30 NaN NaN NaT
2014-07-31 SE0000195570 2 2014-07-16
2014-08-31 NaN NaN NaT
2014-09-30 NaN NaN NaT
2014-10-31 SE0000195570 1 2014-10-23
This data looks very nice for my case since it's nicely grouped by id and has the dates nicely lined up by month. Here it seems like I could use something like df['val'].rolling(4) and make sure it skips NaN values and put that result in a new column.
Expected output (new column calc):
id val date calc
id date
SE0000191827 2018-02-28 SE0000191827 8 2018-02-16 26
2018-03-31 NaN NaN NaT
2018-04-30 SE0000191827 7 2018-04-20 NaN
2018-05-31 NaN NaN NaT
2018-06-30 NaN NaN NaT
2018-07-31 SE0000191827 6 2018-07-11 NaN
2018-08-31 NaN NaN NaT
2018-09-30 NaN NaN NaT
2018-10-31 SE0000191827 5 2018-10-19 NaN
SE0000195570 2014-01-31 SE0000195570 4 2014-01-31 10
2014-02-28 NaN NaN NaT
2014-03-31 NaN NaN NaT
2014-04-30 SE0000195570 3 2014-04-29 NaN
2014-05-31 NaN NaN NaT
2014-06-30 NaN NaN NaT
2014-07-31 SE0000195570 2 2014-07-16 NaN
2014-08-31 NaN NaN NaT
2014-09-30 NaN NaN NaT
2014-10-31 SE0000195570 1 2014-10-23 NaN
2014-11-30 NaN NaN NaT
2014-12-31 SE0000195570 1 2014-10-23 NaN
Here the result in calc is 26 for the first date since it adds the three preceding (8+7+6+5). The rest for that id is NaN since four values are not available.
The problems
While it may look like the data is grouped by id and date, it seems like it's actually grouped by date. I'm not sure how this works. I need the data to be grouped by id and date.
(Pdb) res['val'].get_group(datetime.date(2018,2,28))
7 6.730000e+08
Name: val, dtype: object
The result of the resample above returns a DatetimeIndexResamplerGroupby which doesn't have rolling...
(Pdb) res['val'].rolling(4)
*** AttributeError: 'DatetimeIndexResamplerGroupby' object has no attribute 'rolling'
What to do? My guess is that my approach is wrong but after scouring the documentation I'm not sure where to start.

I have a data and I need to Calculate how many times the median price has been in the open range (160,170).I am just starting to learn Python

ZILLOW/C25499_MLPFAH - Value
Date
2013-04-30 178.571429
2013-05-31 178.571429
2013-06-30 185.380865
2013-07-31 176.747442
2013-08-31 166.666667
2013-09-30 167.599502
2013-10-31 169.025157
2013-11-30 160.929092
2013-12-31 165.282392
2014-01-31 167.153775
2014-02-28 166.666667
2014-03-31 172.686604
2014-04-30 172.207447
2014-05-31 161.466408
2014-06-30 156.976744
2014-07-31 142.410714
2014-08-31 144.152523
2014-09-30 145.656780
2014-10-31 150.291745
2014-11-30 152.343542
2014-12-31 152.343542
2015-01-31 150.387968
2015-02-28 154.441006
2015-03-31 157.130952
2015-04-30 154.761905
2015-05-31 149.999583
2015-06-30 148.054146
2015-07-31 152.357673
2015-08-31 148.054146
2015-09-30 154.715762
2015-10-31 165.719697
2015-11-30 165.719697
2015-12-31 158.990168
2016-01-31 158.990168
2016-02-29 146.204168
2016-03-31 148.255814
2016-04-30 145.340150
2016-05-31 144.152523
2016-06-30 144.152523
2016-07-31 153.556496
2016-08-31 157.471093
2016-09-30 166.272727
2016-10-31 171.289349
2016-11-30 166.272727
2016-12-31 164.085821
2017-01-31 155.586081
2017-02-28 149.224486
2017-03-31 149.107143
2017-04-30 151.785714
2017-05-31 149.107143
2017-06-30 151.903057
2017-07-31 151.903057
2017-08-31 152.020400
2017-09-30 151.477833
2017-10-31 145.813048
2017-11-30 150.843468
2017-12-31 146.829969
2018-01-31 147.846890
2018-02-28 150.843468
2018-03-31 146.920361
data = '''2013-04-30 178.571429
2013-05-31 178.571429
2013-06-30 185.380865
2013-07-31 176.747442
2013-08-31 166.666667
2013-09-30 167.599502
2013-10-31 169.025157
2013-11-30 160.929092
2013-12-31 165.282392
2014-01-31 167.153775
2014-02-28 166.666667
2014-03-31 172.686604
2014-04-30 172.207447
2014-05-31 161.466408
2014-06-30 156.976744
2014-07-31 142.410714
2014-08-31 144.152523
2014-09-30 145.656780
2014-10-31 150.291745
2014-11-30 152.343542
2014-12-31 152.343542
2015-01-31 150.387968
2015-02-28 154.441006
2015-03-31 157.130952
2015-04-30 154.761905
2015-05-31 149.999583
2015-06-30 148.054146
2015-07-31 152.357673
2015-08-31 148.054146
2015-09-30 154.715762
2015-10-31 165.719697
2015-11-30 165.719697
2015-12-31 158.990168
2016-01-31 158.990168
2016-02-29 146.204168
2016-03-31 148.255814
2016-04-30 145.340150
2016-05-31 144.152523
2016-06-30 144.152523
2016-07-31 153.556496
2016-08-31 157.471093
2016-09-30 166.272727
2016-10-31 171.289349
2016-11-30 166.272727
2016-12-31 164.085821
2017-01-31 155.586081
2017-02-28 149.224486
2017-03-31 149.107143
2017-04-30 151.785714
2017-05-31 149.107143
2017-06-30 151.903057
2017-07-31 151.903057
2017-08-31 152.020400
2017-09-30 151.477833
2017-10-31 145.813048
2017-11-30 150.843468
2017-12-31 146.829969
2018-01-31 147.846890
2018-02-28 150.843468
2018-03-31 146.920361'''
print(len([p for p in [float(r.split()[1]) for r in data.split('\n')] if p >= 160 and p <= 170]))
This outputs 13.

delete the integer in timeindex

This is a part of a dataframe.as you can see, there are some Integer in the timeindex. It should not be a timestamp. So I want to just delete it.So how can we delete the records which has the integer as a timeindex?
rent_time rent_price_per_square_meter
0 2016-11-28 09:01:58 0.400000
1 2016-11-28 09:02:35 0.400000
2 2016-11-28 09:02:43 0.400000
3 2016-11-28 09:03:21 0.400000
4 2016-11-28 09:03:21 0.400000
5 2016-11-28 09:03:34 0.400000
6 2016-11-28 09:03:34 0.400000
7 2017-06-17 02:49:33 0.933333
8 2017-03-19 01:30:03 0.490196
9 2017-03-10 06:39:03 11.111111
10 2017-03-09 14:40:03 16.666667
11 908797 11.000000
12 2017-06-08 03:27:52 22.000000
13 2017-06-30 03:03:11 22.000000
14 2017-02-20 11:04:48 12.000000
15 2017-03-05 13:53:39 6.842105
16 2017-03-06 14:00:01 6.842105
17 2017-03-15 02:38:54 20.000000
18 2017-03-15 02:19:07 13.043478
19 2017-02-24 15:10:00 25.000000
20 2017-06-26 02:17:31 13.043478
21 82368 11.111111
22 2017-06-30 07:53:55 4.109589
23 2017-07-17 02:42:43 20.000000
24 2017-06-30 07:38:00 5.254237
25 2017-06-30 07:49:00 4.920635
26 2017-06-30 05:26:26 4.189189
You can use boolean indexing with to_datetime and parameter errors=coerce for return NaNs for no datetime values and then add notnull for return all datetimes:
df1 = df[pd.to_datetime(df['rent_time'], errors='coerce').notnull()]
print (df1)
rent_time rent_price_per_square_meter
0 2016-11-28 09:01:58 0.400000
1 2016-11-28 09:02:35 0.400000
2 2016-11-28 09:02:43 0.400000
3 2016-11-28 09:03:21 0.400000
4 2016-11-28 09:03:21 0.400000
5 2016-11-28 09:03:34 0.400000
6 2016-11-28 09:03:34 0.400000
7 2017-06-17 02:49:33 0.933333
8 2017-03-19 01:30:03 0.490196
9 2017-03-10 06:39:03 11.111111
10 2017-03-09 14:40:03 16.666667
12 2017-06-08 03:27:52 22.000000
13 2017-06-30 03:03:11 22.000000
14 2017-02-20 11:04:48 12.000000
15 2017-03-05 13:53:39 6.842105
16 2017-03-06 14:00:01 6.842105
17 2017-03-15 02:38:54 20.000000
18 2017-03-15 02:19:07 13.043478
19 2017-02-24 15:10:00 25.000000
20 2017-06-26 02:17:31 13.043478
22 2017-06-30 07:53:55 4.109589
23 2017-07-17 02:42:43 20.000000
24 2017-06-30 07:38:00 5.254237
25 2017-06-30 07:49:00 4.920635
26 2017-06-30 05:26:26 4.189189
EDIT:
For next data procesing if need DatetimeIndex:
df['rent_time'] = pd.to_datetime(df['rent_time'], errors='coerce')
df = df.dropna(subset=['rent_time']).set_index('rent_time')
print (df)
rent_price_per_square_meter
rent_time
2016-11-28 09:01:58 0.400000
2016-11-28 09:02:35 0.400000
2016-11-28 09:02:43 0.400000
2016-11-28 09:03:21 0.400000
2016-11-28 09:03:21 0.400000
2016-11-28 09:03:34 0.400000
2016-11-28 09:03:34 0.400000
2017-06-17 02:49:33 0.933333
2017-03-19 01:30:03 0.490196
2017-03-10 06:39:03 11.111111
2017-03-09 14:40:03 16.666667
2017-06-08 03:27:52 22.000000
2017-06-30 03:03:11 22.000000
2017-02-20 11:04:48 12.000000
2017-03-05 13:53:39 6.842105
2017-03-06 14:00:01 6.842105
2017-03-15 02:38:54 20.000000
2017-03-15 02:19:07 13.043478
2017-02-24 15:10:00 25.000000
2017-06-26 02:17:31 13.043478
2017-06-30 07:53:55 4.109589
2017-07-17 02:42:43 20.000000
2017-06-30 07:38:00 5.254237
2017-06-30 07:49:00 4.920635
2017-06-30 05:26:26 4.189189

Categories