Pandas - Replace Duplicates with Nan and Keep Row

Pandas - Replace Duplicates with Nan and Keep Row - python

How do I replace duplicates for each group with NaNs while keeping the rows?
I need to keep rows without removing and perhaps keeping the first original value where it shows up first.
import pandas as pd
from datetime import timedelta
df = pd.DataFrame({
'date': ['2019-01-01 00:00:00','2019-01-01 01:00:00','2019-01-01 02:00:00', '2019-01-01 03:00:00',
'2019-09-01 02:00:00','2019-09-01 03:00:00','2019-09-01 04:00:00', '2019-09-01 05:00:00'],
'value': [10,10,10,10,12,12,12,12],
'ID': ['Jackie','Jackie','Jackie','Jackie','Zoop','Zoop','Zoop','Zoop',]
})
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
date value ID
0 2019-01-01 00:00:00 10 Jackie
1 2019-01-01 01:00:00 10 Jackie
2 2019-01-01 02:00:00 10 Jackie
3 2019-01-01 03:00:00 10 Jackie
4 2019-09-01 02:00:00 12 Zoop
5 2019-09-01 03:00:00 12 Zoop
6 2019-09-01 04:00:00 12 Zoop
7 2019-09-01 05:00:00 12 Zoop
Desired Dataframe:
date value ID
0 2019-01-01 00:00:00 10 Jackie
1 2019-01-01 01:00:00 NaN Jackie
2 2019-01-01 02:00:00 NaN Jackie
3 2019-01-01 03:00:00 NaN Jackie
4 2019-09-01 02:00:00 12 Zoop
5 2019-09-01 03:00:00 NaN Zoop
6 2019-09-01 04:00:00 NaN Zoop
7 2019-09-01 05:00:00 NaN Zoop
Edit:
Duplicated values should only be dropped on the same date indifferent of the frequency. So if value 10 shows up on twice on Jan-1 and three times on Jan-2, the value 10 should only show up once on Jan-1 and once on Jan-2.

I assume you check duplicates on columns value and ID and further check on date of column date
df.loc[df.assign(d=df.date.dt.date).duplicated(['value','ID', 'd']), 'value'] = np.nan
Out[269]:
date value ID
0 2019-01-01 00:00:00 10.0 Jackie
1 2019-01-01 01:00:00 NaN Jackie
2 2019-01-01 02:00:00 NaN Jackie
3 2019-01-01 03:00:00 NaN Jackie
4 2019-09-01 02:00:00 12.0 Zoop
5 2019-09-01 03:00:00 NaN Zoop
6 2019-09-01 04:00:00 NaN Zoop
7 2019-09-01 05:00:00 NaN Zoop
As #Trenton suggest, you may use pd.NA to avoid import numpy
(Note: as #rafaelc sugguest: here is the link explain detail differences between pd.NA and np.nan https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.0.0.html#experimental-na-scalar-to-denote-missing-values)
df.loc[df.assign(d=df.date.dt.date).duplicated(['value','ID', 'd']), 'value'] = pd.NA
Out[273]:
date value ID
0 2019-01-01 00:00:00 10 Jackie
1 2019-01-01 01:00:00 <NA> Jackie
2 2019-01-01 02:00:00 <NA> Jackie
3 2019-01-01 03:00:00 <NA> Jackie
4 2019-09-01 02:00:00 12 Zoop
5 2019-09-01 03:00:00 <NA> Zoop
6 2019-09-01 04:00:00 <NA> Zoop
7 2019-09-01 05:00:00 <NA> Zoop

This is working if the dataframe is sorted - as in your example:
import numpy as np # to be used for np.nan
df['duplicate'] = df['value'].shift(1) # create a duplicate column
df['value'] = df.apply(lambda x: np.nan if x['value'] == x['duplicate'] \
else x['value'], axis=1) # conditional replace
df = df.drop('duplicate', axis=1) # drop helper column

Group on the dates and take the first observed value (not necessarily the first when sorted by time), then merge the result back to the original dataframe.
df2 = df.groupby([df['date'].dt.date, 'ID'], as_index=False).first()
>>> df.drop(columns='value').merge(df2, on=['date', 'ID'], how='left')[df.columns]
date value ID
0 2019-01-01 00:00:00 10.0 Jackie
1 2019-01-01 01:00:00 NaN Jackie
2 2019-01-01 02:00:00 NaN Jackie
3 2019-01-01 03:00:00 NaN Jackie
4 2019-09-01 02:00:00 12.0 Zoop
5 2019-09-01 03:00:00 NaN Zoop
6 2019-09-01 04:00:00 NaN Zoop
7 2019-09-01 05:00:00 NaN Zoop

Related

Expand a time series in the form of numpy.array(), pandas.DataFrame(), or xarray.DataSet() to contain the missing records as NaN

import numpy as np
import pandas as pd
import xarray as xr
validIdx = np.ones(365*5, dtype= bool)
validIdx[np.random.randint(low=0, high=365*5, size=30)] = False
time = pd.date_range("2000-01-01", freq="H", periods=365 * 5)[validIdx]
data = np.arange(365 * 5)[validIdx]
ds = xr.Dataset({"foo": ("time", data), "time": time})
df = ds.to_dataframe()
In the above example, the time-series data ds (or df) has 30 randomly chosen missing records without having those as NaNs. Therefore, the length of data is 365x5 - 30, not 365x5).
My question is this: how can I expand the ds and df to have the 30 missing values as NaNs (so, the length will be 365x5)? For example, if a value in "2000-12-02" is missed in the example data, then it will look like:
...
2000-12-01 value 1
2000-12-03 value 2
...
, while what I want to have is:
...
2000-12-01 value 1
2000-12-02 NaN
2000-12-03 value 2
...

Perhaps you can try resample with 1 hour.
The df without NaNs (just after df = ds.to_dataframe()):
>>> df
foo
time
2000-01-01 00:00:00 0
2000-01-01 01:00:00 1
2000-01-01 02:00:00 2
2000-01-01 03:00:00 3
2000-01-01 04:00:00 4
... ...
2000-03-16 20:00:00 1820
2000-03-16 21:00:00 1821
2000-03-16 22:00:00 1822
2000-03-16 23:00:00 1823
2000-03-17 00:00:00 1824
[1795 rows x 1 columns]
The df with NaNs (df_1h):
>>> df_1h = df.resample('1H').mean()
>>> df_1h
foo
time
2000-01-01 00:00:00 0.0
2000-01-01 01:00:00 1.0
2000-01-01 02:00:00 2.0
2000-01-01 03:00:00 3.0
2000-01-01 04:00:00 4.0
... ...
2000-03-16 20:00:00 1820.0
2000-03-16 21:00:00 1821.0
2000-03-16 22:00:00 1822.0
2000-03-16 23:00:00 1823.0
2000-03-17 00:00:00 1824.0
[1825 rows x 1 columns]
Rows with NaN:
>>> df_1h[df_1h['foo'].isna()]
foo
time
2000-01-02 10:00:00 NaN
2000-01-04 07:00:00 NaN
2000-01-05 06:00:00 NaN
2000-01-09 02:00:00 NaN
2000-01-13 15:00:00 NaN
2000-01-16 16:00:00 NaN
2000-01-18 21:00:00 NaN
2000-01-21 22:00:00 NaN
2000-01-23 19:00:00 NaN
2000-01-24 01:00:00 NaN
2000-01-24 19:00:00 NaN
2000-01-27 12:00:00 NaN
2000-01-27 16:00:00 NaN
2000-01-29 06:00:00 NaN
2000-02-02 01:00:00 NaN
2000-02-06 13:00:00 NaN
2000-02-09 11:00:00 NaN
2000-02-15 12:00:00 NaN
2000-02-15 15:00:00 NaN
2000-02-21 04:00:00 NaN
2000-02-28 05:00:00 NaN
2000-02-28 06:00:00 NaN
2000-03-01 15:00:00 NaN
2000-03-02 18:00:00 NaN
2000-03-04 18:00:00 NaN
2000-03-05 20:00:00 NaN
2000-03-12 08:00:00 NaN
2000-03-13 20:00:00 NaN
2000-03-16 01:00:00 NaN
The number of NaNs in df_1h:
>>> df_1h.isnull().sum()
foo 30
dtype: int64

Sampling dataframe Considering NaN values+Pandas

I have a data frame like below. I want to do sampling with '3S'
So there are situations where NaN is present. What I was expecting is the data frame should do sampling with '3S' and also if there is any 'NaN' found in between then stop there and start the sampling from that index. I tried using dataframe.apply method to achieve but it looks very complex. Is there any short way to achieve?
df.sample(n=3)
Code to generate Input:
index = pd.date_range('1/1/2000', periods=13, freq='T')
series = pd.DataFrame(range(13), index=index)
print series
series.iloc[4] = 'NaN'
series.iloc[10] = 'NaN'
I tried to do sampling but after that there is no clue how to proceed.
2015-01-01 00:00:00 0.0
2015-01-01 01:00:00 1.0
2015-01-01 02:00:00 2.0
2015-01-01 03:00:00 2.0
2015-01-01 04:00:00 NaN
2015-01-01 05:00:00 3.0
2015-01-01 06:00:00 4.0
2015-01-01 07:00:00 4.0
2015-01-01 08:00:00 4.0
2015-01-01 09:00:00 NaN
2015-01-01 10:00:00 3.0
2015-01-01 11:00:00 4.0
2015-01-01 12:00:00 4.0
The new data frame should sample based on '3S' also take into account of 'NaN' if present and start the sampling from there where 'NaN' records are found.
Expected Output:
2015-01-01 02:00:00 2.0 -- Sampling after 3S
2015-01-01 03:00:00 2.0 -- Print because NaN has found in Next
2015-01-01 04:00:00 NaN -- print NaN record
2015-01-01 07:00:00 4.0 -- Sampling after 3S
2015-01-01 08:00:00 4.0 -- Print because NaN has found in Next
2015-01-01 09:00:00 NaN -- print NaN record
2015-01-01 12:00:00 4.0 -- Sampling after 3S

Use:
index = pd.date_range('1/1/2000', periods=13, freq='H')
df = pd.DataFrame({'col': range(13)}, index=index)
df.iloc[4, 0] = np.nan
df.iloc[9, 0] = np.nan
print (df)
col
2000-01-01 00:00:00 0.0
2000-01-01 01:00:00 1.0
2000-01-01 02:00:00 2.0
2000-01-01 03:00:00 3.0
2000-01-01 04:00:00 NaN
2000-01-01 05:00:00 5.0
2000-01-01 06:00:00 6.0
2000-01-01 07:00:00 7.0
2000-01-01 08:00:00 8.0
2000-01-01 09:00:00 NaN
2000-01-01 10:00:00 10.0
2000-01-01 11:00:00 11.0
2000-01-01 12:00:00 12.0
m = df['col'].isna()
s1 = m.ne(m.shift()).cumsum()
t = pd.Timedelta(2, unit='H')
mask = df.index >= df.groupby(s1)['col'].transform(lambda x: x.index[0]) + t
df1 = df[mask | m]
print (df1)
col
2000-01-01 02:00:00 2.0
2000-01-01 03:00:00 3.0
2000-01-01 04:00:00 NaN
2000-01-01 07:00:00 7.0
2000-01-01 08:00:00 8.0
2000-01-01 09:00:00 NaN
2000-01-01 12:00:00 12.0
Explanation:
Create mask for compare missing values by Series.isna
Create groups by consecutive values by comparing shifted values with Series.ne (!=)
print (s1)
2000-01-01 00:00:00 1
2000-01-01 01:00:00 1
2000-01-01 02:00:00 1
2000-01-01 03:00:00 1
2000-01-01 04:00:00 2
2000-01-01 05:00:00 3
2000-01-01 06:00:00 3
2000-01-01 07:00:00 3
2000-01-01 08:00:00 3
2000-01-01 09:00:00 4
2000-01-01 10:00:00 5
2000-01-01 11:00:00 5
2000-01-01 12:00:00 5
Freq: H, Name: col, dtype: int32
Get first value of index per groups, add timdelta (for expected output are added 2T) and compare by DatetimeIndex
Last filter by boolean indexing and chained masks by | for bitwise OR

One way would be to Fill the NAs with 0:
df['Col_of_Interest'] = df['Col_of_Interest'].fillna(0)
And then have the resampling to be done on the series:
(if datetime is your index)
series.resample('30S').asfreq()

Counting continuous nan values in panda Time series

I actually work on time series in Python 3 and Pandas and I want to make a synthesis of periods of contiguous missing values but I'm only able to find the indexes of nan values ...
Sample data :
Valeurs
2018-01-01 00:00:00 1.0
2018-01-01 04:00:00 NaN
2018-01-01 08:00:00 2.0
2018-01-01 12:00:00 NaN
2018-01-01 16:00:00 NaN
2018-01-01 20:00:00 5.0
2018-01-02 00:00:00 6.0
2018-01-02 04:00:00 7.0
2018-01-02 08:00:00 8.0
2018-01-02 12:00:00 9.0
2018-01-02 16:00:00 5.0
2018-01-02 20:00:00 NaN
2018-01-03 00:00:00 NaN
2018-01-03 04:00:00 NaN
2018-01-03 08:00:00 1.0
2018-01-03 12:00:00 2.0
2018-01-03 16:00:00 NaN
Expected results :
Start_Date number of contiguous missing values
2018-01-01 04:00:00 1
2018-01-01 12:00:00 2
2018-01-02 20:00:00 3
2018-01-03 16:00:00 1
How can i manage to obtain this type of results with pandas (shift(), cumsum(), groupby() ???)?
Thank you for your advice!
Sylvain

groupby and agg
mask = df.Valeurs.isna()
d = df.index.to_series()[mask].groupby((~mask).cumsum()[mask]).agg(['first', 'size'])
d.rename(columns=dict(size='num of contig null', first='Start_Date')).reset_index(drop=True)
Start_Date num of contig null
0 2018-01-01 04:00:00 1
1 2018-01-01 12:00:00 2
2 2018-01-02 20:00:00 3
3 2018-01-03 16:00:00 1

Working on the underlying numpy array:
a = df.Valeurs.values
m = np.concatenate(([False],np.isnan(a),[False]))
idx = np.nonzero(m[1:] != m[:-1])[0]
out = df[df.Valeurs.isnull() & ~df.Valeurs.shift().isnull()].index
pd.DataFrame({'Start date': out, 'contiguous': (idx[1::2] - idx[::2])})
Start date contiguous
0 2018-01-01 04:00:00 1
1 2018-01-01 12:00:00 2
2 2018-01-02 20:00:00 3
3 2018-01-03 16:00:00 1

If you have the indices where the values occur, you can use itertools as in this to find continuous chunks

Resample python list with pandas

Fairly new to python and pandas here.
I make a query that's giving me back a timeseries. I'm never sure how many data points I receive from the query (run for a single day), but what I do know is that I need to resample them to contain 24 points (one for each hour in the day).
Printing m3hstream gives
[(1479218009000L, 109), (1479287368000L, 84)]
Then I try to make a dataframe df with
df = pd.DataFrame(data = list(m3hstream), columns=['Timestamp', 'Value'])
and this gives me an output of
Timestamp Value
0 1479218009000 109
1 1479287368000 84
Following I do this
daily_summary = pd.DataFrame()
daily_summary['value'] = df['Value'].resample('H').mean()
daily_summary = daily_summary.truncate(before=start, after=end)
print "Now daily summary"
print daily_summary
But this is giving me a TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Could anyone please let me know how to resample it so I have 1 point for each hour in the 24 hour period that I'm querying for?
Thanks.

First thing you need to do is convert that 'Timestamp' to an actual pd.Timestamp. It looks like those are milliseconds
Then resample with the on parameter set to 'Timestamp'
df = df.assign(
Timestamp=pd.to_datetime(df.Timestamp, unit='ms')
).resample('H', on='Timestamp').mean().reset_index()
Timestamp Value
0 2016-11-15 13:00:00 109.0
1 2016-11-15 14:00:00 NaN
2 2016-11-15 15:00:00 NaN
3 2016-11-15 16:00:00 NaN
4 2016-11-15 17:00:00 NaN
5 2016-11-15 18:00:00 NaN
6 2016-11-15 19:00:00 NaN
7 2016-11-15 20:00:00 NaN
8 2016-11-15 21:00:00 NaN
9 2016-11-15 22:00:00 NaN
10 2016-11-15 23:00:00 NaN
11 2016-11-16 00:00:00 NaN
12 2016-11-16 01:00:00 NaN
13 2016-11-16 02:00:00 NaN
14 2016-11-16 03:00:00 NaN
15 2016-11-16 04:00:00 NaN
16 2016-11-16 05:00:00 NaN
17 2016-11-16 06:00:00 NaN
18 2016-11-16 07:00:00 NaN
19 2016-11-16 08:00:00 NaN
20 2016-11-16 09:00:00 84.0
If you want to fill those NaN values, use ffill, bfill, or interpolate
df.assign(
Timestamp=pd.to_datetime(df.Timestamp, unit='ms')
).resample('H', on='Timestamp').mean().reset_index().interpolate()
Timestamp Value
0 2016-11-15 13:00:00 109.00
1 2016-11-15 14:00:00 107.75
2 2016-11-15 15:00:00 106.50
3 2016-11-15 16:00:00 105.25
4 2016-11-15 17:00:00 104.00
5 2016-11-15 18:00:00 102.75
6 2016-11-15 19:00:00 101.50
7 2016-11-15 20:00:00 100.25
8 2016-11-15 21:00:00 99.00
9 2016-11-15 22:00:00 97.75
10 2016-11-15 23:00:00 96.50
11 2016-11-16 00:00:00 95.25
12 2016-11-16 01:00:00 94.00
13 2016-11-16 02:00:00 92.75
14 2016-11-16 03:00:00 91.50
15 2016-11-16 04:00:00 90.25
16 2016-11-16 05:00:00 89.00
17 2016-11-16 06:00:00 87.75
18 2016-11-16 07:00:00 86.50
19 2016-11-16 08:00:00 85.25
20 2016-11-16 09:00:00 84.00

Let's try:
daily_summary = daily_summary.set_index('Timestamp')
daily_summary.index = pd.to_datetime(daily_summary.index, unit='ms')
For once an hour:
daily_summary.resample('H').mean()
or for once a day:
daily_summary.resample('D').mean()

Fill datetimeindex gap by NaN

I have two dataframes which are datetimeindexed. One is missing a few of these datetimes (df1) while the other is complete (has regular timestamps without any gaps in this series) and is full of NaN's (df2).
I'm trying to match the values from df1 to the index of df2, filling with NaN's where such a datetimeindex doesn't exist in df1.
Example:
In [51]: df1
Out [51]: value
2015-01-01 14:00:00 20
2015-01-01 15:00:00 29
2015-01-01 16:00:00 41
2015-01-01 17:00:00 43
2015-01-01 18:00:00 26
2015-01-01 19:00:00 20
2015-01-01 20:00:00 31
2015-01-01 21:00:00 35
2015-01-01 22:00:00 39
2015-01-01 23:00:00 17
2015-03-01 00:00:00 6
2015-03-01 01:00:00 37
2015-03-01 02:00:00 56
2015-03-01 03:00:00 12
2015-03-01 04:00:00 41
2015-03-01 05:00:00 31
... ...
2018-12-25 23:00:00 41
<34843 rows × 1 columns>
In [52]: df2 = pd.DataFrame(data=None, index=pd.date_range(freq='60Min', start=df1.index.min(), end=df1.index.max()))
df2['value']=np.NaN
df2
Out [52]: value
2015-01-01 14:00:00 NaN
2015-01-01 15:00:00 NaN
2015-01-01 16:00:00 NaN
2015-01-01 17:00:00 NaN
2015-01-01 18:00:00 NaN
2015-01-01 19:00:00 NaN
2015-01-01 20:00:00 NaN
2015-01-01 21:00:00 NaN
2015-01-01 22:00:00 NaN
2015-01-01 23:00:00 NaN
2015-01-02 00:00:00 NaN
2015-01-02 01:00:00 NaN
2015-01-02 02:00:00 NaN
2015-01-02 03:00:00 NaN
2015-01-02 04:00:00 NaN
2015-01-02 05:00:00 NaN
... ...
2018-12-25 23:00:00 NaN
<34906 rows × 1 columns>
Using df2.combine_first(df1) returns the same data as df1.reindex(index= df2.index), which fills any gaps where there shouldn't be data with some value, instead of NaN.
In [53]: Result = df2.combine_first(df1)
Result
Out [53]: value
2015-01-01 14:00:00 20
2015-01-01 15:00:00 29
2015-01-01 16:00:00 41
2015-01-01 17:00:00 43
2015-01-01 18:00:00 26
2015-01-01 19:00:00 20
2015-01-01 20:00:00 31
2015-01-01 21:00:00 35
2015-01-01 22:00:00 39
2015-01-01 23:00:00 17
2015-01-02 00:00:00 35
2015-01-02 01:00:00 53
2015-01-02 02:00:00 28
2015-01-02 03:00:00 48
2015-01-02 04:00:00 42
2015-01-02 05:00:00 51
... ...
2018-12-25 23:00:00 41
<34906 rows × 1 columns>
This is what I was hoping to get:
Out [53]: value
2015-01-01 14:00:00 20
2015-01-01 15:00:00 29
2015-01-01 16:00:00 41
2015-01-01 17:00:00 43
2015-01-01 18:00:00 26
2015-01-01 19:00:00 20
2015-01-01 20:00:00 31
2015-01-01 21:00:00 35
2015-01-01 22:00:00 39
2015-01-01 23:00:00 17
2015-01-02 00:00:00 NaN
2015-01-02 01:00:00 NaN
2015-01-02 02:00:00 NaN
2015-01-02 03:00:00 NaN
2015-01-02 04:00:00 NaN
2015-01-02 05:00:00 NaN
... ...
2018-12-25 23:00:00 41
<34906 rows × 1 columns>
Could someone shed some light on why this is happening, and how to set how these values are filled?

IIUC you need resample df1, because you have an irregular frequency and you need regular frequency:
print df1.index.freq
None
print Result.index.freq
<60 * Minutes>
EDIT1
You can use function asfreq instead of resample - doc, resample vs asfreq.
EDIT2
First I think that resample didn't work, because after resampling the Result is the same as df1. But I try print df1.info() and print Result.info() gets different results - 34857 entries vs 34920 entries.
So I try to find rows with NaN values and it returns 63 rows.
So I think resample works well.
import pandas as pd
df1 = pd.read_csv('test/GapInTimestamps.csv', sep=",", index_col=[0], parse_dates=[0])
print df1.head()
# value
#Date/Time
#2015-01-01 00:00:00 52
#2015-01-01 01:00:00 5
#2015-01-01 02:00:00 12
#2015-01-01 03:00:00 54
#2015-01-01 04:00:00 47
print df1.info()
#<class 'pandas.core.frame.DataFrame'>
#DatetimeIndex: 34857 entries, 2015-01-01 00:00:00 to 2018-12-25 23:00:00
#Data columns (total 1 columns):
#value 34857 non-null int64
#dtypes: int64(1)
#memory usage: 544.6 KB
#None
Result = df1.resample('60min')
print Result.head()
# value
#Date/Time
#2015-01-01 00:00:00 52
#2015-01-01 01:00:00 5
#2015-01-01 02:00:00 12
#2015-01-01 03:00:00 54
#2015-01-01 04:00:00 47
print Result.info()
#<class 'pandas.core.frame.DataFrame'>
#DatetimeIndex: 34920 entries, 2015-01-01 00:00:00 to 2018-12-25 23:00:00
#Freq: 60T
#Data columns (total 1 columns):
#value 34857 non-null float64
#dtypes: float64(1)
#memory usage: 545.6 KB
#None
#find values with NaN
resultnan = Result[Result.isnull().any(axis=1)]
#temporaly display 999 rows and 15 columns
with pd.option_context('display.max_rows', 999, 'display.max_columns', 15):
print resultnan
# value
#Date/Time
#2015-01-13 19:00:00 NaN
#2015-01-13 20:00:00 NaN
#2015-01-13 21:00:00 NaN
#2015-01-13 22:00:00 NaN
#2015-01-13 23:00:00 NaN
#2015-01-14 00:00:00 NaN
#2015-01-14 01:00:00 NaN
#2015-01-14 02:00:00 NaN
#2015-01-14 03:00:00 NaN
#2015-01-14 04:00:00 NaN
#2015-01-14 05:00:00 NaN
#2015-01-14 06:00:00 NaN
#2015-01-14 07:00:00 NaN
#2015-01-14 08:00:00 NaN
#2015-01-14 09:00:00 NaN
#2015-02-01 00:00:00 NaN
#2015-02-01 01:00:00 NaN
#2015-02-01 02:00:00 NaN
#2015-02-01 03:00:00 NaN
#2015-02-01 04:00:00 NaN
#2015-02-01 05:00:00 NaN
#2015-02-01 06:00:00 NaN
#2015-02-01 07:00:00 NaN
#2015-02-01 08:00:00 NaN
#2015-02-01 09:00:00 NaN
#2015-02-01 10:00:00 NaN
#2015-02-01 11:00:00 NaN
#2015-02-01 12:00:00 NaN
#2015-02-01 13:00:00 NaN
#2015-02-01 14:00:00 NaN
#2015-02-01 15:00:00 NaN
#2015-02-01 16:00:00 NaN
#2015-02-01 17:00:00 NaN
#2015-02-01 18:00:00 NaN
#2015-02-01 19:00:00 NaN
#2015-02-01 20:00:00 NaN
#2015-02-01 21:00:00 NaN
#2015-02-01 22:00:00 NaN
#2015-02-01 23:00:00 NaN
#2015-11-01 00:00:00 NaN
#2015-11-01 01:00:00 NaN
#2015-11-01 02:00:00 NaN
#2015-11-01 03:00:00 NaN
#2015-11-01 04:00:00 NaN
#2015-11-01 05:00:00 NaN
#2015-11-01 06:00:00 NaN
#2015-11-01 07:00:00 NaN
#2015-11-01 08:00:00 NaN
#2015-11-01 09:00:00 NaN
#2015-11-01 10:00:00 NaN
#2015-11-01 11:00:00 NaN
#2015-11-01 12:00:00 NaN
#2015-11-01 13:00:00 NaN
#2015-11-01 14:00:00 NaN
#2015-11-01 15:00:00 NaN
#2015-11-01 16:00:00 NaN
#2015-11-01 17:00:00 NaN
#2015-11-01 18:00:00 NaN
#2015-11-01 19:00:00 NaN
#2015-11-01 20:00:00 NaN
#2015-11-01 21:00:00 NaN
#2015-11-01 22:00:00 NaN
#2015-11-01 23:00:00 NaN

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas - Replace Duplicates with Nan and Keep Row - python

Related

Expand a time series in the form of numpy.array(), pandas.DataFrame(), or xarray.DataSet() to contain the missing records as NaN

Sampling dataframe Considering NaN values+Pandas

Counting continuous nan values in panda Time series

Resample python list with pandas

Fill datetimeindex gap by NaN

Categories

Resources