Python pandas interpolating series

Python pandas interpolating series - python

I have data in a csv file which appears as:
DateTime Temp
10/1/2016 0:00 20.35491156
10/1/2016 1:00 19.75320845
10/1/2016 4:00 17.62411292
10/1/2016 5:00 18.30190001
10/1/2016 6:00 19.37101638
I am reading this file from csv file as:
import numpy as np
import pandas as pd
d2 = pd.Series.from_csv(r'C:\PowerCurve.csv')
d3 = d2.interpolate(method='time')
My goal is to fill the missing hours 2 and 3 with interpolation based on nearby values. i.e. every time there is are missing data it should do the interpolation.
However, d3 doesn't show any interpolation.
Edit:
Based on suggestions below my Python 2.7 still errors out. I am trying the following:
import pandas as pd
d2 = pd.Series.from_csv(r'C:\PowerCurve.csv')
d2.set_index('DateTime').resample('H').interpolate()
Error is:
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 2672, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'set_index'

Use resample with datetime as index and use one of the methods of resampling that fits your need. For instance:
df.set_index('DateTime').resample('1H').pad()
Out[23]:
Temp
DateTime
2016-10-01 00:00:00 20.354912
2016-10-01 01:00:00 19.753208
2016-10-01 02:00:00 19.753208
2016-10-01 03:00:00 19.753208
2016-10-01 04:00:00 17.624113
2016-10-01 05:00:00 18.301900
2016-10-01 06:00:00 19.371016

use the interpolate method after resample on an hourly basis.
d2.set_index('DateTime').resample('H').interpolate()
If d2 is a series then we don't need the set_index
d2.resample('H').interpolate()

Related

pandas datetime columns problem and i don't know what i am missing

I am a Korean student
Please understand that English is awkward
i want to make columns datetime > year,mounth .... ,second
train = pd.read_csv('input/Train.csv')
DateTime looks like this
(this is head(20) and I remove other columns easy to see)
datetime
0 2011-01-01 00:00:00
1 2011-01-01 01:00:00
2 2011-01-01 02:00:00
3 2011-01-01 03:00:00
4 2011-01-01 04:00:00
5 2011-01-01 05:00:00
6 2011-01-01 06:00:00
7 2011-01-01 07:00:00
8 2011-01-01 08:00:00
9 2011-01-01 09:00:00
10 2011-01-01 10:00:00
11 2011-01-01 11:00:00
12 2011-01-01 12:00:00
13 2011-01-01 13:00:00
14 2011-01-01 14:00:00
15 2011-01-01 15:00:00
16 2011-01-01 16:00:00
17 2011-01-01 17:00:00
18 2011-01-01 18:00:00
19 2011-01-01 19:00:00
then I write this code to see each columns (year,month,day,hour,minute,second)
train['year'] = train['datetime'].dt.year
train['month'] = train['datetime'].dt.month
train['day'] = train['datetime'].dt.day
train['hour'] = train['datetime'].dt.hour
train['minute'] = train['datetime'].dt.minute
train['second'] = train['datetime'].dt.seond
and error like this
AttributeError: Can only use .dt accessor with datetimelike values
please help me ㅠㅅㅠ

Note that by default read_csv is able to deduce column type only
for numeric and boolean columns.
Unless explicitely specified (e.g. passing converters or dtype
parameters), all other cases of input are left as strings
and the pandasonic type of such columns is object.
And just this occurred in your case.
So, as this column is of object type, you can not invoke dt accessor
on it, as it works only on columns of datetime type.
Actually, in this case, you can take the following approach:
do not specify any conversion of this column (it will be parsed
just as object),
after that split datetime column into "parts", using str.split
(all 6 columns with a single instruction),
set proper column names in the resulting DataFrame,
join it to the original DataFrame (then drop),
as late as now change the type of the original column.
To do it, you can run:
wrk = df['datetime'].str.split(r'[- :]', expand=True).astype(int)
wrk.columns = ['year', 'month', 'day', 'hour', 'minute', 'second']
df = df.join(wrk)
del wrk
df['datetime'] = pd.to_datetime(df['datetime'])
Note that I added astype(int). Otherwise these columns would be left as
object (actually string) type.
Or maybe this original column is not needed any more (as you have extracted
all date / time components)? In such case drop this column instead of
converting it.
And the last hint: datetime is used rather as a type name (with various
endings).
So it is better when you used some other name here, at least differing
in char case, e.g. DateTime.

How to only keep time from Dataframe column of type Datetime in python

I am trying to extract only time from a datetime column but cannot find any solution. I am not good at string manipulation either.
Example:
Datetime
2017-01-17 00:40:00
2017-01-17 01:40:00
2017-01-17 02:40:00
2017-01-17 03:40:00
2017-01-17 04:40:00
Desired Output:
Time
00:40:00
01:40:00
02:40:00
03:40:00
04:40:00

You can do this with the dt.time
df = pd.DataFrame({'date': {0: '26-1-2014 04:40:00', 1: '26-1-2014 03:40:00', 2:'26-1-2015 02:40:00', 3:'30-1-2014 01:40:00'}})
df['time'] = pd.to_datetime(df.date).dt.time
This will add a time column

Let's assume that the column name for your datetime objects is called 'DatetimeColumn'. You can iterate over the Dataframe and modify the Datetime objects to just represent the time. Here is how you would modify them individually:
for row in df.rows:
timeValue = row['DatetimeColumn'].strftime('%H:%M:%S')

Pandas resample numpy array

So I have a dataframe of the form: index is a date and then I have a column that consists of np.arrays with a shape of 180x360. What I want to do is calculate the weekly mean of the data set. Example of the dataframe:
vika geop
1990-01-01 06:00:00 [[50995.954225, 50995.954225, 50995.954225, 50...
1990-01-02 06:00:00 [[51083.0576138, 51083.0576138, 51083.0576138,...
1990-01-03 06:00:00 [[51045.6321168, 51045.6321168, 51045.6321168,...
1990-01-04 06:00:00 [[50499.8436192, 50499.8436192, 50499.8436192,...
1990-01-05 06:00:00 [[49823.5114237, 49823.5114237, 49823.5114237,...
1990-01-06 06:00:00 [[50050.5148846, 50050.5148846, 50050.5148846,...
1990-01-07 06:00:00 [[50954.5188533, 50954.5188533, 50954.5188533,...
1990-01-08 06:00:00 [[50995.954225, 50995.954225, 50995.954225, 50...
1990-01-09 06:00:00 [[50628.1596088, 50628.1596088, 50628.1596088,...
What I've tried so far is the simple
df = df.resample('W-MON')
But I get this error:
pandas.core.groupby.DataError: No numeric types to aggregate
I've tried to change the datatype of the column to list, but it still does not work. Any idea of how to do it with resample, or any other method?

You can use Panel to represent 3d data:
import pandas as pd
import numpy as np
index = pd.date_range("2012/01/01", "2012/02/01")
p = pd.Panel(np.random.rand(len(index), 3, 4), items=index)
p.resample("W-MON")

pandas TypeError when adding column of timnedelta to a column of timestamps

I need to shift a column of uneven timestamps by a column of uneven timedeltas. I tried to add the two columns but get an TypeError.
I have one pandas timeseries (timestamps) with a column of the datetime
time
2011-01-01 00:00:00+01:00 2011-01-01 00:00:00+01:00
2011-01-01 00:15:00+01:00 2011-01-01 00:15:00+01:00
2011-01-01 00:30:00+01:00 2011-01-01 00:30:00+01:00
and another timeseries with delta values (deltas):
2011-01-01 00:00:00+01:00 00:15:00
2011-01-01 00:15:00+01:00 00:15:00
2011-01-01 00:30:00+01:00 00:30:00
Now I try to add the two by timestamps.add(deltas) or timestamps+deltas
but both throw the error:
TypeError: ufunc 'add' not supported for the input types, and the inputs could not
be safely coerced to any supported types according to the casting rule 'safe'
Individual calculations are working well:
timestamps[1]+deltas[1]
result in timestamp('2011-01-01 00:30:00+0100', tz='Europe/Berlin')
What am I doing wrong? What would be the best way to solve this task?

pandas timeseries between_datetime function?

I have been using the between_time method of TimeSeries in pandas, which returns all values between the specified times, regardless of their date.
But I need to select both date and time, because my timeseries structure
contains multiple dates.
One way of solving this, though quite inflexible, is just iterate over the values and remove those which are not relevant.
Is there a more elegant way of doing this ?

You can select the dates that are of interest first, and then use between_time. For example, suppose you have a time series of 72 hours:
import pandas as pd
from numpy.random import randn
rng = pd.date_range('1/1/2013', periods=72, freq='H')
ts = pd.Series(randn(len(rng)), index=rng)
To select the between 20:00 & 22:00 on the 2nd and 3rd of January you can simply do:
ts['2013-01-02':'2013-01-03'].between_time('20:00', '22:00')
Giving you something like this:
2013-01-02 20:00:00 0.144399
2013-01-02 21:00:00 0.886806
2013-01-02 22:00:00 0.126844
2013-01-03 20:00:00 -0.464741
2013-01-03 21:00:00 1.856746
2013-01-03 22:00:00 -0.286726

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python pandas interpolating series - python

use the interpolate method after resample on an hourly basis. d2.set_index('DateTime').resample('H').interpolate() If d2 is a series then we don't need the set_index d2.resample('H').interpolate()

Related

pandas datetime columns problem and i don't know what i am missing

How to only keep time from Dataframe column of type Datetime in python

Pandas resample numpy array

pandas TypeError when adding column of timnedelta to a column of timestamps

pandas timeseries between_datetime function?

Categories

Resources