Turn pandas datetime to hour:min rounded to 15 min - python

I read this excell sheet (only column of 'DATEHEUREMAX') with pandas using this command:
xdata = read_excel('Data.xlsx', 'Data', usecols=['DATEHEUREMAX'])
now I want to turn this df into a simplify df with only hour:min rounded to 15min up. The main idea is to plot an histogram base on hour:min

Consider the following DataFrame, with a single column, read as datetime (not string):
Dat
0 2019-06-03 12:07:00
1 2019-06-04 10:04:00
2 2019-06-05 11:42:00
3 2019-06-06 10:17:00
To round these dates to 15 mins run:
df['Dat2'] = df.Dat.dt.round('15T').dt.time.map(lambda s: str(s)[:-3])
The result is:
Dat Dat2
0 2019-06-03 12:07:00 12:00
1 2019-06-04 10:04:00 10:00
2 2019-06-05 11:42:00 11:45
3 2019-06-06 10:17:00 10:15
For demonstration purpose, I saved the result in a new column, but you can
save it in the original column.

I think this is what you are asking for
rounded_column = df['time_column'].dt.round('15min').strftime("%H:%M")
although i agree with the commenters you might not really need to do this and just use a timegrouper

There is no need to round your column in order to get a histogram of dates with your DATEHEUREMAX column. For this purpose you can just make use of pd.Grouper as detailed below.
Toy sample code
You can work out this example to get a solution with your date column:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Generating a sample of 10000 timestamps and selecting 500 to randomize them
df = pd.DataFrame(np.random.choice(pd.date_range(start=pd.to_datetime('2015-01-14'),periods = 10000, freq='S'), 500), columns=['date'])
# Setting the date as the index since the TimeGrouper works on Index, the date column is not dropped to be able to count
df.set_index('date', drop=False, inplace=True)
# Getting the histogram
df.groupby(pd.Grouper(freq='15Min')).count().plot(kind='bar')
plt.show()
This code resolves to a graph like below:
Solution with your data
For your data you should be able to do something like:
import pandas as pd
import matplotlib.pyplot as plt
xdata = read_excel('Data.xlsx', 'Data', usecols=['DATEHEUREMAX'])
xdata.set_index('DATEHEUREMAX', drop=False, inplace=True)
xdata.groupby(pd.Grouper(freq='15Min')).count().plot(kind='bar')
plt.show()

Related

Make datetime line look nice on seaborn plot x axis

How do you reformat from datetime to Week 1, Week 2... to plot onto a seaborn line chart?
Input
Date Ratio
0 2019-10-04 0.350365
1 2019-10-04 0.416058
2 2019-10-11 0.489051
3 2019-10-18 0.540146
4 2019-10-25 0.598540
5 2019-11-08 0.547445
6 2019-11-01 0.722628
7 2019-11-15 0.788321
8 2019-11-22 0.875912
9 2019-11-27 0.948905
Desired output
I was able to cheese it by matching the natural index of the dataframe to the week. I wonder if there's another way to do this.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = {'Date': ['2019-10-04',
'2019-10-04',
'2019-10-11',
'2019-10-18',
'2019-10-25',
'2019-11-08',
'2019-11-01',
'2019-11-15',
'2019-11-22',
'2019-11-27'],
'Ratio': [0.350365,
0.416058,
0.489051,
0.540146,
0.598540,
0.547445,
0.722628,
0.788321,
0.875912,
0.948905]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
graph = sns.lineplot(data=df,x='Date',y='Ratio')
plt.show()
# First plot looks bad.
week_mapping = dict(zip(df['Date'].unique(),range(len(df['Date'].unique()))))
df['Week'] = df['Date'].map(week_mapping)
graph = sns.lineplot(data=df,x='Week',y='Ratio')
plt.show()
# This plot looks better, but method seems cheesy.
It looks like your data is already spaced weekly, so you can just do:
df.groupby('Date',as_index=False)['Ratio'].mean().plot()
Output:
You can make a new column with the week number and use that as your x value. This would give you the week of the year. If you want to start your week numbers with 0, just subtract the week number of the first date from the value (see the commented out section of the code)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime as dt
data = {'Date': ['2019-10-04',
'2019-10-04',
'2019-10-11',
'2019-10-18',
'2019-10-25',
'2019-11-08',
'2019-11-01',
'2019-11-15',
'2019-11-22',
'2019-11-27'],
'Ratio': [0.350365,
0.416058,
0.489051,
0.540146,
0.598540,
0.547445,
0.722628,
0.788321,
0.875912,
0.948905]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
# To get the week number of the year
df.loc[:, 'Week'] = df['Date'].dt.week
# Or you can use the line below for the exact output you had
#df.loc[:, 'Week'] = df['Date'].dt.week - (df.sort_values(by='Date').iloc[0,0].week)
graph = sns.lineplot(data=df,x='Week',y='Ratio')
plt.show()

How to add x-axis on plot?

I am trying to plot some data, but I don't know how I can add the date values on the x-axis on my graph. Here is my code:
import pandas as pd
import numpy as np
%matplotlib inline
%pylab inline
import matplotlib.pyplot as plt
pylab.rcParams['figure.figsize'] = (15, 9)
df["msft"].plot(grid = True)
The description of the image is a plot, but the x-axis just has numbers, but I am looking for dates to appear on x-axis. The dates are in the date column in the dataframe.
Here is what the dataframe looks like:
date msft nok aapl ibm amzn
1 2018-01-01 09:00:00 112 1 143 130 1298
2 2018-01-01 10:00:00 109 10 185 137 1647
3 2018-01-01 11:00:00 98 11 146 105 1331
4 2018-01-01 12:00:00 83 3 214 131 1355
Can you offer some help on what I am missing?
Your column date is just another column for pandas, you have to tell the program that you want to plot against this specific one. One way is to plot against this column:
from matplotlib import pyplot as plt
import pandas as pd
#load dataframe
df = pd.read_csv("test.txt", delim_whitespace=True)
#convert date column to datetime object, if it is not already one
df["date"] = pd.to_datetime(df["date"])
#plot the specified columns vs dates
df.plot(x = "date", y = ["msft", "ibm"], kind = "line", grid = True)
plt.show()
For more pandas plot options, please have a look at the documentation.
Another way would be to set date as the index of the dataframe. Then you can use your approach:
df.set_index("date", inplace = True)
df[["msft", "ibm"]].plot(grid = True)
plt.show()
The automatic date labels might not be, what you want to display. But there are ways to format the output and you can find examples on SO.
one way to do it is the set_xticklabels function, though Mr. T's answer is the proper way to go
ax = plt.subplot(111)
df["msft"].plot(grid = True)
ax.set_xticklabels(df['date'])
plt.xticks(np.arange(4))
with the data provided:

Work with and change the layout of an csv file in pandas

I read a csv data with pandas and now I would like to change the layout of my dataset. My dataset from excel looks like this:
I run the code with df = pd.read_csv(Location2)
This is what I get:
I would like to have a separated column for time and Watt and their values.
I looked at the documentation but I couldn't find something to make it work.
It seems as if you'd need to set up the correct delimiter that separates the two fields. Try adding delimiter=";" to the parameters
Use read_excel
df = pd.read_excel(Location2)
I think you need parameter sep in read_csv, because default separator is ,:
df = pd.read_csv(Location2, sep=';')
Sample:
import pandas as pd
from pandas.compat import StringIO
temp=u"""time;Watt
0;00:00:00;50
1;01:00:00;45
2;02:00:00;40
3;00:03:00;35"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep=";")
print (df)
time Watt
0 00:00:00 50
1 01:00:00 45
2 02:00:00 40
3 00:03:00 35
Then is possible convert time column to_timedelta:
df['time'] = pd.to_timedelta(df['time'])
print (df)
time Watt
0 00:00:00 50
1 01:00:00 45
2 02:00:00 40
3 00:03:00 35
print (df.dtypes)
time timedelta64[ns]
Watt int64
dtype: object

How to use Pandas Series to plot two Time Series of different lengths/starting dates?

I am plotting several pandas series objects of "total events per week". The data in the series events_per_week looks like this:
Datetime
1995-10-09 45
1995-10-16 63
1995-10-23 83
1995-10-30 91
1995-11-06 101
Freq: W-SUN, dtype: int64
My problem is as follows. All pandas series are the same length, i.e. beginning in same year 1995. One array begins in 2003 however. events_per_week2003 begins in 2003
Datetime
2003-09-08 25
2003-09-15 36
2003-09-22 74
2003-09-29 25
2003-09-05 193
Freq: W-SUN, dtype: int64
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(20,5))
ax = plt.subplot(111)
plt.plot(events_per_week)
plt.plot(events_per_week2003)
I get the following value error.
ValueError: setting an array element with a sequence.
How can I do this?
I really don't get where you're having problems.
I tried to recreate a piece of the dataframe, and it plotted with no problems.
import numpy, matplotlib
data = numpy.array([45,63,83,91,101])
df1 = pd.DataFrame(data, index=pd.date_range('2005-10-09', periods=5, freq='W'), columns=['events'])
df2 = pd.DataFrame(numpy.arange(10,21,2), index=pd.date_range('2003-01-09', periods=6, freq='W'), columns=['events'])
matplotlib.pyplot.plot(df1.index, df1.events)
matplotlib.pyplot.plot(df2.index, df2.events)
matplotlib.pyplot.show()
Using Series instead of Dataframe:
ds1 = pd.Series(data, index=pd.date_range('2005-10-09', periods=5, freq='W'))
ds2 = pd.Series(numpy.arange(10,21,2), index=pd.date_range('2003-01-09', periods=6, freq='W'))
matplotlib.pyplot.plot(ds1)
matplotlib.pyplot.plot(ds2)
matplotlib.pyplot.show()

Segmenting a series of Timedeltas to a minute by minute graph (pandas)

I have a dataframe with the index as a Timedelta, ranging from 0 to 5 minutes, and a column of floating point numbers.
Here's an example subset:
32 0.740283
34 0.572126
36 0.524788
38 0.509685
40 0.490219
42 0.545977
44 0.444170
46 1.098387
48 2.209113
51 1.426835
53 1.536439
55 1.196625
56 1.923569
The left being the timedelta in seconds, the right being the floating point number.
The issue is when plotting with pandas I get an x axis with labels such as:
0 days 00:00:00, 0 days 00:01:10, 0 days 00:02:15
and so on. Is there any way I can maybe resample (wrong word?) the data so that I can have axes on a minute by minute basis while still maintaining the data points in the right place?
Example code/data:
df = pd.DataFrame({'td':[32,34,36,38,40,42,44,51,53,152,283],
'val': np.random.rand(11)})
df.index = df.td.map(lambda x: pd.Timedelta(seconds=x.astype(int)))
df.drop(['td'], axis=1, inplace=True)
df.val.plot()
Pandas only provides plotting functions for convenience. To have full control, you need to use Matplotlib directly.
As a workaround, you could just use datetime instead of timedelta as index. As long as your timespans are within minutes, Pandas won't plot the day or month.
To use your example, this works:
df = pd.DataFrame({'td':[32,34,36,38,40,42,44,51,53,152,283],
'val': np.random.rand(11)})
df.index = [dt(2010, 1, 1) + timedelta(seconds=int(i)) for i in df.td]
df.drop(['td'], axis=1, inplace=True)
df.val.plot()
Is this what you want?
import pandas as pd
import numpy as np
td = np.array([32,34,36,38,40,42,44,51,53,152,283])*1e9 # if you don't multiply by 1e9, then pandas will assume you are referring to nanoseconds when you use the function pd.to_datetime()
df = pd.DataFrame({'td':td,
'val': np.random.rand(11)})
df.index = pd.to_datetime(df.td)
df.index = df.index.time # select the time component of the index ... ignoring the date
df.drop('td', 1, inplace=True)
print df
val
00:00:32 0.825991
00:00:34 0.578752
00:00:36 0.348558
00:00:38 0.221674
00:00:40 0.706031
00:00:42 0.912452
00:00:44 0.448185
00:00:51 0.368867
00:00:53 0.188401
00:02:32 0.855828
00:04:43 0.494732
df.plot() # it gets the plot you want

Categories