Overlapping two plots with different dates - python

I'm have a pandas dataframe with subscriber information and I'm attempting to plot two graphs on the same figure to see the change over the same period of time at two different instances. The dataframe resembles:
df =
Date 1_Month_Sub 3_Month_Sub
0 2010-01-01 00:00:00 2 4
1 2010-01-02 00:00:00 1 1
2 2010-01-03 00:00:00 3 6
3 2010-01-04 00:00:00 0 3
4 2010-01-05 00:00:00 2 1
...
1381 2014-01-01 00:00:00 4 3
1382 2014-01-02 00:00:00 2 2
1383 2014-01-03 00:00:00 3 0
1384 2014-01-04 00:00:00 2 4
...
if I do the following
df.plot(x='Date', y=['Year Sold'], stacked=False, grid=True,
xlim=['2011-10-01', '2011-10-31'], ylim=[0, 7], figsize=(40, 16),
marker='o', color='red')
df.plot(x='Date', y=['Year Sold'], stacked=False, grid=True,
xlim=['2012-10-01', '2012-10-31'], ylim=[0, 7], figsize=(40, 16),
marker='o', color='green')
plt.show()
I get two separate figures. I would like to put both figures on the same graph.
I realize that the x-axis is different in both cases, one being January 2011, and the other being January 2012 and I'm not too sure how to change that to accommodate both graphs. The idea I had was to use the index but that's not very helpful when I select two different periods that aren't the same month but different years (for instance plotting a 15 day period in January 2011 and a 15 day period in May 2011). Is there a way to maybe have both axis present?

Related

Vertical box plots on the same chart

I have two datasets
Date Daily Frequency
0 2019-01-01 1
1 2019-01-02 5
2 2019-01-03 11
3 2019-01-04 9
4 2019-01-06 1
5 2019-01-07 8
6 2019-01-08 7
7 2019-01-09 4
8 2019-01-10 5
9 2019-01-11 3
and
Date Daily Frequency
0 2020-01-01 1
1 2020-01-02 13
2 2020-01-03 13
3 2020-01-04 4
4 2020-01-06 1
5 2020-01-07 15
6 2020-01-08 11
7 2020-01-09 12
8 2020-01-10 11
9 2020-01-11 4
I would be interested in plotting vertical boxplots on the same charts but one beside the other in order to compare them.
import seaborn as sns
ax = sns.boxplot( y="Daily Frequency", data=df1)
ax1 = sns.boxplot( y="Daily Frequency", data=df2)
but it generates a box plot inside the other one.
Can you please tell me how to create two distinct box blots on the same chart?
Thanks
Try this:
pd.concat([df1,df2], axis=1).boxplot()
Output:

extract only hour:minute from datetimeindex to x-axis when creating sns.pointplot

I have a dataframe that looks like:
deploy deployed_today_rent total_rent cum_deploy hourly percent cum_percent
10min
2019-10-01 05:30:00 6 0 0 6 0.000000 0.000000
2019-10-01 05:40:00 0 0 0 6 0.000000 0.000000
2019-10-01 05:50:00 6 0 0 12 0.000000 0.000000
2019-10-01 06:00:00 13 0 0 25 0.000000 0.000000
2019-10-01 06:10:00 0 0 0 25 0.000000 0.000000
2019-10-01 06:20:00 0 1 1 25 0.040000 0.040000
2019-10-01 06:30:00 0 0 0 25 0.000000 0.040000
2019-10-01 06:40:00 0 1 1 25 0.040000 0.080000
2019-10-01 06:50:00 1 1 1 26 0.038462 0.118462
from this I am trying to create a pointplot where x-axis is datetime and y-axis is deployed_today_rent.
My code for creating visualization:
fig,(ax1, ax2)= plt.subplots(nrows=2)
fig.set_size_inches(22,17)
sns.pointplot(data=test, x=test.index, y="total_rent", ax=ax1,color="blue")
sns.pointplot(data=test, x=test.index, y="deployed_today_rent", ax=ax1, color="green")
ax1.set_xticklabels(test.index, rotation=90,
fontdict={
"fontsize":16,
"fontweight":30
})
I have two axes in a figure, right now since by x-axis ticks are full datetime and it is rotated 90 degrees the whole tick name is not showing, I want to extract only 05:30:00 from 2019-10-01 05:30:00 and use it on x-ticks. How can I do this?
Also in above ax1.set_xticklabels font_weight is not working.
Instead of test.index in your plot lines, use test.index.strftime('%H:%M:%S') this should get you just the Hours:Minutes:Seconds from the index.
Your code should be
sns.pointplot(data=test, x=test.index.strftime('%H:%M:%S'), y="total_rent", ax=ax1,color="blue")
sns.pointplot(data=test, x=test.index.strftime('%H:%M:%S'), y="deployed_today_rent", ax=ax1, color="green")

Group data into bins of 30 minutes

I have a .csv file with some data. There is only one column of in this file, which includes timestamps. I need to organize that data into bins of 30 minutes. This is what my data looks like:
Timestamp
04/01/2019 11:03
05/01/2019 16:30
06/01/2019 13:19
08/01/2019 13:53
09/01/2019 13:43
So in this case, the last two data points would be grouped together in the bin that includes all the data from 13:30 to 14:00.
This is what I have already tried
df = pd.read_csv('book.csv')
df['Timestamp'] = pd.to_datetime(df.Timestamp)
df.groupby(pd.Grouper(key='Timestamp',
freq='30min')).count().dropna()
I am getting around 7000 rows showing all hours for all days with the count next to them, like this:
2019-09-01 03:00:00 0
2019-09-01 03:30:00 0
2019-09-01 04:00:00 0
...
I want to create bins for only the hours that I have in my dataset. I want to see something like this:
Time Count
11:00:00 1
13:00:00 1
13:30:00 2 (we have two data points in this interval)
16:30:00 1
Thanks in advance!
Use groupby.size as:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.Timestamp.dt.floor('30min').dt.time.to_frame()\
.groupby('Timestamp').size()\
.reset_index(name='Count')
Or as per suggestion by jpp:
df = df.Timestamp.dt.floor('30min').dt.time.value_counts().reset_index(name='Count')
print(df)
Timestamp Count
0 11:00:00 1
1 13:00:00 1
2 13:30:00 2
3 16:30:00 1

Bar Plot with recent dates left where date is datetime index

I tried to sort the dataframe by datetime index and then plot the graph but no change still it was showing where latest dates like 2017, 2018 were in right and 2008, 2009 were left.
I wanted the latest year to come left and old to the right.
This was the dataframe earlier.
Title
Date
2001-01-01 0
2002-01-01 9
2003-01-01 11
2004-01-01 17
2005-01-01 23
2006-01-01 25
2007-01-01 51
2008-01-01 55
2009-01-01 120
2010-01-01 101
2011-01-01 95
2012-01-01 118
2013-01-01 75
2014-01-01 75
2015-01-01 3
2016-01-01 35
2017-01-01 75
2018-01-01 55
Ignore the values.
Then I sort the above dataframe by index, and then plot but still no change in plots
df.sort_index(ascending=False, inplace=True)
Use invert_xaxis():
df.plot.bar().invert_xaxis()
If you don't want to use Pandas plotting:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
plt.bar(df.index, df['Title'])
ax.invert_xaxis()
I guess you've not change your index to year. This is why it is not working.you can do so by:
df.index = pd.to_datetime(df.Date).dt.year
#then sort index in descending order
df.sort_index(ascending = False , inplace = True)
df.plot.bar()

Plotting pandas DataFrame with matplotlib

Here is a sample of the code I am using which works perfectly well..
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
# Data
df=pd.DataFrame({'x': np.arange(10), 'y1': np.random.randn(10), 'y2': np.random.randn(10)+
range(1,11), 'y3': np.random.randn(10)+range(11,21) })
print(df)
# multiple line plot
plt.plot( 'x', 'y1', data=df, marker='o', markerfacecolor='blue', markersize=12, color='skyblue', linewidth=4)
plt.plot( 'x', 'y2', data=df, marker='', color='olive', linewidth=2)
plt.plot( 'x', 'y3', data=df, marker='', color='olive', linewidth=2, linestyle='dashed', label="y3")
plt.legend()
plt.show()
The values in the column 'x' actually refers to 10 hours time period of the day, starting with 6 AM as 0 and 7 AM, and so on. Is there any way I could replace those values(x-axis) in my figure with the time periods, like replace the 0 with 6 AM?
It's always a good idea to store time or datetime information as Pandas datetime datatype.
In your example, if you only want to keep the time information:
df['time'] = (df.x + 6) * pd.Timedelta(1, unit='h')
Output
x y1 y2 y3 time
0 0 -0.523190 1.681115 11.194223 06:00:00
1 1 -1.050002 1.727412 13.360231 07:00:00
2 2 0.284060 4.909793 11.377206 08:00:00
3 3 0.960851 2.702884 14.054678 09:00:00
4 4 -0.392999 5.507870 15.594092 10:00:00
5 5 -0.999188 5.581492 15.942648 11:00:00
6 6 -0.555095 6.139786 17.808850 12:00:00
7 7 -0.074643 7.963490 18.486967 13:00:00
8 8 0.445099 7.301115 19.005115 14:00:00
9 9 -0.214138 9.194626 20.432349 15:00:00
If you have a starting date:
start_date='2018-07-29' # change this date appropriately
df['datetime'] = pd.to_datetime(start_date) + (df.x + 6) * pd.Timedelta(1, unit='h')
Output
x y1 y2 y3 time datetime
0 0 -0.523190 1.681115 11.194223 06:00:00 2018-07-29 06:00:00
1 1 -1.050002 1.727412 13.360231 07:00:00 2018-07-29 07:00:00
2 2 0.284060 4.909793 11.377206 08:00:00 2018-07-29 08:00:00
3 3 0.960851 2.702884 14.054678 09:00:00 2018-07-29 09:00:00
4 4 -0.392999 5.507870 15.594092 10:00:00 2018-07-29 10:00:00
5 5 -0.999188 5.581492 15.942648 11:00:00 2018-07-29 11:00:00
6 6 -0.555095 6.139786 17.808850 12:00:00 2018-07-29 12:00:00
7 7 -0.074643 7.963490 18.486967 13:00:00 2018-07-29 13:00:00
8 8 0.445099 7.301115 19.005115 14:00:00 2018-07-29 14:00:00
9 9 -0.214138 9.194626 20.432349 15:00:00 2018-07-29 15:00:00
Now the time / datetime column have a special datatype:
print(df.dtypes)
Out[5]:
x int32
y1 float64
y2 float64
y3 float64
time timedelta64[ns]
datetime datetime64[ns]
dtype: object
Which have a lot of nice properties, including automatic string formatting which you will find very useful in later parts of your projects.
Finally, to plot using matplotlib:
# multiple line plot
plt.plot( df.datetime.dt.hour, df['y1'], marker='o', markerfacecolor='blue', markersize=12, color='skyblue', linewidth=4)
plt.plot( df.datetime.dt.hour, df['y2'], marker='', color='olive', linewidth=2)
plt.plot( df.datetime.dt.hour, df['y3'], marker='', color='olive', linewidth=2, linestyle='dashed', label="y3")
plt.legend()
plt.show()

Categories