Plotting a time series with distinctive events - python

I'm new to data viz. and was wondering what the simplest way is to plot my data:
I have a pd.dataframe that looks like:
df.head()
price event
1 123
2 456 A
3 789
...
I would like to have a time series just as if I did
df.plot(x='price')
But with events visible on the plot at for each entry in my DataFrame where my 'event' column is equal to something.
What are my best options?
Thanks.

I took the liberty and added one more row with event z.
fig = plt.figure()
ax = fig.add_subplot(111)
x = df.reset_index()['index']
y = df['price']
ax.scatter(x, y)
ax.plot(y)
for i, txt in enumerate(df['event']):
ax.annotate(txt, (x[i]+0.1,y[i]))
Output:

Related

Changing plot title through loop

I am new to python and need your help. I have several dataframes. Each dataframe is for one day. So I am using for loop to plot for all dataframe. For each plot I want to add the date in my title. Can anyone help me. I have created a variable 'date_created and assigned the dates which I want. I want my title to look like below :
'Voltage vs time
28-01-2022'
for df in (df1,df2,df3,df4,df5,df6,df7,df8):
y = df[' Voltage']
x = df['time']
date_created = [ '28-01-2022, 29-01-2022, 30-01-2022, 31-08-2022, 01-02-2022, 02-02-2022, 03-02-2022, 04-02-2022' ]
fig, ax = plt.subplots(figsize=(18,7))
plt.plot(x,y, 'b')
plt.xlabel("time")
plt.ylabel(" Voltage [V]")
plt.title("Voltage vs time")
To make code work more effective it would be better to create a dictionary of dataframes and dates (if you haven't got date column in your dataframe).
dict = {df1: '28-01-2022', df2: '29-01-2022', df3: '30-01-2022'}
Than we will use for loop for elements of this dictionary
for key, value in dict.items():
y = key['Voltage']
x = key['time']
fig, ax = plt.subplots(figsize=(18,7))
plt.plot(x,y, 'b')
plt.xlabel("time")
plt.ylabel(" Voltage [V]")
plt.title(f"Voltage vs time {value}")
Hope this will work for you!

Matplotlib customize rank line plot

I have the following dataframe where it contains the best equipment in operation ranked by 1 to 300 (1 is the best, 300 is the worst) over a few days (df columns)
Equipment 21-03-27 21-03-28 21-03-29 21-03-30 21-03-31 21-04-01 21-04-02
P01-INV-1-1 1 1 1 1 1 2 2
P01-INV-1-2 2 2 4 4 5 1 1
P01-INV-1-3 4 4 3 5 6 10 10
I would like to customize a line plot (example found here) but I'm having some troubles trying to modify the example code provided:
import matplotlib.pyplot as plt
import numpy as np
def energy_rank(data, marker_width=0.1, color='blue'):
y_data = np.repeat(data, 2)
x_data = np.empty_like(y_data)
x_data[0::2] = np.arange(1, len(data)+1) - (marker_width/2)
x_data[1::2] = np.arange(1, len(data)+1) + (marker_width/2)
lines = []
lines.append(plt.Line2D(x_data, y_data, lw=1, linestyle='dashed', color=color))
for x in range(0,len(data)*2, 2):
lines.append(plt.Line2D(x_data[x:x+2], y_data[x:x+2], lw=2, linestyle='solid', color=color))
return lines
data = ranks.head(4).to_numpy() #ranks is the above dataframe
artists = []
for row, color in zip(data, ('red','blue','green','magenta')):
artists.extend(energy_rank(row, color=color))
fig, ax = plt.subplots()
ax.set_xticklabels(ranks.columns) # set X axis to be dataframe columns
ax.set_xticklabels(ax.get_xticklabels(), rotation=35, fontsize = 10)
for artist in artists:
ax.add_artist(artist)
ax.set_ybound([15,0])
ax.set_xbound([.5,8.5])
When using ax.set_xticklabels(ranks.columns), for some reason, it only plots 5 of the 7 days from ranks columns, removing specifically the first and last values. I tried to duplicate those values but this did not work as well. I end up having this below:
In summary, I would like to know if its possible to do 3 customizations:
input all dates from ranks columns on X axis
revert Y axis. ax.set_ybound([15,0]) is not working. It would make more sense to see the graph starting with 0 on top, since 1 is the most important rank to look at
add labels to the end of each line at the last day (last value on X axis). I could add the little window label, but it often gets really messy when you plot more data, so adding just the text at the end of each line would really make it look cleaner
Please let me know if those customizations are impossible to do and any help is really appreciated! Thank you in advance!
To show all the dates, use plt.xticks() and set_xbound to start at 0. To reverse the y axis, use ax.set_ylim(ax.get_ylim()[::-1]). To set the legends the way you described, you can use annotation and set the coordinates of the annotation at your last datapoint for each series.
fig, ax = plt.subplots()
plt.xticks(np.arange(len(ranks.columns)), list(ranks.columns), rotation = 35, fontsize = 10)
plt.xlabel('Date')
plt.ylabel('Rank')
for artist in artists:
ax.add_artist(artist)
ax.set_ybound([0,15])
ax.set_ylim(ax.get_ylim()[::-1])
ax.set_xbound([0,8.5])
ax.annotate('Series 1', xy =(7.1, 2), color = 'red')
ax.annotate('Series 2', xy =(7.1, 1), color = 'blue')
ax.annotate('Series 3', xy =(7.1, 10), color = 'green')
plt.show()
Here is the plot for the three rows of data in your sample dataframe:

How to plot a count bar chart with a Pandas DF, grouping by one categorical column and colouring by another

I have a dataframe that looks roughly like this:
Property Name industry
1 123 name1 industry 1
1 144 name1 industry 1
2 456 name2 industry 1
3 789 name3 industry 2
4 367 name4 industry 2
. ... ... ...
. ... ... ...
n 123 name1 industry 1
I want to make a bar plot that plots how many rows for each of the Names there are, and colors the bars by what industry it is. I've tried something like this:
ax = df['name'].value_counts().plot(kind='bar',
figsize=(14,8),
title="Number for each Owner Name")
ax.set_xlabel("Owner Names")
ax.set_ylabel("Frequency")
I get the following:
My question is how do I colour the bars according the the industry column in the dataframe (and add a legend).
Thanks!
This is my answer:
def plot_bargraph_with_groupings(df, groupby, colourby, title, xlabel, ylabel):
"""
Plots a dataframe showing the frequency of datapoints grouped by one column and coloured by another.
df : dataframe
groupby: the column to groupby
colourby: the column to color by
title: the graph title
xlabel: the x label,
ylabel: the y label
"""
import matplotlib.patches as mpatches
# Makes a mapping from the unique colourby column items to a random color.
ind_col_map = {x:y for x, y in zip(df[colourby].unique(),
[plt.cm.Paired(np.arange(len(df[colourby].unique())))][0])}
# Find when the indicies of the soon to be bar graphs colors.
unique_comb = df[[groupby, colourby]].drop_duplicates()
name_ind_map = {x:y for x, y in zip(unique_comb[groupby], unique_comb[colourby])}
c = df[groupby].value_counts().index.map(lambda x: ind_col_map[name_ind_map[x]])
# Makes the bargraph.
ax = df[groupby].value_counts().plot(kind='bar',
figsize=FIG_SIZE,
title=title,
color=[c.values])
# Makes a legend using the ind_col_map
legend_list = []
for key in ind_col_map.keys():
legend_list.append(mpatches.Patch(color=ind_col_map[key], label=key))
# display the graph.
plt.legend(handles=legend_list)
ax.set_xlabel(xlabel)
ax.set_ylabel(ylabel)
Use seaborn.countplot
import seaborn as sns
sns.set(style="darkgrid")
titanic = sns.load_dataset("titanic")
ax = sns.countplot(x="class", data=titanic)
Ref the documentation of seaborn
https://seaborn.pydata.org/generated/seaborn.countplot.html
It might be a little bit too complicated but this does the work. I first defined the mappings from name to industry and from industry to color (it seems like there are only two industries but you can adjust the dictionary to your case):
ind_col_map = {
"industry1": "red",
"industry2": "blue"
}
unique_comb = df[["Name","industry"]].drop_duplicates()
name_ind_map = {x:y for x, y in zip(unique_comb["Name"],unique_comb["industry"])}
Then the color can be generated by using the above two mappings:
c = df['Name'].value_counts().index.map(lambda x: ind_col_map[name_ind_map[x]])
Finally, you only need to simply add color to your plotting function:
ax = df['Name'].value_counts().plot(kind='bar',
figsize=(14,8),
title="Number for each Owner Name", color=c)
ax.set_xlabel("Owner Names")
ax.set_ylabel("Frequency")
plt.show()
Let's use a some dataframe reshaping and matplotlib:
ax = df.groupby(['industry','Name'])['Name'].count().unstack(0).plot.bar(title="Number for each Owner Name", figsize=(14,8))
_ = ax.set_xlabel('Owner')
_ = ax.set_ylabel('Frequency')
Output:

Adding Legends in Pandas Plot

I am plotting Density Graphs using Pandas Plot. But I am not able to add appropriate legends for each of the graphs. My code and result is as as below:-
for i in tickers:
df = pd.DataFrame(dic_2[i])
mean=np.average(dic_2[i])
std=np.std(dic_2[i])
maximum=np.max(dic_2[i])
minimum=np.min(dic_2[i])
df1=pd.DataFrame(np.random.normal(loc=mean,scale=std,size=len(dic_2[i])))
ax=df.plot(kind='density', title='Returns Density Plot for '+ str(i),colormap='Reds_r')
df1.plot(ax=ax,kind='density',colormap='Blues_r')
You can see in the pic, top right side box, the legends are coming as 0. How do I add something meaningful over there?
print(df.head())
0
0 -0.019043
1 -0.0212065
2 0.0060413
3 0.0229895
4 -0.0189266
I think you may want to restructure the way you've created the graph. An easy way to do this is to create the ax before plotting:
# sample data
df = pd.DataFrame()
df['returns_a'] = [x for x in np.random.randn(100)]
df['returns_b'] = [x for x in np.random.randn(100)]
print(df.head())
returns_a returns_b
0 1.110042 -0.111122
1 -0.045298 -0.140299
2 -0.394844 1.011648
3 0.296254 -0.027588
4 0.603935 1.382290
fig, ax = plt.subplots()
I then created the dataframe using the parameters specified in your variables:
mean=np.average(df.returns_a)
std=np.std(df.returns_a)
maximum=np.max(df.returns_a)
minimum=np.min(df.returns_a)
pd.DataFrame(np.random.normal(loc=mean,scale=std,size=len(df.returns_a))).rename(columns={0: 'std_normal'}).plot(kind='density',colormap='Blues_r', ax=ax)
df.plot('returns_a', kind='density', ax=ax)
This second dataframe you're working with is created by default with column 0. You'll need to rename this.
I figured out a simpler way to do this. Just add column names to the dataframes.
for i in tickers:
df = pd.DataFrame(dic_2[i],columns=['Empirical PDF'])
print(df.head())
mean=np.average(dic_2[i])
std=np.std(dic_2[i])
maximum=np.max(dic_2[i])
minimum=np.min(dic_2[i])
df1=pd.DataFrame(np.random.normal(loc=mean,scale=std,size=len(dic_2[i])),columns=['Normal PDF'])
ax=df.plot(kind='density', title='Returns Density Plot for '+ str(i),colormap='Reds_r')
df1.plot(ax=ax,kind='density',colormap='Blues_r')

Pandas groupby results on the same plot

I am dealing with the following data frame (only for illustration, actual df is quite large):
seq x1 y1
0 2 0.7725 0.2105
1 2 0.8098 0.3456
2 2 0.7457 0.5436
3 2 0.4168 0.7610
4 2 0.3181 0.8790
5 3 0.2092 0.5498
6 3 0.0591 0.6357
7 5 0.9937 0.5364
8 5 0.3756 0.7635
9 5 0.1661 0.8364
Trying to plot multiple line graph for the above coordinates (x as "x1 against y as "y1").
Rows with the same "seq" is one path, and has to be plotted as one separate line, like all the x, y coordinates corresponding the seq = 2 belongs to one line, and so on.
I am able to plot them, but on a separate graphs, I want all the lines on the same graph, Using subplots, but not getting it right.
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib notebook
df.groupby("seq").plot(kind = "line", x = "x1", y = "y1")
This creates 100's of graphs (which is equal to the number of unique seq). Suggest me a way to obtain all the lines on the same graph.
**UPDATE*
To resolve the above problem, I implemented the following code:
fig, ax = plt.subplots(figsize=(12,8))
df.groupby('seq').plot(kind='line', x = "x1", y = "y1", ax = ax)
plt.title("abc")
plt.show()
Now, I want a way to plot the lines with specific colors. I am clustering path from seq = 2 and 5 in cluster 1; and path from seq = 3 in another cluster.
So, there are two lines under cluster 1 which I want in red and 1 line under cluster 2 which can be green.
How should I proceed with this?
You need to init axis before plot like in this example
import pandas as pd
import matplotlib.pylab as plt
import numpy as np
# random df
df = pd.DataFrame(np.random.randint(0,10,size=(25, 3)), columns=['ProjID','Xcoord','Ycoord'])
# plot groupby results on the same canvas
fig, ax = plt.subplots(figsize=(8,6))
df.groupby('ProjID').plot(kind='line', x = "Xcoord", y = "Ycoord", ax=ax)
plt.show()
Consider the dataframe df
df = pd.DataFrame(dict(
ProjID=np.repeat(range(10), 10),
Xcoord=np.random.rand(100),
Ycoord=np.random.rand(100),
))
Then we create abstract art like this
df.set_index('Xcoord').groupby('ProjID').Ycoord.plot()
Another way:
for k,g in df.groupby('ProjID'):
plt.plot(g['Xcoord'],g['Ycoord'])
plt.show()
Here is a working example including the ability to adjust legend names.
grp = df.groupby('groupCol')
legendNames = grp.apply(lambda x: x.name) #Get group names using the name attribute.
#legendNames = list(grp.groups.keys()) #Alternative way to get group names. Someone else might be able to speak on speed. This might iterate through the grouper and find keys which could be slower? Not sure
plots = grp.plot('x1','y1',legend=True, ax=ax)
for txt, name in zip(ax.legend_.texts, legendNames):
txt.set_text(name)
Explanation:
Legend values get stored in the parameter ax.legend_ which in turn contains a list of Text() objects, with one item per group, where Text class is found within the matplotlib.text api. To set the text object values, you can use the setter method set_text(self, s).
As a side note, the Text class has a number of set_X() methods that allow you to change the font sizes, fonts, colors, etc. I haven't used those, so I don't know for sure they work, but can't see why not.
based on Serenity's anwser, i make the legend better.
import pandas as pd
import matplotlib.pylab as plt
import numpy as np
# random df
df = pd.DataFrame(np.random.randint(0,10,size=(25, 3)), columns=['ProjID','Xcoord','Ycoord'])
# plot groupby results on the same canvas
grouped = df.groupby('ProjID')
fig, ax = plt.subplots(figsize=(8,6))
grouped.plot(kind='line', x = "Xcoord", y = "Ycoord", ax=ax)
ax.legend(labels=grouped.groups.keys()) ## better legend
plt.show()
and you can also do it like:
grouped = df.groupby('ProjID')
fig, ax = plt.subplots(figsize=(8,6))
g_plot = lambda x:x.plot(x = "Xcoord", y = "Ycoord", ax=ax, label=x.name)
grouped.apply(g_plot)
plt.show()
and it looks like:

Categories