my plot picture have two xticks and two yticks by using matplotlib - python

the code is so simple, but there are two xticks and two yticks. it's so strange!
fig = plt.figure(figsize=(16,12))
ax1 = fig.add_subplot(1, 1, 1)
ax1.plot(data['timestamp'], data['value'], 'r', label='value')
ax1.set_xlabel('date', fontsize=16)
ax1.set_ylabel('profit', fontsize=16)
ax1.legend(loc='upper left')
ax1.grid(True)
the data value is below:
0 2010-01-04
1 2010-01-04
2 2010-03-08
3 2010-07-05
4 2010-11-04
Name: timestamp, dtype: datetime64[ns]
0 1.037868
1 1.085912
2 1.092537
3 1.077828
4 1.160641
plot:
I just want the data['timestamp'] and data['value'] show on the picture.
I have tried to add the code below, but the result is the same.
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax1.xaxis.set_major_locator(mdates.YearLocator())
ax1.xaxis.set_minor_locator(mdates.MonthLocator())
I have get the x-tick and y-ticks, the result as below, there are not any value like 0, 0.2, 0.4, 0.6, 0.8, 1.0 in the result.
[14610. 14641. 14669. 14700. 14730. 14761. 14791. 14822. 14853. 14883. 14914.]
[1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.18]

Related

How to plot values in a line plot with string xaxis?

I have this df:
Month MEAN
0 JAN 1.0
1 FEB 2.0
2 MAR 5.0
3 APR 3.0
4 MAY 4.0
5 JUN 2.0
6 JUL 1.0
7 AUG 1.0
8 SEP 0.0
9 OCT 0.0
10 NOV 2.0
11 DEC 3.0
I want to annotate the values of my plot in a lineplot graphic, so i tried this code:
fig = plt.figure('Graphic', figsize=(20,15), dpi=300)
ax1 = fig.add_axes([0.15, 0.20, 0.70, 0.60])
df.plot(kind='line', marker='o',style=['--'],linewidth=7,color='black', ms=15,ax=ax1)
for x,y in zip(df['Month'],df['MEAN']):
label = "{:.2f}".format(y)
plt.annotate(label, # this is the text
(x,y),
textcoords="offset points",
xytext=(0,10),
ha='center')
But i get this error:
ConversionError: Failed to convert value(s) to axis units: 'JAN'
How can i solve this?
pd: Maybe i should change df['Month'] values to numerical but i need to plot the string values in the graphic.
Thanks in advance.
This should work:
fig = plt.figure('Graphic', figsize=(20,15), dpi=300)
ax1 = fig.add_axes([0.15, 0.20, 0.70, 0.60])
df.plot(kind='line', marker='o',style=['--'],linewidth=7,color='black', ms=15,ax=ax1)
plt.xticks(range(0,len(df['Month'])), df['Month'])
plt.show()
Let me know if you have any questions.
As you are aware, the x-axis value must be a number, not a string, so the graph can be created by using the data frame index and then setting the string ticks.
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure('Graphic', figsize=(10,7.5), dpi=72)
ax1 = fig.add_axes([0.15, 0.20, 0.70, 0.60])
df.plot(kind='line', marker='o', style=['--'], linewidth=7, color='black', ms=15, ax=ax1)
for x,y in zip(df.index, df['MEAN']):
label = "{:.2f}".format(y)
plt.annotate(label, # this is the text
(x,y),
textcoords="offset points",
xytext=(0,10),
ha='center')
ax1.set_xticks(np.arange(0,12,1))
ax1.set_xticklabels(df['Month'].unique())
plt.show()

How to plot sequential data, changing the color according to cluster

I have a dataframe with information concerning the date and the cluster that it belongs (it was done before based on collected temperatures for each day). I want to plot this data in sequence, like a stacked bar chart, changing the color of each element according to the assigned cluster. Here it is my table (the info goes up to 100 days):
Date
order
ClusterNo2
constant
2020-08-07
1
3.0
1
2020-08-08
2
0.0
1
2020-08-09
3
1.0
1
2020-08-10
4
3.0
1
2020-08-11
5
1.0
1
2020-08-12
6
1.0
1
2020-08-13
7
3.0
1
2020-08-14
8
2.0
1
2020-08-15
9
2.0
1
2020-08-16
10
2.0
1
2020-08-17
11
2.0
1
2020-08-18
12
1.0
1
2020-08-19
13
1.0
1
2020-08-20
14
0.0
1
2020-08-21
15
0.0
1
2020-08-22
16
1.0
1
Obs: I can't simply group the data by cluster because the plot should be sequential. I thought writing a code to identify the number of elements of each cluster sequentially, but then I will face the same problem for plotting. Someone know how to solve this?
The expected result should be something like this (the numbers inside the bar representing the cluster, the x-axis the time in days and the bar width the number of observed days with the same cluster in order :
You could use the dates for the x-axis, the 'constant' column for the y-axis,
and the Cluster id for the coloring.
You can create a custom legend using a list of colored rectangles.
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import pandas as pd
import numpy as np
N = 100
df = pd.DataFrame({'Date': pd.date_range('2020-08-07', periods=N, freq='D'),
'order': np.arange(1, N + 1),
'ClusterNo2': np.random.randint(0, 4, N).astype(float),
'constant': 1})
df['ClusterNo2'] = df['ClusterNo2'].astype(int) # convert to integers
fig, ax = plt.subplots(figsize=(15, 3))
num_clusters = df['ClusterNo2'].max() + 1
colors = plt.cm.Set2.colors
ax.bar(x=range(len(df)), height=df['constant'], width=1, color=[colors[i] for i in df['ClusterNo2']], edgecolor='none')
ax.set_xticks(range(len(df)))
labels = ['' if i % 3 != 0 else day.strftime('%d\n%b %Y') if i == 0 or day.day <= 3 else day.strftime('%d')
for i, day in enumerate(df['Date'])]
ax.set_xticklabels(labels)
ax.margins(x=0, y=0)
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
legend_handles = [plt.Rectangle((0, 0), 0, 0, color=colors[i], label=f'{i}') for i in range(num_clusters)]
ax.legend(handles=legend_handles, title='Clusters', bbox_to_anchor=(1.01, 1.01), loc='upper left')
fig.tight_layout()
plt.show()
You could just plot a normal bar graph, with 1 bar corresponding to 1 day. If you make the width also 1, it will look as if the patches are contiguous.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# simulate data
total_datapoints = 16
total_clusters = 4
order = np.arange(total_datapoints)
clusters = np.random.randint(0, total_clusters, size=total_datapoints)
# map clusters to colors
cmap = plt.cm.tab10
bounds = np.arange(total_clusters + 1)
norm = BoundaryNorm(bounds, cmap.N)
colors = [cmap(norm(cluster)) for cluster in clusters]
# plot
fig, ax = plt.subplots()
ax.bar(order, np.ones_like(order), width=1, color=colors, align='edge')
# xticks
change_points = np.where(np.diff(clusters) != 0)[0] + 1
change_points = np.unique([0] + change_points.tolist() + [total_datapoints])
ax.set_xticks(change_points)
# annotate clusters
for ii, dx in enumerate(np.diff(change_points)):
xx = change_points[ii] + dx/2
ax.text(xx, 0.5, str(clusters[int(xx)]), ha='center', va='center')
ax.set_xlabel('Time (days)')
plt.show()

Seaborn Concatenated Bar Charts Different Colors

I have this dataframe called cases_deaths:
week daily_case_totals daily_death_totals
0 1 2.0 0.0
1 2 12.0 0.0
2 3 12.0 0.0
3 4 2.0 0.0
4 5 573.0 6.0
5 6 3134.0 12.0
6 7 3398.0 32.0
7 8 992.0 25.0
.
.
.
And this code to generate to Seaborn charts:
fig, axes = plt.subplots(2, 1, figsize=(11, 10))
for name, ax in zip(['daily_case_totals', 'daily_death_totals'], axes):
sns.barplot(data=cases_deaths, x='week', y=name, ax=ax, color = 'red')
And the chart looks like this:
But I want the top one to be blue and bottom to be red. Not sure how to do that, I've tried passing in a list of colors to the color parameter in the for loop but that yielded an error.
Just add one more iterable to zip for the colors:
import seaborn as sns
fig, axes = plt.subplots(2, 1, figsize=(11, 10))
for name, color, ax in zip(('daily_case_totals', 'daily_death_totals'),
('blue', 'red'),
axes):
sns.barplot(data=cases_deaths, x='week', y=name, ax=ax, color=color)

Wrong Dates in Dataframe and Subplots

I am trying to plot my data in the csv file. Currently my dates are not shown properly in the plot also if i am converting it. How can I change it to show the proper dat format as defined Y-m-d? The second question is that I am currently plotting all the dat in one plot but want to have for every Valuegroup one subplot.
My code looks like the following:
import pandas as pd
import matplotlib.pyplot as plt
csv_loader = pd.read_csv('C:/Test.csv', encoding='cp1252', sep=';', index_col=0).dropna()
csv_loader['Date'] = pd.to_datetime(csv_loader['Date'], format="%Y-%m-%d")
print(csv_loader)
fig, ax = plt.subplots()
csv_loader.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
plt.grid(True)
The csv file looks like the following:
Calcgroup;Valuegroup;id;Date;Value
Group1;A;1;20080103;0.1
Group1;A;1;20080104;0.3
Group1;A;1;20080107;0.5
Group1;A;1;20080108;0.9
Group1;B;1;20080103;0.5
Group1;B;1;20080104;1.3
Group1;B;1;20080107;2.0
Group1;B;1;20080108;0.15
Group1;C;1;20080103;1.9
Group1;C;1;20080104;2.1
Group1;C;1;20080107;2.9
Group1;C;1;20080108;0.45
You can just tell pandas to parse that column as a datetime and it will just work:
In[151]:
import matplotlib.pyplot as plt
t="""Calcgroup;Valuegroup;id;Date;Value
Group1;A;1;20080103;0.1
Group1;A;1;20080104;0.3
Group1;A;1;20080107;0.5
Group1;A;1;20080108;0.9
Group1;B;1;20080103;0.5
Group1;B;1;20080104;1.3
Group1;B;1;20080107;2.0
Group1;B;1;20080108;0.15
Group1;C;1;20080103;1.9
Group1;C;1;20080104;2.1
Group1;C;1;20080107;2.9
Group1;C;1;20080108;0.45"""
df = pd.read_csv(io.StringIO(t), parse_dates=['Date'], sep=';', index_col=0)
df
Out[151]:
Valuegroup id Date Value
Calcgroup
Group1 A 1 2008-01-03 0.10
Group1 A 1 2008-01-04 0.30
Group1 A 1 2008-01-07 0.50
Group1 A 1 2008-01-08 0.90
Group1 B 1 2008-01-03 0.50
Group1 B 1 2008-01-04 1.30
Group1 B 1 2008-01-07 2.00
Group1 B 1 2008-01-08 0.15
Group1 C 1 2008-01-03 1.90
Group1 C 1 2008-01-04 2.10
Group1 C 1 2008-01-07 2.90
Group1 C 1 2008-01-08 0.45
fig, ax = plt.subplots()
df.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
plt.grid(True)
plt.show()
results in:
Besides your format string was incorrect anyway, it should be:
csv_loader['Date'] = pd.to_datetime(csv_loader['Date'], format="%Y%m%d")
however, this won't work as that column will have been loaded as int dtype so you would've needed to convert to string first:
csv_loader['Date'] = pd.to_datetime(csv_loader['Date'].astype(str), format="%Y%m%d")
To format the dates on the x-axis you can use DateFormatter from matplotlib see related: Editing the date formatting of x-axis tick labels in matplotlib
from matplotlib.dates import DateFormatter
fig, ax = plt.subplots()
df.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
plt.grid(True)
myFmt = DateFormatter("%d-%m-%Y")
ax.xaxis.set_minor_formatter(myFmt)
plt.show()
now gives plot:
You're parsing your dates wrong; "%Y-%m-%d" would work for dates like 2017-12-11 (which is Dec 12, 2017). Your dates are of the form "%Y%m%d", without the hyphen.

Matplotlib graph displaying aggregate functions in a strange manner

I've faced with the following problem while trying to display data from a DataFrame with Matplotlib. The idea is to build a linear graph where Y-axis is the mean of score for each gamer and the X-axis is the number of shots performed. I have applied aggregate functions to the data in my DataFrame but the resulting graph doesn't look as I have expected.
Here is what I've done so far:
The DataFrame
Score Gamer Shots
a 5.0 gamer1 7
b 3.0 gamer2 2
c 2.5 gamer1 8
d 7.1 gamer3 9
e 1.8 gamer3 2
f 2.2 gamer3 1
The Plot
plt.title('Plot 1', size=14)
plt.xlabel('Number of Shots', size=14)
plt.ylabel('Mean Score', size=14)
plt.grid(b=True, which='major', color='g', linestyle='-')
x = df[['gamer','shots']].groupby(['gamer']).count()
y = df[['gamer','score']].groupby(['gamer']).mean()
plt.plot(x, y)
IIUC, you need something like this:
In [52]: df.groupby('Gamer').agg({'Score':'mean','Shots':'count'}).plot()
Out[52]: <matplotlib.axes._subplots.AxesSubplot at 0xb41e710>
corresponding data:
In [54]: df.groupby('Gamer').agg({'Score':'mean','Shots':'count'})
Out[54]:
Score Shots
Gamer
gamer1 3.75 2
gamer2 3.00 1
gamer3 3.70 3
UPDATE:
I need just a single line plot for displaying the dependency of mean
score of a gamer (Y-axis) on the number of shots(X-axis)
In [90]: df.groupby('Gamer').agg({'Score':'mean','Shots':'count'}).set_index('Shots').plot()
Out[90]: <matplotlib.axes._subplots.AxesSubplot at 0xbe749b0>
UPDATE2:
In [155]: g = df.groupby('Gamer').agg({'Score':'mean','Shots':'count'}).sort_values('Shots')
In [156]: x,y = g['Shots'], g['Score']
In [157]: plt.plot(x, y)
Out[157]: [<matplotlib.lines.Line2D at 0xbdbf668>]

Categories