my plot picture have two xticks and two yticks by using matplotlib

my plot picture have two xticks and two yticks by using matplotlib - python

the code is so simple, but there are two xticks and two yticks. it's so strange!
fig = plt.figure(figsize=(16,12))
ax1 = fig.add_subplot(1, 1, 1)
ax1.plot(data['timestamp'], data['value'], 'r', label='value')
ax1.set_xlabel('date', fontsize=16)
ax1.set_ylabel('profit', fontsize=16)
ax1.legend(loc='upper left')
ax1.grid(True)
the data value is below:
0 2010-01-04
1 2010-01-04
2 2010-03-08
3 2010-07-05
4 2010-11-04
Name: timestamp, dtype: datetime64[ns]
0 1.037868
1 1.085912
2 1.092537
3 1.077828
4 1.160641
plot:
I just want the data['timestamp'] and data['value'] show on the picture.
I have tried to add the code below, but the result is the same.
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax1.xaxis.set_major_locator(mdates.YearLocator())
ax1.xaxis.set_minor_locator(mdates.MonthLocator())
I have get the x-tick and y-ticks, the result as below, there are not any value like 0, 0.2, 0.4, 0.6, 0.8, 1.0 in the result.
[14610. 14641. 14669. 14700. 14730. 14761. 14791. 14822. 14853. 14883. 14914.]
[1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.18]

Related

How to plot values in a line plot with string xaxis?

I have this df:
Month MEAN
0 JAN 1.0
1 FEB 2.0
2 MAR 5.0
3 APR 3.0
4 MAY 4.0
5 JUN 2.0
6 JUL 1.0
7 AUG 1.0
8 SEP 0.0
9 OCT 0.0
10 NOV 2.0
11 DEC 3.0
I want to annotate the values of my plot in a lineplot graphic, so i tried this code:
fig = plt.figure('Graphic', figsize=(20,15), dpi=300)
ax1 = fig.add_axes([0.15, 0.20, 0.70, 0.60])
df.plot(kind='line', marker='o',style=['--'],linewidth=7,color='black', ms=15,ax=ax1)
for x,y in zip(df['Month'],df['MEAN']):
label = "{:.2f}".format(y)
plt.annotate(label, # this is the text
(x,y),
textcoords="offset points",
xytext=(0,10),
ha='center')
But i get this error:
ConversionError: Failed to convert value(s) to axis units: 'JAN'
How can i solve this?
pd: Maybe i should change df['Month'] values to numerical but i need to plot the string values in the graphic.
Thanks in advance.

This should work:
fig = plt.figure('Graphic', figsize=(20,15), dpi=300)
ax1 = fig.add_axes([0.15, 0.20, 0.70, 0.60])
df.plot(kind='line', marker='o',style=['--'],linewidth=7,color='black', ms=15,ax=ax1)
plt.xticks(range(0,len(df['Month'])), df['Month'])
plt.show()
Let me know if you have any questions.

As you are aware, the x-axis value must be a number, not a string, so the graph can be created by using the data frame index and then setting the string ticks.
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure('Graphic', figsize=(10,7.5), dpi=72)
ax1 = fig.add_axes([0.15, 0.20, 0.70, 0.60])
df.plot(kind='line', marker='o', style=['--'], linewidth=7, color='black', ms=15, ax=ax1)
for x,y in zip(df.index, df['MEAN']):
label = "{:.2f}".format(y)
plt.annotate(label, # this is the text
(x,y),
textcoords="offset points",
xytext=(0,10),
ha='center')
ax1.set_xticks(np.arange(0,12,1))
ax1.set_xticklabels(df['Month'].unique())
plt.show()

How to plot sequential data, changing the color according to cluster

I have a dataframe with information concerning the date and the cluster that it belongs (it was done before based on collected temperatures for each day). I want to plot this data in sequence, like a stacked bar chart, changing the color of each element according to the assigned cluster. Here it is my table (the info goes up to 100 days):
Date
order
ClusterNo2
constant
2020-08-07
1
3.0
1
2020-08-08
2
0.0
1
2020-08-09
3
1.0
1
2020-08-10
4
3.0
1
2020-08-11
5
1.0
1
2020-08-12
6
1.0
1
2020-08-13
7
3.0
1
2020-08-14
8
2.0
1
2020-08-15
9
2.0
1
2020-08-16
10
2.0
1
2020-08-17
11
2.0
1
2020-08-18
12
1.0
1
2020-08-19
13
1.0
1
2020-08-20
14
0.0
1
2020-08-21
15
0.0
1
2020-08-22
16
1.0
1
Obs: I can't simply group the data by cluster because the plot should be sequential. I thought writing a code to identify the number of elements of each cluster sequentially, but then I will face the same problem for plotting. Someone know how to solve this?
The expected result should be something like this (the numbers inside the bar representing the cluster, the x-axis the time in days and the bar width the number of observed days with the same cluster in order :

You could use the dates for the x-axis, the 'constant' column for the y-axis,
and the Cluster id for the coloring.
You can create a custom legend using a list of colored rectangles.
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import pandas as pd
import numpy as np
N = 100
df = pd.DataFrame({'Date': pd.date_range('2020-08-07', periods=N, freq='D'),
'order': np.arange(1, N + 1),
'ClusterNo2': np.random.randint(0, 4, N).astype(float),
'constant': 1})
df['ClusterNo2'] = df['ClusterNo2'].astype(int) # convert to integers
fig, ax = plt.subplots(figsize=(15, 3))
num_clusters = df['ClusterNo2'].max() + 1
colors = plt.cm.Set2.colors
ax.bar(x=range(len(df)), height=df['constant'], width=1, color=[colors[i] for i in df['ClusterNo2']], edgecolor='none')
ax.set_xticks(range(len(df)))
labels = ['' if i % 3 != 0 else day.strftime('%d\n%b %Y') if i == 0 or day.day <= 3 else day.strftime('%d')
for i, day in enumerate(df['Date'])]
ax.set_xticklabels(labels)
ax.margins(x=0, y=0)
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
legend_handles = [plt.Rectangle((0, 0), 0, 0, color=colors[i], label=f'{i}') for i in range(num_clusters)]
ax.legend(handles=legend_handles, title='Clusters', bbox_to_anchor=(1.01, 1.01), loc='upper left')
fig.tight_layout()
plt.show()

You could just plot a normal bar graph, with 1 bar corresponding to 1 day. If you make the width also 1, it will look as if the patches are contiguous.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# simulate data
total_datapoints = 16
total_clusters = 4
order = np.arange(total_datapoints)
clusters = np.random.randint(0, total_clusters, size=total_datapoints)
# map clusters to colors
cmap = plt.cm.tab10
bounds = np.arange(total_clusters + 1)
norm = BoundaryNorm(bounds, cmap.N)
colors = [cmap(norm(cluster)) for cluster in clusters]
# plot
fig, ax = plt.subplots()
ax.bar(order, np.ones_like(order), width=1, color=colors, align='edge')
# xticks
change_points = np.where(np.diff(clusters) != 0)[0] + 1
change_points = np.unique([0] + change_points.tolist() + [total_datapoints])
ax.set_xticks(change_points)
# annotate clusters
for ii, dx in enumerate(np.diff(change_points)):
xx = change_points[ii] + dx/2
ax.text(xx, 0.5, str(clusters[int(xx)]), ha='center', va='center')
ax.set_xlabel('Time (days)')
plt.show()

Seaborn Concatenated Bar Charts Different Colors

I have this dataframe called cases_deaths:
week daily_case_totals daily_death_totals
0 1 2.0 0.0
1 2 12.0 0.0
2 3 12.0 0.0
3 4 2.0 0.0
4 5 573.0 6.0
5 6 3134.0 12.0
6 7 3398.0 32.0
7 8 992.0 25.0
.
.
.
And this code to generate to Seaborn charts:
fig, axes = plt.subplots(2, 1, figsize=(11, 10))
for name, ax in zip(['daily_case_totals', 'daily_death_totals'], axes):
sns.barplot(data=cases_deaths, x='week', y=name, ax=ax, color = 'red')
And the chart looks like this:
But I want the top one to be blue and bottom to be red. Not sure how to do that, I've tried passing in a list of colors to the color parameter in the for loop but that yielded an error.

Just add one more iterable to zip for the colors:
import seaborn as sns
fig, axes = plt.subplots(2, 1, figsize=(11, 10))
for name, color, ax in zip(('daily_case_totals', 'daily_death_totals'),
('blue', 'red'),
axes):
sns.barplot(data=cases_deaths, x='week', y=name, ax=ax, color=color)

Wrong Dates in Dataframe and Subplots

I am trying to plot my data in the csv file. Currently my dates are not shown properly in the plot also if i am converting it. How can I change it to show the proper dat format as defined Y-m-d? The second question is that I am currently plotting all the dat in one plot but want to have for every Valuegroup one subplot.
My code looks like the following:
import pandas as pd
import matplotlib.pyplot as plt
csv_loader = pd.read_csv('C:/Test.csv', encoding='cp1252', sep=';', index_col=0).dropna()
csv_loader['Date'] = pd.to_datetime(csv_loader['Date'], format="%Y-%m-%d")
print(csv_loader)
fig, ax = plt.subplots()
csv_loader.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
plt.grid(True)
The csv file looks like the following:
Calcgroup;Valuegroup;id;Date;Value
Group1;A;1;20080103;0.1
Group1;A;1;20080104;0.3
Group1;A;1;20080107;0.5
Group1;A;1;20080108;0.9
Group1;B;1;20080103;0.5
Group1;B;1;20080104;1.3
Group1;B;1;20080107;2.0
Group1;B;1;20080108;0.15
Group1;C;1;20080103;1.9
Group1;C;1;20080104;2.1
Group1;C;1;20080107;2.9
Group1;C;1;20080108;0.45

You can just tell pandas to parse that column as a datetime and it will just work:
In[151]:
import matplotlib.pyplot as plt
t="""Calcgroup;Valuegroup;id;Date;Value
Group1;A;1;20080103;0.1
Group1;A;1;20080104;0.3
Group1;A;1;20080107;0.5
Group1;A;1;20080108;0.9
Group1;B;1;20080103;0.5
Group1;B;1;20080104;1.3
Group1;B;1;20080107;2.0
Group1;B;1;20080108;0.15
Group1;C;1;20080103;1.9
Group1;C;1;20080104;2.1
Group1;C;1;20080107;2.9
Group1;C;1;20080108;0.45"""
df = pd.read_csv(io.StringIO(t), parse_dates=['Date'], sep=';', index_col=0)
df
Out[151]:
Valuegroup id Date Value
Calcgroup
Group1 A 1 2008-01-03 0.10
Group1 A 1 2008-01-04 0.30
Group1 A 1 2008-01-07 0.50
Group1 A 1 2008-01-08 0.90
Group1 B 1 2008-01-03 0.50
Group1 B 1 2008-01-04 1.30
Group1 B 1 2008-01-07 2.00
Group1 B 1 2008-01-08 0.15
Group1 C 1 2008-01-03 1.90
Group1 C 1 2008-01-04 2.10
Group1 C 1 2008-01-07 2.90
Group1 C 1 2008-01-08 0.45
fig, ax = plt.subplots()
df.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
plt.grid(True)
plt.show()
results in:
Besides your format string was incorrect anyway, it should be:
csv_loader['Date'] = pd.to_datetime(csv_loader['Date'], format="%Y%m%d")
however, this won't work as that column will have been loaded as int dtype so you would've needed to convert to string first:
csv_loader['Date'] = pd.to_datetime(csv_loader['Date'].astype(str), format="%Y%m%d")
To format the dates on the x-axis you can use DateFormatter from matplotlib see related: Editing the date formatting of x-axis tick labels in matplotlib
from matplotlib.dates import DateFormatter
fig, ax = plt.subplots()
df.groupby('Valuegroup').plot(x='Date', y='Value', ax=ax, legend=False, kind='line')
plt.grid(True)
myFmt = DateFormatter("%d-%m-%Y")
ax.xaxis.set_minor_formatter(myFmt)
plt.show()
now gives plot:

You're parsing your dates wrong; "%Y-%m-%d" would work for dates like 2017-12-11 (which is Dec 12, 2017). Your dates are of the form "%Y%m%d", without the hyphen.

Matplotlib graph displaying aggregate functions in a strange manner

I've faced with the following problem while trying to display data from a DataFrame with Matplotlib. The idea is to build a linear graph where Y-axis is the mean of score for each gamer and the X-axis is the number of shots performed. I have applied aggregate functions to the data in my DataFrame but the resulting graph doesn't look as I have expected.
Here is what I've done so far:
The DataFrame
Score Gamer Shots
a 5.0 gamer1 7
b 3.0 gamer2 2
c 2.5 gamer1 8
d 7.1 gamer3 9
e 1.8 gamer3 2
f 2.2 gamer3 1
The Plot
plt.title('Plot 1', size=14)
plt.xlabel('Number of Shots', size=14)
plt.ylabel('Mean Score', size=14)
plt.grid(b=True, which='major', color='g', linestyle='-')
x = df[['gamer','shots']].groupby(['gamer']).count()
y = df[['gamer','score']].groupby(['gamer']).mean()
plt.plot(x, y)

IIUC, you need something like this:
In [52]: df.groupby('Gamer').agg({'Score':'mean','Shots':'count'}).plot()
Out[52]: <matplotlib.axes._subplots.AxesSubplot at 0xb41e710>
corresponding data:
In [54]: df.groupby('Gamer').agg({'Score':'mean','Shots':'count'})
Out[54]:
Score Shots
Gamer
gamer1 3.75 2
gamer2 3.00 1
gamer3 3.70 3
UPDATE:
I need just a single line plot for displaying the dependency of mean
score of a gamer (Y-axis) on the number of shots(X-axis)
In [90]: df.groupby('Gamer').agg({'Score':'mean','Shots':'count'}).set_index('Shots').plot()
Out[90]: <matplotlib.axes._subplots.AxesSubplot at 0xbe749b0>
UPDATE2:
In [155]: g = df.groupby('Gamer').agg({'Score':'mean','Shots':'count'}).sort_values('Shots')
In [156]: x,y = g['Shots'], g['Score']
In [157]: plt.plot(x, y)
Out[157]: [<matplotlib.lines.Line2D at 0xbdbf668>]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

my plot picture have two xticks and two yticks by using matplotlib - python

Related

How to plot values in a line plot with string xaxis?

How to plot sequential data, changing the color according to cluster

Seaborn Concatenated Bar Charts Different Colors

Wrong Dates in Dataframe and Subplots

Matplotlib graph displaying aggregate functions in a strange manner

Categories

Resources