Matplotlib graph displaying aggregate functions in a strange manner - python

I've faced with the following problem while trying to display data from a DataFrame with Matplotlib. The idea is to build a linear graph where Y-axis is the mean of score for each gamer and the X-axis is the number of shots performed. I have applied aggregate functions to the data in my DataFrame but the resulting graph doesn't look as I have expected.
Here is what I've done so far:
The DataFrame
Score Gamer Shots
a 5.0 gamer1 7
b 3.0 gamer2 2
c 2.5 gamer1 8
d 7.1 gamer3 9
e 1.8 gamer3 2
f 2.2 gamer3 1
The Plot
plt.title('Plot 1', size=14)
plt.xlabel('Number of Shots', size=14)
plt.ylabel('Mean Score', size=14)
plt.grid(b=True, which='major', color='g', linestyle='-')
x = df[['gamer','shots']].groupby(['gamer']).count()
y = df[['gamer','score']].groupby(['gamer']).mean()
plt.plot(x, y)

IIUC, you need something like this:
In [52]: df.groupby('Gamer').agg({'Score':'mean','Shots':'count'}).plot()
Out[52]: <matplotlib.axes._subplots.AxesSubplot at 0xb41e710>
corresponding data:
In [54]: df.groupby('Gamer').agg({'Score':'mean','Shots':'count'})
Out[54]:
Score Shots
Gamer
gamer1 3.75 2
gamer2 3.00 1
gamer3 3.70 3
UPDATE:
I need just a single line plot for displaying the dependency of mean
score of a gamer (Y-axis) on the number of shots(X-axis)
In [90]: df.groupby('Gamer').agg({'Score':'mean','Shots':'count'}).set_index('Shots').plot()
Out[90]: <matplotlib.axes._subplots.AxesSubplot at 0xbe749b0>
UPDATE2:
In [155]: g = df.groupby('Gamer').agg({'Score':'mean','Shots':'count'}).sort_values('Shots')
In [156]: x,y = g['Shots'], g['Score']
In [157]: plt.plot(x, y)
Out[157]: [<matplotlib.lines.Line2D at 0xbdbf668>]

Related

my plot picture have two xticks and two yticks by using matplotlib

the code is so simple, but there are two xticks and two yticks. it's so strange!
fig = plt.figure(figsize=(16,12))
ax1 = fig.add_subplot(1, 1, 1)
ax1.plot(data['timestamp'], data['value'], 'r', label='value')
ax1.set_xlabel('date', fontsize=16)
ax1.set_ylabel('profit', fontsize=16)
ax1.legend(loc='upper left')
ax1.grid(True)
the data value is below:
0 2010-01-04
1 2010-01-04
2 2010-03-08
3 2010-07-05
4 2010-11-04
Name: timestamp, dtype: datetime64[ns]
0 1.037868
1 1.085912
2 1.092537
3 1.077828
4 1.160641
plot:
I just want the data['timestamp'] and data['value'] show on the picture.
I have tried to add the code below, but the result is the same.
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax1.xaxis.set_major_locator(mdates.YearLocator())
ax1.xaxis.set_minor_locator(mdates.MonthLocator())
I have get the x-tick and y-ticks, the result as below, there are not any value like 0, 0.2, 0.4, 0.6, 0.8, 1.0 in the result.
[14610. 14641. 14669. 14700. 14730. 14761. 14791. 14822. 14853. 14883. 14914.]
[1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.18]

How to interpolate values between points

I have this dataset show below
temp = [0.1, 1, 4, 10, 15, 20, 25, 30, 35, 40]
sg =[0.999850, 0.999902, 0.999975, 0.999703, 0.999103, 0.998207, 0.997047, 0.995649, 0.99403, 0.99222]
sg_temp = pd.DataFrame({'temp' : temp,
'sg' : sg})
temp sg
0 0.1 0.999850
1 1.0 0.999902
2 4.0 0.999975
3 10.0 0.999703
4 15.0 0.999103
5 20.0 0.998207
6 25.0 0.997047
7 30.0 0.995649
8 35.0 0.994030
9 40.0 0.992220
I would like to interpolate all the values between 0.1 and 40 on a scale of 0.001 with a spline interpolation and have those points as in the dataframe as well. I have used resample() before but can't seem to find an equivalent for this case.
I have tried this based off of other questions but it doesn't work.
scale = np.linspace(0, 40, 40*1000)
interpolation_sg = interpolate.CubicSpline(list(sg_temp.temp), list(sg_temp.sg))
It works very well for me. What exactly does not work for you?
Have you correctly used the returned CubicSpline to generate your interpolated values? Or is there some kind of error?
Basically you obtain your interpolated y values by plugging in the new x values (scale) to your returned CubicSpline function:
y = interpolation_sg(scale)
I believe this is the issue here. You probably expect that the interpolation function returns you the values, but it returns a function. And you use this function to obtain your values.
If I plot this, I obtain this graph:
import matplotlib.pyplot as plt
plt.plot(sg_temp['temp'], sg_temp['sg'], marker='o', ls='') # Plots the originial data
plt.plot(scale, interpolation_sg(scale)) # Plots the interpolated data
Call scale with the result of the interpolation:
from scipy import interpolate
out = pd.DataFrame(
{'temp': scale,
'sg': interpolate.CubicSpline(sg_temp['temp'],
sg_temp['sg'])(scale)
})
Visual output:
Code for the plot
ax = plt.subplot()
out.plot(x='temp', y='sg', label='interpolated', ax=ax)
sg_temp.plot(x='temp', y='sg', marker='o', label='sg', ls='', ax=ax)

How to plot sequential data, changing the color according to cluster

I have a dataframe with information concerning the date and the cluster that it belongs (it was done before based on collected temperatures for each day). I want to plot this data in sequence, like a stacked bar chart, changing the color of each element according to the assigned cluster. Here it is my table (the info goes up to 100 days):
Date
order
ClusterNo2
constant
2020-08-07
1
3.0
1
2020-08-08
2
0.0
1
2020-08-09
3
1.0
1
2020-08-10
4
3.0
1
2020-08-11
5
1.0
1
2020-08-12
6
1.0
1
2020-08-13
7
3.0
1
2020-08-14
8
2.0
1
2020-08-15
9
2.0
1
2020-08-16
10
2.0
1
2020-08-17
11
2.0
1
2020-08-18
12
1.0
1
2020-08-19
13
1.0
1
2020-08-20
14
0.0
1
2020-08-21
15
0.0
1
2020-08-22
16
1.0
1
Obs: I can't simply group the data by cluster because the plot should be sequential. I thought writing a code to identify the number of elements of each cluster sequentially, but then I will face the same problem for plotting. Someone know how to solve this?
The expected result should be something like this (the numbers inside the bar representing the cluster, the x-axis the time in days and the bar width the number of observed days with the same cluster in order :
You could use the dates for the x-axis, the 'constant' column for the y-axis,
and the Cluster id for the coloring.
You can create a custom legend using a list of colored rectangles.
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import pandas as pd
import numpy as np
N = 100
df = pd.DataFrame({'Date': pd.date_range('2020-08-07', periods=N, freq='D'),
'order': np.arange(1, N + 1),
'ClusterNo2': np.random.randint(0, 4, N).astype(float),
'constant': 1})
df['ClusterNo2'] = df['ClusterNo2'].astype(int) # convert to integers
fig, ax = plt.subplots(figsize=(15, 3))
num_clusters = df['ClusterNo2'].max() + 1
colors = plt.cm.Set2.colors
ax.bar(x=range(len(df)), height=df['constant'], width=1, color=[colors[i] for i in df['ClusterNo2']], edgecolor='none')
ax.set_xticks(range(len(df)))
labels = ['' if i % 3 != 0 else day.strftime('%d\n%b %Y') if i == 0 or day.day <= 3 else day.strftime('%d')
for i, day in enumerate(df['Date'])]
ax.set_xticklabels(labels)
ax.margins(x=0, y=0)
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
legend_handles = [plt.Rectangle((0, 0), 0, 0, color=colors[i], label=f'{i}') for i in range(num_clusters)]
ax.legend(handles=legend_handles, title='Clusters', bbox_to_anchor=(1.01, 1.01), loc='upper left')
fig.tight_layout()
plt.show()
You could just plot a normal bar graph, with 1 bar corresponding to 1 day. If you make the width also 1, it will look as if the patches are contiguous.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# simulate data
total_datapoints = 16
total_clusters = 4
order = np.arange(total_datapoints)
clusters = np.random.randint(0, total_clusters, size=total_datapoints)
# map clusters to colors
cmap = plt.cm.tab10
bounds = np.arange(total_clusters + 1)
norm = BoundaryNorm(bounds, cmap.N)
colors = [cmap(norm(cluster)) for cluster in clusters]
# plot
fig, ax = plt.subplots()
ax.bar(order, np.ones_like(order), width=1, color=colors, align='edge')
# xticks
change_points = np.where(np.diff(clusters) != 0)[0] + 1
change_points = np.unique([0] + change_points.tolist() + [total_datapoints])
ax.set_xticks(change_points)
# annotate clusters
for ii, dx in enumerate(np.diff(change_points)):
xx = change_points[ii] + dx/2
ax.text(xx, 0.5, str(clusters[int(xx)]), ha='center', va='center')
ax.set_xlabel('Time (days)')
plt.show()

display last N values in x axis rang as label in matplotlib in Python

In my python script df.Value have set of n values(200). I need last 100 values as my x axis label like last 100-200 index values.
plt.figure(figsize=(100, 5), dpi=100)
plt.plot(df['Time'], df['sale'], label='sales')
plt.xlabel('Time ')
plt.ylabel('sales')
plt.title('sales')
plt.legend()
plt.show()
its show 0-200 value in x axis but i need last N values in x axis label
sample data
sample data
sales and time
1 604.802656 13:00:00
2 604.400000 13:01:00
3 604.900024 13:02:00
4 604.099976 13:03:00
5 604.000000 13:04:00
6 604.250000 13:05:00
7 604.400024 13:06:00
8 604.150024 13:07:00
9 604.000000 13:08:00
plt.xticks(np.arange(100),df['Time'].values[100:200])
thid will help you to shows 100 x axis label in last 100 values
try this
plt.xticks(np.arange(100, 200, step=1))
for your case i.e. Time on x-axis you can see this post https://stackoverflow.com/a/16428019/5202279
plt.figure(figsize=(100, 5), dpi=100)
plt.plot(df['Time'], df['sale'], label='sales')
plt.xlabel('Time ')
plt.xticks(np.arrange(100),df.Time[100:200],rotation=45)
plt.ylabel('sales')
plt.title('sales')
plt.legend()
plt.show()
np.ararange(100) indicate 100 X axis value want to show
and df.Time[100:200] get the last 100 string value from data set df.Time
rotate the labe 45 degree
thanks for your support

Create a plot with x axis as timestamp and y axis as shifted price

I am new to time-series programming with pandas. Can somebody help me with this.
Create a plot with x axis as timestamp and y axis as shifted price. In the plot draw the following dotted lines:
Green dotted line which indicates mean
Say mean of shifted price distribution is 0.5 and standard deviation is 2.25
Line should be y = 0.5 ie horizontal line parallel to x-axis
Red dotted lines which indicates one standard deviation above and below x-axis.
Line should be y=2.25 and y=-2.25
Following is a sample image which shows the shifted price in y-axis, time in x-axis, green dotted
line on mean and red dotted line on +- standard deviation
here is the sample data:
0 2017-11-05 09:20:01.134 2123.0 12.23 34.12 300.0
1 2017-11-05 09:20:01.789 2133.0 32.43 45.62 330.0
2 2017-11-05 09:20:02.238 2423.0 35.43 55.62 NaN
3 2017-11-05 09:20:02.567 3423.0 65.43 56.62 NaN
4 2017-11-05 09:20:02.948 2463.0 45.43 58.62 NaN
Consider your price as a Series and plot it as follow :
import numpy as np
import pandas as pd
# Date
rng = pd.date_range('1/1/2000', periods=1000)
# Create a Random Series
ts = pd.Series(np.random.randn(len(rng)), index=rng)
# Create plot
ax = ts.plot()
# Plot de mean
ax.axhline(y=ts.mean(), color='r', linestyle='--', lw=2)
# Plot CI
ax.axhline(y=ts.mean() + 1.96*np.sqrt(np.var(ts)), color='g', linestyle=':', lw=2)
ax.axhline(y=ts.mean() - 1.96*np.sqrt(np.var(ts)), color='g', linestyle=':', lw=2)

Categories