using matplotlib to do a scatter plot - python

when I try to use matplotlib to plot the code i see an empty figure with no plots on it. I am attaching the code and the blank figure. Please let me know as to what I am missing. Thanks!
empty window with no plot
from datetime import datetime
start_time = datetime.now()
print(start_time)
import pandas as pd
import numpy as np
file1 = 'fn_data.csv'
import matplotlib.pyplot as plt
#import pylab
# Read the .txt file into a dataframe
data = pd.read_csv(file1, encoding = "ISO-8859-1", header=0, delimiter=',')
rating=data.iloc[:,0]
chef=data.iloc[:,3]
print(rating)
mydict={}
i = 0
for item in chef:
if(i>0 and item in mydict):
continue
else:
i = i+1
mydict[item] = i
chef_codes=[]
for item in chef:
chef_codes.append(mydict[item])
print(chef_codes)
chef_codes_new=np.array(chef_codes)
rating_new=np.array(rating)
print(type(chef_codes_new),type(rating_new))
print(np.max(chef_codes_new),np.max(rating_new))
plt.plot(kind='scatter',x=chef_codes_new,y=rating_new, marker='o', ms = 10, alpha=1, color='b')
plt.axis([0, 1000, 0, 5])
plt.show()
plt.savefig("fig1.png")
end_time = datetime.now()
print(end_time)

def scatterPlot(X,Y):
ids= ['green' if y == 0 else 'red' for y in Y]
plt.scatter(X[:,0], X[:,1], color=ids)
plt.title("Scatter Plot")
return
scatterPlot(x,y)
Its only an example of scatter plot hope it might help you out from your problem.

If plt.scatter works, plt.plot will work as well. The problem is that you have mixed the syntax of pandas.DataFrame.plot with the command of pyplot.plot.
So, instead of
plt.plot(kind='scatter',x=chef_codes_new,y=rating_new, marker='o', ms = 10, alpha=1, color='b')
you need
plt.plot(chef_codes_new,rating_new, marker='o', ms = 10, alpha=1, color='b')
The syntax is plt.plot(x,y, *args, **kwargs) and you cannot use kind="scatter" in a pyplot plot. If you want a scatter plot, use plt.scatter.

Related

How can i keep the graphing result of a python function despite while calling it multiple times?

I am having fun creating yield curves for a "put" option. I made a function that plots the curves given its arguments. I want to be able to do this for multiple diffrent arguments and display them at the same time to compare. This is what i have so far:
import matplotlib.pyplot as plt
import numpy as np
#counter=1
def my_function(option,strike,bid,price):
if option=="put":
breakeven=price-bid
x=[breakeven-10,breakeven,price,price+bid]
y=[0]*len(x)
i=0
while i<len(x):
if x[i]<price:
y[i]=(x[i]*-100) + breakeven*100
else:
y[i]=-100*bid
print(x[i],y[i])
i+=1
plt.figure(counter)
plt.plot(x, y, label = str(strike))
#naming the x axis
plt.xlabel('price')
#naming the y axis
plt.ylabel('profit')
plt.show()
#counter+=1
my_function("put",90,20,100)
my_function("put",90,10,100)
However, instead of generating another figure, it just replaces it.
I've tried using a global counter and using plt.figure(counter) prior to my plot but it doesnt accept an incrementing counter.
You need to take the fig, ax = plt.subplots() out of the loop and also the plt.show() to achieve this. Otherwise you are writing over the same plot.
So the code will look like this:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
def my_function(option,strike,bid,price):
if option=="put":
breakeven=price-bid
x=[breakeven-10,breakeven,price,price+bid]
y=[0]*len(x)
i=0
while i<len(x):
if x[i]<price:
y[i]=(x[i]*-100) + breakeven*100
else:
y[i]=-100*bid
# print(x[i],y[i])
i+=1
ax.plot(x, y, label = str(strike))
#naming the x axis
ax.set_xlabel('price')
#naming the y axis
ax.set_ylabel('profit')
# plt.show() <-- remove this !!
my_function("put",90,20,100)
my_function("put",90,10,100)
# plot all lines
plt.show()
and the result will look like this:
import matplotlib.pyplot as plt
import numpy as np
plt.close()
fig, ax = plt.subplots()
def my_function(option,strike,bid,price):
if option=="put":
breakeven=price-bid
x=[breakeven-10,breakeven,price,price+bid]
y=[0]*len(x)
i=0
while i<len(x):
if x[i]<price:
y[i]=(x[i]*-100) + breakeven*100
else:
y[i]=-100*bid
# print(x[i],y[i])
i+=1
ax.plot(x, y, label = str(strike))
#naming the x axis
ax.set_xlabel('price')
#naming the y axis
ax.set_ylabel('profit')
# plt.show()
my_function("put",90,20,100)
my_function("put",90,10,100)

Plotting a scatterplot gif from a dataframe

I have a Dataframe with 6 rows of data and 4 columns. Is there any way to generate a gif scatterplot (y which are the 4 columns in different color versus x which are the index rows) plot in which in every frame of the gif, first data point of the Column 1 and its first respective row data is plotted in different color versus the shared x axis which are the indexes, at the same time, column 2, 3 and 4 first data points are plotted, and this goes progressively until the last 6th point is plotted for all of the columns? If a gif is not possible at all, is there any other way to generate at least movie so that I can include in my ppt slide? I appreciate any feedback you might have! The error I am getting is generating an empty plot and saying: TypeError: cannot unpack non-iterable AxesSubplot object. But I am not sure if this is preventing the result from the plotting.
This is a sample of my data and code effort:
import pandas as pd
import numpy as np
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import random
from itertools import count
from IPython import display
row_data = np.arange(0, 6)
column_X = np.random.rand(6,)
column_Y = np.random.rand(6,)
column_Z = np.random.rand(6,)
column_K = np.random.rand(6,)
my_df = pd.DataFrame()
my_df['column_X'] = column_X
my_df['column_Y'] = column_Y
my_df['column_Z'] = column_Z
my_df['column_K'] = column_K
my_df.index = row_data
my_df['index'] = row_data
def animate(j):
fig, ax = plt.subplot(sharex= True)
ax[1]=my_df['column_X', color = 'blue']
ax[2]=my_df['column_Y', color = 'red']
ax[3]=my_df['column_Z', color = 'brown']
ax[4]=my_df['column_K', color = 'green']
y=my_df['index']
x.append()
y.append()
plt.xlabel(color = 'blue')
plt.ylabel(color = 'red')
ax.set_ylabel("progressive sales through time")
ax.set_xlabel("progressive time")
plt.plot(x,y)
animation_1 = animation.FuncAnimation(plt.gcf(),animate,interval=1000)
plt.show()
# Inside Jupyter:
video_1 = animation_1.to_html5_video()
html_code_1 = display.HTML(video_1)
display.display(html_code_1)
plt.tight_layout()
plt.show()
Good question! matplotlib animations can be tricky. I struggled a bit with this one, mainly because you want different colors for the different columns. You need 4 different Line2D objects to do this.
# VSCode notebook magic
%matplotlib widget
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
my_df = pd.DataFrame()
my_df["column_X"] = np.random.rand(6)
my_df["column_Y"] = np.random.rand(6)
my_df["column_Z"] = np.random.rand(6)
my_df["column_K"] = np.random.rand(6)
fig, ax = plt.subplots()
# four y-data lists, x-data is shared
xdata, y1, y2, y3, y4 = [], [], [], [], []
# four Line3D objects with different colors
graph1, = ax.plot([], [], 'ro-')
graph2, = ax.plot([], [], 'go-')
graph3, = ax.plot([], [], 'bo-')
graph4, = ax.plot([], [], 'ko-')
# set up the plot
plt.xlim(-1, 6)
plt.xlabel('Time')
plt.ylim(0, 1)
plt.ylabel('Price')
# animation function
def animate(i):
xdata.append(i)
y1.append(my_df.iloc[i,0])
y2.append(my_df.iloc[i,1])
y3.append(my_df.iloc[i,2])
y4.append(my_df.iloc[i,3])
graph1.set_data(xdata, y1)
graph2.set_data(xdata, y2)
graph3.set_data(xdata, y3)
graph4.set_data(xdata, y4)
return (graph1,graph2,graph3,graph4,)
anim = animation.FuncAnimation(fig, animate, frames=6, interval=500, blit=True)
anim.save('test.mp4')
#plt.show()
Here's the resulting .gif (converted from .mp4 using Adobe Express):

how to make the mplcursors module only show labels for points plotted on a line graph

So I have a line graph where the mplcursors module in python shows the coordinates for any point on it.
I want it to only show labels for points that are explicitly plotted, not the ones that are in-between the plotted points and happen to be on the line connecting them.
I am willing to update the question with the code if you want.
One approach is to create an invisible scatterplot for the same points, and attach the mplcursor to it.
import matplotlib.pyplot as plt
import numpy as np
import mplcursors
x = np.arange(30)
y = 30 + np.random.randint(-5, 6, x.size).cumsum()
fig, ax = plt.subplots()
ax.plot(x, y)
dots = ax.scatter(x, y, color='none')
mplcursors.cursor(dots, hover=True)
plt.show()
The functionality could be wrapped into a helper function:
import matplotlib.pyplot as plt
import numpy as np
import mplcursors
def create_mplcursor_for_points_on_line(lines, ax=None, annotation_func=None, **kwargs):
ax = ax or plt.gca()
scats = [ax.scatter(x=line.get_xdata(), y=line.get_ydata(), color='none') for line in lines]
cursor = mplcursors.cursor(scats, **kwargs)
if annotation_func is not None:
cursor.connect('add', annotation_func)
return cursor
x = np.arange(10, 301, 10)
y = 30 + np.random.randint(-5, 6, x.size).cumsum()
fig, ax = plt.subplots()
lines = ax.plot(x, y)
cursor = create_mplcursor_for_points_on_line(lines, ax=ax, hover=True)
plt.show()
I cannot show the whole code
I have found a good solution to this problem. You can extract datapoints of linegraph with mplcursor function.
# Label functions
def show_datapoints(sel):
xi, yi = sel[0], sel[0]
xi, yi = xi._xorig.tolist(), yi._yorig.tolist()
sel.annotation.set_text('x: '+ str(xi[round(sel.target.index)]) +'\n'+ 'y: '+ str(yi[round(sel.target.index)]))
call the show_datapoints function in mplcursor to show datapoints
mplcursors.cursor(self.ax1).connect('add',show_datapoints)

Matplotlib and plt.connect

Good Morning
I have done a bar graph, and I want than a line change of position in the graph bar with a mouse event. Im a novice and I couldn't get it to work, I put the code underneath.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(12345)
df = pd.DataFrame([np.random.normal(32000,200000,3650),
np.random.normal(43000,100000,3650),
np.random.normal(43500,140000,3650),
np.random.normal(48000,70000,3650)],
index=[1992,1993,1994,1995])
df['mean']=df.mean(axis=1)
df['std']=df.std(axis=1)
fig, ax = plt.subplots()
years = df.index.values.tolist()
averages = df['mean'].values.tolist()
stds =df['std'].values.tolist()
x_pos = np.arange(len(years))
min_value = int(df.values.min())
max_value =int(df.values.max())
yaxis = np.arange(min_value,max_value, 100)
plt.bar(x_pos,averages, color='red')
ax.set_xticks(x_pos)
ax.set_xticklabels(years)
ax.set_ylabel('Values')
ax.set_title('Average values per year')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
line = ax.axhline(y=10000,color='black')
traza=[]
def onclick(event):
if not event.inaxes:
return
traza.append('onclick '+event.ydata())
line.set_data([0,1], event.ydata())
plt.connect('button_press_event', onclick)
plt.show()
I can't even get the onclick procedure done. Could you help me?
Thank you
Several things are going wrong:
event.ydata is not a function, so you can't call it as event.ydata(). Just use it directly.
When some graphical information changes, the image on the screen isn't updated immediately (as there can be many changes and redrawing continuously could be very slow). After all changes are done, calling fig.canvas.draw() will update the screen.
'onclick ' + event.ydata doesn't work. 'onclick ' is a string and ydata is a number. To concatenate a string and a number, first convert the number to a string: 'onclick ' + str(event.ydata)
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def onclick(event):
if not event.inaxes:
return
line.set_data([0, 1], event.ydata)
fig.canvas.draw()
np.random.seed(12345)
df = pd.DataFrame([np.random.normal(32000, 200000, 3650),
np.random.normal(43000, 100000, 3650),
np.random.normal(43500, 140000, 3650),
np.random.normal(48000, 70000, 3650)],
index=[1992, 1993, 1994, 1995])
df['mean'] = df.mean(axis=1)
df['std'] = df.std(axis=1)
fig, ax = plt.subplots()
years = df.index.values.tolist()
averages = df['mean'].values.tolist()
stds = df['std'].values.tolist()
x_pos = np.arange(len(years))
min_value = int(df.values.min())
max_value = int(df.values.max())
yaxis = np.arange(min_value, max_value, 100)
plt.bar(x_pos, averages, color='red')
ax.set_xticks(x_pos)
ax.set_xticklabels(years)
ax.set_ylabel('Values')
ax.set_title('Average values per year')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
line = ax.axhline(y=10000, color='black', linestyle=':')
plt.connect('button_press_event', onclick)
plt.show()

How to force the plot to show the x-axis values in python

I have an issue making a simple plot while setting the x-axis in python.
Here is my code:
import import matplotlib.pyplot as plt
y = [2586.087776040828,2285.8044466570227,1991.0556336526986,1719.7261325405243,1479.8272625661773,1272.5176077500348,1096.4367842436593,949.02201512882527,826.89866676342137,726.37921828890637,636.07392349697909,553.52559247838076,480.71257022562935,418.00424110010181,364.41801903538288,318.67575156686001,280.17668207838426,248.15399589447813,221.75070551820284,199.59983992701842,179.72014852370447,162.27141772637697,147.14507926321306,134.22828323366301,123.36572367962557,114.33589702168332,106.8825327470323,100.69181027167537,95.515144406404971,91.091036326792434]
x = range(0,30)
fig3_4 ,ax3_4 = plt.subplots()
ax3_4.semilogx(range(0,30),(loss_ave_hist))
ax3_4.set_title('Validation Performance')
# ax3_4.set_xticks(np.arange(0,30, 1.0))
ax3_4.set_xlabel('i')
ax3_4.set_ylabel('Average Loss')
fig3_4.show()
plt.show()
I believe my code is right! Notice the line of code I commented, it should set
the axis with the values I want, however, it throws an error. I cannot figure
out why!
Here is my plot from the my plot:
Here is a way to make a semilogx plot but with xticks labelled according to their original (non-log) values.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
y = np.array([2586.087776040828, 2285.8044466570227, 1991.0556336526986, 1719.7261325405243, 1479.8272625661773, 1272.5176077500348, 1096.4367842436593, 949.02201512882527, 826.89866676342137, 726.37921828890637, 636.07392349697909, 553.52559247838076, 480.71257022562935, 418.00424110010181, 364.41801903538288, 318.67575156686001, 280.17668207838426, 248.15399589447813, 221.75070551820284, 199.59983992701842, 179.72014852370447, 162.27141772637697, 147.14507926321306, 134.22828323366301, 123.36572367962557, 114.33589702168332, 106.8825327470323, 100.69181027167537, 95.515144406404971, 91.091036326792434])
x = np.arange(1, len(y)+1)
fig, ax = plt.subplots()
ax.plot(x, y, 'o-')
ax.set_xlim(x.min(), x.max())
ax.set_xscale('log')
formatter = mticker.ScalarFormatter()
ax.xaxis.set_major_formatter(formatter)
ax.xaxis.set_major_locator(mticker.FixedLocator(np.arange(0, x.max()+1, 5)))
plt.show()
yields
FixedLocator(np.arange(0, x.max()+1, 5))) places a tick mark at every 5th value in x.
With ax.xaxis.set_major_locator(mticker.FixedLocator(x)), the xticklabels got a bit too crowded.
Note I changed x = range(0, 30) to x = np.arange(1, len(y)+1) since
the length of x should match the length of y and since we are using a logarithmic x-axis, it does not make sense to start at x=0.
Notice also that in your original code the first y value (2586.08...) is missing since its associated x value, 0, is off-the-chart on a logarithmic scale.
I used the following and it ran without errors.
All I changed is the typo in your first line of your imports, and replaced loss_ave_hist with y (i.e. what you called your data in your question.
y = [2586.087776040828,2285.8044466570227,1991.0556336526986,1719.7261325405243,1479.8272625661773,1272.5176077500348,1096.4367842436593,949.02201512882527,826.89866676342137,726.37921828890637,636.07392349697909,553.52559247838076,480.71257022562935,418.00424110010181,364.41801903538288,318.67575156686001,280.17668207838426,248.15399589447813,221.75070551820284,199.59983992701842,179.72014852370447,162.27141772637697,147.14507926321306,134.22828323366301,123.36572367962557,114.33589702168332,106.8825327470323,100.69181027167537,95.515144406404971,91.091036326792434]
import matplotlib.pyplot as plt
fig3_4 ,ax3_4 = plt.subplots()
x = range(0,30)
ax3_4.semilogx(range(0,30),(y))
ax3_4.set_title('Validation Performance')
# ax3_4.set_xticks(np.arange(0,30, 1.0))
ax3_4.set_xlabel('i')
ax3_4.set_ylabel('Average Loss')
plt.show()
UPDATE: I understand you want to label the x-axis with values from 0..29, but on a log scale, all those numbers are very close.
Here is an image with xticks set (I din't get any errors):
fig3_4 ,ax3_4 = plt.subplots()
x = range(0,30)
ax3_4.semilogx(range(0,30),(y))
ax3_4.set_title('Validation Performance')
ax3_4.set_xticks(np.arange(0,30, 1.0))
ax3_4.set_xlabel('i')
ax3_4.set_ylabel('Average Loss')
plt.show()
Here is an image where I replace semilogx with semilogy.
fig3_4 ,ax3_4 = plt.subplots()
x = range(0,30)
ax3_4.semilogy(range(0,30),(y))
ax3_4.set_title('Validation Performance')
ax3_4.set_xticks(np.arange(0,30, 1.0))
ax3_4.set_xlabel('i')
ax3_4.set_ylabel('Average Loss')
plt.show()
Does any of this resemble your goal?

Categories