Making a Histogramm from a nested List in Python - python

I have the following list and I like to make a histogramm out of that data but i dont know how to do it.
finished = [('https', 38), ('on', 33), ('with', 32), ('model', 28), ('com', 26), ('evaluation', 19), ('detection', 19), ('br', 18), ('models', 18), ("href='g3doc", 17), ('trained', 17)]
I have tried the following:
import matplotlib.pyplot as plt
z=0
for i in finished:
plt.hist(finished[z], bins = range(38))
z=z+1
plt.show()
Im always confused regarding the labels and the values.
Thank you and have a nice day

I would use a bar chart like so:
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
finished = [('https', 38), ('on', 33), ('with', 32), ('model', 28), ('com', 26), ('evaluation', 19), ('detection', 19), ('br', 18), ('models', 18), ("href='g3doc", 17), ('trained', 17)]
names = list(f[0] for f in finished)
values = list(f[1] for f in finished)
y_pos = np.arange(len(finished))
plt.figure(figsize=(20,10))
plt.bar(y_pos, values, align='center', alpha=0.5)
plt.xticks(y_pos, names)
plt.ylabel('Values')
plt.title('Word usage')
plt.show()
You may be better off with a different format for your data. But this works with your sample data.

As suggested in the comments, you want to create a bar chart
import pandas as pd
import matplotlib.pyplot as plt
finished = [('https', 38), ('on', 33), ('with', 32), ('model', 28), ('com', 26), ('evaluation', 19), ('detection', 19), ('br', 18), ('models', 18), ("href='g3doc", 17), ('trained', 17)]
df = pd.DataFrame(finished)
ax = df.plot(kind="bar")
ax.set_xticklabels(list(df.iloc[:,0].values))
plt.xticks(rotation=90)

Related

ValueError: Shape of passed values is (14, 14), indices imply (15, 15)

I'm getting the following error when trying to run the code below
ValueError: Shape of passed values is (14, 14), indices imply (15, 15)
def plot_confusion_matrix(cm):
cm = cm[::-1]
cm = pd.DataFrame(cm, columns=classes, index=classes[::-1])
fig = ff.create_annotated_heatmap(z=cm.values, x=list(cm.columns), y=list(cm.index), colorscale='ice', showscale=True, reversescale=True)
fig.update_layout(width=500, height=500, title='Confusion Matrix', font_size=16)
fig.show()
import plotly.figure_factory as ff
plot_confusion_matrix(cm)
Does anyone know how this can be resolved?

Matplotlib plot function creates two lines from one data list, or a triangle, but no normal x y graph [duplicate]

I have the following data set. I would like to use Python or Gnuplot to plot the data. The tuples are of the form (x, y). The Y-axis should be a log axis, that is, log(y). A scatter plot or line plot would be ideal.
How can this be done?
[(0, 6.0705199999997801e-08), (1, 2.1015700100300739e-08),
(2, 7.6280656623374823e-09), (3, 5.7348209304555086e-09),
(4, 3.6812203579604238e-09), (5, 4.1572516753310418e-09)]
If I get your question correctly, you could do something like this.
>>> import matplotlib.pyplot as plt
>>> testList =[(0, 6.0705199999997801e-08), (1, 2.1015700100300739e-08),
(2, 7.6280656623374823e-09), (3, 5.7348209304555086e-09),
(4, 3.6812203579604238e-09), (5, 4.1572516753310418e-09)]
>>> from math import log
>>> testList2 = [(elem1, log(elem2)) for elem1, elem2 in testList]
>>> testList2
[(0, -16.617236475334405), (1, -17.67799605473062), (2, -18.691431541177973), (3, -18.9767093108359), (4, -19.420021520728017), (5, -19.298411635970396)]
>>> zip(*testList2)
[(0, 1, 2, 3, 4, 5), (-16.617236475334405, -17.67799605473062, -18.691431541177973, -18.9767093108359, -19.420021520728017, -19.298411635970396)]
>>> plt.scatter(*zip(*testList2))
>>> plt.show()
which would give you something like
Or as a line plot,
>>> plt.plot(*zip(*testList2))
>>> plt.show()
EDIT - If you want to add a title and labels for the axis, you could do something like
>>> plt.scatter(*zip(*testList2))
>>> plt.title('Random Figure')
>>> plt.xlabel('X-Axis')
>>> plt.ylabel('Y-Axis')
>>> plt.show()
which would give you
In matplotlib it would be:
import matplotlib.pyplot as plt
data = [(0, 6.0705199999997801e-08), (1, 2.1015700100300739e-08),
(2, 7.6280656623374823e-09), (3, 5.7348209304555086e-09),
(4, 3.6812203579604238e-09), (5, 4.1572516753310418e-09)]
x_val = [x[0] for x in data]
y_val = [x[1] for x in data]
print x_val
plt.plot(x_val,y_val)
plt.plot(x_val,y_val,'or')
plt.show()
which would produce:
As others have answered, scatter() or plot() will generate the plot you want. I suggest two refinements to answers that are already here:
Use numpy to create the x-coordinate list and y-coordinate list. Working with large data sets is faster in numpy than using the iteration in Python suggested in other answers.
Use pyplot to apply the logarithmic scale rather than operating directly on the data, unless you actually want to have the logs.
import matplotlib.pyplot as plt
import numpy as np
data = [(2, 10), (3, 100), (4, 1000), (5, 100000)]
data_in_array = np.array(data)
'''
That looks like array([[ 2, 10],
[ 3, 100],
[ 4, 1000],
[ 5, 100000]])
'''
transposed = data_in_array.T
'''
That looks like array([[ 2, 3, 4, 5],
[ 10, 100, 1000, 100000]])
'''
x, y = transposed
# Here is the OO method
# You could also the state-based methods of pyplot
fig, ax = plt.subplots(1,1) # gets a handle for the AxesSubplot object
ax.plot(x, y, 'ro')
ax.plot(x, y, 'b-')
ax.set_yscale('log')
fig.show()
I've also used ax.set_xlim(1, 6) and ax.set_ylim(.1, 1e6) to make it pretty.
I've used the object-oriented interface to matplotlib. Because it offers greater flexibility and explicit clarity by using names of the objects created, the OO interface is preferred over the interactive state-based interface.
You could also use zip
import matplotlib.pyplot as plt
l = [(0, 6.0705199999997801e-08), (1, 2.1015700100300739e-08),
(2, 7.6280656623374823e-09), (3, 5.7348209304555086e-09),
(4, 3.6812203579604238e-09), (5, 4.1572516753310418e-09)]
x, y = zip(*l)
plt.plot(x, y)
With gnuplot using gplot.py
from gplot import *
l = [(0, 6.0705199999997801e-08), (1, 2.1015700100300739e-08),
(2, 7.6280656623374823e-09), (3, 5.7348209304555086e-09),
(4, 3.6812203579604238e-09), (5, 4.1572516753310418e-09)]
gplot.log('y')
gplot(*zip(*l))

Matplotlib path contains_point

I've just discovered the matplotlib path functionality and I'm using it with path.contains_point to check whether points are found within a region defined by 2 bezier curves.
I'm getting some unexpected behaviour where contains_point is returning True when I would have expected it to return False. Specifically, if the point to be tested is to the left of the region then it seems to be incorrect. On the right is ok.
Defining my paths as a number of straight lines rather than curves seems to work as expected.
A failing test case is as follows:
import matplotlib
import matplotlib.path as mplPath
import matplotlib.patches as patches
import matplotlib.pyplot as plt
import pylab
import pandas as pd
print "MPL Version {}".format(matplotlib.__version__) #1.5.0
print "MPL NP Version {}".format(matplotlib.__version__numpy__) #1.6
path_data = [
(mplPath.Path.MOVETO, (2, 10)),
(mplPath.Path.CURVE4, (0, 100)),
(mplPath.Path.CURVE4, (20, 100)),
(mplPath.Path.CURVE4, (40, 150)),
(mplPath.Path.MOVETO, (40, 150)),
(mplPath.Path.CURVE4, (42, 45)),
(mplPath.Path.CURVE4, (20, 30)),
(mplPath.Path.CURVE4, (2, 10))
]
codes, verts = zip(*path_data)
path = mplPath.Path(verts, codes)
patch = patches.PathPatch(path, facecolor='r', alpha=0.5)
#Plot the patch and a some of the test points to visualise
fig = plt.figure()
ax = fig.add_subplot(111)
ax.add_patch(patch)
ax.set_xlim(0, 50)
ax.set_ylim(0, 200)
ax.scatter(1, 50)
ax.scatter(20, 120)
ax.scatter(20, 25)
print path.contains_point((1,50)) #This should be false but is actually true
print path.contains_point((20,120)) #This should be false but is actually true
print path.contains_point((20, 25)) #This should be false and it is
plt.show()
Thanks in advance for any help you can provide. Python version is 2.7, Anaconda Distro on Linux Mint 17.3
Jim
You have an open path (extra moveto command). Once you comment it out, it works fine.
path_data = [
(mplPath.Path.MOVETO, (2, 10)),
(mplPath.Path.CURVE4, (0, 100)),
(mplPath.Path.CURVE4, (20, 100)),
(mplPath.Path.CURVE4, (40, 150)),
# (mplPath.Path.MOVETO, (40, 150)),
(mplPath.Path.CURVE4, (42, 45)),
(mplPath.Path.CURVE4, (20, 30)),
(mplPath.Path.CURVE4, (2, 10))
]

How do I show legend for different colors in bokeh scatter plot?

This is my plot.
I need to show the legend for the different colors used.
How do I do that?
My code for the plot:
def mscatter(p, x, y, c,typestr,source):
p.scatter(x, y, marker=typestr,
line_color="#6666ee", fill_color=c, fill_alpha=0.5, size=y*1.5,source = source)
p = figure(title="CGPA of 4th year students",tools=[hover])
mscatter(p, xdata, ydata, colors,"circle",source)
show(p)
here is sample bokeh documentation on adding legends
You will have to modify as you see fit
from collections import OrderedDict
from bokeh.charts import Scatter, output_file, show
# (dict, OrderedDict, lists, arrays and DataFrames of (x, y) tuples are valid inputs)
xyvalues = OrderedDict()
xyvalues['python'] = [(1, 2), (3, 3), (4, 7), (5, 5), (8, 26)]
xyvalues['pypy'] = [(1, 12), (2, 23), (4, 47), (5, 15), (8, 46)]
xyvalues['jython'] = [(1, 22), (2, 43), (4, 10), (6, 25), (8, 26)]
scatter = Scatter(xyvalues, title="Scatter", legend="top_left", ylabel='Languages')
output_file('scatter.html')
show(scatter)
the above code will result in the following picture:

why does my pyplot show no lines?

I am new to pyplot and wondering what I'm doing wrong here.
I would like to plot a series of random line segments:
Here's some example code:
import matplotlib.pyplot as plt
def testPlot():
minx = miny = -1
maxx = maxy = 30
# some points randomly generated
points = [((10, 21), (19, 22)), ((11, 9), (22, 27)), ((9, 13), (5, 9)), ((18, 4), (2, 21)), ((25, 27
for pair in points:
print pair
for point in pair: #plot each point with a small dot
x = point[0]
y = point[1]
plt.plot(x,y,'bo')
# draw a line between the pairs of points
plt.plot(pair[0][0],pair[0][1],pair[1][0],pair[1][1],color='r',linewidth=2)
plt.axis([minx,maxx,miny,maxy])
plt.show()
Here is what I get after running that, there should be lines between the points but where are the lines?
((10, 21), (19, 22))
((11, 9), (22, 27))
((9, 13), (5, 9))
((18, 4), (2, 21))
((25, 27), (11, 13))
Thank you for any clues
This is the line of problem:
...
plt.plot(pair[0][0],pair[0][1],pair[1][0],pair[1][1],color='r',linewidth=2)
...
You're trying to draw the referring to x,y,x1,y1, which in fact should be ((x, x1), (y, y1)). Correcting this seems working fine:
def testPlot():
minx = miny = -1
maxx = maxy = 30
# some points randomly generated
points = [((10, 21), (19, 22)), ((11, 9), (22, 27)), ((9, 13), (5, 9)), ((18, 4), (2, 21)), ((25, 27), (11, 13))]
for pair in points:
print pair
for point in pair: #plot each point with a small dot
x = point[0]
y = point[1]
plt.plot(x,y,'bo')
# change this line to ((x, x1), (y, y1))
plt.plot((pair[0][0],pair[1][0]),(pair[0][1],pair[1][1]), color='r',linewidth=2)
plt.axis([minx,maxx,miny,maxy])
plt.show()
Results:

Categories