Python3 (Anaconda3) and Matplotlib plots specified x-ases values - python

My question is about a readability issue with a plot. I read several similar questions on StackOverflow but none of them solved completely the problem.
I have a txt file with 100 abscissa and ordinate values.
I want to plot them but on the x-axes, I want to be shown only specified tick values.
E.g: the 1st,2nd,3rd,4th,5th,44th,88th, and the 99th point. It is only something that I want for better readability because I want to plot all the points anyway.
What I tried is:
import matplotlib.pyplot as plt
import numpy as np
plt.xlabel("Values")
plt.ylabel("Percentage")
for i in range(99):
try:
filename = "Folder_Name/foo_%d.txt" % i
filevals = np.loadtxt(filename, usecols=1)
idx = [1, 2, 3, 4, 5, 44, 88, 99]
y = [filevals[k]*100 for k in idx]
plt.plot(range(len(idx)), y, 'o-', label="values_foo_%s" % i)
plt.xticks(range(len(idx)), idx)
except IOError or IndexError:
break
plt.legend(loc=4)
plt.grid(True)
plt.tight_layout()
plt.savefig("plot_test.pdf")
plt.close()
As a result, of course, the graph obtained plots only that values ignoring the other points and, as a consequence, the distance between the 5th and the 44th points is the same as it is between the 4th and the 5th.

Just write: plt.xtick([0, 1, 2, 3, 4, 43, 87, 98])
Also, don't forget that a list index begins with 0.

Related

Automatically find and add the coordinates to add Annotations (e.g. count) on a Boxplot made from a Dictionary of uneven Lists

I'm pretty new in programming world and I'm really frustrated to solve a problem which I thought should be really easy...
Case: Let's say I have a Dictionary with uneven Lists; Also the number of Keys(string) & Values(number) could change anytime.
Need: I want to annotate (add text or whatever) some Information (e.g. count) to each Subplots or Categories (each Key is an individual Category).
Problem: I found many solutions for evenly numbered Categories, which apparently doesn't work for me. e.g. Solution
I also found some Answers e.g. Solution , that I should first get the Coordinates of each Keys in the x-line and then do a inverted transformation to work with the "log scales". Which was so far the best solution for me, but unfortunately it does not really fit the Coordinates and I couldn't get & add the points automatically before using plt.show().
I could also guess the coordinates with trial error in the Transformation Method or with Offset e.g. Solution. But as I said, my Dictionary could change anytime, and then I should do it again every time!
I think there should be much more simpler method to solve this problem, but I couldn't find it.
Here is the simplified example of my Code and what I tried:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
AnnotationBbox)
dictionary = {}
dictionary["a"] = [1, 2, 3, 4, 5]
dictionary["b"] = [1, 2, 3, 4, 5, 6, 7]
fig, ax = plt.subplots()
ax.boxplot(dictionary.values())
x = ax.set_xticklabels(dictionary.keys())
fig.text(x = 0.25, y = 0, s = str(len(dictionary["a"])))
fig.text(x = 0.75, y = 0, s = str(len(dictionary["b"])))
plt.show()
crd = np.vstack((ax.get_xticks(), np.zeros_like(ax.get_xticks()))).T
ticks = ax.transAxes.inverted().transform(ax.transData.transform(crd))
print(ticks[:,0])
# ab = AnnotationBbox(TextArea("text"), xy=(1, 0), xybox =(0, -30), boxcoords="offset points",pad=0,frameon=False )
# ax.add_artist(ab)
Output of my code
as i understand you may want something like this:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
AnnotationBbox)
dictionary = {}
dictionary["a"] = [1, 2, 3, 4, 5]
dictionary["b"] = [1, 2, 3, 4, 5, 6, 7]
dictionary["cex"] = [1, 2, 3]
fig, ax = plt.subplots()
ax.boxplot(dictionary.values())
x = ax.set_xticklabels(dictionary.keys())
ticksList=ax.get_xticks()
print (ticksList)
for x in ticksList:
ax.text(x, 0,str(len(list(dictionary.values())[x-1])),fontdict={'horizontalalignment': 'center'})
fig.show()

How to plot a histogram using numpy.histogram output? [duplicate]

I'd like to use Matplotlib to plot a histogram over data that's been pre-counted. For example, say I have the raw data
data = [1, 2, 2, 3, 4, 5, 5, 5, 5, 6, 10]
Given this data, I can use
pylab.hist(data, bins=[...])
to plot a histogram.
In my case, the data has been pre-counted and is represented as a dictionary:
counted_data = {1: 1, 2: 2, 3: 1, 4: 1, 5: 4, 6: 1, 10: 1}
Ideally, I'd like to pass this pre-counted data to a histogram function that lets me control the bin widths, plot range, etc, as if I had passed it the raw data. As a workaround, I'm expanding my counts into the raw data:
data = list(chain.from_iterable(repeat(value, count)
for (value, count) in counted_data.iteritems()))
This is inefficient when counted_data contains counts for millions of data points.
Is there an easier way to use Matplotlib to produce a histogram from my pre-counted data?
Alternatively, if it's easiest to just bar-plot data that's been pre-binned, is there a convenience method to "roll-up" my per-item counts into binned counts?
You can use the weights keyword argument to np.histgram (which plt.hist calls underneath)
val, weight = zip(*[(k, v) for k,v in counted_data.items()])
plt.hist(val, weights=weight)
Assuming you only have integers as the keys, you can also use bar directly:
min_bin = np.min(counted_data.keys())
max_bin = np.max(counted_data.keys())
bins = np.arange(min_bin, max_bin + 1)
vals = np.zeros(max_bin - min_bin + 1)
for k,v in counted_data.items():
vals[k - min_bin] = v
plt.bar(bins, vals, ...)
where ... is what ever arguments you want to pass to bar (doc)
If you want to re-bin your data see Histogram with separate list denoting frequency
I used pyplot.hist's weights option to weight each key by its value, producing the histogram that I wanted:
pylab.hist(counted_data.keys(), weights=counted_data.values(), bins=range(50))
This allows me to rely on hist to re-bin my data.
You can also use seaborn to plot the histogram :
import seaborn as sns
sns.distplot(
list(
counted_data.keys()
),
hist_kws={
"weights": list(counted_data.values())
}
)
the length of the "bins" array should be longer than the length of "counts". Here's the way to fully reconstruct the histogram:
import numpy as np
import matplotlib.pyplot as plt
bins = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).astype(float)
counts = np.array([5, 3, 4, 5, 6, 1, 3, 7]).astype(float)
centroids = (bins[1:] + bins[:-1]) / 2
counts_, bins_, _ = plt.hist(centroids, bins=len(counts),
weights=counts, range=(min(bins), max(bins)))
plt.show()
assert np.allclose(bins_, bins)
assert np.allclose(counts_, counts)
Adding to tacaswell's comment, plt.bar can be much more efficient than plt.hist here for large numbers of bins (>1e4). Especially for a crowded random plot where you only need plot the highest bars because the width required to see them will cover most of their neighbors anyway. You can pick out the highest bars and plot them with
i, = np.where(vals > min_height)
plt.bar(i,vals[i],width=len(bins)//50)
Other statistical trends may prefer to instead plot every 100th bar or something similar.
The trick here is that plt.hist wants to plot all of your bins whereas plt.bar will let you just plot the sparser set of visible bins.
hist uses bar under the hood, this will produce something similar to what hist creates (assumes bins of equal size):
bins = [1,2,3]
heights = [10,20,30]
ax = plt.gca()
ax.bar(bins, heights, align='center', width=bins[-1] - bins[-2])

Pyplot/Matplotlib: Binary data with strings on x-axis

I know it's such a basic thing, but due to ridiculous time constraints and the severity of the situation I'm forced to ask something like this:
I've got two arrays of 160 000 entries. One contains strings(names I need to use), the other contains corresponding 1's and 0's.
I'm trying to make a simple "step" graph in pyplot with the array of names along the X-axis and 0 and 1 along the Y-axis.
I have this currently:
import numpy as np
import matplotlib.pyplot as plt
data = [1, 2, 4, 5, 9]
bindata = [0,1,1,0,1,1,0,0,0,1]
xaxis = np.arange(0, data[-1] + 1)
yaxis = np.array(bindata)
plt.step(xaxis, yaxis)
plt.xlabel('Filter Degree Combinations')
plt.ylabel('Negative Or Positive')
plt.title("Car 1")
#plt.savefig('foo.png') #For saving
plt.show()
It gives me this:
But I want something like this:
I cobbled the code together from some examples, tutorials and stackoverflow questions, but I run into "ValueError: x and y must have same first dimension" so often that I'm not getting anywhere when I try to experiment my way forward.
You can achieve the desired plot by specifying the tick labels and their positions on the x-axis using plt.xticks. The first argument range(0, 10, 2) is the positions followed by the strings
import numpy as np
import matplotlib.pyplot as plt
data = [1, 2, 4, 5, 9]
bindata = [0,1,1,0,1,1,0,0,0,1]
xaxis = np.arange(0, data[-1] + 1)
yaxis = np.array(bindata)
plt.step(xaxis, yaxis)
xlabels = ['Josh', 'Anna', 'Kevin', 'Sophie', 'Steve'] # <-- specify tick-labels
plt.xlabel('Filter Degree Combinations')
plt.ylabel('Negative Or Positive')
plt.title("Car 1")
plt.xticks(range(0, 10, 2), xlabels) # <-- assign tick-labels
plt.show()

plotting/marking seleted points from a 1D array

this seems a simple question but I have tried it for a really long time.
I got a 1d array data(named 'hightemp_unlocked', after I found the peaks(an array of location where the peaks are located) of it, I wanted to mark the peaks on the plot.
import matplotlib
from matplotlib import pyplot as plt
.......
plt.plot([x for x in range(len(hightemp_unlocked))],hightemp_unlocked,label='200 mk db ramp')
plt.scatter(peaks, hightemp_unlocked[x in peaks], marker='x', color='y', s=40)
for some reason, it keeps telling me that x, y must be the same size
it shows:
File "period.py", line 86, in <module>
plt.scatter(peaks, hightemp_unlocked[x in peaks], marker='x', color='y', s=40)
File "/usr/local/lib/python2.6/dist-packages/matplotlib/pyplot.py", line 2548, in scatter
ret = ax.scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, faceted, verts, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/matplotlib/axes.py", line 5738, in scatter
raise ValueError("x and y must be the same size")
I don't think hightemp_unlocked[x in peaks] is what you want. Here x in peaks reads as the conditional statement "is x in peaks?" and will return True or False depending on what was last stored in x. When parsing hightemp_unlocked[x in peaks], True or False is interpreted as 0 or 1, which returns only the first or second element of hightemp_unlocked. This explains the array size error.
If peaks is an array of indexes, then simply hightemp_unlocked[peaks] will return the corresponding values.
You are almost on the right track, but hightemp_unlocked[x in peaks] is not what you are looking for. How about something like:
from matplotlib import pyplot as plt
# dummy temperatures
temps = [10, 11, 14, 12, 10, 8, 5, 7, 10, 12, 15, 13, 12, 11, 10]
# list of x-values for plotting
xvals = list(range(len(temps)))
# say our peaks are at indices 2 and 10 (temps of 14 and 15)
peak_idx = [2, 10]
# make a new list of just the peak temp values
peak_temps = [temps[i] for i in peak_idx]
# repeat for x-values
peak_xvals = [xvals[i] for i in peak_idx]
# now we can plot the temps
plt.plot(xvals, temps)
# and add the scatter points for the peak values
plt.scatter(peak_xvals, peak_temps)

Plotting a histogram from pre-counted data in Matplotlib

I'd like to use Matplotlib to plot a histogram over data that's been pre-counted. For example, say I have the raw data
data = [1, 2, 2, 3, 4, 5, 5, 5, 5, 6, 10]
Given this data, I can use
pylab.hist(data, bins=[...])
to plot a histogram.
In my case, the data has been pre-counted and is represented as a dictionary:
counted_data = {1: 1, 2: 2, 3: 1, 4: 1, 5: 4, 6: 1, 10: 1}
Ideally, I'd like to pass this pre-counted data to a histogram function that lets me control the bin widths, plot range, etc, as if I had passed it the raw data. As a workaround, I'm expanding my counts into the raw data:
data = list(chain.from_iterable(repeat(value, count)
for (value, count) in counted_data.iteritems()))
This is inefficient when counted_data contains counts for millions of data points.
Is there an easier way to use Matplotlib to produce a histogram from my pre-counted data?
Alternatively, if it's easiest to just bar-plot data that's been pre-binned, is there a convenience method to "roll-up" my per-item counts into binned counts?
You can use the weights keyword argument to np.histgram (which plt.hist calls underneath)
val, weight = zip(*[(k, v) for k,v in counted_data.items()])
plt.hist(val, weights=weight)
Assuming you only have integers as the keys, you can also use bar directly:
min_bin = np.min(counted_data.keys())
max_bin = np.max(counted_data.keys())
bins = np.arange(min_bin, max_bin + 1)
vals = np.zeros(max_bin - min_bin + 1)
for k,v in counted_data.items():
vals[k - min_bin] = v
plt.bar(bins, vals, ...)
where ... is what ever arguments you want to pass to bar (doc)
If you want to re-bin your data see Histogram with separate list denoting frequency
I used pyplot.hist's weights option to weight each key by its value, producing the histogram that I wanted:
pylab.hist(counted_data.keys(), weights=counted_data.values(), bins=range(50))
This allows me to rely on hist to re-bin my data.
You can also use seaborn to plot the histogram :
import seaborn as sns
sns.distplot(
list(
counted_data.keys()
),
hist_kws={
"weights": list(counted_data.values())
}
)
the length of the "bins" array should be longer than the length of "counts". Here's the way to fully reconstruct the histogram:
import numpy as np
import matplotlib.pyplot as plt
bins = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).astype(float)
counts = np.array([5, 3, 4, 5, 6, 1, 3, 7]).astype(float)
centroids = (bins[1:] + bins[:-1]) / 2
counts_, bins_, _ = plt.hist(centroids, bins=len(counts),
weights=counts, range=(min(bins), max(bins)))
plt.show()
assert np.allclose(bins_, bins)
assert np.allclose(counts_, counts)
Adding to tacaswell's comment, plt.bar can be much more efficient than plt.hist here for large numbers of bins (>1e4). Especially for a crowded random plot where you only need plot the highest bars because the width required to see them will cover most of their neighbors anyway. You can pick out the highest bars and plot them with
i, = np.where(vals > min_height)
plt.bar(i,vals[i],width=len(bins)//50)
Other statistical trends may prefer to instead plot every 100th bar or something similar.
The trick here is that plt.hist wants to plot all of your bins whereas plt.bar will let you just plot the sparser set of visible bins.
hist uses bar under the hood, this will produce something similar to what hist creates (assumes bins of equal size):
bins = [1,2,3]
heights = [10,20,30]
ax = plt.gca()
ax.bar(bins, heights, align='center', width=bins[-1] - bins[-2])

Categories