Plotting histrogram from numpy array

Plotting histrogram from numpy array - python

I need to create histograms from the 2D arrays that I obtain from convolving an input array and a filter. The bins should as the range of the values in the array.
I tried following this example: How does numpy.histogram() work?
The code is this:
import matplotlib.pyplot as plt
import numpy as np
plt.hist(result, bins = (np.min(result), np.max(result),1))
plt.show()
I always get this error message:
AttributeError: bins must increase monotonically.
Thanks for any help.

What you are actually doing is specifying three bins where the first bin is np.min(result), second bin is np.max(result) and third bin is 1. What you need to do is provide where you want the bins to be located in the histogram, and this must be in increasing order. My guess is that you want to choose bins from np.min(result) to np.max(result). The 1 seems a bit odd, but I'm going to ignore it. Also, you want to plot a 1D histogram of values, yet your input is 2D. If you'd like to plot the distribution of your data over all unique values in 1D, you'll need to unravel your 2D array when using np.histogram. Use np.ravel for that.
Now, I'd like to refer you to np.linspace. You can specify a minimum and maximum value and as many points as you want in between uniformly spaced. So:
bins = np.linspace(start, stop)
The default number of points in between start and stop is 50, but you can override this:
bins = np.linspace(start, stop, num=100)
This means that we generate 100 points in between start and stop.
As such, try doing this:
import matplotlib.pyplot as plt
import numpy as np
num_bins = 100 # <-- Change here - Specify total number of bins for histogram
plt.hist(result.ravel(), bins=np.linspace(np.min(result), np.max(result), num=num_bins)) #<-- Change here. Note the use of ravel.
plt.show()

Related

How can I produce multiple plots on one graph where each plot has a different color? Can I set a colormap to an array of scalar variables?

I have a series of simple mass-radius relationships (so a 2d plot) that I'd like to include in one plot according to how well of a fit it is to my data. I have the radii (x), masses (y), and a separate 1d array that quantifies how well the M-R relationship fits to my data. This 1d array can be likened to error, but it isn't calculated using a standard Python function (I calculate it myself).
Ideally, my end result is a series of ~2000 mass-radius relationships on one plot, where each mass-radius relationship is color coded according to its agreement with my data. So something like this, but instead of two colors, it's on a grayscale:
Here's a snippet of what I'm trying to do but obviously isn't working, as I didn't even define a colormap:
for i in range(10):
plt.plot(x,y,c=error[i])
plt.colorbar()
plt.show()
And again, I'd like to have each element in error correspond to a color in greyscale.
I know this is simple so I'm definitely outing myself as an amateur here, but I really appreciate any help!
EDIT: Here is the code snippet where I made the plot:
for i in range(2396):
if eps[i]==0.:
plt.plot(f[i,:,1],f[i,:,0],c='g',linewidth=0.1)
else:
plt.plot(f[i,:,1],f[i,:,0],c='r',linewidth=0.1)
plt.xlabel('Radius')
plt.ylabel('Mass')
plt.title('Neutron Star Mass-Radius Relationships')

You have one fit value for each series of points:
Here is a script to plot multiple series on a single plot, where each series (i.e. each line) is colored based on a third fit variable:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
fit = np.random.rand(25)
cmap = mpl.cm.get_cmap('binary')
color_gradients = cmap(fit) # this line changed! it was incorrect before
fig, (ax1,ax2) = plt.subplots(1,2, gridspec_kw={'width_ratios': [30, 1]})
for i,_ in enumerate(fit):
x = sorted(np.random.randint(100, size=25))
y = sorted(np.random.randint(100, size=25))
ax1.plot(x, y, c=color_gradients[i])
cb = mpl.colorbar.ColorbarBase(ax2, cmap=cmap,
orientation='vertical',
ticks=[0,1])
Now responding to your questions from the comments:
How does fit play into the rest of the plot?
fit is an array of random decimals between 0 and 1, corresponding to the "error" values for each series:
>>>fit
array([0.76458568, 0.15017328, 0.70686393, 0.98885091, 0.18449953,
0.62506401, 0.49513702, 0.69138913, 0.96844495, 0.48937011,
0.09878352, 0.68965829, 0.13524182, 0.95419698, 0.39844843,
0.63095159, 0.95933663, 0.00693236, 0.98212815, 0.16262205,
0.26274884, 0.56880703, 0.68233984, 0.18304883, 0.66759496])
fit is used to generate the divisions of the color gradient in these lines:
cmap = mpl.cm.get_cmap('binary')
color_gradients = cmap(fit)
I'm not sure where the specific documentation for this is, but basically, passing an array of numbers to the cmap will return an array of RGBA color values spaced accordingly to the array passed:
>>>color_gradients
array([[0.23529412, 0.23529412, 0.23529412, 1. ],
[0.85098039, 0.85098039, 0.85098039, 1. ],
[0.29411765, 0.29411765, 0.29411765, 1. ],
[0.00784314, 0.00784314, 0.00784314, 1. ],
.
.
.
So this array can be used to assign specific colors to each line, based on their fit. And it assumes the higher numbers are better fits, and that you want better fits to be colored darker.
Note that before I had color_gradient_divisions = [(1/len(fit))*i for i in range(len(fit))], which was incorrect as it evenly divides the color map into 25 pieces, not actually returning values corresponding to the fit.
The cmap is also passed to the colorbar when constructing it. Often you can just call plt.colorbar to simply create one, but here matplotlib doesn't automatically know what to create a color bar for as the lines are separate and manually colored. So instead, we create 2 axes, one for the plot and one for the colorbar (spacing them accordingly with the gridspec_kw argument), and then using mpl.colorbar.ColorbarBase to make the colorbar (I also removed a norm argument b/c I don't think it is needed).
why have you used an underscore in the for loop?
This is a pattern in Python, typically meaning "I'm not using this thing". enumerate returns an iterator of tuples with the structure (value index, value). So enumerate(fit) returns (0, 0.76458568), (1, 0.15017328), etc (based on the data shown above). I am only using the index (i) to get the corresponding position (and color) in color_gradients (ax1.plot(x, y, c=color_gradients[i])). Even though the values from fit are being returned by enumerate, I am not using them, so I instead point them to _. If I was using them within the loop, I would use a typical variable name instead.
enumerate is the encouraged way to loop over an iterable if you need to access both the count of the values and the values themselves. People tend to use for i in range(len(fit)) also to do this (which works fine!) but the further I've gone with Python the more I've seen people avoiding that.
This was a little bit of a confusing example; I set my loop to iterate over fit b/c I was conceptualizing "creating one graph for each value in fit". But I could have just looped over color_gradients (for c in color_gradients) which might be more clear.
But in your real data, something like enumerate may be helpful if you are looping over multiple aligned arrays. In my example, I just create new random data within each loop. But you will likely want to have an array of fit values, an array of color values, an array (of series) of radii, and an array (of series) of masses, such that the ith element of each array corresponds to the same star. You may be iterating over one array and want to access the same position in another (zip is used for this also).
I'll leave this second answer here, even though it wasn't what OP was getting at:
You have one fit value for each point:
Here, each pair of x,y coordinates has its own fit value:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.randint(100, size=25)
y = np.random.randint(100, size=25)
fit = np.random.rand(25)
plt.scatter(x, y, c=fit, cmap='binary')
plt.colorbar()
Note that with either approach, poorly fitting points or lines may be invisible

Is there a way to convert scalar values in an array to matplotlib colormap indices?

I have an array that consists of a bunch of floats (e.g. [1202.21, -124.4, 23, ....]) that I've plotted with matplotlib using the colormap jet. Is there any way to get the indices of the jet scale (i.e. a single value 0-255) for each float in my array? I want to display some stats about the data but it only will make sense if the stats (mean, standard deviation, etc.) are within the 0-255 range.
I've tried returning the array used by matplotlib using get_array() but that doesn't seem to change the data.
Thanks!

numpy.digitize gives you the bin number for the data when put into bins. Here you have 256 bins and the last bin is closed. Hence,
import numpy as np
a = np.array([1,2,3])
N = 256
bins = np.linspace(a.min(), a.max(), N+1)
dig = np.digitize(a, bins)-1
dig[dig == N] = N-1 # map the last half-open interval back
print(dig)
Now verify that those are indeed the indices of the colormap:
import matplotlib.pyplot as plt
cmap = plt.cm.jet
norm = plt.Normalize(a.min(), a.max())
colors1 = cmap(norm(a))
colors2 = cmap(dig)
assert(np.all(colors1 == colors2))

Bin counting with hex-like bins in 2D

I use numpy's historgram 2d
to count how many (training) data points lie in one each. For a new point (x,y) I can then query how may points are in the same bin as (x,y):
Is there something similar for "hex" bins like in the matplotlib plots
where I can fill the bins and then later query how may point are in each bin?

You can get the bin data, but it's not as simple as doing the same operation on a rectangular grid. The reason is that hex bins do not lend themselves to straightforward two-dimensional indexing. The function hexbin(), returns a PolyCollection which has the bin locations accessible through get_offsets() and bin values accessible through get_array(). So:
import matplotlib.pyplot as plt
hb = plt.hexbin(...)
bin_xy = hb.get_offsets()
bin_counts = hb.get_array()

Plotting a curve from numpy array with large values

I am trying to plot a curve from molecular dynamics potential energies data stored in numpy array. As you can see from my figure attached, on the top left of the figure, a large number appears which is related to the label on y-axis. Look at it.
Even if I rescale the data, still a number appears there. I do not want it. Please can you suggest me howto sort out this issue? Thank you very much..

This is likely happening because your data is a small value offset by a large one. That's what the - sign means at the front of the number, "take the plotted y-values and subtract this number to get the actual values". You can remove it by plotting with the mean subtracted. Here's an example:
import numpy as np
import matplotlib.pyplot as plt
y = -1.5*1e7 + np.random.random(100)
plt.plot(y)
plt.ylabel("units")
gives the form you don't like:
but subtracting the mean (or some other number close to that, like min or max, etc) will remove the large offset:
plt.figure()
plt.plot(y - np.mean(y))
plt.ylabel("offset units")
plt.show()

You can remove the offset by using:
plt.ticklabel_format(useOffset=False)

It seems your data is displayed in exponential form like: 1e+10, 2e+10, etc.
This question here might help:
How to prevent numbers being changed to exponential form in Python matplotlib figure

How to fit powerlaw to a histogram with matplotlib

I am trying to fit a power law to a histogram (more exact Pareto distribution). I did it with my own function, where I check for smallest sum of squares of difference. But this means I need to loop threw all the coefitients, which can take some time. Another problem is that I need to make my own data list so that I have histogram data.
So I am looking for a function that would return a list of data made by matplotlib.pyplot.hist() and not just a picture and than I would like to fit this data with pareto distribution abit faster than looping so many times and obtain the coefitions.

I think you are looking for the values and the bin sizes.
The matplotlib.pyplot.hist() function returns a tupe with (n, bins, patches)
For more information about this function click this link
For example to plot some 'data', 150 bins:
import matplotlib.pyplot as plt
hist = plt.hist(data,150)
binsize = hist[0]
value = hist[1]
print binsize
print ''
print value

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting histrogram from numpy array - python

Related

How can I produce multiple plots on one graph where each plot has a different color? Can I set a colormap to an array of scalar variables?

Is there a way to convert scalar values in an array to matplotlib colormap indices?

Bin counting with hex-like bins in 2D

Plotting a curve from numpy array with large values

How to fit powerlaw to a histogram with matplotlib

Categories

Resources