Extracting data from a histogram with custom bins in Python

Extracting data from a histogram with custom bins in Python - python

I have a data set of distances between two particles, and I want to bin these data in custom bins. For example, I want to see how many distance values lay in the interval from 1 to 2 micrometers, and so on. I wrote a code about it, and it seems to work. This is my code for this part:
#Custom binning of data
bins= [0,1,2,3,4,5,6,7,8,9,10]
fig, ax = plt.subplots(n,m,figsize = (30,10)) #using this because I actually have 5 histograms, but only posted one here
ax.hist(dist_from_spacer1, bins=bins, edgecolor="k")
ax.set_xlabel('Distance from spacer 1 [µm]')
ax.set_ylabel('counts')
plt.xticks(bins)
plt.show()
However, now I wish to extract those data values from the intervals, and store them into lists. I tried to use:
np.histogram(dist_from_spacer1, bins=bins)
However, this just gives how many data points are on each bin and the bin intervals, just like this:
(array([ 0, 0, 44, 567, 481, 279, 309, 202, 117, 0]),
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]))
How can I get the exact data that belong to each histogram bin?

Yes, np.histogram calculates what you need for a histogram, and hence the specific data points are not necessary, just bins' boundaries and count for each bin. However, the bins' boundaries is sufficient to acheive what you want by using np.digitizr
counts, bins = np.histogram(dist_from_spacer1)
indices = np.digitize(dist_from_spacer1, bins)
lists = [[] for _ in range(len(bins))]
[lists[i].append(x) for i, x in zip(indices, dist_from_spacer1)
In your case, the bins' boundaries are predefined, so you can use np.digitize directly

Related

Values on the X-axis shifting toward to zero

I generated a graph in which the values on the X-axis start from 0 and go to 1000, fifty by fifty, like 0, 50, 100, 150, ..., 900, 950, 1000. However, I want to divide the values on the X-axis by 10 (I want to convert the values on the x-axis into 0, 5, 10, 15, ..., 90, 95, 100).
Index_time is 1001
index_time = len(df.index)
ax.plot(np.arange(index_time), df["SoluteHBonds"], color="blue")
ranges=(np.arange(0,index_time,50))
ax.set_xticks(ranges)
When I divide the values on the X-axis via np.true_divide(ranges, 10), all the values on the X-axis shift toward 0
On the other hand, I tried to create a list first and then divide each element by 10 but the result is still the same.
lst_range=list(range(0,int((index_time-1)/10),5))
ax.set_xticks([time/10 for time in lst_range])
What could be the problem or what is the thing that I am missing in this case?
Thanks in advance!

Is it possible to leave blank spaces in matplotlib's pcolormesh plots?

I'm displaying some data using matplotlib.pyplot.pcolormesh in python, and I want to leave blank spaces where there are missing data points.
Suppose I've collected data for x values 0 to 10, and y values 0 to 10, but not every such value. At present, I initialize my data storage array using np.zeros((11,11)), then use a for loop to change the values of that array to the data value if I have the data for that point.
That leaves me with a bunch of data plus some zeros in an array. When I plot this, it is impossible to distinguish between points that have no data and points which have data with small values.
Is it possible to have missing data points clearly distinct from non-missing data points? For example, in the code below I want the squares at (3,1), (5,7), and (8,8) colored but the rest of the squares white.
I've tried initializing my data storage array with np.empty((11,11)) and np.full((11,11),np.nan) as well, but they both produce the same output as np.zeros. Here's the code below:
import numpy as np
import matplotlib.pyplot as plt
data_storage = np.zeros((11,11))
collected_data = [[3, 1, 45.2], [5, 7, 23.9], [8, 8, 78.4]
for data in collected_data:
x_coord = data[0]
y_coord = data[1]
value = data[2]
data_storage[y_coord,x_coord] = value
all_x_values = np.linspace(0,10,11)
all_y_values = np.linspace(0,10,11)
plt.pcolormesh(all_x_values, all_y_values, data_storage)
plt.show()

One approach is to change all zeros to NaN, which would make the corresponding cells transparent.
Please note that the x and y values for pcolormesh are for the grid points, not for the centers, so you need one value more in each dimension (11 cells, 12 cell borders). This allows to create color meshes with unequal cell sizes. If you want the ticks to be nicely in the center of the cells, you can put the cell borders at the halves.
(In the code below the forloop has been written more concise).
import numpy as np
import matplotlib.pyplot as plt
data_storage = np.zeros((11, 11))
collected_data = [[3, 1, 45.2], [5, 7, 23.9], [8, 8, 78.4]]
for x_coord, y_coord, value in collected_data:
data_storage[y_coord, x_coord] = value
all_x_values = np.arange(0, 12) - 0.5
all_y_values = np.arange(0, 12) - 0.5
plt.pcolormesh(all_x_values, all_y_values, np.where(data_storage == 0, np.nan, data_storage))
plt.gca().xaxis.set_major_locator(MultipleLocator(1))
plt.gca().yaxis.set_major_locator(MultipleLocator(1))
plt.colorbar()
plt.show()
An alternative approach could be to create a colormap, set an 'under' color and set vmin to a value slightly larger than 0. Optionally, the 'under' color can be visualized in the colorbar with extend='min'.
from copy import copy
my_cmap = copy(plt.cm.get_cmap('viridis'))
my_cmap.set_under('lightgrey')
plt.pcolormesh(all_x_values, all_y_values, data_storage, cmap=my_cmap, vmin=0.000001)
plt.colorbar(extend='min', extendrect=True)

Python Matplotlib pyplot histogram

I am plotting a histogram of a fairly simple simulation. On the histogram the last two columns are merged and it looks odd. Please finde attached below the code and the plot result.
Thanks in advance!
import numpy as np
import random
import matplotlib.pyplot as plt
die = [1, 2, 3, 4, 5, 6]
N = 100000
results = []
# first round
for i in range(N):
X1 = random.choice(die)
if X1 > 4:
results.append(X1)
else:
X2 = random.choice(die)
if X2 > 3:
results.append(X2)
else:
X3 = random.choice(die)
results.append(X3)
plt.hist(results)
plt.ylabel('Count')
plt.xlabel('Result');
plt.title("Mean results: " + str(np.mean(results)))
plt.show()
The output looks like this. I dont understand why the last two columns are stuck together.
Any help appreciated!

By default, matplotlib divides the range of the input into 10 equally sized bins. All bins span a half-open interval [x1,x2), but the rightmost bin includes the end of the range. Your range is [1,6], so your bins are [1,1.5), [1.5,2), ..., [5.5,6], so all the integers end up in the first, third, etc. odd-numbered bins, but the sixes end up in the tenth (even) bin.
To fix the layout, specify the bins:
# This will give you a slightly different layout with only 6 bars
plt.hist(results, bins=die + [7])
# This will simulate your original plot more closely, with empty bins in between
plt.hist(results, bins=np.arange(2, 14)/2)
The last bit generates the number sequence 2,3,...,13 and then divides each number by 2, which gives 1, 1.5, ..., 6.5 so that the last bin spans [6,6.5].

You need to tell matplotlib that you want the histogram to match the bins.
Otherwise matplotlib choses the default value of 10 for you - in this case, that doesn't round well.
# ... your code ...
plt.hist(results, bins=die) # or bins = 6
plt.ylabel('Count')
plt.xlabel('Result');
plt.title("Mean results: " + str(np.mean(results)))
plt.show()
Full documentation is here: https://matplotlib.org/3.2.2/api/_as_gen/matplotlib.pyplot.hist.html

it doesn't, but you can try it.
import seaborn as sns
sns.distplot(results , kde = False)

Change linear x-axis to circular x-axis

I have a series of numbers from 0 to 360 that I want to plot on the x-axis. The x-axis should be "circular", ie there should be no negative numbers before zero, but 359 instead of -1, 358 instead of -2, etc.
I would like a plot whose x-axis goes from 320 to 40, something like:
https://imgur.com/k1Ss2WJ
I don't want to manually change the data and the ticks on the axes, but I'd like to know if there is a more direct way, keeping the data as it is.

It's pretty simple. You need to use %, known as the modulo operator. This is how you'll convert your x axis numbers:
# Say your numbers are like these:
xaxis = [-1, -2, 600, 200, 360, 0, 6]
mod_xaxis = [x % 360 for x in xaxis]
# mod_xaxis is now [359, 358, 240, 200, 0, 0, 6]

How to encircle some pixels on a heat map with a continuous, not branched line using Python?

I am using plt.imshow() to plot values on a grid (CCD data in my case). An example plot:
I need to indicate a barrier on it, to show which pixels I care about. This is similar to what I need:
I know how to add squares to an image, gridlines to an image, but this knowledge doesn't solve the issuue, nor adding single squares to the pic, which is also within my abilities. I need a line which encircles an area on the grid (and this line will always need to go between pixels, not across them so this might make it simpler a bit).
How can I do this?
Iury Sousa has provided a nice work-around to the question above. However, it is not strictly circling the area with a line (rather plotting a mask to the picture and then covering most of it with the picture again), and it fails when I try to encircle overlapping group of pixels. ImportanceOfBeingErnest suggested in the comments that I should simply use the plt.plot sample. Using Iury Sousa's example as a starting point lets have:
X,Y = np.meshgrid(range(30),range(30))
Z = np.sin(X)+np.sin(Y)
selected1 = Z>1.5
Now selected1 is an array of boolean arrays, and we would like to circle only those pixels which have corresponding Z value above 1.5. We also would like to circle selected2, which contains True values for pixels with value above 0.2 and below 1.8:
upperlim_selected2 = Z<1.8
selected2 = upperlim_selected2>0.2
Iury Sousa's great work-around doesn't work for this case. plt.plot would, in my opinion. What is an efficient way to achieve the circling of selected1 and selected2, either using plt.plot or another method?

I tried something that should fit your needs.
First of all, I defined an arbitrary data:
X,Y = np.meshgrid(range(30),range(30))
Z = np.sin(X)+np.sin(Y)
Here you can define the condition which fits in the pattern you want to highlight:
selected = Z>1.5
To plot you will use scatter instead of imshow. You will plot all the data, then the selected data two more times, one with larger squares with the highlight color and another normally using the same color reference and limits.
info = dict(marker='s',vmin=-2,vmax=2)
fig,ax = plt.subplots()
plt.scatter(X.ravel(),Y.ravel(),100,c=Z.ravel(),**info)
plt.scatter(X[selected].ravel(),Y[selected].ravel(),150,c='r',marker='s')
plt.scatter(X[selected].ravel(),Y[selected].ravel(),100,c=Z[selected].ravel(),**info)
ax.axis('equal')

Similar to the answer in Can matplotlib contours match pixel edges?
you can create a grid with a higher resolution and draw a contour plot.
import numpy as np
import matplotlib.pyplot as plt
X,Y = np.meshgrid(range(30),range(30))
Z = np.sin(X)+np.sin(Y)
resolution = 25
f = lambda x,y: Z[int(y),int(x) ]
g = np.vectorize(f)
x = np.linspace(0,Z.shape[1], Z.shape[1]*resolution)
y = np.linspace(0,Z.shape[0], Z.shape[0]*resolution)
X2, Y2= np.meshgrid(x[:-1],y[:-1])
Z2 = g(X2,Y2)
plt.pcolormesh(X,Y, Z)
plt.contour(X2,Y2,Z2, [1.5], colors='r', linewidths=[1])
plt.show()

Another solution which works for me:
Lets have a grid for example:
grid=[[0, 6, 8, 2, 2, 5, 25, 24, 11],
[4, 15, 3, 22, 225, 1326, 2814, 1115, 18],
[6, 10, 9, 201, 3226, 3549, 3550, 3456, 181],
[42, 24, 46, 1104, 3551, 3551, 3551, 3382, 27],
[9, 7, 73, 2183, 3551, 3551, 3551, 3294, 83],
[9, 7, 5, 669, 3544, 3551, 3074, 1962, 18],
[10, 3545, 9, 10, 514, 625, 16, 14, 5],
[5, 6, 128, 10, 8, 6, 7, 40, 4]]
We plot it:
plt.pcolormesh(grid)
Assume we want to encircle every pixel with value higher than 1420. We create a boolean array:
threshold=1420
booleangrid=np.asarray(grid)>threshold
intgrid=booleangrid*1
We then create a line segment around every pixel:
down=[];up=[];left=[];right=[]
for i, eachline in enumerate(intgrid):
for j, each in enumerate(eachline):
if each==1:
down.append([[j,j+1],[i,i]])
up.append([[j,j+1],[i+1,i+1]])
left.append([[j,j],[i,i+1]])
right.append([[j+1,j+1],[i,i+1]])
and join them together:
together=[]
for each in down: together.append(each)
for each in up: together.append(each)
for each in left: together.append(each)
for each in right: together.append(each)
(Creted separately for clarity.)
We go through these individual line segments, ant keep only those which appear only once, ie the ones on the edge of the feature defined by the boolean array (booleangrid) we defined above:
filtered=[]
for each in together:
c=0
for EACH in together:
if each==EACH:
c+=1
if c==1:
filtered.append(each)
Then we plot the grid and idividual line segments with a for loop:
plt.pcolormesh(grid)
for x in range(len(filtered)):
plt.plot(filtered[x][0],filtered[x][1],c='red', linewidth=8)
giving us the result:
With which we can be happy with.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting data from a histogram with custom bins in Python - python

Related

Values on the X-axis shifting toward to zero

Is it possible to leave blank spaces in matplotlib's pcolormesh plots?

Python Matplotlib pyplot histogram

Change linear x-axis to circular x-axis

How to encircle some pixels on a heat map with a continuous, not branched line using Python?

Categories

Resources