Integration of a curve generated using matplotlib - python

I have generated a graph using basic function -
plt.plot(tm, o1)
tm is list of all x coordinates and o1 is a list of all y coordinates
NOTE
there is no specific function such as y=f(x), rather a certain y value remains constant for a given range of x.. see figure for clarity
My question is how to integrate this function, either using the matplotlib figure or using the lists (tm and o1)

The integral corresponds to computing the area under the curve.
The most easy way to compute (or approximate) the integral "numerically" is using the rectangle rule which is basically approximating the area under the curve by summing area of rectangles (see https://en.wikipedia.org/wiki/Numerical_integration#Quadrature_rules_based_on_interpolating_functions).
Practically in your case, it quite straightforward since it is a step function.
First, I recomment to use numpy arrays instead of list (more handy for numerical computing):
import matplotlib.pyplot as plt
import numpy as np
x = np.array([0,1,3,4,6,7,8,11,13,15])
y = np.array([8,5,2,2,2,5,6,5,9,9])
plt.plot(x,y)
Then, we compute the width of rectangles using np.diff():
w = np.diff(x)
Then, the height of the same rectangles (multiple possibilities exist):
h = y[:-1]
Here I chose the 2nd value of each two successive y values. the top right angle of rectangle is on the curve. You can choose the mean value of each two successive y values h = (y[1:]+y[:-1])/2 in which the middle of the top of the rectangle coincide with the curve.
Then , you will need to multiply and sum:
area = (w*h).sum()

Related

Calculate the area enclosed by a 2D array of unordered points in python

I am trying to calculate the area of a shape enclosed by a large set of unordered points in python. I have a 2D array of points which I can plot as a scatterplot like this.
There are several ways to calculate the area enclosed by points, but these all assume ordered points, such as here and here. This method calculates the area unordered points, but it doesn't appear to work for complex shapes, as seen here. How would I calculate this area from unordered points in python?
Sample data looks like this:
[[225.93459 -27.25677 ]
[226.98128 -32.001945]
[223.3623 -34.119724]
[225.84741 -34.416553]]
From pen and paper one can see that this shape contains an area of ~12 (unitless) but putting these coordinates into one of the algorithms linked to previously returns an area of ~0.78.
Let's first mention that in the question How would I calculate this area from unordered points in python? used phrase 'unordered points' in the context of calculation of an area usually means that given are points of a contour enclosing an area which area is to calculate.
But in the question provided data sample are not points of a contour but just a cloud of points, which if visualized using a scatterplot results in a visually perceivable area.
The above is the reason why in the question provided links to algorithms calculating areas from 'unordered points' don't apply at all to what the question is about.
In other words, the actual title of the question I will answer below will be:
Calculate the visually perceivable area a cloud of (x,y) points is forming when visualized as a scatterplot
One of the possible options is mentioned in a comment to the question:
Honestly, you might consider taking THAT graph as a bitmap, and counting the number of non-white pixels in it. That is probably as close as you can get. – Tim Roberts
Given the image perfectly covering (without any margin) all the non-white pixels you can calculate the area the image rectangle is covering in units used in the underlying (x,y) data by calculating the area TA of the rectangle visible in the image from the underlying list of points P with (x,y) point coordinates ( P = [(x1,y1), (x2,y2), ...] ) as follows:
X = [x for x,y in P]
Y = [y for x,y in P]
TA = (max(X)-min(X))*(max(Y)-min(Y))
Assuming N_white is the number of all white pixels in the image with N pixels the actual area A covered by non-white pixels expressed in units used in the list of points P will be:
A = TA*(N-N_white)/N
Another approach using a list of points P with (x,y) point coordinates only ( without creation of an image ) consists of following steps:
decide which area Ap a point is covering and calculate half of the size h2 of a rectangle with this area around that point ( h2 = 0.5*sqrt(Ap) )
create a list R with rectangles around all points in the list P: R = [(x-h2, y+h2, x+h2, y-h2) for x,y in P]
use the code provided through a link listed in the stackoverflow question
Area of Union Of Rectangles using Segment Trees to calculate the total area covered by the rectangles in the list R.
The above approach has the advantage over the graphical one obtained from the scatterplot that with the choice of the area covered by a point you directly influence the used precision/resolution/granularity for the area calculation.
Given a 2D array of points the area covered by the points can be calculated with help of the return value of the same hist2d() function provided in the matplotlib module (as matplotlib.pyplot.hist2d()) which is used to show the scatterplot.
The 'trick' is to set the cmin parameter value of the function to 1 ( cmin=1 ) and then calculate the number of numpy.nan values in the by the function returned array setting them in relation to entire amount of array values.
In other words all what is necessary to calculate the area when creating the scatterplot is already there for easy use in a simple area calculation formulas if you know that the histogram creating function provide as return value all what is therefore necessary.
Below code of a ready to use function for the area calculation along with demonstration of function usage:
def area_of_points(points, grid_size = [1000, 1000]):
"""
Returns the area covered by N 2D-points provided in a 'points' array
points = [ (x1,y1), (x2,y2), ... , (xN, yN) ]
'grid_size' gives the number of grid cells in x and y direction the
'points' bounding box is divided into for calculation of the area.
Larger 'grid_size' values mean smaller grid cells, higher precision
of the area calculation and longer runtime.
area_of_points() requires installed matplotlib module. """
import matplotlib.pyplot as plt
import numpy as np
pts_x = [x for x,y in points]
pts_y = [y for x,y in points]
pts_bb_area = (max(pts_x)-min(pts_x))*(max(pts_y)-min(pts_y))
h2D,_,_,_ = plt.hist2d( pts_x, pts_y, bins = grid_size, cmin=1)
numberOfWhiteBins = np.count_nonzero(np.isnan(h2D))
numberOfAll2Dbins = h2D.shape[0]*h2D.shape[1]
areaFactor = 1.0 - numberOfWhiteBins/numberOfAll2Dbins
pts_pts_area = areaFactor * pts_bb_area
print(f'Areas: b-box = {pts_bb_area:8.4f}, points = {pts_pts_area:8.4f}')
plt.show()
return pts_pts_area
#:def area_of_points(points, grid_size = [1000, 1000])
import numpy as np
np.random.seed(12345)
x = np.random.normal(size=100000)
y = x + np.random.normal(size=100000)
pts = [[xi,yi] for xi,yi in zip(x,y)]
print(area_of_points(pts))
# ^-- prints: Areas: b-box = 114.5797, points = 7.8001
# ^-- prints: 7.800126875291629
The above code creates following scatterplot:
Notice that the printed output Areas: b-box = 114.5797, points = 7.8001 and the by the function returned area value 7.800126875291629 give the area in units in which the x,y coordinates in the array of points are specified.
Instead of usage of a function when utilizing the know how you can play around with the parameter of the scatterplot calculating the area of what can be seen in the scatterplot.
Below code which changes the displayed scatterplot using the same underlying point data:
import numpy as np
np.random.seed(12345)
x = np.random.normal(size=100000)
y = x + np.random.normal(size=100000)
pts = [[xi,yi] for xi,yi in zip(x,y)]
pts_values_example = \
[[0.53005, 2.79209],
[0.73751, 0.18978],
... ,
[-0.6633, -2.0404],
[1.51470, 0.86644]]
# ---
pts_x = [x for x,y in pts]
pts_y = [y for x,y in pts]
pts_bb_area = (max(pts_x)-min(pts_x))*(max(pts_y)-min(pts_y))
# ---
import matplotlib.pyplot as plt
bins = [320, 300] # resolution of the grid (for the scatter plot)
# ^-- resolution of precision for the calculation of area
pltRetVal = plt.hist2d( pts_x, pts_y, bins = bins, cmin=1, cmax=15 )
plt.colorbar() # display the colorbar (for a 2d density histogram)
plt.show()
# ---
h2D, xedges1D, yedges1D, h2DhistogramObject = pltRetVal
numberOfWhiteBins = np.count_nonzero(np.isnan(h2D))
numberOfAll2Dbins = (len(xedges1D)-1)*(len(yedges1D)-1)
areaFactor = 1.0 - numberOfWhiteBins/numberOfAll2Dbins
area = areaFactor * pts_bb_area
print(f'Areas: b-box = {pts_bb_area:8.4f}, points = {area:8.4f}')
# prints "Areas: b-box = 114.5797, points = 20.7174"
creating following scatterplot:
Notice that the calculated area is now larger due to smaller values used for grid resolution resulting in more of the area colored.

Comparing value in dataframe and calculating another attribute using it

I have a pd Dataframe that has a lot of planes in the XY plane. The dataframe consists of the points' x and y coordinates. I want to check every point's distance to all other points using the pythagorean theorem and count number of points within a certain distance of that point.
def distance(x1, y1, x2, y2):
return math.sqrt((x1 - x2)**2 + (y1 - y2)**2)
df = pd.DataFrame({'X':[random.randint(1,100) for i in range(100)], 'Y':[random.randint(1,100) for i in range(100)]})
I realise that I can loop over the dataframe but that is not best practice and it takes too long. Is there a way I can optimize this process.
Ultimately I'd want another column in the dataframe that stores the number of points in the dataframe that are within a certain distance of each point.
EDIT:
Another thing I am trying to do is look for arbitrary points (or zones) in the XY plane with the most number of points within a given radius. What I basically mean is I want to also look at positions in the plane that are not necessarily points in the dataframe but are still within the limits of the plane.
If you want your code to run fast using pandas and numpy you should try to get used to writing functions that look like they only work with numbers but you can actually input numpy arrays/pandas series. E.g. if you want to find all points in your df being distance r or less from the point cx, cy you could do that like so
def close_to_my_point(x,y):
return (x-cx)**2+(y-cy)**2 <= r**2
close_to_my_point(df["X"],df["Y"])
This gives you a series of booleans indicating if your point at that position in the dataframe now is close to cx, cy or not. Notice now that when summing over True, False values True behaves like 1 and False like 0. So sum(close_to_my_point(df["X"],df["Y"])) does what you want for one point.
For functions that can't be applied to series by default there is np.vectorize to change that. Putting all that together you get something that can calculate the amount of points in some distance quite quickly:
def disk_equation(cx,cy,r):
return lambda x,y: (x-cx)**2+(y-cy)**2<= r**2
points_in_distance = lambda x,y: sum(disk_equation(x,y,20)(df["X"],df["Y"]))
df["points_closer_than_20"] = np.vectorize(points_in_distance)(df["X"],df["Y"])
There is a whole lot of tools for pairwise distance calculations included in SciPy: enter link description here
The simplest one to use would be a distance_matrix that calculates pairwise distances and returns those as a matrix. First you need to convert your dataframe into a properly formatted numpy array:
import random
from scipy.spatial import distance_matrix
import pandas as pd
import numpy as np
df = pd.DataFrame({'X':[random.randint(1,100) for i in range(100)], 'Y': random.randint(1,100) for i in range(100)]})
foo = np.array([(x,y) for x, y in zip(df.X, df.Y)])
baz = distance_matrix(foo, foo)
Here we're using foo twice since we want all pairwise distances to all points in the array.

Using python to plot a heat map from five arrays: x,y and 3 arrays indicating RGB

I have 2 arrays, x and y, respectively representing each point's coordinate on a 2D plane. I also have another 3 arrays of the same length as x and y. These three arrays represent the RGB values of a color. Therefore, each point in x,y correspond to a color indicated by the RGB arrays. In Python, how can I plot a heat map with x,y as its axes and colors from the three RGB arrays? Each array is, say, 1000 in length.
As an example that takes the first 10 points, I have:
x = [10.946028, 16.229064, -36.855, -38.719057, 11.231684, 33.256904999999996, -41.21, 12.294958, 16.113228, -43.429027000000005]
y = [-21.003803, 4.5, 4.5, -22.135853, 4.084630000000001, 17.860079000000002, -18.083685, -3.98297, -19.565272, 0.877016]
R = [0,1,2,3,4,5,6,7,8,9]
G = [2,4,6,8,10,12,14,16,18,20]
B = [0,255,0,255,0,255,0,255,0,255]
I'd like to draw a heat map that, for example, the first point would have the coordinates (10.946028,-21.003803) and has a color of R=0,G=2,B=0. The second point would have the coordinates (16.229064, 4.5) and has a color of R=1,G=4,B=255.
Ok it seems like you want like your own colormap for your heatmap. Actually you can write your own, or just use some of matplotlibs templates. Check out this post for the use of heatmaps with matplotlib. If you want to do it on your own, the easiest way is to recombine the 5 one-dimension vectors to a 3D-RGB image. Afterwards you have to define a mapping function which combines the R-G and B value to a new single value for every pixel. Like:
f(R,G,B) = a*R +b*G + c*B
a,b,c can be whatever you like, actually the formular can be way more complex, but you have to determine in which correlation the values should be. From that you get a 2D-Matrix filled with values of your function f(R,G,B). Now you have to define which value of this new matrix gets what color. This can be a linear mapping by hand (like just writing a list: 0=deep-Blue , 1= ligth-Red ...). Using this look-up table you can now get your own specific heatmap. But as you may see, that path takes some time so i would recommend not doing it and just use one of the various templates of matplotlib. Example:
import matplotlib.pyplot as plt
import numpy as np
a = np.random.random((16, 16))
plt.imshow(a, cmap='hot', interpolation='nearest')
plt.show()
You can use various types of these buy changing the string after cmap="hot" to sth of that list. Hope i could help you, gl hf.

How to avoid "TypeError: Input z must be at least a 2x2 array" when trying to generate a contour plot

I am trying to generate a contour plot of the gravitational potential of a 2 body system. The input is the masses of the 2 bodies and their separation. I keep receiving the error
TypeError: Input z must be at least a 2x2 array.
which I assume is referring to the PHI term in ax.contour(X,Y,PHI).
I have tried changing the x, y, and phi so that they match (I don't understand why they don't already match because phi is generated from the x and y values). Also, I have never used the contour plot before and I have limited programming experience so please forgive my ignorance.
import matplotlib.pyplot as plt
import numpy as np
n=100
#x and y evenly spaced values
x=np.arange(-n,n,0.01)
y=np.arange(-n,n,0.01)
X,Y=np.meshgrid(x,y)
r=np.array([X,Y])
#r is the scalar distance from the center that each test particle resides at
#r=np.array([(X**2+Y**2)**0.5])<--this was used to generate the output image
def Lagrange(m1,m2,a):
mtot=m1+m2
#the distance from the bodies to the center of mass of the system
x1=-(m2/mtot)*a
x2=(m1/mtot)*a
omsq=mtot/(a**3) #omega squared term
def phi(r):#gravitational potential function
phi= -m1/abs(r-x1)-m2/abs(r-x2)-0.5*omsq*r**2
return phi
#I also had a vector plot included in this code (among other details),
#but decided to omit them as it's not relevant to the error I am receiving.
fig=plt.figure()
ax=fig.add_subplot(111)
PHI=np.meshgrid(phi(r))
ax.contour(X,Y,PHI) #3rd dimension is the contour lines
plt.show()
Lagrange(3.0,1.0,1.0)
I expect by taking the input x and y coordinates of the system (their lengths match) and using the generated output values from the phi function (which should also match the length of x as well as y) to generate a contour plot, where the contour lines will represent the gravitational potential given by phi.
Here is the pair of images I mentioned in the comments. python code (left) vs working matlab code (right)

Plotting a 3D Cylindrical Surface plot in Python

I am trying to create a cylindrical 3D surface plot using Python, where my independent variables are z and theta, and the dependent variable is radius (i.e., radius is a function of vertical position and azimuth angle).
So far, I have only been able to find ways to create a 3D surface plot that:
has z as a function of r and theta
has r as a function of z, but does not change with theta (so, the end product looks like a revolved contour; for example, the case of r = sin(z) + 1 ).
I would like to have r as a function of z and theta, because my function will produce a shape that, at any given height, will be a complex function of theta.
On top of that, I need the surface plot be able to have (but does not have to have, depending on the properties of the function) an open top or bottom. For example, if r is constant from z = 0 to z = 1 (a perfect cylinder), I would want a surface plot that would only consist of the side of the cylinder, not the top or bottom. The plot should look like a hollow shell.
I already have the function r defined.
Thanks for any help!
Apparently, after some trial and error, the best/easiest thing to do in this case is to just to convert the r, theta, and z data points (defined as 2D arrays, just like for an x,y,z plot) into cartesian coordinates:
# convert to rectangular
x = r*numpy.cos(theta)
y = r*numpy.sin(theta)
z = z
The new x,y,z arrays can be plotted just like any other x,y,z arrays generated from a polynomial where z is a function of x,y. I had originally thought that the data points would get screwed up because of overlapping z values or maybe the adjacent data points would not be connected correctly, but apparently that is not the case.

Categories