Find speed of vehicle from images

Find speed of vehicle from images - python

I am doing a project to find the speed of a vehicle from images. We are taking these images from within the vehicle. We will be marking some object from the 1st image as a reference. Using the properties of the same object in the next image, we must calculate the speed of the moving vehicle. Can anyone help me here??? I am using python opencv. I have succeeded till finding the marked pixel in the 2nd image using Optical flow method. Can anyone help me with the rest?

Knowing the acquisition frequency, you must now find the distance between the successive positions of the marker.
To find this distance, I suggest you estimate the pose of the marker for each image. Loosely speaking, the "pose" is the transformation matrix expressing the coordinates of an object relative to a camera. Once you have those successive coordinates, you can compute the distance, and then the speed.
Pose estimation is the process of computing the position and orientation of a known 3D object relative to a 2D camera. The resulting pose is the transformation matrix describing the object's referential in the camera's referential.
OpenCV implements a pose estimation algorithm: Posit. The doc says:
Given some 3D points (in object
coordinate system) of the object, at
least four non-coplanar points, their
corresponding 2D projections in the
image, and the focal length of the
camera, the algorithm is able to
estimate the object's pose.
This means:
You must know the focal length of your camera
You must know the geometry of your marker
You must be able to match four know points of your marker in the 2D image
You may have to compute the focal length of the camera using the calibration routines provided by OpenCV. I think you have the two other required data.
Edit:
// Algorithm example
MarkerCoords = {Four coordinates of know 3D points}
I1 = take 1st image
F1 = focal(I1)
MarkerPixels1 = {Matching pixels in I1}
Pose1 = posit(MarkerCoords, MarkerPixels1, F1)
I2 = take 2nd image
F2 = focal(I2)
MarkerPixels2 = {Matching pixels in I2 by optical flow}
Pose2 = posit(MarkerCoords, MarkerPixels2, F2)
o1 = origin_of_camera * Pose1 // Origin of camera is
o2 = origin_of_camera * Pose2 // typically [0,0,0]
dist = euclidean_distance(o1, o2)
speed = dist/frequency
Edit 2: (Answers to comments)
"What is the acquisition frequency?"
Computing the speed of your vehicle is equivalent to computing the speed of the marker. (In the first case, the referential is the marker attached to the earth, in the second case, the referential is the camera attached to the vehicle.) This is expressed by the following equation:
speed = D/(t2-t1)
With:
D the distance [o1 o2]
o1 the position of the marker at time t1
o2 the position of the marker at time t2
You can retrieve the elapsed time either by extracting t1 and t2 from the metadata of your photos, or from the acquisition frequency of your imaging device: t2-t1 = T = 1/F.
"Won't it be better to mark simple things like posters? And if doing so can't we consider it as a 2d object?"
This is not possible with the Posit algorithm (or with any other pose estimation algorithm as far as I know): it requires four non-coplanar points. This means you cannot chose a 2D object embedded in a 3D space, you have to chose an object with some depth.
On the other hand, you can use a really simple shape, as far as it is a volume. (A cube for example.)

Related

Taking the coordinates of an object and creating a formula to drag an arrow

I am using OpenCV to triangulate the position of an object, and am trying to create some kind of formula to pass the coordinates that I obtain through to drag a pull arrow, casting a fishing rod. I tried using polynomial regression to a very high degree, but it is still inaccurate due to the regression not being able to take into account an (x,y) input to an (x,y) output, rather just an x input to x output etc. I have attached screenshots below for clarity, alongside my obtained formulas from the regression. Any help/ideas/suggestions would be appreciated, thanks.
Edit:
The xy coordinates are organized from the landing position to the position where the arrow was pulled to for the bobber to land there. This is because the fishing blob is the input, and the arrow pull end location comes from the blob location. I am using OpenCV to obtain the x,y coordinates, which I believe is just an x,y coordinate system of the 2d screen.
The avatar position is locked, and the button to cast the rod is located at an absolute position of (957,748).
The camera position is locked with no rotation or movement.
I believe that the angle the rod is cast at is likely a 1:1 opposite of where it is pulled to. Ex: if the rod was pulled to 225 degrees it would cast at 45 degrees. I am not 100% sure, but I think that the strength is linear. I used linear regression partially because I was not sure about this. There is no altitude difference/slope/wind that affects the cast. The only affecting factor of landing position is where the arrow is dragged to. The arrow will not drag past the 180/360 degree position sideways (relative to cast button) and will simply lock the cast angle in the x direction if it is held there.
The x-y data was collected with a simple program to move the mouse to the same position (957,748) and drag the arrow to cast the rod with different drag strengths/positions to create some kind of line of best fit for a general formula for casting the rod. The triang_x and y functions included are what the x and y coordinates were run through respectively to triangulate the ending drag coordinate for the arrow. This does not work very well because matching the x-to-x and y-to-y doesn't account for x and y data in each formula, just x-to-x etc.
Left column is fishing spot coordinates, right column is where arrow is dragged to to hit the fish spot.
(1133,359) to (890,890)
(858,334) to (886, 900)
(755,579) to (1012,811)
(1013,255) to (933,934)
(1166,469) to (885,855)
(1344,654) to (855,794)
(804,260) to (1024,939)
(1288,287) to (822,918)
(624,422) to (1075,869)
(981,460) to (949,851)
(944,203) to (963,957)
(829,367) to (1005,887)
(1129,259) to (885,932)
(773,219) to (1036,949)
(1052,314) to (919,908)
(958,662) to (955,782)
(1448,361) to (775,906)
(1566,492) to (751,837)
(1275,703) to (859,764)
(1210,280) to (852,926)
(668,513) to (1050,836)
(830,243) to (1011,939)
(688,654) to (1022,792)
(635,437) to (1072,864)
(911,252) to (976,935)
(1499,542) to (785,825)
(793,452) to (1017,860)
(1309,354) to (824,891)
(1383,522) to (817,838)
(1262,712) to (867,758)
(927,225) to (980,983)
(644,360) to (1097,919)
(1307,648) to (862,798)
(1321,296) to (812,913)
(798,212) to (1026,952)
(1315,460) to (836,854)
(700,597) to (1028,809)
(868,573) to (981,811)
(1561,497) to (758,838)
(1172,588) to (896,816)
Shows bot actions taken within function and how formula is used.
coeffs_x = np.float64([
-7.9517089428836911e+005,
4.1678460255861210e+003,
-7.5075555590709371e+000,
4.2001528427460097e-003,
2.3767929866943760e-006,
-4.7841176483548307e-009,
6.1781765539212100e-012,
-5.2769581174002655e-015,
-4.3548777375857698e-019,
2.5342561455214514e-021,
-1.4853535063513160e-024,
1.5268121610772846e-027,
-2.9667978919426497e-031,
-9.5670287721717018e-035,
-2.0270490020866057e-037,
-2.8248895597371365e-040,
-4.6436110892973750e-044,
6.7719507722602512e-047,
7.1944028726480678e-050,
1.2976299392064562e-052,
7.3188205383162127e-056,
-6.3972284918241943e-059,
-4.1991571617797430e-062,
2.5577340340980386e-066,
-4.3382682133956009e-068,
1.5534384486024757e-071,
5.1736875087411699e-075,
7.8137258396620031e-078,
2.6423817496804479e-081,
2.5418438527686641e-084,
-2.8489136942892384e-087,
-2.3969101111450846e-091,
-3.3499890707855620e-094,
-1.4462592756075361e-096,
6.8375394909274851e-100,
-2.4083095685910846e-103,
7.0453288171977301e-106,
-2.8342463921987051e-109
])
triang_x = np.polynomial.Polynomial(coeffs_x)
coeffs_y = np.float64([
2.6215449742035207e+005,
-5.7778572049616614e+003,
5.1995066291482431e+001,
-2.3696608508824663e-001,
5.2377319234985116e-004,
-2.5063316505492962e-007,
-9.2022083686040928e-010,
3.8639053124052189e-013,
2.7895763914453325e-015,
7.3703786336356152e-019,
-1.3411964395287408e-020,
1.5532055573746500e-023,
-6.9719956967963252e-027,
1.9573598517734802e-029,
-3.3847482160483597e-032,
-5.5368209294319872e-035,
7.1463648457003723e-038,
4.6713369979545088e-040,
-7.5070219026265008e-043,
-4.5089676791698693e-047,
-3.2970870269153785e-049,
1.6283636917056585e-051,
-1.4312555782661719e-054,
7.8463441723355399e-058,
1.9439588820918080e-060,
2.1292310369635749e-063,
-1.4191866473449773e-065,
-2.1353539347524828e-070,
2.5876946863828411e-071,
-1.6182477348921458e-074
])
triang_y = np.polynomial.Polynomial(coeffs_y)

First you need to clarify few things:
the xy data
Is position of object you want to hit or position what you hit when used specific input data (which is missing in that case)?In what coordinate system?
what position is your avatar?
how is the view defined?
is it fully 3D with 6DOF or just fixed (no rotation or movement) relative to avatar?
what is the physics/logic of your rod casting
is it angle (one or two), strength?Is the strength linear to distance?Does throwing acount for altitude difference between avatar and target?does ground elevation (slope) play a role?Are there any other factors like wind, tupe of rod etc?
You shared the xy data but what against you want to correlate or make formula for it? it does not make sense you obviously forget to add something like each position was taken for what conditions?
I would solve this by (no further details before you clarify stuff above):
transform targets xy to player relative coordinate system aligned to ground
compute azimut angle (geometricaly)
simple atan2(y,x) will do but you need to take into account your coordinate system notations.
compute elevation angle and strength (geometricaly)
simple balistic physics should apply however depends on the physics the game or whatever you write this for uses.
adjust for additional stuff
You know for example wind can slightly change your angle and strength
In case you have real physics and data you can do #3,#4 at the same time. See similar:
C++ intersection time of 2 bullets
[Edit1] puting your data into your image
OK your coordinates obviously do not match your screenshot as the image taken is scaled after some intuition I rescaled it and draw into image in C++ to match again so here the result:
I converted your Cartesian points:
int ava_x=957,ava_y=748; // avatar
int data[]= // target(x1,y1) , drag(x0,y0)
{
1133,359,890,890,
858,334,886, 900,
755,579,1012,811,
1013,255,933,934,
1166,469,885,855,
1344,654,855,794,
804,260,1024,939,
1288,287,822,918,
624,422,1075,869,
981,460,949,851,
944,203,963,957,
829,367,1005,887,
1129,259,885,932,
773,219,1036,949,
1052,314,919,908,
958,662,955,782,
1448,361,775,906,
1566,492,751,837,
1275,703,859,764,
1210,280,852,926,
668,513,1050,836,
830,243,1011,939,
688,654,1022,792,
635,437,1072,864,
911,252,976,935,
1499,542,785,825,
793,452,1017,860,
1309,354,824,891,
1383,522,817,838,
1262,712,867,758,
927,225,980,983,
644,360,1097,919,
1307,648,862,798,
1321,296,812,913,
798,212,1026,952,
1315,460,836,854,
700,597,1028,809,
868,573,981,811,
1561,497,758,838,
1172,588,896,816,
};
Into polar relative to ava_x,ava_y using atan2 and 2D distance formula and simply print the angular difference +180deg and ratio between line sizes (that is the yellow texts in left of the screenshot) first is ordinal number then angle difference [deg] and then ratio between line lengths...
as you can see the angle difference is +/-10.6deg and length ratio is <2.5,3.6> probably because of inaccuracy of OpenCV findings and some randomness for fishing rod castings from the game logic itself.
As you can see polar coordinates are best for this. For starters you could do simply this:
// wanted target in polar (obtained by CV)
x = target_x-ava_x;
y = target_y-ava_y;
a = atan2(y,x);
l = sqrt((x*x)+(y*y));
// aiming drag in polar
a += 3.1415926535897932384626433832795; // +=180 deg
l /= 3.0; // "avg" ratio between line sizes
// aiming drag in cartesian
aim_x = ava_x + l*cos(a);
aim_y = ava_y + l*sin(a);
You can optimize it to:
aim_x = ava_x - ((target_x-ava_x)/3);
aim_y = ava_y - ((target_y-ava_y)/3);
Now to improve precision you could measure the dependency or line ratio and line size (it might be not linear) , also the angular difference might be bigger for bigger lines ...
Also note that second cast (ordinal 2) is probably a bug (wrongly detected x,y by CV) if you render the 2 lines you will see they do not match so you should not account that and throw them away from dataset.
Also note that I code in C++ so my goniometrics use radians (not sure if true for python if not you need to convert to degrees) also equations might need some additional tweaking for your coordinate systems (negate y?)

Quantify roughness of a 2D surface based on given scatter points geometrically

How to design a simple code to automatically quantify a 2D rough surface based on given scatter points geometrically? For example, to use a number, r=0 for a smooth surface, r=1 for a very rough surface and the surface is in between smooth and rough when 0 < r < 1.
To more explicitly illustrate this question, the attached figure below is used to show several sketches of 2D rough surfaces. The dots are the scattered points with given coordinates. Accordingly, every two adjacent dots can be connected and a normal vector of each segment can be computed (marked with arrow). I would like to design a function like
def roughness(x, y):
...
return r
where x and y are sequences of coordinates of each scatter point. For example, in case (a), x=[0,1,2,3,4,5,6], y=[0,1,0,1,0,1,0]; in case (b), x=[0,1,2,3,4,5], y=[0,0,0,0,0,0]. When we call the function roughness(x, y), we will get r=1 (very rough) for case (a) and r=0 (smooth) for case (b). Maybe r=0.5 (medium) for case (d). The question is refined to what appropriate components do we need to put inside the function roughness?
Some initial thoughts:
Roughness of a surface is a local concept, which we only consider within a specific range of area, i.e. only with several local points around the location of interest. To use mean of local normal vectors? This may fail: (a) and (b) are with the same mean, (0,1), but (a) is rough surface and (b) is smooth surface. To use variance of local normal vectors? This may also fail: (c) and (d) are with the same variance, but (c) is rougher than (d).

maybe something like this:
import numpy as np
def roughness(x, y):
# angles between successive points
t = np.arctan2(np.diff(y), np.diff(x))
# differences between angles
ts = np.sin(t)
tc = np.cos(t)
dt = ts[1:] * tc[:-1] - tc[1:] * ts[:-1]
# sum of squares
return np.sum(dt**2) / len(dt)
would give you something like you're asking?

Maybe you should consider a protocol definition:
1) geometric definition of the surface first
2) grant unto that geometric surface intrinsic properties.
2.a) step function can be based on quadratic curve between two peaks or two troughs with their concatenated point as the focus of the 'roughness quadratic' using the slope to define roughness in analogy to the science behind road speed-bumps.
2.b) elliptical objects can be defined by a combination of deformation analysis with centered circles on the incongruity within the body. This can be solved in many ways analogous to step functions.
2.c) flat lines: select points that deviate from the mean and do a Newtonian around with a window of 5-20 concatenated points or what ever is clever.
3) define a proper threshold that fits what ever intuition you are defining as "roughness" or apply conventions of any professional field to your liking.
This branched approach might be quicker to program, but I am certain this solution can be refactored into a Euclidean construct of 3-point ellipticals, if someone is up for a geometry problem.

The mathematical definitions of many surface parameters can be found here, which can be easily put into numpy:
https://www.keyence.com/ss/products/microscope/roughness/surface/parameters.jsp
Image (d) shows a challenge: basically you want to flatten the shape before doing the calculation. This requires prior knowledge of the type of geometry you want to fit. I found an app Gwyddion that can do this in 3D, but it can only interface with Python 2.7, not 3.
If you know which base shape lies underneath:
fit the known shape
calculate the arc distance between each two points
remap the numbers by subtracting 1) from the original data and assigning new coordinates according to 2)
perform normal 2D/3D roughness calculations

How to use optical flow tracking in opencv to segment an image?

Using calcopticalflowpyrlk from opencv2 to track the motion flow, of objects I picked on the first frame (green dots):
I draw line between the old points fed to calcopticalflowpyrlk and the ones outputed by calcopticalflowpyrlk
At the end I get this nice track
Quoting #rotating_image answer to a similar question:
You can measure the direction and the magnitude of the displacement
each pixel of interest undergoes in two successive frames to get an
idea of their movement pattern
Indeed, using previous and current spot of the tracked object, I can find the flow vector angle and magnitude.
But I still can't see how does it help me segment the image?
Should I compute the vectors of all the pixels, and those that have ~"the same" angel and magnitude found previously are the object and everything else is the background?
Or am I missing something?

Assuming you have flow images and you want to auto track blob of flows that going to the same direction.
So what you get is sparse flow which would look like sth below
You can use opencv partition for this. The partition is like distance based clustering algorithm which is better than kmean because you dont have to enter the number k. Problem is it is subject to noise and false associations. So I`ll prefer to use it on set of flow vector which is larger than a threshold.
you can find a sample below
int th_distance = 18; // radius tolerance
int th2 = th_distance * th_distance; // squared radius tolerance
vector<int> labels;
int n_labels = partition(pts, labels, [th2](const Point& lhs, const Point& rhs) {
return ((lhs.x - rhs.x)*(lhs.x - rhs.x) + (lhs.y - rhs.y)*(lhs.y - rhs.y)) < th2;
});
->
where each color means a segmentes. You can adjust the parameter for your video
Then based on the initial clustering, use convex hull to get proper shape of each car out.
here is the sample https://docs.opencv.org/2.4/doc/tutorials/imgproc/shapedescriptors/hull/hull.html
At last, aggregate the motion vector into final vector K and denote on the final vector K on the center of the hull.
Then concatenate the final vector K of each image to form a trajectory.

OpenCV - Tilted camera and triangulation landmark for stereo vision

I am using a stereo system and so I am trying to get world coordinates of some points by triangulation.
My cameras present an angle, the Z axis direction (direction of the depth) is not normal to my surface. That is why when I observe flat surface, I get no constant depth but a "linear" variation, correct? And I want the depth from the baseline direction... How I can re-project?
A piece of my code with my projective arrays and triangulate function :
#C1 and C2 are the cameras matrix (left and rig)
#R_0 and T_0 are the transformation between cameras
#Coord1 and Coord2 are the correspondant coordinates of left and right respectively
P1 = np.dot(C1,np.hstack((np.identity(3),np.zeros((3,1)))))
P2 =np.dot(C2,np.hstack(((R_0),T_0)))
for i in range(Coord1.shape[0])
z = cv2.triangulatePoints(P1, P2, Coord1[i,],Coord2[i,])
-------- EDIT LATER -----------
Thanks scribbleink, so i tried to apply your proposal. But i think i have a mistake because it doesnt work well as you can see below. And the point clouds seems to be warped and curved towards the edges of the image.
U, S, Vt = linalg.svd(F)
V = Vt.T
#Right epipol
U[:,2]/U[2,2]
# The expected X-direction with C1 camera matri and C1[0,0] the focal length
vecteurX = np.array([(U[:,2]/U[2,2])[0],(U[:,2]/U[2,2])[1],C1[0,0]])
vecteurX_unit = vecteurX/np.sqrt(vecteurX[0]**2 + vecteurX[1]**2 + vecteurX[2]**2)
# The expected Y axis :
height = 2048
vecteurY = np.array([0, height -1, 0])
vecteurY_unit = vecteurY/np.sqrt(vecteurY[0]**2 + vecteurY[1]**2 + vecteurY[2]**2)
# The expected Z direction :
vecteurZ = np.cross(vecteurX,vecteurY)
vecteurZ_unit = vecteurZ/np.sqrt(vecteurZ[0]**2 + vecteurZ[1]**2 + vecteurZ[2]**2)
#Normal of the Z optical (the current Z direction)
Zopitcal = np.array([0,0,1])
cos_theta = np.arccos(np.dot(vecteurZ_unit, Zopitcal)/np.sqrt(vecteurZ_unit[0]**2 + vecteurZ_unit[1]**2 + vecteurZ_unit[2]**2)*np.sqrt(Zopitcal[0]**2 + Zopitcal[1]**2 + Zopitcal[2]**2))
sin_theta = (np.cross(vecteurZ_unit, Zopitcal))[1]
#Definition of the Rodrigues vector and use of cv2.Rodrigues to get rotation matrix
v1 = Zopitcal
v2 = vecteurZ_unit
v_rodrigues = v1*cos_theta + (np.cross(v2,v1))*sin_theta + v2*(np.cross(v2,v1))*(1. - cos_theta)
R = cv2.Rodrigues(v_rodrigues)[0]

Your expected z direction is arbitrary to the reconstruction method. In general, you have a rotation matrix that rotates the left camera from your desired direction. You can easily build that matrix, R. Then all you need to do is to multiply your reconstructed points by the transpose of R.

To add to fireant's response, here is one candidate solution, assuming that the expected X-direction coincides with the line joining the centers of projection of the two cameras.
Compute the focal lengths f_1 and f_2 (via pinhole model calibration).
Solve for the location of camera 2's epipole in camera 1's frame. For this, you can use either the Fundamental matrix (F) or the Essential matrix (E) of the stereo camera pair. Specifically, the left and right epipoles lie in the nullspace of F, so you can use Singular Value Decomposition. For a solid theoretical reference, see Hartley and Zisserman, Second edition, Table 9.1 "Summary of fundamental matrix properties" on Page 246 (freely available PDF of the chapter).
The center of projection of camera 1, i.e. (0, 0, 0) and the location of the right epipole, i.e. (e_x, e_y, f_1) together define a ray that aligns with the line joining the camera centers. This can be used as the expected X-direction. Call this vector v_x.
Assuming that the expected Y axis faces downward in the image plane, i.e, from (0, 0, f_1) to (0, height-1, f_1), where f is the focal length. Call this vector as v_y.
The expected Z direction is now the cross-product of vectors v_x and v_y.
Using the expected Z direction along with the optical axis (Z-axis) of camera 1, you can then compute a rotation matrix from two 3D vectors using, say the method listed in this other stackoverflow post.
Practical note:
Expecting the planar object to exactly align with the stereo baseline is unlikely without considerable effort, in my practical experience. Some amount of plane-fitting and additional rotation would be required.
One-time effort:
It depends on whether you need to do this once, e.g. for one-time calibration, in which case simply make this estimation process real-time, then rotate your stereo camera pair until the depth map variance is minimized. Then lock your camera positions and pray someone doesn't bump into it later.
Repeatability:
If you need to keep aligning your estimated depth maps to truly arbitrary Z-axes that change for every new frame captured, then you should consider investing time in the plane-estimation method and making it more robust.

How do I calculate a 3D centroid?

Is there even such a thing as a 3D centroid? Let me be perfectly clear—I've been reading and reading about centroids for the last 2 days both on this site and across the web, so I'm perfectly aware at the existing posts on the topic, including Wikipedia.
That said, let me explain what I'm trying to do. Basically, I want to take a selection of edges and/or vertices, but NOT faces. Then, I want to place an object at the 3D centroid position.
I'll tell you what I don't want:
The vertices average, which would pull too far in any direction that has a more high-detailed mesh.
The bounding box center, because I already have something working for this scenario.
I'm open to suggestions about center of mass, but I don't see how this would work, because vertices or edges alone don't define any sort of mass, especially when I just have an edge loop selected.
For kicks, I'll show you some PyMEL that I worked up, using #Emile's code as reference, but I don't think it's working the way it should:
from pymel.core import ls, spaceLocator
from pymel.core.datatypes import Vector
from pymel.core.nodetypes import NurbsCurve
def get_centroid(node):
if not isinstance(node, NurbsCurve):
raise TypeError("Requires NurbsCurve.")
centroid = Vector(0, 0, 0)
signed_area = 0.0
cvs = node.getCVs(space='world')
v0 = cvs[len(cvs) - 1]
for i, cv in enumerate(cvs[:-1]):
v1 = cv
a = v0.x * v1.y - v1.x * v0.y
signed_area += a
centroid += sum([v0, v1]) * a
v0 = v1
signed_area *= 0.5
centroid /= 6 * signed_area
return centroid
texas = ls(selection=True)[0]
centroid = get_centroid(texas)
print(centroid)
spaceLocator(position=centroid)

In theory centroid = SUM(pos*volume)/SUM(volume) when you split the part into finite volumes each with a location pos and volume value volume.
This is precisely the calculation done for finding the center of gravity of a composite part.

There is not just a 3D centroid, there is an n-dimensional centroid, and the formula for it is given in the "By integral formula" section of the Wikipedia article you cite.
Perhaps you are having trouble setting up this integral? You have not defined your shape.
[Edit] I'll beef up this answer in response to your comment. Since you have described your shape in terms of edges and vertices, then I'll assume it is a polyhedron. You can partition a polyedron into pyramids, find the centroids of the pyramids, and then the centroid of your shape is the centroid of the centroids (this last calculation is done using ja72's formula).
I'll assume your shape is convex (no hollow parts---if this is not the case then break it into convex chunks). You can partition it into pyramids (triangulate it) by picking a point in the interior and drawing edges to the vertices. Then each face of your shape is the base of a pyramid. There are formulas for the centroid of a pyramid (you can look this up, it's 1/4 the way from the centroid of the face to your interior point). Then as was said, the centroid of your shape is the centroid of the centroids---ja72's finite calculation, not an integral---as given in the other answer.
This is the same algorithm as in Hugh Bothwell's answer, however I believe that 1/4 is correct instead of 1/3. Perhaps you can find some code for it lurking around somewhere using the search terms in this description.

I like the question. Centre of mass sounds right, but the question then becomes, what mass for each vertex?
Why not use the average length of each edge that includes the vertex? This should compensate nicely areas with a dense mesh.

You will have to recreate face information from the vertices (essentially a Delauney triangulation).
If your vertices define a convex hull, you can pick any arbitrary point A inside the object. Treat your object as a collection of pyramidal prisms having apex A and each face as a base.
For each face, find the area Fa and the 2d centroid Fc; then the prism's mass is proportional to the volume (== 1/3 base * height (component of Fc-A perpendicular to the face)) and you can disregard the constant of proportionality so long as you do the same for all prisms; the center of mass is (2/3 A + 1/3 Fc), or a third of the way from the apex to the 2d centroid of the base.
You can then do a mass-weighted average of the center-of-mass points to find the 3d centroid of the object as a whole.
The same process should work for non-convex hulls - or even for A outside the hull - but the face-calculation may be a problem; you will need to be careful about the handedness of your faces.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.