I've been trying to use the HOnnotate dataset to extract perspective correct hand and object masks as shown in the images of Task-3 of the Hands-2019 challenge.
The data set comes with the following annotations:
annotations:
The annotations are provided in pickled files under meta folder for each sequence. The pickle files in the training data contain a dictionary with the following keys:
objTrans: A 3x1 vector representing object translation
objRot: A 3x1 vector representing object rotation in axis-angle representation
handPose: A 48x1 vector represeting the 3D rotation of the 16 hand joints including the root joint in axis-angle representation. The ordering of the joints follow the MANO model convention (see joint_order.png) and can be directly fed to MANO model.
handTrans: A 3x1 vector representing the hand translation
handBeta: A 10x1 vector representing the MANO hand shape parameters
handJoints3D: A 21x3 matrix representing the 21 3D hand joint locations
objCorners3D: A 8x3 matrix representing the 3D bounding box corners of the object
objCorners3DRest: A 8x3 matrix representing the 3D bounding box corners of the object before applying the transormation
objName: Name of the object as given in YCB dataset
objLabel: Object label as given in YCB dataset
camMat: Intrinsic camera parameters
handVertContact: A 778D boolean vector whose each element represents whether the corresponding MANO vertex is in contact with the object. A MANO vertex is in contact if its distance to the object surface is <4mm
handVertDist: A 778D float vector representing the distance of MANO vertices to the object surface.
handVertIntersec: A 778D boolean vector specifying if the MANO vertices are inside the object surface.
handVertObjSurfProj: A 778x3 matrix representing the projection of MANO vertices on the object surface.
It also comes with a visualization script (https://github.com/shreyashampali/ho3d) that can render the annotations as 3D meshes (using Open3D) or 2D projects of on object corners and hand points (using Matplotlib):
What I am trying to do is project the visualization created by Open3D back to the original image.
So far I have not been able to do this. What I have been able to do is get the point cloud from 3d mesh and apply the camera intrinsic on it to make it perspective correct, now the question is how to create a mask out of the point-cloud for both hands and objects like the one from Open3d rendering.
# code looks as follows
# "mesh" is an Open3D triangle mesh ie "open3d.geometry.TriangleMesh()"
pcd = open3d.geometry.PointCloud()
pcd.points = mesh.vertices
pcd.colors = mesh.vertex_colors
pcd.normals = mesh.vertex_normals
pts3D = np.asarray(pcd.points)
# hand/object along negative z-axis so need to correct perspective when plotting using OpenCV
cord_change_mat = np.array([[1., 0., 0.], [0, -1., 0.], [0., 0., -1.]], dtype=np.float32)
pts3D = pts3D.dot(cord_change_mat.T)
# "anno['camMat']" is camera intrinsic matrix
img_points, _ = cv2.projectPoints(pts3D, (0, 0, 0), (0, 0, 0), anno['camMat'], np.zeros(4, dtype='float32'))
# draw perspective correct point cloud back on the image
for point in img_points:
p1, p2 = int(point[0][0]), int(point[0][1])
img[p2, p1] = (255, 255, 255)
Basically, I'm trying to get this segmentation mask out:
PS. Sorry if this doesn't make much sense, I'm very much new to 3D meshes, point clouds and their projections. I don't know all the correct technical words from them, yet. Leave a comment with a question and I can try to explain it as far as I can.
Turns out there is an easy way to do this task using Open3D and the camera intrinsic values. Basically we instruct Open3D to render the image from the POV of the camera.
import open3d
import open3d.visualization.rendering as rendering
# Create a renderer with a set image width and height
render = rendering.OffscreenRenderer(img_width, img_height)
# setup camera intrinsic values
pinhole = open3d.camera.PinholeCameraIntrinsic(img_width, img_height, fx, fy, cx, cy)
# Pick a background colour of the rendered image, I set it as black (default is light gray)
render.scene.set_background([0.0, 0.0, 0.0, 1.0]) # RGBA
# now create your mesh
mesh = open3d.geometry.TriangleMesh()
mesh.paint_uniform_color([1.0, 0.0, 0.0]) # set Red color for mesh
# define further mesh properties, shape, vertices etc (omitted here)
# Define a simple unlit Material.
# (The base color does not replace the mesh's own colors.)
mtl = o3d.visualization.rendering.Material()
mtl.base_color = [1.0, 1.0, 1.0, 1.0] # RGBA
mtl.shader = "defaultUnlit"
# add mesh to the scene
render.scene.add_geometry("MyMeshModel", mesh, mtl)
# render the scene with respect to the camera
render.scene.camera.set_projection(camMat, 0.1, 1.0, 640, 480)
img_o3d = render.render_to_image()
# we can now save the rendered image right at this point
open3d.io.write_image("output.png", img_o3d, 9)
# Optionally, we can convert the image to OpenCV format and play around.
# For my use case I mapped it onto the original image to check quality of
# segmentations and to create masks.
# (Note: OpenCV expects the color in BGR format, so swap red and blue.)
img_cv2 = cv2.cvtColor(np.array(img_o3d), cv2.COLOR_RGBA2BGR)
cv2.imwrite("cv_output.png", img_cv2)
This answer borrows a lot from this answer
Related
Using Trimesh library (https://github.com/mikedh/trimesh),I'm trying to get the pixel colors of a mesh triangle from the texture loaded with the 3D mesh (.obj + .png +.mtl). In other words, given a vertex #0, I want to map the incident faces to the loaded texture and recover the pixels inside the mapped faces. For this I load the textured 3D mesh as :
im = Image.open("cow_texture.png")
im_arr=np.array(im)
mesh= trimesh.load('cow.obj',process=False, maintain_order=False)
tex = trimesh.visual.TextureVisuals(image=im)
mesh.visual.texture = tex
The problem is when try to get the incident faces of a vertex (let's take vertex n° 0), the function returns me a wrong number of incident faces (4 instead of 6(please see image below, vertex #0 in red)):
mesh_false.vertex_faces[0] #prints array([3009, 3008, 2960, -1, -1, -1, -1, -1])
However, note that the call of the function trimesh.load is made with the parameter maintain_order=False . If the load is made with maintain_order=True, the number of incident faces is good, however the uv coordinates of adjacent neighboring vertices aren't good (they are far while they should be near).
How can I retrieve the the number right number of incident faces (or right number of neighboring adjacent vertices) when load a textured 3D mesh?
I tried to load a textured 3D mesh and find incident faces of a vertex #0.
I have five pictures of wound models taken from different angles links for the pictures is provided here
I have used SFM for computing a mesh , the picture of the mesh is presented below . I would like to extract only the features associated with the wound region compute volume and depth accordingly.
In order to solve this question, i have used U-Net segmentation to generate a 2D mask of the wound from 2d pictures , an example of 2D mask generated using U-Net is shown below .
I would like to know how i can map this mask onto 3D mesh and extract specific region within the 3D mesh which deals with wound part while removing other regions.
Any other ideas on how to segment the 3D mesh and extract specific region of interest are greatly appreciated, since i don't have different wound models i cannot apply supervised learning using 3D U-Net.
Convert the image and mesh to numpy array and follow the steps:
Duplicate the 2D mesh and stack it to make it a 3D mesh. for example, like this:
from skimage.transform import resize
mesh = np.array(mesh)
mesh = resize(mesh, (HEIGHT, WIDTH, 3))
mesh3D = np.array([mesh]*3).reshape(HEIGHT, WIDTH, 3)
Convert the pixel value of the mesh to binary (0,1). Set the part of the mesh where the wound is present to 1 and the rest to 0.
Multiply the mesh with the image.
Part of the mesh where the value is 1, that part of the image will remain as it is and the part of the mesh where the value is 0, that part of the image will be set to 0
The goal is to get the Bird's Eye View from KITTI images (dataset), and I have the Projection Matrix (3x4).
There are many ways to generate transformation matrices. For Bird's Eye View I have read some kind math expressions, like:
H12 = H2*H1-1=ARA-1=P*A-1 in OpenCV - Projection, homography matrix and bird's eye view
and x = Pi * Tr * X in kitti dataset camera projection matrix
but none of these options worked for my purpose.
PYTHON CODE
import numpy as np
import cv2
image = cv2.imread('Data/RGB/000007.png')
maxHeight, maxWidth = image.shape[:2]
M has 3x4 dimensions
M = np.array(([721.5377, 0.0, 609.5593, 44.85728], [0.0, 721.5377, 72.854, 0.2163791], [0.0, 0.0, 1.0, .002745884]))
Here It's necessary a M matrix with 3x3 dimensions
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
show the original and warped images
cv2.imshow("Original", image)
cv2.imshow("Warped", warped)
cv2.waitKey(0)
I need to know how to manage the Projection Matrix for getting Bird's Eye View.
So far, everything I've tried throws warped images at me, without information even close to what I need.
This is a example of image from the KITTI database.
This is other example of image from the KITTI database.
On the left, images are shown detecting cars in 3D (above) and 2D (below). On the right is the Bird's Eye View that I want to obtain. Therefore, I need to obtain the transformation matrix to transform the coordinates of the boxes that delimit the cars.
Here is my code to manually build a bird's eye view transform:
cv::Mat1d CameraModel::getInversePerspectiveMapping(double pixelPerMeter, cv::Point const & origin) const {
double f = pixelPerMeter * cameraPosition()[2];
cv::Mat1d R(3,3);
R << 0, 1, 0,
1, 0, 0,
0, 0, 1;
cv::Mat1d K(3,3);
K << f, 0, origin.x,
0, f, origin.y,
0, 0, 1;
cv::Mat1d transformtoGround = K * R * mCameraToCarMatrix(cv::Range(0,3), cv::Range(0,3));
return transformtoGround * mIntrinsicMatrix.inv();
}
The member variables/functions used inside the functions are
mCameraToCarMatrix: a 4x4 matrix holding the homogeneous rigid transformation from the camera's coordinate system to the car's coordinate system. The camera's axes are x-right, y-down, z-forward. The car's axes are x-forward, y-left, z-up. Within this function only the rotation part of mCameraToCarMatrix is used.
mIntrinsicMatrix: the 3x3 matrix holding the camera's intrinsic parameters
cameraPosition()[2]: the Z-coordinate (height) of the camera in the car's coordinate frame. It's the same as mCameraToCarMatrix(2,3).
The function parameters:
pixelPerMeter: the resolution of the bird's eye view image. A distance of 1 meter on the XY plane will translate to pixelPerMeter pixels in the bird's eye view image.
origin: the camera's position in the bird's eye view image
You can pass the transform matrix to cv::initUndistortRectifyMaps() as newCameraMatrix and then use cv::remap to create the bird's eye view image.
I am using VTK in python to import .stl files. then what i want to do is to scale down the mesh and making it smaller without changing the orientation matrix.
I tried vtkTransform with a scale tuple but the problem is the scaled polydata is getting rotated.
Here is the code:
def scaleSTL(filenameSTL, opacity=0.75, scale=(1,1,1), mesh_color="gold"):
colors = vtk.vtkNamedColors()
reader = vtk.vtkSTLReader()
reader.SetFileName(filenameSTL)
reader.Update()
transform = vtk.vtkTransform()
transform.Scale(scale)
transformFilter = vtk.vtkTransformPolyDataFilter()
transformFilter.SetInputConnection(reader.GetOutputPort())
transformFilter.SetTransform(transform)
transformFilter.Update()
mapper = vtk.vtkPolyDataMapper()
mapper.SetInputConnection(transformFilter.GetOutputPort())
actor = vtk.vtkActor()
actor.SetMapper(mapper)
actor.GetProperty().SetColor(colors.GetColor3d(mesh_color))
actor.GetProperty().SetOpacity(opacity)
return actor
def render_scene(my_actor_list):
renderer = vtk.vtkRenderer()
for arg in my_actor_list:
renderer.AddActor(arg)
namedColors = vtk.vtkNamedColors()
renderer.SetBackground(namedColors.GetColor3d("SlateGray"))
window = vtk.vtkRenderWindow()
window.SetWindowName("Oriented Cylinder")
window.AddRenderer(renderer)
interactor = vtk.vtkRenderWindowInteractor()
interactor.SetRenderWindow(window)
# Visualize
window.Render()
interactor.Start()
if __name__ == "__Main__":
filename = "400_tri.stl"
scale01 = (1, 1, 1)
scale02 = (0.5, 0.5, 0.5)
my_list = []
my_list.append(scaleSTL(filename, 0.75, scale01, "Gold"))
my_list.append(scaleSTL(filename, 0.75, scale02, "DarkGreen"))
render_scene(my_list)
I used my mesh file kidney.stl (yellow one) but what i getting is the scaled and rotated mesh. i set opacity to 0.75 to see both meshes. In the picture below you can see that the green one is completely moved but i want to scale so the green one is completely inside the original yellow mesh.
Simple answer (no explanation) can be found here: Scaling 3D models, finding the origin
That is because the scaling transformation is defined simply as multiplying the coordinates by a given factor (see e.g. https://www.tutorialspoint.com/computer_graphics/3d_transformation.htm). This intrinsically means that it is done with respect to a certain reference point. Your transform.Scale() call will use the origin (0,0,0) as this reference point and since your object is apparently not centered around origin, you get the translation (not rotation as you claim btw).
To get a locally centered scaling, you need to choose a reference point R on your object around which you want to scale (in your case, since you want the scaled object to be inside the original, you want some kind of center - since the object is "almost convex", centroid - average of all points - could be good enough). Translate the object by -R to align it with the coordinate system, scale and then translate back by +R.
Try a little exercise to visualize this: simple 2D example - draw yourself a square made of points with coordinates (2,2), (2,3), (3,3), (3,2) and "scale it by 2" - you get (4,4), (4,6),(6,6), (6,4) - draw it as well. Now try the alternative - first translate by the square's center (2.5,2.5), you get (-0.5,-0.5), (-0.5,0.5), (0.5,0.5), (0.5,-0.5) (draw it), scale by two, you get (-1,-1), (-1, 1), (1,1), (1,-1) (draw) and finally translate back by 2.5: (1.5, 1.5), (1.5,3.5), (3.5,3.5), (3.5, 1.5) and draw - see the difference?
I'm using skimage.measure.marching_cubes to extract a surface, defined as faces and vertices. marching_cubes also outputs values for each face.
How do I "smooth" these values (the actual smoothing could be a low-pass filter, median filter etc)? I thought that one way to achieve this would be to project, or to represent this surface in 2D, and then apply standard filters, but I can't think of how to do this from a list of faces and vertices.
The reason for this "smoothing" is because the values are not informative at the scale of a single face of the surface, but over larger areas of the surface represented by many faces.
Thanks in advance!
I eventually found a way to do this, based on MATLAB code from this paper:
Welf et al. "Quantitative Multiscale Cell Imaging in Controlled 3D Microenvironments" in Developmental Cell, 2016, Vol 36, Issue 4, p462-475
def median_filter_surface(faces, verts, measure, radius, p_norm=2):
from scipy import spatial
import numpy as np
# INPUT:
# faces: triangular surface faces - defined by 3 vertices
# verts: the above vertices, defined by x,y,z coordinates
# measure: the value related to each face that needs to be filtered
# radius: the radius for median filtering (larger = more filtering)
# p_norm: distance metric for the radius, default 2 (euclidian)
# OUTPUT:
# measure_med_filt: the "measure" after filtering
num_faces = len(faces)
face_centres = np.zeros((num_faces, 3))
# get face centre positions in 3D space (from vert coordinates)
for face in range(0, num_faces):
face_centres[face, :] = np.mean(verts[faces[face, :], :], 0)
# return all other points within a radius
tree = spatial.KDTree(face_centres)
faces_in_radius = tree.query_ball_point(face_centres, radius, p_norm)
measure_med_filt = np.zeros(len(faces))
for face in range(0, len(faces)):
measure_med_filt[face] = np.median(measure[faces_in_radius[face]])
return measure_med_filt