P
US12475605B2ActiveUtilityPatentIndex 52

Systems and methods for mesh geometry prediction based on a centroid-normal representation

Assignee: ADEIA GUIDES INCPriority: May 31, 2023Filed: May 31, 2023Granted: Nov 18, 2025
Est. expiryMay 31, 2043(~16.9 yrs left)· nominal 20-yr term from priority
Inventors:LI ZHUCHEN TAO
G06T 9/001G06T 17/205G06T 9/002
52
PatentIndex Score
0
Cited by
28
References
20
Claims

Abstract

Systems and methods are provided for predictive mesh coding based on a centroid-normal (C-N) representation. An encoder generates C-N representations of a high-resolution (hi-res) mesh and a downscaling of the mesh (lo-res mesh), each representation having respective centroids and normals. The encoder generates predicted centroids corresponding to the hi-res mesh based on the lo-res centroids using a centroid prediction model. The encoder generates predicted normals corresponding to the hi-res mesh based on the predicted centroids and lo-res normals using a normal vector prediction model. Residuals are computed for the respective predicted geometry data. The encoder transmits encodings of the lo-res mesh and the residuals for decoding at a client device.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
         1 . A method comprising:
 generating a low-resolution mesh from a high-resolution mesh, each of the low-resolution mesh and the high-resolution mesh representing a same media content;   generating a first centroid-normal representation of the high-resolution mesh, the first centroid-normal representation comprising a first plurality of centroids and a first plurality of normal vectors;   generating a second centroid-normal representation of the low-resolution mesh, the second centroid-normal representation comprising a second plurality of centroids and second plurality of normal vectors;   using a centroid occupancy prediction model to generate, from the second plurality of centroids, a predicted representation corresponding to the first centroid-normal representation, wherein the centroid occupancy prediction model is trained according to a first learning algorithm;   using a normal vector prediction model to generate, from the second plurality of normal vectors and the predicted representation, predicted normal vectors corresponding to the first centroid-normal representation, wherein the normal vector prediction model is trained according to a second learning algorithm;   computing a centroid residual based on a difference between centroids of the predicted representation and the first plurality of centroids;   computing a normal vector residual based on a difference between the predicted normal vectors and the first plurality of normal vectors; and   transmitting, for decoding at a client device, encodings of the low-resolution mesh, the centroid residual, and the normal vector residual for reconstruction of the high-resolution mesh and display of the same media content.   
     
     
         2 . The method of  claim 1 , wherein the generating the first centroid-normal representation comprises:
 accessing a first data structure comprising a plurality of mesh elements for the high-resolution mesh, each mesh element comprising a respective plurality of vertices;   for each mesh element of the plurality of mesh elements:
 computing a respective centroid of the respective plurality of vertices of the mesh element; and 
 computing a respective normal vector based on the respective plurality of vertices of the mesh element, wherein the respective normal vector is perpendicular to the mesh element at the respective centroid, and wherein the respective normal vector is one of the first plurality of normal vectors; and 
   generating a second data structure associated with the first centroid-normal representation, wherein the second data structure comprises the first plurality of centroids and the first plurality of normal vectors.   
     
     
         3 . The method of  claim 2 , wherein the computing the respective normal vector based on the respective plurality of vertices of each mesh element of the plurality of mesh elements comprises:
 determining a first angle and a second angle corresponding to the respective normal vector, wherein the first angle and the second angle collectively define a spatial direction that is perpendicular to the mesh element at the respective centroid, and wherein the second data structure comprises the first angle and the second angle for the respective normal vector based on the respective plurality of vertices of each mesh element of the plurality of mesh elements.   
     
     
         4 . The method of  claim 1 , wherein the using the centroid occupancy prediction model to generate the predicted representation comprises:
 computing a probability of occupancy for centroids of a mesh object, wherein the mesh object is a 3D structure defining potential centroids for the first centroid-normal representation;   comparing the probability of occupancy to a threshold value; and   assigning, as part of the predicted representation, centroids of the mesh object associated with a probability of occupancy greater than the threshold value.   
     
     
         5 . The method of  claim 4 , further comprising determining centroid errors for the predicted representation based on a binary cross-entropy loss. 
     
     
         6 . The method of  claim 1 , wherein the using the centroid occupancy prediction model to generate the predicted representation further comprises determining initial feature channels by using a 3D convolution model. 
     
     
         7 . The method of  claim 6 , wherein the 3D convolution model comprises a Minkowski convolutional neural network model. 
     
     
         8 . The method of  claim 1 , wherein the same media content comprises at least one of a 3D scanned model of a physical object or an animated digital object. 
     
     
         9 . The method of  claim 1 , wherein the normal vector prediction model comprises a 3D convolution model. 
     
     
         10 . The method of  claim 9 , wherein the using the normal vector prediction model further comprises:
 using the 3D convolution model to generate initial feature channels;   using a first plurality of Multi-Resolution Convolution Blocks (MRCBs) and Stride-2 downscaling convolution layers to generate expanded feature channels based on the initial feature channels;   inputting, in a second plurality of MRCBs and Stride-2 upscaling convolution layers, the expanded feature channels and a plurality of centroids having respective resolution scales for the Stride-2 upscaling convolution layers; and   generating, from the second plurality of MRCBs and Stride-2 upscaling convolution layers, output feature channels based on the expanded feature channels and the plurality of centroids having the respective resolution scales.   
     
     
         11 . A system comprising:
 control circuitry configured to:
 generate a low-resolution mesh from a high-resolution mesh, each of the low-resolution mesh and the high-resolution mesh representing a same media content; 
 generate a first centroid-normal representation of the high-resolution mesh comprising a first plurality of centroids and a first plurality of normal vectors; 
 generate a second centroid-normal representation of the low-resolution mesh comprising a second plurality of centroids and second plurality of normal vectors; 
 use a centroid occupancy prediction model to generate, from the second plurality of centroids, a predicted representation corresponding to the first centroid-normal representation, wherein the centroid occupancy prediction model is trained according to a first learning algorithm; 
 use a normal vector prediction model to generate, from the second plurality of normal vectors and the predicted representation, predicted normal vectors corresponding to the first centroid-normal representation, wherein the normal vector prediction model is trained according to a second learning algorithm; 
 compute a centroid residual based on a difference between centroids of the predicted representation and the first plurality of centroids; 
 compute a normal vector residual based on a difference between the predicted normal vectors and the first plurality of normal vectors; and 
   input/output (I/O) circuitry configured to:
 transmit, for decoding at a client device, encodings of the low-resolution mesh, the centroid residual, and the normal vector residual for reconstruction of the high-resolution mesh and display of the same media content. 
   
     
     
         12 . The system of  claim 11 , wherein the control circuitry, when generating the first centroid-normal representation, is configured to:
 access a first data structure comprising a plurality of mesh elements for the high-resolution mesh, each mesh element comprising a respective plurality of vertices;   for each mesh element of the plurality of mesh elements:
 compute a respective centroid of the respective plurality of vertices of the mesh element; and 
 compute a respective normal vector based on the respective plurality of vertices of the mesh element, wherein the respective normal vector is perpendicular to the mesh element at the respective centroid, and wherein the respective normal vector is one of the first plurality of normal vectors; and 
   generate a second data structure associated with the first centroid-normal representation, wherein the second data structure comprises the first plurality of centroids and the first plurality of normal vectors.   
     
     
         13 . The system of  claim 12 , wherein the control circuitry, when computing the respective normal vector based on the respective plurality of vertices of each mesh element of the plurality of mesh elements, is configured to:
 determine a first angle and a second angle corresponding to the respective normal vector, wherein the first angle and the second angle collectively define a spatial direction that is perpendicular to the mesh element at the respective centroid, and wherein the second data structure comprises the first angle and the second angle for the respective normal vector based on the respective plurality of vertices of each mesh element of the plurality of mesh elements.   
     
     
         14 . The system of  claim 11 , wherein the control circuitry, when using the centroid occupancy prediction model to generate the predicted representation, is configured to:
 compute a probability of occupancy for centroids of a mesh object, wherein the mesh object is a 3D structure defining potential centroids for the first centroid-normal representation;   compare the probability of occupancy to a threshold value; and   assign, as part of the predicted representation, centroids of the mesh object associated with a probability of occupancy greater than the threshold value.   
     
     
         15 . The system of  claim 14 , wherein the control circuitry is further configured to:
 determine centroid errors for the predicted representation based on a binary cross-entropy loss.   
     
     
         16 . The system of  claim 11 , wherein the control circuitry, when using the centroid occupancy prediction model to generate the predicted representation, is configured to determine initial feature channels by using a 3D convolution model. 
     
     
         17 . The system of  claim 16 , wherein the 3D convolution model comprises a Minkowski convolutional neural network model. 
     
     
         18 . The system of  claim 11 , wherein the same media content comprises at least one of a 3D scanned model of a physical object or an animated digital object. 
     
     
         19 . The system of  claim 11 , wherein the normal vector prediction model comprises a 3D convolution model. 
     
     
         20 . The system of  claim 19 , wherein the control circuitry, when using the normal vector prediction model, is configured to:
 use the 3D convolution model to generate initial feature channels;   use a first plurality of Multi-Resolution Convolution Blocks (MRCBs) and Stride-2 downscaling convolution layers to generate expanded feature channels based on the initial feature channels;   input, in a second plurality of MRCBs and Stride-2 upscaling convolution layers, the expanded feature channels and a plurality of centroids having respective resolution scales for the Stride-2 upscaling convolution layers; and   generate, from the second plurality of MRCBs and Stride-2 upscaling convolution layers, output feature channels based on the expanded feature channels and the plurality of centroids having the respective resolution scales.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.