US8437537B2ActiveUtilityPatentIndex 76
Method and system for estimating 3D pose of specular objects

Assignee: CHANG JU YONGPriority: Mar 27, 2009Filed: Jul 27, 2009Granted: May 7, 2013
Est. expiryMar 27, 2029(~2.7 yrs left)· nominal 20-yr term from priority
Inventors:CHANG JU YONG AGRAWAL AMIT KUMAR VEERARAGHAVAN ASHOK N RASKAR RAMESH N THORTON JAY E
G06V 20/647G06T 7/75G06T 2207/30164
PatentIndex Score
Cited by
References
Claims
Abstract

A method estimates a 3D pose of a 3D specular object in an environment. In a preprocessing step, a set of pairs of 2D reference images are generated using a 3D model of the object, and a set of poses of the object, wherein each pair of reference images is associated with one of the poses. Then, a pair of 2D input images are acquired of the object. A rough 3D pose of the object is estimated by comparing features in the pair of 2D input images and the features in each pair of 2D reference images using a rough cost function. The rough estimate is optionally refined using a fine cost function.
Claims

exact text as granted — not AI-modified
We claim: 
     
       1. A method for estimating a 3D pose of a 3D object in an environment, comprising the steps of:
 rendering a set of pairs of 2D reference images using a 3D model of the object, wherein the object has a specular surface, and a set of poses of the object, wherein each pair of reference images is associated with one of the poses; 
 acquiring a pair of 2D input images of the object; and 
 estimating a 3D pose of the object by comparing features in the pair of 2D input images and features in each pair of 2D reference images using a cost function matching the features, wherein the features are specular flows, wherein the steps are performed in a processor. 
 
     
     
       2. currently amended) The method of  claim 1  where the 2D input images are obtained from a single image acquired by a camera with a non-linear intensity response. 
     
     
       3. The method of  claim 1 , wherein the 3D pose is defined by a 3D translation vector (X, Y, Z), and 3D Euler angles (μ, φ, σ) for orientation. 
     
     
       4. The method of  claim 1 , further comprising:
 refining the pose using a fine cost function. 
 
     
     
       5. The method of  claim 1 , wherein the features are obtained by image processing intensities due to specular reflection. 
     
     
       6. The method of  claim 5 , wherein reference specular intensities are rendered by using a mirror bidirectional reflectance distribution function (BRDF), or some other known BRDF. 
     
     
       7. The method of  claim 6 , wherein the reflectance is mirror-like. 
     
     
       8. The method of  claim 6 , wherein the other known BRDF is of the object. 
     
     
       9. The method of  claim 5 , further comprising:
 arranging a mirror-like sphere in the environment; 
 acquiring an environment map image via reflection of the environment in the mirror-like sphere; 
 constructing an environment map from the environment map image, using a 2D plenoptic function, which models appearance of the surrounding, and wherein the reference images are rendered from a 3D model of the object reflecting the environment map. 
 
     
     
       10. The method of  claim 9 , further comprising:
 acquiring a set of images of the environment and generating a mosaic from the set of images in order to construct the environment map. 
 
     
     
       11. The method of  claim 5 , wherein the pose is obtained by solving 
       
         
           
             
               
                 
                   ( 
                   
                     
                       X 
                       ^ 
                     
                     , 
                     
                       Y 
                       ^ 
                     
                     , 
                     
                       θ 
                       ^ 
                     
                     , 
                     
                       ϕ 
                       ^ 
                     
                     , 
                     
                       σ 
                       ^ 
                     
                   
                   ) 
                 
                 = 
                 
                   arg 
                   ⁢ 
                   
                     
                       min 
                       
                         θ 
                         , 
                         ϕ 
                         , 
                         σ 
                       
                     
                     ⁢ 
                     
                       ( 
                       
                         
                           min 
                           
                             X 
                             , 
                             Y 
                           
                         
                         ⁢ 
                         
                           
                             C 
                             R 
                           
                           ⁡ 
                           
                             ( 
                             
                               
                                 I 
                                 L 
                               
                               , 
                               
                                 I 
                                 S 
                               
                               , 
                               
                                 R 
                                 
                                   θ 
                                   , 
                                   ϕ 
                                   , 
                                   σ 
                                 
                                 L 
                               
                               , 
                               
                                 R 
                                 
                                   θ 
                                   , 
                                   ϕ 
                                   , 
                                   σ 
                                 
                                 S 
                               
                               , 
                               X 
                               , 
                               Y 
                             
                             ) 
                           
                         
                       
                       ) 
                     
                   
                 
               
               , 
             
           
         
       
       where ({circumflex over (X)}, Ŷ, {circumflex over (θ)}, {circumflex over (φ)}, {circumflex over (σ)}) denotes translation and Euler angles of the initial pose, and CR( ) is a rough cost function, I L  and R L  are long exposure input image and reference images, and I S  is a short exposure input image and R S  is a reference image, and arg min is a function that returns the arguments that provide a minimum value, and an inner minimum is determined before an outer minimum. 
     
     
       12. The method of  claim 11 , wherein the rough function is
     C   R ( I   L   , I   S   , R   θ,φ,σ   L   , R   θ, φ, σ   S   , X, Y )=(1−λ) C   1 ( I   S   , R   θ,φ, σ   S   , X, Y )+λ C   2 ( I   L   , R   θ,φσ   L   , X, Y ),
 
 
       where λ is a control parameter, and C 1 ( ) and C 2 ( ) are cost functions for a long exposure image and a short exposure image, respectively. 
     
     
       13. The method of  claim 12 , wherein highlight pixels are used for C 1 ( ) and the highlight pixels are determined by thresholding to produce a corresponding binary image, and further comprising:
 constructing corresponding reference distance image D R  and input distance image D I  by application of a distance transform to the binary images. 
 
     
     
       14. The method of  claim 13 , wherein the cost function C 1 ( )is 
       
         
           
             
               
                 
                   
                     C 
                     1 
                   
                   ⁡ 
                   
                     ( 
                     
                       
                         I 
                         S 
                       
                       , 
                       
                         R 
                         
                           θ 
                           , 
                           ϕ 
                           , 
                           σ 
                         
                         S 
                       
                       , 
                       X 
                       , 
                       Y 
                     
                     ) 
                   
                 
                 = 
                 
                   
                     1 
                     
                       N 
                       highlight 
                     
                   
                   ⁢ 
                   
                     
                       ∑ 
                       
                         ( 
                         
                           u 
                           , 
                           v 
                         
                         ) 
                       
                     
                     ⁢ 
                     
                       
                          
                         
                           
                             
                               D 
                               I 
                             
                             ⁡ 
                             
                               ( 
                               
                                 u 
                                 , 
                                 v 
                               
                               ) 
                             
                           
                           - 
                           
                             
                               D 
                               R 
                             
                             ⁡ 
                             
                               ( 
                               
                                 
                                   u 
                                   - 
                                   x 
                                 
                                 , 
                                 
                                   v 
                                   - 
                                   y 
                                 
                               
                               ) 
                             
                           
                         
                          
                       
                       2 
                     
                   
                 
               
               , 
             
           
         
       
       where (x, y) are projection points, (u, v) are pixel coordinates, N highlight  denotes a number of pixels for the summation, and S denotes a short exposure. 
     
     
       15. The method of  claim 12 , wherein the cost function C2( ) is
     C   2 ( I   L   , R   θ,φ,σ   L   , X, Y )=1−NCC( I   L ( u, v ), R   θ,φ,σ   L ( u−x, v−y )),
 
 
       where NCC denotes normalized cross correlation, and L denotes a long exposure. 
     
     
       16. The method of  claim 15 , wherein (X, Y) denotes translation and (μ, φ, σ) denote Euler angles of the fine pose, and wherein the fine cost function is
     C   F (θ,φ,σ)=1−NCC( I   L ( u, v ), R   θ,φ,σ, X, Y ( u, v )),
 
 
       where (u, v) are pixel coordinates of the input image I and the reference images R, NCC denotes normalized cross correlation, and L denotes a long exposure. 
     
     
       17. The method of  claim 1 , wherein exposures used while acquiring the input images are different. 
     
     
       18. The method of  claim 17 , wherein a short exposure is about 1/60 second and a long exposure is about ¼ second, and a camera aperture is adjusted for ambient illumination so that the long exposure produces an image with normal intensity. 
     
     
       19. The method of  claim 1 , wherein the specular flows are due to a rotation of the environment around a predetermined viewing direction of a camera acquiring the 2D input images. 
     
     
       20. The method of  claim 19 , wherein the rotation is about 5± degrees. 
     
     
       21. The method of  claim 1 , wherein the specular flows are determined using block matching and a color coded environment map. 
     
     
       22. The method of  claim 1 , wherein (X, Y) denotes translation and (μ, φ, σ) denote Euler angles of the pose, and the rough cost function is 
       
         
           
             
               
                 
                   
                     C 
                     1 
                   
                   ⁡ 
                   
                     ( 
                     
                       I 
                       , 
                       
                         R 
                         
                           θ 
                           , 
                           ϕ 
                           , 
                           σ 
                         
                       
                       , 
                       X 
                       , 
                       Y 
                     
                     ) 
                   
                 
                 = 
                 
                   
                     1 
                     
                       N 
                       motion 
                     
                   
                   ⁢ 
                   
                     
                       ∑ 
                       
                         ( 
                         
                           u 
                           , 
                           v 
                         
                         ) 
                       
                     
                     ⁢ 
                     
                       
                          
                         
                           
                             
                               D 
                               I 
                             
                             ⁡ 
                             
                               ( 
                               
                                 u 
                                 , 
                                 v 
                               
                               ) 
                             
                           
                           - 
                           
                             
                               D 
                               R 
                             
                             ⁡ 
                             
                               ( 
                               
                                 
                                   u 
                                   - 
                                   x 
                                 
                                 , 
                                 
                                   v 
                                   - 
                                   y 
                                 
                               
                               ) 
                             
                           
                         
                          
                       
                       2 
                     
                   
                 
               
               , 
             
           
         
       
       where λ is a control parameter, and C1( ) and C2( ) are cost functions based on motion segmentation and the specular flows, respectively, and R and I represent the reference images and the input images, respectively. 
     
     
       23. The method of  claim 22  further comprising:
 constructing corresponding reference distance image D R  and input distance image D 1  from the binary images obtained by thresholding magnitudes of specular flows and a distance transform, and wherein the cost function C1( ) is 
 
       
         
           
             
               
                 
                   
                     C 
                     1 
                   
                   ⁡ 
                   
                     ( 
                     
                       I 
                       , 
                       
                         R 
                         
                           θ 
                           , 
                           ϕ 
                           , 
                           σ 
                         
                       
                       , 
                       X 
                       , 
                       Y 
                     
                     ) 
                   
                 
                 = 
                 
                   
                     1 
                     
                       N 
                       motion 
                     
                   
                   ⁢ 
                   
                     
                       ∑ 
                       
                         ( 
                         
                           u 
                           , 
                           v 
                         
                         ) 
                       
                     
                     ⁢ 
                     
                       
                          
                         
                           
                             
                               D 
                               I 
                             
                             ⁡ 
                             
                               ( 
                               
                                 u 
                                 , 
                                 v 
                               
                               ) 
                             
                           
                           - 
                           
                             
                               D 
                               R 
                             
                             ⁡ 
                             
                               ( 
                               
                                 
                                   u 
                                   - 
                                   x 
                                 
                                 , 
                                 
                                   v 
                                   - 
                                   y 
                                 
                               
                               ) 
                             
                           
                         
                          
                       
                       2 
                     
                   
                 
               
               , 
             
           
         
       
       where (x, y) are projection points, (u, v) are pixel coordinates, the summation is carried out for motion segmentation pixels of the reference image R, and N motion  denotes a number of such pixels. 
     
     
       24. The method of  claim 22 , further comprising:
 comparing the reference image R and input image I, finding inlier pixels where a difference between an input specular flow vector and a reference specular flow vector is less than a small threshold, and wherein the cost function C2( ) is C 2 (I, R θ,φ,σ , X, Y)=−|M| where M is the set of inlier pixels. 
 
     
     
       25. The method of  claim 1 , where (X, Y) represents translation and (μ, φ, σ) represent Euler angles of the 3D pose and a fine cost function is 
       
         
           
             
               
                 
                   
                     C 
                     F 
                   
                   ⁡ 
                   
                     ( 
                     
                       θ 
                       , 
                       ϕ 
                       , 
                       σ 
                     
                     ) 
                   
                 
                 = 
                 
                   
                     1 
                     
                       N 
                       mask 
                     
                   
                   ⁢ 
                   
                     
                       ∑ 
                       
                         ( 
                         
                           u 
                           , 
                           v 
                         
                         ) 
                       
                     
                     ⁢ 
                     
                       
                          
                         
                           
                             I 
                             ⁡ 
                             
                               ( 
                               
                                 u 
                                 , 
                                 v 
                               
                               ) 
                             
                           
                           - 
                           
                             
                               R 
                               
                                 θ 
                                 , 
                                 ϕ 
                                 , 
                                 σ 
                                 , 
                                 X 
                                 , 
                                 Y 
                               
                             
                             ⁡ 
                             
                               ( 
                               
                                 u 
                                 , 
                                 v 
                               
                               ) 
                             
                           
                         
                          
                       
                       2 
                     
                   
                 
               
               , 
               , 
             
           
         
       
       where (u, v) are pixel coordinates, R is the reference image, with the pose parameter (θ, φ, σ, X, Y), and N mask  denotes a number of a stencil, which is defined as an object segmentation mask. 
     
     
       26. The method of  claim 1 , wherein each pair of 2D input images is generated from a single high dynamic range image. 
     
     
       27. The method of  claim 1 , wherein each pair of 2D input images is generated from a set of images collected with varying exposures. 
     
     
       28. The method of  claim 1 , further comprising:
 picking the object out of a bin using a robot arm according to the estimated pose. 
 
     
     
       29. The method of  claim 28 , wherein the bin includes a single or multiple objects. 
     
     
       30. The method of  claim 28 , wherein the input images are acquired by a camera mounted on the robot arm. 
     
     
       31. The method of  claim 28 , where the bin includes active lighting. 
     
     
       32. The method of  claim 1 , wherein the 3D pose has six degrees of freedom. 
     
     
       33. The method of  claim 1 , further comprising:
 segmenting the object in the input images while estimating the pose. 
 
     
     
       34. The method of  claim 1 , further comprising:
 estimating a reflectance of the object in the input images while estimating the pose. 
 
     
     
       35. The method of  claim 1 , wherein the input images are acquired from multiple views of the object. 
     
     
       36. The method of  claim 1 , further comprising:
 actively illuminating the environment with an illumination source. 
 
     
     
       37. The method of  claim 36 , where the illumination source includes one or more projectors. 
     
     
       38. The method of  claim 1 , where the input images are acquired using polarization to estimate specular components. 
     
     
       39. The method of  claim 1 , further comprising:
 illuminating the environment with different colors, and performing the method independently for each color. 
 
     
     
       40. An apparatus for estimating a 3D pose of a 3D object in an environment, wherein the object has a specular surface, comprising:
 means for generating a set of pairs of 2D reference images using a 3D model of the object, and a set of poses of the object, wherein each pair of reference images is associated with one of the poses; 
 a camera configure to acquire a pair of 2D input images of the object; and 
 means, implemented in a processor, for estimating a 3D pose of the object by comparing features in the pair of 2D input images and features in each pair of 2D reference images using a cost function matching the features, wherein the features are specular flows.
Cited by (0)

No later patents cite this yet.
References (0)

No backward citations on record.