US8437537B2ActiveUtilityPatentIndex 76
Method and system for estimating 3D pose of specular objects
Est. expiryMar 27, 2029(~2.7 yrs left)· nominal 20-yr term from priority
G06V 20/647G06T 7/75G06T 2207/30164
76
PatentIndex Score
14
Cited by
12
References
40
Claims
Abstract
A method estimates a 3D pose of a 3D specular object in an environment. In a preprocessing step, a set of pairs of 2D reference images are generated using a 3D model of the object, and a set of poses of the object, wherein each pair of reference images is associated with one of the poses. Then, a pair of 2D input images are acquired of the object. A rough 3D pose of the object is estimated by comparing features in the pair of 2D input images and the features in each pair of 2D reference images using a rough cost function. The rough estimate is optionally refined using a fine cost function.
Claims
exact text as granted — not AI-modifiedWe claim:
1. A method for estimating a 3D pose of a 3D object in an environment, comprising the steps of:
rendering a set of pairs of 2D reference images using a 3D model of the object, wherein the object has a specular surface, and a set of poses of the object, wherein each pair of reference images is associated with one of the poses;
acquiring a pair of 2D input images of the object; and
estimating a 3D pose of the object by comparing features in the pair of 2D input images and features in each pair of 2D reference images using a cost function matching the features, wherein the features are specular flows, wherein the steps are performed in a processor.
2. currently amended) The method of claim 1 where the 2D input images are obtained from a single image acquired by a camera with a non-linear intensity response.
3. The method of claim 1 , wherein the 3D pose is defined by a 3D translation vector (X, Y, Z), and 3D Euler angles (μ, φ, σ) for orientation.
4. The method of claim 1 , further comprising:
refining the pose using a fine cost function.
5. The method of claim 1 , wherein the features are obtained by image processing intensities due to specular reflection.
6. The method of claim 5 , wherein reference specular intensities are rendered by using a mirror bidirectional reflectance distribution function (BRDF), or some other known BRDF.
7. The method of claim 6 , wherein the reflectance is mirror-like.
8. The method of claim 6 , wherein the other known BRDF is of the object.
9. The method of claim 5 , further comprising:
arranging a mirror-like sphere in the environment;
acquiring an environment map image via reflection of the environment in the mirror-like sphere;
constructing an environment map from the environment map image, using a 2D plenoptic function, which models appearance of the surrounding, and wherein the reference images are rendered from a 3D model of the object reflecting the environment map.
10. The method of claim 9 , further comprising:
acquiring a set of images of the environment and generating a mosaic from the set of images in order to construct the environment map.
11. The method of claim 5 , wherein the pose is obtained by solving
(
X
^
,
Y
^
,
θ
^
,
ϕ
^
,
σ
^
)
=
arg
min
θ
,
ϕ
,
σ
(
min
X
,
Y
C
R
(
I
L
,
I
S
,
R
θ
,
ϕ
,
σ
L
,
R
θ
,
ϕ
,
σ
S
,
X
,
Y
)
)
,
where ({circumflex over (X)}, Ŷ, {circumflex over (θ)}, {circumflex over (φ)}, {circumflex over (σ)}) denotes translation and Euler angles of the initial pose, and CR( ) is a rough cost function, I L and R L are long exposure input image and reference images, and I S is a short exposure input image and R S is a reference image, and arg min is a function that returns the arguments that provide a minimum value, and an inner minimum is determined before an outer minimum.
12. The method of claim 11 , wherein the rough function is
C R ( I L , I S , R θ,φ,σ L , R θ, φ, σ S , X, Y )=(1−λ) C 1 ( I S , R θ,φ, σ S , X, Y )+λ C 2 ( I L , R θ,φσ L , X, Y ),
where λ is a control parameter, and C 1 ( ) and C 2 ( ) are cost functions for a long exposure image and a short exposure image, respectively.
13. The method of claim 12 , wherein highlight pixels are used for C 1 ( ) and the highlight pixels are determined by thresholding to produce a corresponding binary image, and further comprising:
constructing corresponding reference distance image D R and input distance image D I by application of a distance transform to the binary images.
14. The method of claim 13 , wherein the cost function C 1 ( )is
C
1
(
I
S
,
R
θ
,
ϕ
,
σ
S
,
X
,
Y
)
=
1
N
highlight
∑
(
u
,
v
)
D
I
(
u
,
v
)
-
D
R
(
u
-
x
,
v
-
y
)
2
,
where (x, y) are projection points, (u, v) are pixel coordinates, N highlight denotes a number of pixels for the summation, and S denotes a short exposure.
15. The method of claim 12 , wherein the cost function C2( ) is
C 2 ( I L , R θ,φ,σ L , X, Y )=1−NCC( I L ( u, v ), R θ,φ,σ L ( u−x, v−y )),
where NCC denotes normalized cross correlation, and L denotes a long exposure.
16. The method of claim 15 , wherein (X, Y) denotes translation and (μ, φ, σ) denote Euler angles of the fine pose, and wherein the fine cost function is
C F (θ,φ,σ)=1−NCC( I L ( u, v ), R θ,φ,σ, X, Y ( u, v )),
where (u, v) are pixel coordinates of the input image I and the reference images R, NCC denotes normalized cross correlation, and L denotes a long exposure.
17. The method of claim 1 , wherein exposures used while acquiring the input images are different.
18. The method of claim 17 , wherein a short exposure is about 1/60 second and a long exposure is about ¼ second, and a camera aperture is adjusted for ambient illumination so that the long exposure produces an image with normal intensity.
19. The method of claim 1 , wherein the specular flows are due to a rotation of the environment around a predetermined viewing direction of a camera acquiring the 2D input images.
20. The method of claim 19 , wherein the rotation is about 5± degrees.
21. The method of claim 1 , wherein the specular flows are determined using block matching and a color coded environment map.
22. The method of claim 1 , wherein (X, Y) denotes translation and (μ, φ, σ) denote Euler angles of the pose, and the rough cost function is
C
1
(
I
,
R
θ
,
ϕ
,
σ
,
X
,
Y
)
=
1
N
motion
∑
(
u
,
v
)
D
I
(
u
,
v
)
-
D
R
(
u
-
x
,
v
-
y
)
2
,
where λ is a control parameter, and C1( ) and C2( ) are cost functions based on motion segmentation and the specular flows, respectively, and R and I represent the reference images and the input images, respectively.
23. The method of claim 22 further comprising:
constructing corresponding reference distance image D R and input distance image D 1 from the binary images obtained by thresholding magnitudes of specular flows and a distance transform, and wherein the cost function C1( ) is
C
1
(
I
,
R
θ
,
ϕ
,
σ
,
X
,
Y
)
=
1
N
motion
∑
(
u
,
v
)
D
I
(
u
,
v
)
-
D
R
(
u
-
x
,
v
-
y
)
2
,
where (x, y) are projection points, (u, v) are pixel coordinates, the summation is carried out for motion segmentation pixels of the reference image R, and N motion denotes a number of such pixels.
24. The method of claim 22 , further comprising:
comparing the reference image R and input image I, finding inlier pixels where a difference between an input specular flow vector and a reference specular flow vector is less than a small threshold, and wherein the cost function C2( ) is C 2 (I, R θ,φ,σ , X, Y)=−|M| where M is the set of inlier pixels.
25. The method of claim 1 , where (X, Y) represents translation and (μ, φ, σ) represent Euler angles of the 3D pose and a fine cost function is
C
F
(
θ
,
ϕ
,
σ
)
=
1
N
mask
∑
(
u
,
v
)
I
(
u
,
v
)
-
R
θ
,
ϕ
,
σ
,
X
,
Y
(
u
,
v
)
2
,
,
where (u, v) are pixel coordinates, R is the reference image, with the pose parameter (θ, φ, σ, X, Y), and N mask denotes a number of a stencil, which is defined as an object segmentation mask.
26. The method of claim 1 , wherein each pair of 2D input images is generated from a single high dynamic range image.
27. The method of claim 1 , wherein each pair of 2D input images is generated from a set of images collected with varying exposures.
28. The method of claim 1 , further comprising:
picking the object out of a bin using a robot arm according to the estimated pose.
29. The method of claim 28 , wherein the bin includes a single or multiple objects.
30. The method of claim 28 , wherein the input images are acquired by a camera mounted on the robot arm.
31. The method of claim 28 , where the bin includes active lighting.
32. The method of claim 1 , wherein the 3D pose has six degrees of freedom.
33. The method of claim 1 , further comprising:
segmenting the object in the input images while estimating the pose.
34. The method of claim 1 , further comprising:
estimating a reflectance of the object in the input images while estimating the pose.
35. The method of claim 1 , wherein the input images are acquired from multiple views of the object.
36. The method of claim 1 , further comprising:
actively illuminating the environment with an illumination source.
37. The method of claim 36 , where the illumination source includes one or more projectors.
38. The method of claim 1 , where the input images are acquired using polarization to estimate specular components.
39. The method of claim 1 , further comprising:
illuminating the environment with different colors, and performing the method independently for each color.
40. An apparatus for estimating a 3D pose of a 3D object in an environment, wherein the object has a specular surface, comprising:
means for generating a set of pairs of 2D reference images using a 3D model of the object, and a set of poses of the object, wherein each pair of reference images is associated with one of the poses;
a camera configure to acquire a pair of 2D input images of the object; and
means, implemented in a processor, for estimating a 3D pose of the object by comparing features in the pair of 2D input images and features in each pair of 2D reference images using a cost function matching the features, wherein the features are specular flows.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.