Fast template-based tracking
Abstract
Techniques to identify and track a pre-identified region-of-interest (ROI) through a temporal sequence of frames/images are described. In general, a down-sampled color gradient (edge map) of an arbitrary sized ROI from a prior frame may be used to generate a small template. This initial template may be used to identify a region of a new or current frame that may be overscan and used to create a current frame's edge map. By comparing the prior frame's template to the current frame's edge map, a cost value or image may be found and used to identify the current frame's ROI center. The size of the current frame's ROI may be found by varying the size of putative new ROIs and testing for their congruence with the prior frame's template. Subsequent ROI's for subsequent frames may be identified to, effectively, track an arbitrarily sized ROI through a sequence of video frames.
Claims
exact text as granted — not AI-modifiedThe invention claimed is:
1. An object tracking method, comprising:
receiving an initial frame from a temporal sequence of frames, the initial frame having an initial region-of-interest (ROI), every ROI having a size and location;
determining an initial template of the initial frame based on the initial ROI and a specified size;
receiving a first frame from the temporal sequence of frames, the first frame arriving later in the temporal sequence of frames than the initial frame;
identifying a first region of the first frame based on the initial ROI;
finding a first plurality of first metric values based on the first region and a cost function;
determining a first location of a first ROI of the first frame based on the plurality of first metric values;
determining a second plurality of putative ROIs for the first frame, each putative ROI having a different size and centered at the first location;
determining a second metric value for each of the putative ROIs; and
selecting one of the putative ROIs as the first frame's first ROI based on the second metric values.
2. The method of claim 1 , wherein the initial ROI comprises less than the entire initial frame.
3. The method of claim 2 , wherein determining an initial template comprises:
down-sampling the initial ROI to the specified size; and
determining a color gradient of the down-sampled initial ROI.
4. The method of claim 3 , further comprising determining a color descriptor of the down-sampled initial ROI.
5. The method of claim 4 , wherein identifying a first region of the first frame comprises:
identifying a temporary region of the first frame corresponding to the initial ROI of the initial frame; and
overscanning the temporary region.
6. The method of claim 5 , wherein the cost function is based on a congruence between the initial template and each n-by-n sub-region of the first frame's first region, wherein ‘n’ indicates the amount of overscan of the temporary region.
7. The method of claim 6 , wherein determining a second metric value for each of the putative ROIs comprises, for each putative ROI:
selecting a region centered about the first location, the region having a size;
converting the region to the specified size;
finding an edge map of the converted region; and
finding a value indicative of the congruence between the initial template and the edge map of the converted region.
8. The method of 1 , further comprising:
determining a first edge map of the first frame based on the first ROI;
combining the first edge map with a plurality of other edge maps to generate an updated template, wherein each of the plurality of other edge maps corresponds to an ROI of a different frame from the temporal sequence of frames, each of the plurality of other frames arriving earlier in the temporal sequence of frames than the first frame; and
using the updated template as the initial template when evaluating a next frame from the temporal sequence of frames, the next frame arriving later in the temporal sequence of frames than the first frame.
9. An object tracking digital image capture unit, comprising:
an image sensor;
a lens system configured to focus light from a scene onto the image sensor;
a memory communicatively coupled to the image sensor and configured to store multiple images from the image sensor; and
one or more processors coupled to the lens system and the memory, the one or more processors configured for—
receiving an initial frame from a temporal sequence of frames, the initial frame having an initial region-of-interest (ROI), every ROI having a size and location;
determining an initial template of the initial frame based on the initial ROI and a specified size;
receiving a first frame from the temporal sequence of frames, the first frame arriving later in the temporal sequence of frames than the initial frame;
identifying a first region of the first frame based on the initial ROI;
finding a first plurality of first metric values based on the first region and a cost function;
determining a first location of a first ROI of the first frame based on the plurality of first metric values;
determining a second plurality of putative ROIs for the first frame, each putative ROI having a different size and centered at the first location;
determining a second metric value for each of the putative ROIs; and
selecting one of the putative ROIs as the first frame's first ROI based on the second metric values.
10. The digital image capture unit of claim 9 , wherein the initial ROI comprises less than the entire initial frame.
11. The digital image capture unit of claim 10 , wherein determining an initial template comprises:
down-sampling the initial ROI to the specified size; and
determining a color gradient of the down-sampled initial ROI.
12. The digital image capture unit of claim 11 , wherein the one or more processors are further configured for determining a color descriptor of the down-sampled initial ROI.
13. The digital image capture unit of claim 12 , wherein identifying a first region of the first frame comprises:
identifying a temporary region of the first frame corresponding to the initial ROI of the initial frame; and
overscanning the temporary region.
14. The digital image capture unit of claim 13 , wherein the cost function is based on a congruence between the initial template and each n-by-n sub-region of the first frame's first region, wherein ‘n’ indicates the amount of overscan of the temporary region.
15. The digital image capture unit of claim 14 , wherein determining a second metric value for each of the putative ROIs comprises, for each putative ROI:
selecting a region centered about the first location, the region having a size;
converting the region to the specified size;
finding an edge map of the converted region; and
finding a value indicative of the congruence between the initial template and the edge map of the converted region.
16. The digital image capture unit of 9 , wherein the one or more processors are further configured for:
determining a first edge map of the first frame based on the first ROI;
combining the first edge map with a plurality of other edge maps to generate an updated template, wherein each of the plurality of other edge maps corresponds to an ROI of a different frame from the temporal sequence of frames, each of the plurality of other frames arriving earlier in the temporal sequence of frames than the first frame; and
using the updated template as the initial template when evaluating a next frame from the temporal sequence of frames, the next frame arriving later in the temporal sequence of frames than the first frame.
17. A non-transitory program storage device comprising instructions stored thereon to cause one or more processors to:
receive an initial frame from a temporal sequence of frames, the initial frame having an initial region-of-interest (ROI), every ROI having a size and location;
determine an initial template of the initial frame based on the initial ROI and a specified size;
receive a first frame from the temporal sequence of frames, the first frame arriving later in the temporal sequence of frames than the initial frame;
identify a first region of the first frame based on the initial ROI;
find a first plurality of first metric values based on the first region and a cost function;
determine a first location of a first ROI of the first frame based on the plurality of first metric values;
determine a second plurality of putative ROIs for the first frame, each putative ROI having a different size and centered at the first location;
determine a second metric value for each of the putative ROIs; and
select one of the putative ROIs as the first frame's first ROI based on the second metric values.
18. The non-transitory program storage device of claim 17 , wherein the initial ROI comprises less than the entire initial frame.
19. The non-transitory program storage device of claim 18 , wherein the instructions to determine an initial template comprise instructions to:
down-sample the initial ROI to the specified size; and
determine a color gradient of the down-sampled initial ROI.
20. The non-transitory program storage device of claim 19 , further comprising instructions to determine a color descriptor of the down-sampled initial ROI.
21. The non-transitory program storage device of claim 20 , wherein instructions to identify a first region of the first frame comprise instructions to:
identify a temporary region of the first frame corresponding to the initial ROI of the initial frame; and
overscan the temporary region.
22. The non-transitory program storage device of claim 21 , wherein the cost function is based on a congruence between the initial template and each n-by-n sub-region of the first frame's first region, wherein ‘n’ indicates the amount of overscan of the temporary region.
23. The non-transitory program storage device of claim 22 , wherein instructions to determine a second metric value for each of the putative ROIs comprise instructions to, for each putative ROI:
select a region centered about the first location, the region having a size;
convert the region to the specified size;
find an edge map of the converted region; and
find a value indicative of the congruence between the initial template and the edge map of the converted region.
24. The non-transitory program storage device of 17 , further comprising instructions to:
determine a first edge map of the first frame based on the first ROI;
combine the first edge map with a plurality of other edge maps to generate an updated template, wherein each of the plurality of other edge maps corresponds to an ROI of a different frame from the temporal sequence of frames, each of the plurality of other frames arriving earlier in the temporal sequence of frames than the first frame; and
use the updated template as the initial template when evaluating a next frame from the temporal sequence of frames, the next frame arriving later in the temporal sequence of frames than the first frame.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.