P
US11812154B2ActiveUtilityPatentIndex 67

Method, apparatus and system for video processing

Assignee: REALSEE BEIJING TECH CO LTDPriority: Jul 30, 2021Filed: Jun 16, 2022Granted: Nov 7, 2023
Est. expiryJul 30, 2041(~15.1 yrs left)· nominal 20-yr term from priority
Inventors:Rao TongZHU YIHUANG CHENGFENG
H04N 23/698G06T 7/337G06V 10/751H04N 5/265H04N 5/2622H04N 13/156G06T 2207/10016G06T 2207/20221H04N 5/262G06T 7/33G06T 3/4038G06V 20/46G06V 10/26G06V 10/16
67
PatentIndex Score
2
Cited by
43
References
20
Claims

Abstract

A method for video processing is provided. The method comprises obtaining an image of a scene, obtaining a video that records an area included in the scene, determining one or more frames from the plurality of frames of the video, determining pairs of matched features, generating a plurality of composite frames by combining each of the selected one or more frames with the image of the scene based on the pairs of matched features, and generating a composite video based on the plurality of composite frames. Each of the pairs of matched features is related to an object that is in both the image and the one or more frames. Each of the pairs of matched features is associated with one or more pixels of the image of the scene and one or more pixels of a selected frame of the one or more frames.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A method for video processing, comprising:
 obtaining an image of a scene; 
 obtaining a video that records an area included in the scene, the video comprising a plurality of frames, wherein the image of the scene is associated with a first image plane, and each frame in the video is associated with a second image plane; 
 determining one or more frames from the plurality of frames of the video; 
 determining pairs of matched features, wherein each of the pairs of matched features is related to an object that is in both the image and the one or more frames, and wherein each of the pairs of matched features is associated with one or more pixels of the image of the scene and one or more pixels of a selected frame of the one or more frames; 
 determining, based on the matched features, one or more relationships between the first image plane and one or more second image planes; 
 generating, based on the pairs of matched features, a plurality of composite frames by combining each of the selected one or more frames with the image of the scene, wherein generating the plurality of composite frames further comprises:
 projecting, based on the one or more relationships, pixels of the frames in the video from the associated second image planes to the first image plane; and 
 combining the projected pixels with the pixels of the image of the scene in the first image plane to generate the plurality of composite frames; and 
 
 generating a composite video based on the plurality of composite frames. 
 
     
     
       2. The method according to  claim 1 , wherein the area recorded by the video is a target scene included in the scene, wherein pixels in the composite frames that are related to the target scene are the projected pixels from the respective second image planes, and wherein the remainder pixels in the composite frames are from the pixels in the image of the scene. 
     
     
       3. The method according to  claim 1 , wherein the second image planes for the frames in the video are the same, and one relationship is determined between the first image plane and the second image planes based on the one or more frames. 
     
     
       4. The method according to  claim 1 , wherein the second image planes for the frames in the video include different second image planes, wherein the frames in the video are divided into groups, and each group of the frames is associated with one second image plane, and wherein one relationship is determined for each group of the frames. 
     
     
       5. The method according to  claim 4 , wherein processing the plurality of composite frames further comprises:
 mitigating boundaries caused by combining each of the one or more frames with the image of the scene; or 
 adjusting colors in the composite frames. 
 
     
     
       6. The method according to  claim 5 , further comprising:
 processing the plurality of composite frames to improve the quality of the composite frames, 
 wherein the composite video is generated based on the processed composite frames. 
 
     
     
       7. The method according to  claim 1 , wherein determining the matched features between the image of the scene and the one or more frames further comprises:
 determining a set of first features from the image of the scene; 
 determining a set of second features from each of the one or more frames; and 
 comparing the set of first features and each set of second features, 
 wherein the matched features include the first features and the corresponding second features that are related to the same objects in the scene based on the comparison results. 
 
     
     
       8. The method according to  claim 1 , wherein obtaining the image of the scene further comprises:
 obtaining a plurality of images from different perspectives; and 
 generating the image of the scene by combining the plurality of images. 
 
     
     
       9. The method according to  claim 1 , wherein the video is recorded by an imaging device, and the settings of the imaging device remain the same during the recording of the video. 
     
     
       10. The method according to  claim 1 , wherein the video is recorded for motions of one or more objects in the area included in the scene. 
     
     
       11. The method according to  claim 1 , further comprising:
 causing display of the composite video. 
 
     
     
       12. A device for video processing, comprising:
 one or more processors; and 
 a non-transitory computer-readable medium, having computer-executable instructions stored thereon, the computer-executable instructions, when executed by the one or more processors, causing the one or more processors to facilitate: 
 obtaining an image of a scene; 
 obtaining a video that records an area included in the scene, the video comprising a plurality of frames, wherein the image of the scene is associated with a first image plane, and each frame in the video is associated with a second image plane; 
 determining one or more frames from the plurality of frames of the video; 
 determining pairs of matched features, wherein each of the pairs of matched features is related to an object that is in both the image and the one or more frames, and wherein each of the pairs of matched features is associated with one or more pixels of the image of the scene and one or more pixels of a selected frame of the one or more frames; 
 determining, based on the matched features, one or more relationships between the first image plane and one or more second image planes; 
 generating, based on the pairs of matched features, a plurality of composite frames by combining each of the selected one or more frames with the image of the scene, wherein generating the plurality of composite frames further comprises:
 projecting, based on the one or more relationships, pixels of the frames in the video from the associated second image planes to the first image plane; and 
 combining the projected pixels with the pixels of the image of the scene in the first image plane to generate the plurality of composite frames; and 
 
 generating a composite video based on the plurality of composite frames. 
 
     
     
       13. The device according to  claim 12 , wherein the area recorded by the video is a target scene included in the scene, wherein pixels in the composite frames that are related to the target scene are the projected pixels from the respective second image planes, and wherein the remainder pixels in the composite frames are from the pixels in the image of the scene. 
     
     
       14. The device according to  claim 12 , wherein the second image planes for the frames in the video are the same, and one relationship is determined between the first image plane and the second image planes based on the one or more frames. 
     
     
       15. The device according to  claim 12 , wherein the second image planes for the frames in the video include different second image planes, wherein the frames in the video are divided into groups, and each group of the frames is associated with one second image plane, and wherein one relationship is determined for each group of the frames. 
     
     
       16. The device according to  claim 12 , wherein the instructions cause the one or more processors to further facilitate:
 processing the plurality of composite frames to improve the quality of the composite frames, 
 wherein the composite video is generated based on the processed composite frames. 
 
     
     
       17. The device according to  claim 16 , wherein processing the plurality of composite frames further comprises:
 mitigating boundaries caused by combining each of the one or more frames with the image of the scene; or 
 adjusting colors in the composite frames. 
 
     
     
       18. A non-transitory computer-readable medium, having computer-executable instructions stored thereon, the computer-executable instructions, when executed by one or more processors, causing a processor to facilitate:
 obtaining an image of a scene; 
 obtaining a video that records an area included in the scene, the video comprising a plurality of frames, wherein the image of the scene is associated with a first image plane, and each frame in the video is associated with a second image plane; 
 determining one or more frames from the plurality of frames of the video; 
 determining pairs of matched features, wherein each of the pairs of matched features is related to an object that is in both the image and the one or more frames, and wherein each of the pairs of matched features is associated with one or more pixels of the image of the scene and one or more pixels of a selected frame of the one or more frames; 
 determining, based on the matched features, one or more relationships between the first image plane and one or more second image planes; 
 generating, based on the pairs of matched features, a plurality of composite frames by combining each of the selected one or more frames with the image of the scene, wherein generating the plurality of composite frames further comprises:
 projecting, based on the one or more relationships, pixels of the frames in the video from the associated second image planes to the first image plane; and 
 combining the projected pixels with the pixels of the image of the scene in the first image plane to generate the plurality of composite frames; and 
 
 generating a composite video based on the plurality of composite frames. 
 
     
     
       19. The non-transitory computer-readable medium according to  claim 18 , wherein the area recorded by the video is a target scene included in the scene, wherein pixels in the composite frames that are related to the target scene are the projected pixels from the respective second image planes, and wherein the remainder pixels in the composite frames are from the pixels in the image of the scene. 
     
     
       20. The non-transitory computer-readable medium according to  claim 18 , wherein the second image planes for the frames in the video are the same, and one relationship is determined between the first image plane and the second image planes based on the one or more frames.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.