P
US12154376B2ActiveUtilityPatentIndex 47

Extracting facial imagery from online sessions

Assignee: DELL PRODUCTS LPPriority: Jan 21, 2022Filed: Jan 21, 2022Granted: Nov 26, 2024
Est. expiryJan 21, 2042(~15.5 yrs left)· nominal 20-yr term from priority
Inventors:SHEPHERD MICHAELWHITSON JONATHAN
G06V 40/169G06V 30/10G06V 20/635G06V 40/166G06V 40/161
47
PatentIndex Score
0
Cited by
4
References
20
Claims

Abstract

A system can determine, from a video of an online session, respective bounding boxes of text names of people, wherein the text names are presented in the video, and wherein images of the people are present in the video. The system can determine, from the video, respective faces of the people. The system can associate a first bounding box of the bounding boxes with a first face of the faces based on the first bounding box satisfying a function of distance with respect to the first face among the faces. The system can extract a name from the first bounding box via optical character recognition. The system can extract a subportion of the video that comprises the first face. The system can store an association between the name and the subportion of the video that comprises the first face.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A system, comprising:
 at least one processor; and 
 at least one memory that stores executable instructions that, when executed by the at least one processor, facilitate performance of operations, comprising:
 determining, from a video of an online session, respective bounding boxes of text names of people, wherein the text names are presented in the video, and wherein images of the people are present in the video; 
 determining, from the video, respective faces of the people; 
 associating a bounding box of the bounding boxes with a face of the faces based on the bounding box satisfying a function of distance with respect to the face among the faces based on the bounding box having a shortest distance with respect to the face among the faces, and wherein the shortest distance is measured between an upper right element of the bounding box and a lower left element of the face; 
 extracting a name from the bounding box via optical character recognition; 
 extracting a subportion of the video that comprises the face; and 
 storing an association between the name and the subportion of the video that comprises the face. 
 
 
     
     
       2. The system of  claim 1 , wherein the operations further comprise:
 determining that a first user interface location of the bounding box has changed in the video to a new first user interface location; 
 determining that a second user interface location of the face has changed in the video to a new second user interface location; and 
 associating the bounding box at the new first user interface location with the face at the second user interface location. 
 
     
     
       3. The system of  claim 2 , wherein the second user interface location of the face has changed in the video to the new second user interface location based on a determination that a user account previously joined in the online session has signed out of participating in the online session. 
     
     
       4. The system of  claim 2 , wherein the second user interface location of the face has changed in the video to the new second user interface location based on a change in user account of the online session associated with a person speaking in the online session. 
     
     
       5. The system of  claim 1 , wherein the bounding box satisfying the function of distance comprises the bounding box having a first shortest distance with respect to the face among the faces, wherein the video is a first video created with a first online session program, wherein the bounding box is a first bounding box, wherein the face is a first face, wherein the association is a first association, wherein the images are first images, wherein the subportion is a first subportion, and wherein the operations further comprise:
 determining, from a second video of a second online session created with a second online session program, respective second bounding boxes of text names of second people, wherein second images of the second people are present in the second video; 
 associating a second bounding box of the respective second bounding boxes with a second face of the second people based on the second bounding box having a second shortest distance with the second face among the second images of the second people; 
 extracting a second name from the second bounding box via optical character recognition; 
 extracting a second subportion of the second video that comprises the second face; and 
 storing a second association between the second name and the second subportion of the second video that comprises the second face. 
 
     
     
       6. The system of  claim 1 , wherein the association is a first association, and wherein the operations further comprise:
 storing respective associations between respective text names of the text names and respective people of the people for respective frames of the video. 
 
     
     
       7. A method, comprising:
 determining, by a system comprising at least one processor, and from a video of an online session, respective bounding boxes of text names of people, wherein the text names are presented in the video, and wherein images of the people are present in the video; 
 determining, by the system and from the video, respective faces of the people; 
 associating, by the system, a bounding box of the bounding boxes with a face of the faces based on the bounding box satisfying a criterion of distance with respect to the face among the faces based on the bounding box having a shortest distance with respect to the face among the faces, and wherein the shortest distance is measured between an upper right element of the bounding box and a lower left element of the face; 
 extracting, by the system, a name from the bounding box via optical character recognition; 
 extracting, by the system, a subportion of the video that comprises the face; and 
 storing, by the system, an association between the name and the subportion of the video that comprises the face. 
 
     
     
       8. The method of  claim 7 , further comprising:
 determining, by the system, a group of bounding box information, respective bounding box information of the group of bounding box information comprising respective first timestamps, respective first sources, and respective coordinates within the video; and 
 determining, by the system, a group of face information, respective face information of the group of face information comprising respective second timestamps within the video, respective second coordinates within the video, and respective masks, 
 wherein associating the bounding box with the face is based on associating a first timestamp of the respective first timestamps of the group of bounding box information with a second timestamp of the respective second timestamps of the group of face information. 
 
     
     
       9. The method of  claim 7 , wherein the association between the name and the portion of the video that comprises the face comprises a timestamp, the name, a location of the face in the video, and a mask. 
     
     
       10. The method of  claim 7 , wherein determining the respective faces of the people comprises:
 determining, by the system, the bounding box of the face and a mask of the face, wherein the mask covers the face, and wherein the mask is contained within the bounding box; and 
 wherein storing the association between the name and the portion of the video that comprises the face comprises storing, by the system, the bounding box of the face and the mask of the face. 
 
     
     
       11. The method of  claim 7 , further comprising:
 determining, by the system, stored participant names of participants in the online session, the stored participant names being separate from the video; and 
 updating, by the system, the name based on the name being different from the stored participant names, to produce an updated name, 
 wherein storing the association comprises storing, by the system, the updated name. 
 
     
     
       12. The method of  claim 7 , wherein the name is a first name, and further comprising:
 determining, by the system, stored participant names of participants in the online session, the stored participant names being separate from the video; 
 determining, by the system, that a second name of the text names is not identified in the stored participant names; and 
 omitting, by the system, from associating the second name with the faces. 
 
     
     
       13. The method of  claim 7 , wherein the distance is a first distance, and further comprising:
 determining, by the system, a predetermined location relative to the face; and 
 associating, by the system, the bounding box with the face based on the bounding box predetermined location relative to the face satisfying the criterion with respect to distance. 
 
     
     
       14. A non-transitory computer-readable medium comprising instructions that, in response to execution, cause a system comprising a processor to perform operations, comprising:
 determining, from a video of an online session, respective bounding boxes of text names of people, wherein the text names are presented in the video, and wherein images of the people are present in the video; 
 determining, from the video, respective faces of the people; 
 associating a bounding box of the bounding boxes with a face of the faces based on the bounding box satisfying a function of distance with respect to the face among the faces based on the bounding box having a shortest distance with respect to the face among the faces, and wherein the shortest distance is measured between an upper right element of the bounding box and a lower left element of the face; 
 extracting a name from the bounding box via optical character recognition; 
 extracting a subportion of the video that comprises the face; and 
 storing an association between the name and the subportion of the video that comprises the face. 
 
     
     
       15. The non-transitory computer-readable medium of  claim 14 , wherein the operations further comprise:
 extracting the name from the bounding box via optical character recognition before storing the association. 
 
     
     
       16. The non-transitory computer-readable medium of  claim 14 , wherein the operations further comprise:
 extracting the part of the video that comprises the face before storing the association. 
 
     
     
       17. The non-transitory computer-readable medium of  claim 14 , wherein the operations further comprise:
 determining the respective faces using a convolutional neural network. 
 
     
     
       18. The non-transitory computer-readable medium of  claim 14 , wherein the operations further comprise:
 determining the respective bounding boxes of text names of the people in the video using a fully convolutional network. 
 
     
     
       19. The non-transitory computer-readable medium of  claim 14 , wherein the operations further comprise:
 determining the respective faces using a first neural network; and 
 determining the respective bounding boxes of text names of the people in the video using a second neural network, wherein the first neural network differs from the second neural network. 
 
     
     
       20. The non-transitory computer-readable medium of  claim 14 , wherein the operations further comprise:
 determining that a first user interface location of the bounding box has changed in the video to a new first user interface location; 
 determining that a second user interface location of the face has changed in the video to a new second user interface location; and 
 associating the bounding box at the new first user interface location with the face at the second user interface location.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.