P
US9626358B2ActiveUtilityPatentIndex 73

Creating ontologies by analyzing natural language texts

Assignee: ABBYY INFOPOISK LLCPriority: Nov 26, 2014Filed: Jan 2, 2015Granted: Apr 18, 2017
Est. expiryNov 26, 2034(~8.4 yrs left)· nominal 20-yr term from priority
Inventors:DANIELYAN TATIANA
G06V 30/418G06F 40/30G06V 30/10G06F 17/2785G06K 9/00463G06K 2209/01G06K 9/00483G06V 30/414
73
PatentIndex Score
6
Cited by
230
References
20
Claims

Abstract

Systems and methods for creating ontologies by analyzing natural language texts. An example method comprises: receiving a plurality of semantic structures associated with a text corpus; identifying a first semantic structure and a second semantic structure, wherein the first semantic structure comprises a first substructure and a second substructure, wherein the second semantic structure comprises a third substructure and a fourth substructure, and wherein the first substructure is similar to the third substructure in view of a first similarity criterion; and responsive to determining that the second substructure is similar to the fourth substructure in view of a second similarity criterion, associating, with a certain concept of an ontology associated with the text corpus, objects represented by the second substructure and the fourth substructure.

Claims

exact text as granted — not AI-modified
What is claimed is: 
     
       1. A method, comprising:
 receiving a plurality of semantic structures associated with a text corpus; 
 identifying, by a processing device, a first semantic structure and a second semantic structure, wherein the first semantic structure comprises a first substructure and a second substructure, wherein the second semantic structure comprises a third substructure and a fourth substructure, and wherein the first substructure is similar to the third substructure in view of a first similarity criterion; and 
 responsive to determining that the second substructure is similar to the fourth substructure in view of a second similarity criterion, associating, with a certain concept of an ontology associated with the text corpus, objects represented by the second substructure and the fourth substructure. 
 
     
     
       2. The method of  claim 1 , wherein the ontology comprises one or more concepts, each concept associated with one or more instances of the concept represented by one or more objects. 
     
     
       3. The method of  claim 1 , wherein the first substructure comprises a left context and a right context surrounding the second substructure. 
     
     
       4. The method of  claim 1 , wherein the second substructure comprises a left context and a right context surrounding the fourth substructure. 
     
     
       5. The method of  claim 1 , wherein determining that the second substructure is similar to the fourth substructure comprises: identifying a third semantic structure and a fourth semantic structure, wherein the third semantic structure comprises the second substructure and a fifth substructure, wherein the fourth semantic structure comprises the fourth substructure and a sixth substructure, and wherein the firth substructure is similar to the sixth substructure in view of the first similarity criterion. 
     
     
       6. The method of  claim 1 , wherein at least one of the first semantic structure and the second semantic structure is represented by a graph comprising a plurality of nodes corresponding to a plurality of semantic classes and a plurality of edges corresponding to a plurality of semantic relationships. 
     
     
       7. The method of  claim 1 , wherein identifying the first semantic structure and the second semantic structure comprises comparing a first plurality of semantic classes associated with a first plurality of nodes of a first graph representing the first semantic structure to a second plurality of semantic classes associated with a second plurality of nodes of a second graph representing the second semantic structure. 
     
     
       8. The method of  claim 1 , wherein identifying the first semantic structure and the second semantic structure comprises comparing a first plurality of semantemes associated with a first plurality of nodes of a first graph representing the first semantic structure to a second plurality of semantemes associated with a second plurality of nodes of a second graph representing the second semantic structure. 
     
     
       9. The method of  claim 1 , wherein identifying the first semantic structure and the second semantic structure comprises comparing a first plurality of deep slots associated with a first plurality of nodes of a first graph representing the first semantic structure to a second plurality of deep slots associated with a second plurality of nodes of a second graph representing the second semantic structure. 
     
     
       10. The method of  claim 1 , further comprising producing the plurality of semantic structures by performing a syntactico-semantic analysis of the text corpus. 
     
     
       11. A system, comprising:
 a memory; 
 a processor, coupled to the memory, the processor configured to: 
 receiving a plurality of semantic structures associated with a text corpus;
 identify a first semantic structure and a second semantic structure, wherein the first semantic structure comprises a first substructure and a second substructure, wherein the second semantic structure comprises a third substructure and a fourth substructure, and wherein the first substructure is similar to the third substructure in view of a first similarity criterion; and 
 responsive to determining that the second substructure is similar to the fourth substructure in view of a second similarity criterion, associate, with a certain concept of an ontology associated with the text corpus, objects represented by the second substructure and the fourth substructure. 
 
 
     
     
       12. The system of  claim 11 , wherein determining that the second substructure is similar to the fourth substructure comprises: identifying a third semantic structure and a fourth semantic structure, wherein the third semantic structure comprises the second substructure and a fifth substructure, wherein the fourth semantic structure comprises the fourth substructure and a sixth substructure, and wherein the firth substructure is similar to the sixth substructure in view of the first similarity criterion. 
     
     
       13. The system of  claim 11 , wherein identifying the first semantic structure and the second semantic structure comprises comparing a first plurality of semantic classes associated with a first plurality of nodes of a first graph representing the first semantic structure to a second plurality of semantic classes associated with a second plurality of nodes of a second graph representing the second semantic structure. 
     
     
       14. The system of  claim 11 , wherein identifying the first semantic structure and the second semantic structure comprises comparing a first plurality of semantemes associated with a first plurality of nodes of a first graph representing the first semantic structure to a second plurality of semantemes associated with a second plurality of nodes of a second graph representing the second semantic structure. 
     
     
       15. The system of  claim 11 , wherein identifying the first semantic structure and the second semantic structure comprises comparing a first plurality of deep slots associated with a first plurality of nodes of a first graph representing the first semantic structure to a second plurality of deep slots associated with a second plurality of nodes of a second graph representing the second semantic structure. 
     
     
       16. A computer-readable non-transitory storage medium comprising executable instructions that, when executed by a computing device, cause the computing device to perform operations comprising:
 receiving a plurality of semantic structures associated with a text corpus; 
 identifying, by a processing device, a first semantic structure and a second semantic structure, wherein the first semantic structure comprises a first substructure and a second substructure, wherein the second semantic structure comprises a third substructure and a fourth substructure, and wherein the first substructure is similar to the third substructure in view of a first similarity criterion; and 
 responsive to determining that the second substructure is similar to the fourth substructure in view of a second similarity criterion, associating, with a certain concept of an ontology associated with the text corpus, objects represented by the second substructure and the fourth substructure. 
 
     
     
       17. The computer-readable non-transitory storage medium of  claim 16 , wherein determining that the second substructure is similar to the fourth substructure comprises: identifying a third semantic structure and a fourth semantic structure, wherein the third semantic structure comprises the second substructure and a fifth substructure, wherein the fourth semantic structure comprises the fourth substructure and a sixth substructure, and wherein the firth substructure is similar to the sixth substructure in view of the first similarity criterion. 
     
     
       18. The computer-readable non-transitory storage medium of  claim 16 , wherein identifying the first semantic structure and the second semantic structure comprises comparing a first plurality of semantic classes associated with a first plurality of nodes of a first graph representing the first semantic structure to a second plurality of semantic classes associated with a second plurality of nodes of a second graph representing the second semantic structure. 
     
     
       19. The computer-readable non-transitory storage medium of  claim 16 , wherein identifying the first semantic structure and the second semantic structure comprises comparing a first plurality of semantemes associated with a first plurality of nodes of a first graph representing the first semantic structure to a second plurality of semantemes associated with a second plurality of nodes of a second graph representing the second semantic structure. 
     
     
       20. The computer-readable non-transitory storage medium of  claim 16 , wherein identifying the first semantic structure and the second semantic structure comprises comparing a first plurality of deep slots associated with a first plurality of nodes of a first graph representing the first semantic structure to a second plurality of deep slots associated with a second plurality of nodes of a second graph representing the second semantic structure.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.