Edge conditioned dynamic neighborhood aggregation based molecular property prediction
Abstract
This disclosure relates generally to system and method for molecular property prediction. The conventional methods for molecular property prediction suffer from inherent limitation to effectively encapsulate the characteristics of the molecular graph. Moreover, the known methods are computationally intensive, thereby leading to non-performance in real-time scenarios. The disclosed method overcomes the limitations of typical dynamic neighborhood aggregation (DNA) method by fusing the static edge attributes in determining the self-attention coefficients. In an embodiment, the disclosed method transforms the hidden state of a sink node by utilizing a neural-net function, which takes as input an aggregated single-message vector obtained by the self-attention mechanism and the self-attention mechanism transformed hidden state of the node.
Claims
exact text as granted — not AI-modifiedWhat is claimed is:
1. A processor-implemented method, comprising:
accessing, via one or more hardware processors, a database comprising a plurality of molecular graphs associated with a plurality of molecules and a plurality of labels indicative of chemical properties of the plurality of the molecular graphs, wherein each molecular graph of the plurality of molecular graphs comprises a plurality of sink nodes, each sink node of the plurality of sink nodes connected to a plurality of source nodes for passing neural messages through a plurality of connecting edges;
updating, via the one or more hardware processors, hidden states of the plurality of nodes of each molecular graph from amounts of the plurality of molecular graphs by aggregating encoded neural messages from the plurality of sink nodes associated with each of the molecular graphs to transform a hidden representation of each sink node from amongst the plurality of sink nodes in a plurality of iterations, wherein transforming the hidden state of a sink node from amongst the plurality of sink nodes in a current iteration from amongst the plurality of iterations comprises:
determining a first key matrix representative of a plurality of edge-incorporated neural messages sent by the plurality of source nodes to the sink node in a set of previous iterations that occurred prior to the current iteration;
determining a first value matrix representative of the plurality of edge-incorporated neural messages sent by the plurality of source nodes to the sink node in the set of previous iterations;
determining a first query matrix representative of a linearly transformed hidden state of the sink node;
determining a first set of self-attention coefficients to give weightage to the plurality of edge-incorporated neural messages sent from the plurality of source nodes, the first set of self-attention coefficients determined as a softmax transform product of the first query matrix and the first key matrix;
calculating a single message vector to be perceived by the sink node based on a matrix multiplication of the first value matrix and the first set of self-attention coefficients, wherein the single message vector determines the hidden state of the sink node in a next iteration occurring subsequent to the current iteration;
determining a second key matrix representative of the hidden state of the sink node in the set of previous iterations;
determining a second value matrix representative of the hidden state of the sink node in the set of previous iterations;
determining a second query matrix as a product of the hidden state of the sink node determined at each of the plurality of previous iterations and a query projection matrix at the current iteration step;
determining a second set of self-attention coefficients to give weightage to the hidden stage of the sink node determined at each of the plurality of previous iterations, the second set of self-attention coefficients determined as a softmax transform product of the second query matrix and the second key matrix;
calculating a self-attention based transformed hidden state of the sink node based on a product of the second set of self-attention coefficients with the second value matrix;
determining the hidden state of the sink node at the current iteration using the single message vector and the self-attention based transformed hidden state of the sink node; and
transforming the hidden state vector of the sink node to obtain a graph level embedding of the molecular graph; and
determining, via the one or more hardware processors, one or more molecular properties using a linear layer from the graph level embedding of the molecular graph.
2. The method of claim 1 , wherein the first key matrix is determined by computing a transpose of a product of a key projection matrix and a sum of a concatenated matrix of the source node hidden states from the set of previous iterations and the linearly transformed edge-information, wherein the linearly transformed edge-information is obtained by parameterizing edge-information with a first trainable weight matrix.
3. The method of claim 2 , wherein the second key matrix is determined by a transpose of a product of the key projection matrix at the current iteration and a concatenated matrix of the sink node hidden states from the set of previous iterations.
4. The method of claim 1 , wherein the first value matrix is determined by computing a transpose of a product of a value projection matrix and a sum of a concatenated matrix of the source node hidden states from the set of previous iterations and the linearly transformed edge-information, wherein the linearly transformed edge-information is obtained by parameterizing edge-information with a second trainable weight matrix.
5. The method of claim 4 , wherein the second value matrix is determined by the product of (1) the value projection matrix at the current iteration, and (2) a sum of a concatenated matrix of the sink node hidden states from the set of previous iterations.
6. The method of claim 1 , wherein the first query matrix is determined as a product of a query projection matrix at the current iteration and the hidden state of the sink node at a previous iteration from amongst the set of previous iterations.
7. A system, comprising:
a memory storing instructions;
one or more communication interfaces; and
one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to:
access a database comprising a plurality of molecular graphs associated with a plurality of molecules and a plurality of labels indicative of chemical properties of the plurality of the molecular graphs, wherein each molecular graph of the plurality of molecular graphs comprises a plurality of sink nodes, each sink node of the plurality of sink nodes connected to a plurality of source nodes for passing neural messages through a plurality of connecting edges;
update hidden states of the plurality of nodes of each molecular graph by aggregating encoded neural messages from the plurality of sink nodes associated with each molecular graph to transform a hidden representation of each sink node among the plurality of sink nodes in a plurality of iterations, wherein to transform the hidden state of a sink node from amongst the plurality of sink nodes in a current iteration from amongst the plurality of iterations, the one or more hardware processors are configured by the instructions to:
determine a first key matrix representative of a plurality of edge-incorporated neural messages sent by the plurality of source nodes to the sink node in a set of previous iterations that occurred prior to the current iteration;
determine a first value matrix representative of the plurality of edge-incorporated neural messages sent by the plurality of source nodes to the sink node in the set of previous iterations;
determine a first query matrix representative of a linearly transformed hidden state of the sink node;
determine a first set of self-attention coefficients to give weightage to the plurality of edge-incorporated neural messages sent from the plurality of source nodes, the first set of self-attention coefficients determined as a softmax transform product of the first query matrix and the first key matrix;
calculate a single message vector to be perceived by the sink node based on a matrix multiplication of the first value matrix and the first set of self-attention coefficients, wherein the single message vector determines the hidden state of the sink node in a next iteration occurring subsequent to the current iteration;
determine a second key matrix representative of the hidden state of the sink node in the set of previous iterations;
determining a second value matrix representative of the hidden state of the sink node in the set of previous iterations;
determine a second query matrix as a product of the hidden state of the sink node determined at each of the plurality of previous iterations and a query projection matrix at the current iteration step;
determine a second set of self-attention coefficients to give weightage to the hidden state of the sink node determined at each of the plurality of previous iterations, the second set of self-attention coefficients determined as a softmax transform product of the second query matrix and the second key matrix;
calculate a self-attention based transformed hidden state of the sink node based on a product of the second set of self-attention coefficients with the second value matrix;
determine the hidden state of the sink node at the current iteration using the single message vector and the self-attention based transformed hidden state of the sink node; and
transform the hidden state vector of the sink node to obtain a graph level embedding of the molecular graph; and
determine one or more molecular properties using a linear layer from the graph level embedding of the molecular graph.
8. The system of claim 7 , wherein the first key matrix is determined by computing a transpose of a product of a key projection matrix and a sum of a concatenated matrix of the source node hidden states from the set of previous iterations and the linearly transformed edge-information, wherein the linearly transformed edge-information is obtained by parameterizing edge-information with a first trainable weight matrix.
9. The system of claim 8 , wherein the second key matrix is determined by a transpose of a product of the key projection matrix at the current iteration and a concatenated matrix of the sink node hidden states from the set of previous iterations.
10. The system of claim 7 , wherein the first value matrix is determined by computing a transpose of a product of a value projection matrix and a sum of a concatenated matrix of the source node hidden states from the set of previous iterations and the linearly transformed edge-information, wherein the linearly transformed edge-information is obtained by parameterizing edge-information with a second trainable weight matrix.
11. The system of claim 10 , wherein the second value matrix is determined by the product of (1) the value projection matrix at the current iteration, and (2) a sum of a concatenated matrix of the sink node hidden states from the set of previous iterations.
12. The system of claim 7 , wherein the first query matrix is determined as a product of a query projection matrix at the current iteration and the hidden state of the sink node at a previous iteration from amongst the set of previous iterations.
13. One or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause:
accessing, via one or more hardware processors, a database comprising a plurality of molecular graphs associated with a plurality of molecules and a plurality of labels indicative of chemical properties of the plurality of the molecular graphs, wherein each molecular graph of the plurality of molecular graphs comprises a plurality of sink nodes, each sink node of the plurality of sink nodes connected to a plurality of source nodes for passing neural messages through a plurality of connecting edges;
updating, via the one or more hardware processors, hidden states of the plurality of nodes of each molecular graph from amounts of the plurality of molecular graphs by aggregating encoded neural messages from the plurality of sink nodes associated with each of the molecular graphs to transform a hidden representation of each sink node from amongst the plurality of sink nodes in a plurality of iterations, wherein transforming the hidden state of a sink node from amongst the plurality of sink nodes in a current iteration from amongst the plurality of iterations comprises:
determining a first key matrix representative of a plurality of edge-incorporated neural messages sent by the plurality of source nodes to the sink node in a set of previous iterations that occurred prior to the current iteration;
determining a first value matrix representative of the plurality of edge-incorporated neural messages sent by the plurality of source nodes to the sink node in the set of previous iterations;
determining a first query matrix representative of a linearly transformed hidden state of the sink node;
determining a first set of self-attention coefficients to give weightage to the plurality of edge-incorporated neural messages sent from the plurality of source nodes, the first set of self-attention coefficients determined as a softmax transform product of the first query matrix and the first key matrix;
calculating a single message vector to be perceived by the sink node based on a matrix multiplication of the first value matrix and the first set of self-attention coefficients, wherein the single message vector determines the hidden state of the sink node in a next iteration occurring subsequent to the current iteration;
determining a second key matrix representative of the hidden state of the sink node in the set of previous iterations;
determining a second value matrix representative of the hidden state of the sink node in the set of previous iterations;
determining a second query matrix as a product of the hidden state of the sink node determined at each of the plurality of previous iterations and a query projection matrix at the current iteration step;
determining a second set of self-attention coefficients to give weightage to the hidden state of the sink node determined at each of the plurality of previous iterations, the second set of self-attention coefficients determined as a softmax transform product of the second query matrix and the second key matrix;
calculating a self-attention based transformed hidden state of the sink node based on a product of the second set of self-attention coefficients with the second value matrix;
determining the hidden state of the sink node at the current iteration using the single message vector and the self-attention based transformed hidden state of the sink node; and
transforming the hidden state vector of the sink node to obtain a graph level embedding of the molecular graph; and
determining, via the one or more hardware processors, one or more molecular properties using a linear layer from the graph level embedding of the molecular graph.
14. The one or more non-transitory machine readable information storage mediums of claim 13 , wherein the first key matrix is determined by computing a transpose of a product of a key projection matrix and a sum of concatenated matrix of the source node hidden states from the set of previous iterations and the linearly transformed edge-information, wherein the linearly transformed edge-information is obtained by parameterizing edge-information with a first trainable weight matrix.
15. The one or more non-transitory machine readable information storage mediums of claim 14 , wherein the second key matrix is determined by a transpose of a product of the key projection matrix at the current iteration and a concatenated matrix of the sink node hidden states from the set of previous iterations.
16. The one or more non-transitory machine readable information storage mediums of claim 13 , wherein the first value matrix is determined by computing a transpose of a product of a value projection matrix and a sum of a concatenated matrix of the source node hidden states from the set of previous iterations and the linearly transformed edge-information, wherein the linearly transformed edge-information is obtained by parameterizing edge-information with a second trainable weight matrix.
17. The one or more non-transitory machine readable information storage mediums of claim 16 , wherein the second value matrix is determined by the product of (1) the value projection matrix at the current iteration, and (2) a sum of a concatenated matrix of the sink node hidden states from the set of previous iterations.
18. The one or more non-transitory machine readable information storage mediums of claim 13 , wherein the first query matrix is determined as a product of a query projection matrix at the current iteration and the hidden state of the sink node at a previous iteration from amongst the set of previous iterations.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.