Server grouping system
Abstract
In one embodiment, a method includes receiving client-server connection data for clients and servers, the data including IP addresses corresponding to the servers, for each one of a plurality of IP address pairs performing a statistical test to determine whether the IP addresses in the one IP address pair are related by common clients based on the number of the clients connecting to each of the IP addresses in the one IP address pair, generating a graph including a plurality of vertices and edges, each of the vertices corresponding to a different IP address, each edge corresponding to a different IP address pair determined to be related by common clients in the statistical test, and clustering the vertices yielding clusters, a subset of the IP addresses in one of the clusters providing an indication of the IP addresses of the servers serving a same application.
Claims
exact text as granted — not AI-modifiedWhat is claimed is:
1. A method comprising:
receiving, by a communication interface, client-server connection data for a plurality of clients and a plurality of servers, the client-server connection data including a plurality of server Internet Protocol (IP) addresses corresponding to the plurality of servers;
for each one IP address pair of a plurality of IP address pairs selected from the plurality of server IP addresses, performing, by a hardware processor, a statistical test to determine whether the server IP addresses in the one IP address pair are related by common clients based on the number of the clients connecting to each of the server IP addresses in the one IP address pair;
generating, by the hardware processor, a graph including a plurality of vertices and a plurality of edges between the vertices, each of the vertices corresponding to a different one of the server IP addresses, each one edge of the edges corresponding to a different one of the IP address pairs determined to be related by the common clients in the statistical test, the one edge connecting the server IP addresses in the one IP address pair determined to be related by the common clients in the statistical test; and
clustering, by the hardware processor, the vertices in the graph yielding a plurality of clusters, each of the clusters including the vertices representing a subset of the server IP addresses, the subset of the server IP addresses in one of the clusters providing an indication of the server IP addresses of the servers serving a same application.
2. The method according to claim 1 , further comprising:
analyzing the client-server connection data of the servers having the server IP addresses included in a first cluster of the clusters; and
based on the analyzing, determining an application type served by the servers having the server IP addresses included in the first cluster.
3. The method according to claim 1 , further comprising outputting the application type to an output.
4. The method according to claim 1 , further comprising:
identifying a first server IP address of the server IP addresses as a security threat;
identifying a first cluster of the clusters including device the first server IP address;
identifying the server IP addresses included in the first cluster; and
associating the server IP addresses included in the first cluster with the security threat.
5. The method according to claim 4 , further comprising at least one selected from a group including: blocking, the server IP addresses associated with the security threat; and adding to a blacklist, the server IP addresses associated with the security threat.
6. The method according to claim 1 , further comprising:
for each one server IP address of the server IP addresses, calculating a first probability of any of the clients connecting to the one server IP address;
for each one IP address pair of the IP address pairs, calculating a second probability of any of the clients randomly connecting to both of the server IP addresses in the one IP address pair based on the first probability of any of the clients connecting individually to each of the server IP addresses in the one IP address pair; and
for each one IP address pair of the IP address pairs, performing the statistical test based on the second probability of the one IP address pair, yielding a p-value, the server IP addresses in the one IP address pair being related by common clients if the p-value is less than a certain value.
7. The method according to claim 6 , wherein the certain value is within a range between 0.05 and 0.001.
8. The method according to claim 1 , further comprising:
calculating a quality score for each one cluster of the clusters based on the following formula: IE/(IE+AE), where IE is a number of the edges between the vertices in the one cluster and AE is a number of edges from vertices within the one cluster to vertices outside of the one cluster; and selecting ones of the clusters having the quality score above a certain quality score.
9. The method according to claim 1 , further comprising generating the graph as an aggregate graph aggregated from a plurality of component graphs, the component graphs being generated based on client-server connection data collected for a plurality of time periods wherein each of the component graphs is generated based on the client-server collection data for a different one of the time periods, the generation of each one component graph of the component graphs including performing the statistical test on IP address pairs included in the client-server connection data for the one time period of the one component graph generating component vertices and component edges, the aggregate graph being generated to include a plurality of aggregate vertices and a plurality of aggregate edges, the aggregate edges corresponding to the component edges that appear, above a certain limit, in the component graphs, the aggregate vertices corresponding to the component vertices that appear in the component graphs and are connected by the aggregate edges in the aggregate graph.
10. A system comprising:
a communication interface to receive client-server connection data for a plurality of clients and a plurality of servers, the client-server connection data including a plurality of server Internet Protocol addresses corresponding to the plurality of servers;
a hardware processor to:
perform, for each one IP address pair of a plurality of IP address pairs selected from the plurality of server IP addresses, a statistical test to determine whether the server IP addresses in the one IP address pair are related by common clients based on the number of the clients connecting to each of the server IP addresses in the one IP address pair;
generate a graph including a plurality of vertices and a plurality of edges between the vertices, each of the vertices corresponding to a different one of the server IP addresses, each one edge of the edges corresponding to a different one of the IP address pairs determined to be related by the common clients in the statistical test, the one edge connecting the server IP addresses in the one IP address pair determined to be related by the common clients in the statistical test; and
cluster the vertices in the graph yielding a plurality of clusters, each of the clusters including the vertices representing a subset of the server IP addresses, the subset of the server IP addresses in one of the clusters providing an indication of the server IP addresses of the servers serving a same application.
11. The system according to claim 10 , wherein the hardware processor is operative to:
analyze the client-server connection data of the servers having the server IP addresses included in a first cluster; and
based on the analyzing, determine an application type served by the servers having the server IP addresses included in the first cluster.
12. The system according to claim 11 , further comprising an output interface to output the application type to an output device.
13. The system according to claim 10 , wherein the hardware processor is operative to:
identify a first server IP address of the server IP addresses as a security threat;
identify a first cluster of the clusters including the first server IP address;
identify the server IP addresses included in the first cluster; and
associate the server IP addresses included in the first cluster with the security threat.
14. The system according to claim 13 , wherein the hardware processor is operative to perform at least one operation selected from a group including: block, the server IP addresses associated with the security threat; and add to a blacklist, the server IP addresses associated with the security threat.
15. The system according to claim 10 , wherein the hardware processor is operative to:
for each one server IP address of the server IP addresses, calculate a first probability of any of the clients connecting to the one server IP address;
for each one IP address pair of the IP address pairs, calculate a second probability of any of the clients randomly connecting to both of the server IP addresses in the one IP address pair based on the first probability of any of the clients connecting individually to each of the server IP addresses in the one IP address pair; and
for each one IP address pair of the IP address pairs, perform the statistical test based on the second probability of the one IP address pair, yielding a p-value, the server IP addresses in the one IP address pair being related by common clients if the p-value is less than a certain value.
16. The system according to claim 15 , wherein the certain value is within a range between 0.05 and 0.001.
17. The system according to claim 10 , wherein the hardware processor is operative to:
calculate a quality score for each one cluster of the clusters based on the following formula: IE/(IE+AE), where IE is a number of the edges between the vertices in the one cluster and AE is a number of edges from vertices within the one cluster to vertices outside of the one cluster; and
select ones of the clusters having the quality score above a certain quality score.
18. The system according to claim 10 , wherein the hardware processor is operative to generate the graph as an aggregate graph aggregated from a plurality of component graphs, the component graphs being generated based on client-server connection data collected for a plurality of time periods wherein each of the component graphs is generated based on the client-server collection data for a different one of the time periods, the generation of each one component graph of the component graphs including performing the statistical test on IP address pairs included in the client-server connection data for the one time period of the one component graph generating component vertices and component edges, the aggregate graph being generated to include a plurality of aggregate vertices and a plurality of aggregate edges, the aggregate edges corresponding to the component edges that appear, above a certain limit, in the component graphs, the aggregate vertices corresponding to the component vertices that appear in the component graphs and are connected by the aggregate edges in the aggregate graph.
19. A computer software product, comprising a non-transitory, computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to:
receive client-server connection data for a plurality of clients and a plurality of servers, the client-server connection data including a plurality of server Internet Protocol (IP) addresses corresponding to the plurality of servers;
perform, for each one IP address pair of a plurality of IP address pairs selected from the plurality of server IP addresses, a statistical test to determine whether the server IP addresses in the one IP address pair are related by common clients based on the number of the clients connecting to each of the server IP addresses in the one IP address pair;
generate a graph including a plurality of vertices and a plurality of edges between the vertices, each of the vertices corresponding to a different one of the server IP addresses, each one edge of the edges corresponding to a different one of the IP address pairs determined to be related by the common clients in the statistical test, the one edge connecting the server IP addresses in the one IP address pair determined to be related by the common clients in the statistical test; and
cluster the vertices in the graph yielding a plurality of clusters, each of the clusters including the vertices representing a subset of the server IP addresses, the subset of the server IP addresses in one of the clusters providing an indication of the server IP addresses of the servers serving a same application.
20. The computer software product according to claim 19 , wherein the program instructions also cause the computer to:
for each one server IP address of the server IP addresses, calculate a first probability of any of the clients connecting to the one server IP address;
for each one IP address pair of the IP address pairs, calculate a second probability of any of the clients randomly connecting to both of the server IP addresses in the one IP address pair based on the first probability of any of the clients connecting individually to each of the server IP addresses in the one IP address pair; and
for each one IP address pair of the IP address pairs, perform the statistical test based on the second probability of the one IP address pair, yielding a p-value, the server IP addresses in the one IP address pair being related by common clients if the p-value is less than a certain value.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.