Bot-network detection based on simple mail transfer protocol (SMTP) characteristics of e-mail senders within IP address aggregates
Abstract
A method and system for determining whether an IP address is part of a bot-network are provided. The IP-address-aggregate associated with the IP address of an e-mail sender is determined. The IP-address-aggregate is associated with an IP-address-aggregate-category based on the current SMTP traffic characteristics of the IP-address-aggregate and the known SMTP traffic characteristics of an IP-address-aggregate-category. A bot-likelihood score of the IP-address-aggregate-category is then associated with IP-address-aggregate. IP-address-aggregate-categories can be established based on historical SMTP traffic characteristics of the IP-address-aggregates. The IP-address-aggregates are grouped based on SMTP characteristics, and the IP-address-aggregate-categories are defined based on a selection of IP-address-aggregates with similar SMTP traffic characteristics that are diagnostic of spam bots vs. non-botnet-controllers spammers. Bot likelihood scores are determined for the resulting IP-address-aggregate-categories based on historically known bot IP addresses.
Claims
exact text as granted — not AI-modifiedWe claim:
1. A computer-implemented method for determining whether an internet protocol address is part of a bot-network, the method comprising:
determining an internet protocol-address-aggregate associated with the internet protocol-address;
associating the internet protocol-address-aggregate with an internet protocol-address-aggregate-category of a plurality of internet protocol-address-aggregate-categories based on simple mail transfer protocol traffic characteristics of the internet protocol-address-aggregate and a plurality of simple mail transfer protocol traffic characteristics of the internet protocol-address-aggregate-category, wherein one of the plurality of simple mail transfer protocol traffic characteristics is a simple mail transfer protocol entropy within a period of time, wherein the plurality of simple mail transfer protocol traffic characteristics of the internet protocol-address-aggregate-category further comprising:
a number of black-listed e-mail senders within the period of time;
a number of active unique e-mail senders adjusted for internet protocol-address-aggregate simple mail transfer protocol inactivity; and
a number of active unique e-mail senders adjusted for individual e-mail sender inactivity given a simple mail transfer protocol-active internet protocol-address-aggregate; and
determining whether the internet protocol-address is part of a bot-network based on a bot-likelihood score of the internet protocol-address-aggregate-category, wherein the bot-likelihood score indicates a probability that the internet protocol-address is part of the bot-network.
2. The method of claim 1 , further comprising:
assigning the bot-likelihood score of the internet protocol-address-aggregate-category to the internet protocol-address-aggregate.
3. The method of claim 2 , further comprising:
comparing the bot-likelihood score to a threshold.
4. The method of claim 1 , wherein the plurality of internet protocol-address-aggregate-categories are determined by:
determining simple mail transfer protocol traffic characteristics of a plurality of internet protocol-address-aggregates based on observed network data;
grouping the plurality of internet protocol-address-aggregates; and
defining the plurality of internet protocol-address-aggregate-categories based on a selection of groups.
5. The method of claim 4 , wherein the plurality of internet protocol-address-aggregates is grouped based on k-means clustering.
6. The method of claim 4 , further comprising:
determining a simple mail transfer protocol-traffic-characteristic-vector for each of the plurality of internet protocol-address-aggregates; and
grouping the plurality of internet protocol-address-aggregates based on the simple mail transfer protocol-traffic-characteristic-vectors.
7. The method of claim 6 , wherein the plurality of internet protocol-address-aggregate-categories are determined by:
deriving internet protocol-address-aggregate-category models for the plurality of internet protocol-address-aggregates, each of the internet protocol-address-aggregate-category models having a simple mail transfer protocol-traffic-characteristic-mean value-vector and a simple mail transfer protocol-traffic-characteristic-vector variance matrix; and
defining the plurality of internet protocol-address-aggregate-categories by selecting certain ones of the internet protocol-address-aggregate-categories based on the internet protocol-address-aggregate-category models derived.
8. The method of claim 1 , wherein a bot-likelihood score of a particular internet protocol-address-aggregate-category is determined by a ratio of a number of bot-internet protocol addresses in the particular internet protocol-address-aggregate-category to a number of bot-internet protocol addresses in the other internet protocol-address-aggregate-categories, divided by a ratio of a number of non-bot-internet protocol addresses in the particular internet protocol-address-aggregate-category to a number of non-bot-internet protocol addresses in one or more other internet protocol-address-aggregate-categories,
wherein the number of bot-internet protocol addresses in the particular internet protocol-address-aggregate-category, the number of bot-internet protocol addresses in the one or more internet protocol-address-aggregate-categories, the number of non-bot-internet protocol addresses in the particular internet protocol-address-aggregate-category, and the number of non-bot-internet protocol addresses in the one or more internet protocol-address-aggregate-categories are determined based on observed data.
9. The method of claim 1 , wherein the internet protocol-address-aggregate comprises a range of internet protocol addresses for a particular class C of internet protocol-addresses.
10. A system for determining whether an internet protocol address is part of a bot-network comprising:
a processor; and
a data storage device storing computer program instructions that, when executed by the processor, cause the processor to perform operations comprising:
determining an internet protocol-address-aggregate associated with the internet protocol-address;
associating the internet protocol-address-aggregate with an internet protocol-address-aggregate-category of a plurality of internet protocol-address-aggregate-categories based on simple mail transfer protocol traffic characteristics of the internet protocol-address-aggregate and a plurality of simple mail transfer protocol traffic characteristics of the internet protocol-address-aggregate-category, wherein one of the plurality of simple mail transfer protocol traffic characteristics is a simple mail transfer protocol entropy within a period of time, wherein the plurality of simple mail transfer protocol traffic characteristics of the internet protocol-address-aggregate-category further comprising:
a number of black-listed e-mail senders within the period of time;
a number of active unique e-mail senders adjusted for internet protocol-address-aggregate
simple mail transfer protocol inactivity; and
a number of active unique e-mail senders adjusted for individual e-mail sender inactivity given a simple mail transfer protocol-active internet protocol-address-aggregate; and
determining whether the internet protocol-address is part of a bot-network based on a bot-likelihood score of the internet protocol-address-aggregate-category, wherein the bot-likelihood score indicates a probability that the internet protocol-address is part of the bot-network.
11. The system of claim 10 , the operations further comprising:
assigning the bot-likelihood score of the internet protocol-address-aggregate-category to the internet protocol-address-aggregate.
12. The system of claim 10 , wherein the plurality of internet protocol-address-aggregate-categories are determined by:
determining simple mail transfer protocol traffic characteristics of a plurality of internet protocol-address-aggregates based on observed network data;
grouping the plurality of internet protocol-address-aggregates; and
defining the plurality of internet protocol-address-aggregate-categories based on a selection of groups.
13. The system of claim 12 , wherein the plurality of internet protocol-address-aggregates is grouped based on k-means clustering.
14. The system of claim 12 , the operations further comprising:
determining a simple mail transfer protocol-traffic-characteristic-vector for each of the plurality of internet protocol-address-aggregates; and
grouping the plurality of internet protocol-address-aggregates based on the simple mail transfer protocol-traffic-characteristic-vectors.
15. The system of claim 14 , wherein the plurality of internet protocol-address-aggregate-categories are determined by:
deriving internet protocol-address-aggregate-category models for the plurality of internet protocol-address-aggregates, each of the internet protocol-address-aggregate-category models having a simple mail transfer protocol-traffic-characteristic-mean value-vector and a simple mail transfer protocol-traffic-characteristic-vector variance matrix; and
defining the plurality of internet protocol-address-aggregate-categories by selecting certain ones of the internet protocol-address-aggregate-categories based on the internet protocol-address-aggregate-category models derived.
16. The system of claim 10 , wherein a bot-likelihood score of a particular internet protocol-address-aggregate-category is determined by a ratio of a number of bot-internet protocol addresses in the particular internet protocol-address-aggregate-category to a number of bot-internet protocol addresses in the other internet protocol-address-aggregate-categories, divided by a ratio of a number of non-bot-internet protocol addresses in the particular internet protocol-address-aggregate-category to a number of non-bot-internet protocol addresses in one or more other internet protocol-address-aggregate-categories,
wherein the number of bot-internet protocol addresses in the particular internet protocol-address-aggregate-category, the number of bot-internet protocol addresses in the one or more internet protocol-address-aggregate-categories, the number of non-bot-internet protocol addresses in the particular internet protocol-address-aggregate-category, and the number of non-bot-internet protocol addresses in the one or more internet protocol-address-aggregate-categories are determined based on observed data.
17. The system of claim 10 , the operations further comprising:
comparing the bot-likelihood score to a threshold.
18. The system of claim 10 , wherein the internet protocol-address-aggregate comprises a range of internet protocol addresses for a particular class C of internet protocol-addresses.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.