P
US9055012B2ActiveUtilityPatentIndex 82

Bot-network detection based on simple mail transfer protocol (SMTP) characteristics of e-mail senders within IP address aggregates

Assignee: AT & T IP I LPPriority: Apr 8, 2010Filed: Apr 5, 2013Granted: Jun 9, 2015
Est. expiryApr 8, 2030(~3.8 yrs left)· nominal 20-yr term from priority
Inventors:EHRLICH WILLA KAYHOEFLIN DAVID ALIU DANIELLESPIELMAN CHAIMWOOD STEPHEN
H04L 63/1441H04L 51/00H04L 51/02
82
PatentIndex Score
6
Cited by
10
References
18
Claims

Abstract

A method and system for determining whether an IP address is part of a bot-network are provided. The IP-address-aggregate associated with the IP address of an e-mail sender is determined. The IP-address-aggregate is associated with an IP-address-aggregate-category based on the current SMTP traffic characteristics of the IP-address-aggregate and the known SMTP traffic characteristics of an IP-address-aggregate-category. A bot-likelihood score of the IP-address-aggregate-category is then associated with IP-address-aggregate. IP-address-aggregate-categories can be established based on historical SMTP traffic characteristics of the IP-address-aggregates. The IP-address-aggregates are grouped based on SMTP characteristics, and the IP-address-aggregate-categories are defined based on a selection of IP-address-aggregates with similar SMTP traffic characteristics that are diagnostic of spam bots vs. non-botnet-controllers spammers. Bot likelihood scores are determined for the resulting IP-address-aggregate-categories based on historically known bot IP addresses.

Claims

exact text as granted — not AI-modified
We claim: 
     
       1. A computer-implemented method for determining whether an internet protocol address is part of a bot-network, the method comprising:
 determining an internet protocol-address-aggregate associated with the internet protocol-address; 
 associating the internet protocol-address-aggregate with an internet protocol-address-aggregate-category of a plurality of internet protocol-address-aggregate-categories based on simple mail transfer protocol traffic characteristics of the internet protocol-address-aggregate and a plurality of simple mail transfer protocol traffic characteristics of the internet protocol-address-aggregate-category, wherein one of the plurality of simple mail transfer protocol traffic characteristics is a simple mail transfer protocol entropy within a period of time, wherein the plurality of simple mail transfer protocol traffic characteristics of the internet protocol-address-aggregate-category further comprising: 
 a number of black-listed e-mail senders within the period of time; 
 a number of active unique e-mail senders adjusted for internet protocol-address-aggregate simple mail transfer protocol inactivity; and 
 a number of active unique e-mail senders adjusted for individual e-mail sender inactivity given a simple mail transfer protocol-active internet protocol-address-aggregate; and 
 determining whether the internet protocol-address is part of a bot-network based on a bot-likelihood score of the internet protocol-address-aggregate-category, wherein the bot-likelihood score indicates a probability that the internet protocol-address is part of the bot-network. 
 
     
     
       2. The method of  claim 1 , further comprising:
 assigning the bot-likelihood score of the internet protocol-address-aggregate-category to the internet protocol-address-aggregate. 
 
     
     
       3. The method of  claim 2 , further comprising:
 comparing the bot-likelihood score to a threshold. 
 
     
     
       4. The method of  claim 1 , wherein the plurality of internet protocol-address-aggregate-categories are determined by:
 determining simple mail transfer protocol traffic characteristics of a plurality of internet protocol-address-aggregates based on observed network data; 
 grouping the plurality of internet protocol-address-aggregates; and 
 defining the plurality of internet protocol-address-aggregate-categories based on a selection of groups. 
 
     
     
       5. The method of  claim 4 , wherein the plurality of internet protocol-address-aggregates is grouped based on k-means clustering. 
     
     
       6. The method of  claim 4 , further comprising:
 determining a simple mail transfer protocol-traffic-characteristic-vector for each of the plurality of internet protocol-address-aggregates; and 
 grouping the plurality of internet protocol-address-aggregates based on the simple mail transfer protocol-traffic-characteristic-vectors. 
 
     
     
       7. The method of  claim 6 , wherein the plurality of internet protocol-address-aggregate-categories are determined by:
 deriving internet protocol-address-aggregate-category models for the plurality of internet protocol-address-aggregates, each of the internet protocol-address-aggregate-category models having a simple mail transfer protocol-traffic-characteristic-mean value-vector and a simple mail transfer protocol-traffic-characteristic-vector variance matrix; and 
 defining the plurality of internet protocol-address-aggregate-categories by selecting certain ones of the internet protocol-address-aggregate-categories based on the internet protocol-address-aggregate-category models derived. 
 
     
     
       8. The method of  claim 1 , wherein a bot-likelihood score of a particular internet protocol-address-aggregate-category is determined by a ratio of a number of bot-internet protocol addresses in the particular internet protocol-address-aggregate-category to a number of bot-internet protocol addresses in the other internet protocol-address-aggregate-categories, divided by a ratio of a number of non-bot-internet protocol addresses in the particular internet protocol-address-aggregate-category to a number of non-bot-internet protocol addresses in one or more other internet protocol-address-aggregate-categories,
 wherein the number of bot-internet protocol addresses in the particular internet protocol-address-aggregate-category, the number of bot-internet protocol addresses in the one or more internet protocol-address-aggregate-categories, the number of non-bot-internet protocol addresses in the particular internet protocol-address-aggregate-category, and the number of non-bot-internet protocol addresses in the one or more internet protocol-address-aggregate-categories are determined based on observed data. 
 
     
     
       9. The method of  claim 1 , wherein the internet protocol-address-aggregate comprises a range of internet protocol addresses for a particular class C of internet protocol-addresses. 
     
     
       10. A system for determining whether an internet protocol address is part of a bot-network comprising:
 a processor; and 
 a data storage device storing computer program instructions that, when executed by the processor, cause the processor to perform operations comprising: 
 determining an internet protocol-address-aggregate associated with the internet protocol-address; 
 associating the internet protocol-address-aggregate with an internet protocol-address-aggregate-category of a plurality of internet protocol-address-aggregate-categories based on simple mail transfer protocol traffic characteristics of the internet protocol-address-aggregate and a plurality of simple mail transfer protocol traffic characteristics of the internet protocol-address-aggregate-category, wherein one of the plurality of simple mail transfer protocol traffic characteristics is a simple mail transfer protocol entropy within a period of time, wherein the plurality of simple mail transfer protocol traffic characteristics of the internet protocol-address-aggregate-category further comprising: 
 a number of black-listed e-mail senders within the period of time; 
 a number of active unique e-mail senders adjusted for internet protocol-address-aggregate 
 simple mail transfer protocol inactivity; and 
 a number of active unique e-mail senders adjusted for individual e-mail sender inactivity given a simple mail transfer protocol-active internet protocol-address-aggregate; and 
 determining whether the internet protocol-address is part of a bot-network based on a bot-likelihood score of the internet protocol-address-aggregate-category, wherein the bot-likelihood score indicates a probability that the internet protocol-address is part of the bot-network. 
 
     
     
       11. The system of  claim 10 , the operations further comprising:
 assigning the bot-likelihood score of the internet protocol-address-aggregate-category to the internet protocol-address-aggregate. 
 
     
     
       12. The system of  claim 10 , wherein the plurality of internet protocol-address-aggregate-categories are determined by:
 determining simple mail transfer protocol traffic characteristics of a plurality of internet protocol-address-aggregates based on observed network data; 
 grouping the plurality of internet protocol-address-aggregates; and 
 defining the plurality of internet protocol-address-aggregate-categories based on a selection of groups. 
 
     
     
       13. The system of  claim 12 , wherein the plurality of internet protocol-address-aggregates is grouped based on k-means clustering. 
     
     
       14. The system of  claim 12 , the operations further comprising:
 determining a simple mail transfer protocol-traffic-characteristic-vector for each of the plurality of internet protocol-address-aggregates; and 
 grouping the plurality of internet protocol-address-aggregates based on the simple mail transfer protocol-traffic-characteristic-vectors. 
 
     
     
       15. The system of  claim 14 , wherein the plurality of internet protocol-address-aggregate-categories are determined by:
 deriving internet protocol-address-aggregate-category models for the plurality of internet protocol-address-aggregates, each of the internet protocol-address-aggregate-category models having a simple mail transfer protocol-traffic-characteristic-mean value-vector and a simple mail transfer protocol-traffic-characteristic-vector variance matrix; and 
 defining the plurality of internet protocol-address-aggregate-categories by selecting certain ones of the internet protocol-address-aggregate-categories based on the internet protocol-address-aggregate-category models derived. 
 
     
     
       16. The system of  claim 10 , wherein a bot-likelihood score of a particular internet protocol-address-aggregate-category is determined by a ratio of a number of bot-internet protocol addresses in the particular internet protocol-address-aggregate-category to a number of bot-internet protocol addresses in the other internet protocol-address-aggregate-categories, divided by a ratio of a number of non-bot-internet protocol addresses in the particular internet protocol-address-aggregate-category to a number of non-bot-internet protocol addresses in one or more other internet protocol-address-aggregate-categories,
 wherein the number of bot-internet protocol addresses in the particular internet protocol-address-aggregate-category, the number of bot-internet protocol addresses in the one or more internet protocol-address-aggregate-categories, the number of non-bot-internet protocol addresses in the particular internet protocol-address-aggregate-category, and the number of non-bot-internet protocol addresses in the one or more internet protocol-address-aggregate-categories are determined based on observed data. 
 
     
     
       17. The system of  claim 10 , the operations further comprising:
 comparing the bot-likelihood score to a threshold. 
 
     
     
       18. The system of  claim 10 , wherein the internet protocol-address-aggregate comprises a range of internet protocol addresses for a particular class C of internet protocol-addresses.

Cited by (0)

No later patents cite this yet.

References (0)

No backward citations on record.