US7536366B2ExpiredUtilityPatentIndex 62
Contingency table estimation via sketches
Est. expirySep 15, 2025(expired)· nominal 20-yr term from priority
G06F 16/328Y10S707/99935G06F 16/93G06F 2216/03G06F 16/38Y10S707/99936
62
PatentIndex Score
4
Cited by
4
References
10
Claims
Abstract
Systems and methods that enhance estimate(s) of features (e.g., word associations), via employing a sampling component (e.g., sketches) that facilitates computations of sample contingency tables, and designates occurrences (or absence) of features in data (e.g., words in document lists). The sampling component can further include a contingency table generator and an estimation that employs a likelihood argument (e.g., partial likelihood, maximum likelihood, and the like) to estimate features/word pair(s) associations in the contingency tables.
Claims
exact text as granted — not AI-modified1. A computer implemented system comprising the following computer executable components:
a memory;
a processor that executes the following software components;
a sampling component that employs sketches to facilitate generation of contingency tables and forms a sample of data; and
an estimation component that employs a probabilistic argument to locate a most likely contingency table for the sample;
wherein the contingency tables comprise a sample intersection cell (a s ) that is equal to an intersection cell (a) multiplied by a sampling rate (s);
wherein the processor finds and outputs to a computer display, estimates on word pair associations in searching document lists on a computer; and
wherein the sketches employed to facilitate generation of the contingency tables comprise sketches K 1 , K 2 , and
D
s
=
min
{
K
1
(
k
1
)
,
K
2
(
k
2
)
}
,
k
1
′
=
k
1
-
{
j
:
K
1
(
j
)
>
D
s
}
,
k
2
′
=
k
2
-
{
j
:
K
2
(
j
)
>
D
s
}
,
a
s
=
K
1
⋂
K
2
,
b
s
=
k
1
′
-
a
s
,
c
s
=
k
2
′
-
a
s
,
d
s
=
D
s
-
a
s
-
b
s
-
c
s
.
wherein the most likely contingency table is determined via defining a maximum likelihood estimation as
P
(
a
s
,
b
s
,
c
s
,
d
s
|
D
s
;
a
,
r
)
=
(
a
a
s
)
(
b
b
s
)
(
c
c
s
)
(
d
d
s
)
(
a
+
b
+
c
+
d
a
s
+
b
s
+
c
s
+
d
s
)
=
(
a
a
s
)
(
f
1
-
a
b
s
)
(
f
2
-
a
c
s
)
(
D
-
f
1
-
f
2
+
a
d
s
)
(
D
D
s
)
.
and a closed form solution is presented in the form of
a
^
MLE
,
a
=
f
1
(
2
a
s
+
c
s
)
+
f
2
(
2
a
s
+
b
s
)
-
(
f
1
(
2
a
s
+
c
s
)
-
f
2
(
2
a
s
+
b
s
)
)
2
+
4
f
1
f
2
b
s
c
s
2
(
2
a
s
+
b
s
+
c
s
)
.
2. The computer implemented system of claim 1 , the estimation component further comprising a maximum likelihood estimation.
3. The computer implemented system of claim 2 , further comprising a stopping rule that is based on the likelihood estimation.
4. The computer implemented system of claim 1 , further comprising an artificial intelligence component that facilitates estimation of word associations, via utilizing an automatic classification system that learns explicitly or implicitly when the contingency tables should be generated, wherein the classification system employs a probabilistic and/or statistical-based analysis to infer an action that a user desires to be automatically performed.
5. A computer implemented method comprising the following computer executable acts:
sampling a collection of data to form a sample space;
computing a contingency table within the sample space for features that require association estimation;
employing sketches to facilitate generation of the contingency table and to form a sample of data;
employing a probabilistic argument on the contingency table to determine the most likely contingency table;
employing a linear relation between a sample intersection cell and a sampling rate, wherein the contingency table comprises a sample intersection cell (a s ) that is equal to an intersection cell (a) multiplied by a sampling rate (s);
finding and outputting estimates on word pair associations in searching document lists on a computer;
employing
D
s
=
min
{
K
1
(
k
1
)
,
K
2
(
k
2
)
}
,
k
1
′
=
k
1
-
{
j
:
K
1
(
j
)
>
D
s
}
,
k
2
′
=
k
2
-
{
j
:
K
2
(
j
)
>
D
s
}
,
a
s
=
K
1
⋂
K
2
,
b
s
=
k
1
′
-
a
s
,
c
s
=
k
2
′
-
a
s
,
d
s
=
D
s
-
a
s
-
b
s
-
c
s
for sketches K 1 , K 2 ;
determining the most likely contingency table via defining a maximum likelihood estimation as
P
(
a
s
,
b
s
,
c
s
,
d
s
|
D
s
;
a
,
r
)
=
(
D
s
a
s
,
b
s
,
c
s
,
d
s
)
(
a
D
)
a
s
(
b
D
)
b
s
(
c
D
)
c
s
(
d
D
)
d
s
∝
a
a
s
(
f
1
-
a
)
b
s
-
(
f
2
-
a
)
c
s
(
D
-
f
1
-
f
2
+
a
)
d
s
;
and presenting a closed form solution in form of
a
^
MLE
,
a
=
f
1
(
2
a
s
+
c
s
)
+
f
2
(
2
a
s
+
b
s
)
-
(
f
1
(
2
a
s
+
c
s
)
-
f
2
(
2
a
s
+
b
s
)
)
2
+
4
f
1
f
2
b
s
c
s
2
(
2
a
s
+
b
s
+
c
s
)
.
6. The method of claim 5 further comprising employing a maximum likelihood estimation.
7. The method of claim 5 further comprising employing variances based on document frequencies.
8. The method of claim 5 further comprising representing a partial likelihood as
P
(
a
s
,
b
s
,
c
s
,
d
s
|
D
s
;
a
)
=
(
a
a
s
)
(
b
b
s
)
(
c
c
s
)
(
d
d
s
)
(
a
+
b
+
c
+
d
a
s
+
b
s
+
c
s
+
d
s
)
=
(
a
a
s
)
(
f
1
-
a
b
s
)
(
f
2
-
a
c
s
)
(
D
-
f
1
-
f
2
+
a
d
s
)
(
D
D
s
)
.
9. The method of claim 8 further comprising presenting a solution as:
g
(
a
)
=
a
a
-
a
s
f
1
-
a
+
1
-
b
s
f
1
-
a
+
1
f
2
-
a
+
1
-
c
s
f
2
-
a
+
1
D
-
f
1
-
f
2
+
a
D
-
f
1
-
f
2
+
a
-
d
s
=
1.
10. A computer implemented system comprising the following computer executable components:
means for sampling a collection of documents to form a sample space;
means for computing a contingency table within the sample space;
means for employing sketches to facilitate generation of the contingency table and to form a sample of data;
means for employing a probabilistic argument on the contingency table to determine the most likely contingency table;
means for employing a linear relation between a sample intersection cell and a sampling rate, wherein the contingency table comprises a sample intersection cell (a s ) that is equal to an intersection cell (a) multiplied by a sampling rate (s);
means for finding and outputting estimates on word pair associations in using the sample intersection cell and the sampling rate;
means for employing
D
s
=
min
{
K
1
(
k
1
)
,
K
2
(
k
2
)
}
,
k
1
′
=
k
1
-
{
j
:
K
1
(
j
)
>
D
s
}
,
k
2
′
=
k
2
-
{
j
:
K
2
(
j
)
>
D
s
}
,
a
s
=
K
1
⋂
K
2
,
b
s
=
k
1
′
-
a
s
,
c
s
=
k
2
′
-
a
s
,
d
s
=
D
s
-
a
s
-
b
s
-
c
s
for sketches K1, K2;
means for determining the most likely contingency table via defining a maximum likelihood estimation as
P
(
a
s
,
b
s
,
c
s
,
d
s
|
D
s
;
a
,
r
)
=
(
D
s
a
s
,
b
s
,
c
s
,
d
s
)
(
a
D
)
a
s
(
b
D
)
b
s
(
c
D
)
c
s
(
d
D
)
d
s
∝
a
a
s
(
f
1
-
a
)
b
s
-
(
f
2
-
a
)
c
s
(
D
-
f
1
-
f
2
+
a
)
d
s
;
and means for presenting a closed form solution in form of
a
^
MLE
,
a
=
f
1
(
2
a
s
+
c
s
)
+
f
2
(
2
a
s
+
b
s
)
-
(
f
1
(
2
a
s
+
c
s
)
-
f
2
(
2
a
s
+
b
s
)
)
2
+
4
f
1
f
2
b
s
c
s
2
(
2
a
s
+
b
s
+
c
s
)
.Cited by (0)
No later patents cite this yet.
References (0)
No backward citations on record.